...
The outline of this page is :
1) Problem description
2) Program flow
3) Test data file and caveats
Data date of predefined data set is: 2019-10-15 till 2019-10-17
1) Description
The WIGOS id contains four parts such as 0-2XXXX-0-YYYYY,
...
old stations and their WIGOS ids.
2)Program description
Code Block | ||
---|---|---|
| ||
''' Created on 22 Oct 2019 # Copyright 2005-2018 ECMWF. # This software is licensed under the terms of the Apache Licence Version 2.0 # which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. # In applying this licence, ECMWF does not waive the privileges and immunities # granted to it by virtue of its status as an intergovernmental organisation # nor does it submit to any jurisdiction This is a test program to encode Wigos Synop requires 1) ecCodes version 2.14.81 or above (available at https://confluence.ecmwf.int/display/ECC/Releases) 2) python3.6.8-01 To run the program -i <input bufr >./addWigosProg.py -m <mode [web|json]> -l <logFile> -o <output BUFR file>i synop_multi_subset.bufr -o out_synop_multisubset.bufr -w WIGOS_TEMP_IDENT.csv Uses BUFR version 4 template and adds the WIGOS Identifier 301150 REQUIRES TablesVersionNumber above 28 Author : Roberto Ribas Garcia ECMWF 28/10/2019 ''' from eccodes import * import argparse import json import re import pandas as pd import numpy as np import Modifications performance improvement ( uses skipExtraKeyAttributes) and codes_clone 04/11/2019 changes for SYNOP and TEMP messages 05/11/2019 fixed codes_clone issue 05/11/2019 ''' from eccodes import * import argparse import json import re import pandas as pd import numpy as np import logging import requests import os def read_cmd_line(): p=argparse.ArgumentParser() p.add_argument("-i","--input",help="input bufr file") p.add_argument("-o","--output",help="output bufr file with wigos") p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]") p.add_argument("-l","--logfile",help="log file ") args=p.parse_args() return args def read_oscar_json(jsonFile): with open(jsonFile,"r") as f: jtext=json.load(f) return jtext def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"): r=requests.get(oscarURL) jtext=json.loads(r.text) return jtext def parse_json_into_dataframe(jtext): ''' parses the JSON from the file wigosJsonFile filters the stations by wigosStationIdentifiers key in the dictionaries ''' wigosStations=[] nowigosStations=[] for d in jtext: if "wigosStationIdentifiers" in d.keys(): wigosStations.append(d) else: nowigosStations.append(d) ''' uses only the wigos 0-20XXX-0-YYYYY (surface) ''' p=re.compile("0-20\d{3}-0-\d{5}") fwigosStations=[] for d in wigosStations: wigosInfo=d["wigosStationIdentifiers"] for e in wigosInfo: if e["primary"]==True: wigosId=e["wigosStationIdentifier"] if p.match(wigosId): wigosParts=wigosId.split("-") d["wigosIdentifierSeries"]=wigosParts[0] d["wigosIssuerOfIdentifier"]=wigosParts[1] d["wigosIssueNumber"]=wigosParts[2] d["wigosLocalIdentifierCharacter"]=wigosParts[3] d["oldID"]=wigosParts[3][-5:] fwigosStations.append(d) df=pd.DataFrame(fwigosStations) df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber", "wigosLocalIdentifierCharacter","oldID"]] return df def get_ident(bid): ''' gets the ident of the message by combining blockNumber and stationNumber keys from the input BUFR file the ident may be single valued or multivalued ( only single valued are considered further) considered further) ''' ident=None if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ): ''' ident=None if ( blockNumber=codes_isget_definedarray(bid, "blockNumber") and stationNumber=codes_isget_definedarray(bid,"stationNumber") if len(blockNumber)==1 and len(stationNumber): ==1: blockNumber=codes_get_array(bid,"blockNumber") ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber)) elif stationNumber=codes_get_array(bid,"stationNumber") len(blockNumber)==1 and len(stationNumber)!=1: if len(blockNumber)==1 and =np.repeat(blockNumber,len(stationNumber))==1: ident=[str("{0:02d}{1:03d}".format(intb,s)) for b,s in zip(blockNumber),int(stationNumber)) elif len(blockNumber)==1 if b!=CODES_MISSING_LONG and len(stationNumber)s!=1:CODES_MISSING_LONG] elif len(blockNumber)!=1 blockNumber=np.repeat(blockNumber,and len(stationNumber))!=1: ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] elif len(blockNumber)!=1 and len(stationNumber)!=1:''' ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) here only the first element of the list is returned to the main program this avoids lists being used in the if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] dataframe query and breaking the logic ''' return ident def copy_header(bid,obidif isinstance(ident,list): ''' this function copies the header keys ''' ident=ident[0] return ident bhc=codes_get(bid,"bufrHeaderCentre") codes_set(obid,"bufrHeaderCentre",bhc) def add_wigos_info(ident,bid,odf,obid): bhsc=codes_get(bid,"bufrHeaderSubCentre")''' codes_set(obid,"bufrHeaderSubCentre",bhsc) usn=codes_get(bid,"updateSequenceNumber") codes_set(obid,"updateSequenceNumber",usn) dc=codes_get(bid,"dataCategory")add the wigos information to the message ident pointed by bid the odf contains the WIGOS information for ident codes_set(obid,"dataCategory",dc) obid is the output handle dsc=codes_get(bid,"dataSubCategory") ''' codes_set(obid,"dataSubCategory",dsc) year=if codes_is_getdefined(bid, "typicalYearshortDelayedDescriptorReplicationFactor"): codes_set(obid,"typicalYear",year) monthshortDelayed=codes_get_array(bid,"typicalMonthshortDelayedDescriptorReplicationFactor") codes_set(obid,"typicalMonth",month) else: day=codes_get(bid,"typicalDay") shortDelayed=None if codes_is_setdefined(obidbid, "typicalDaydelayedDescriptorReplicationFactor",day): hour delayedDesc=codes_get_array(bid,"typicalHourdelayedDescriptorReplicationFactor") codes_set(obid,"typicalHour",hour) else: delayedDesc=None tmin=codes_get(bid,"typicalMinute") if codes_is_setdefined(obidbid, "typicalMinuteextendedDelayedDescriptorReplicationFactor",tmin): sec extDelayedDesc=codes_get_array(bid,"typicalSecondextendedDelayedDescriptorReplicationFactor") codes_set(obid,"typicalSecond",sec) else: return extDelayedDesc=None def add_wigos_info(ident,bid,wdf,obid): nsubsets=codes_get(bid,"numberOfSubsets") '''compressed=codes_get(bid,"compressedData") add the wigos information to the message ident pointed by bid masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber") if masterTablesVersionNumber<28: the wdf is the wholemasterTablesVersionNumber=28 wigos dataframe and obid is the output bid '''unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors") outUD=list(unexpandedDescriptors) if codes_is_defined(bid, "shortDelayedDescriptorReplicationFactor"):outUD.insert(0,301150) shortDelayed=codes_get_array(bid,"shortDelayedDescriptorReplicationFactor") else:''' only treat the uncompressed messages with 1 shortDelayed=Nonesubset for if codes_is_defined(bid, "delayedDescriptorReplicationFactor"): delayedDesc=codes_get_array(bid,"delayedDescriptorReplicationFactor")future add treatment of compressed messages with more than 1 subset else:''' delayedDesc=None if compressed==0 and nsubsets==1: ''' IMPORTANT, takes into account nsubsets=codes_get(bid,"numberOfSubsets") compressed=codes_get(bid,"compressedData")delayed replications ( all possible cases) to accommodate masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber") if masterTablesVersionNumber<28:SYNOP + TEMP messages masterTablesVersionNumber=28''' if shortDelayed is not None: unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors") outUD=list(unexpandedDescriptors) outUD.insert(0,301150codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed) if delayedDesc is not '''None: only treat the uncompressed messages with 1 subset codes_set_array(obid,"inputDelayedDescriptorReplicationFactor",delayedDesc) for future add treatment of compressed messages with more than 1 subset if extDelayedDesc is not None: ''' if compressed==0 and nsubsets==1: codes_set_array(obid,"inputExtendedDelayedDescriptorReplicationFactor",extDelayedDesc) if shortDelayed is not None: codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber) codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactornumberOfSubsets",shortDelayednsubsets) if delayedDesc is not None: codes_set_array(obid, "inputDelayedDescriptorReplicationFactorunexpandedDescriptors",delayedDesc)outUD) wis=odf["wigosIdentifierSeries"].values copy_header(bid,obid) if len(wis)!=1: codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber) wis=wis[0] codes_set(obid,"numberOfSubsets",nsubsetswigosIdentifierSeries",int(wis)) odf=wdf.query("oldID=='{0}'".format(ident))wid=odf["wigosIssuerOfIdentifier"].values if not odf.empty: len(wid)!=1: wid=wid[0] codes_set_array(obid, "unexpandedDescriptorswigosIssuerOfIdentifier",outUDint(wid)) wiswin=odf["wigosIdentifierSerieswigosIssueNumber"].values if len(wiswin)!=1: wis=wiswin=win[0] codes_set(obid,"wigosIdentifierSerieswigosIssueNumber",int(wiswin)) wid wlid=odf["wigosIssuerOfIdentifierwigosLocalIdentifierCharacter"].values if len(wid)!=1:wlid="{0:5}".format(wlid[0]) wid=wid[0] logging.info(" wlid here {0}".format(wlid)) codes_set(obid,"wigosIssuerOfIdentifierwigosLocalIdentifierCharacter",intstr(widwlid)) codes_bufr_copy_data(bid,obid) else: win=odf["wigosIssueNumber"].values logging.info(" skipping compressed message id {0} with {1} if len(win)!=1:subsets ".format(ident,nsubsets)) return win=win[0] def main(): print("ecCodes version {0}".format(codes_set(obid,"wigosIssueNumber",int(win)) _get_api_version())) args=read_cmd_line() logfile=args.logfile wlid=odf["wigosLocalIdentifierCharacter"].values logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w") wlid="{0:5}".format(wlid[0])infile=args.input outfile=args.output logging.info(" wlid here {0}".format(wlid)) mode=args.mode if mode=="web": codes_set(obid,"wigosLocalIdentifierCharacter",str(wlid)) jtext=read_oscar_web() cdirectory=os.getcwd() codes_bufr_copy_data(bid,obidoscarFile=os.path.join(cdirectory,"oscar.json") else: with open(oscarFile,"w") as f: logging.info(" wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) json.dump(jtext,f) else: loggingcdirectory=os.infogetcwd(") skipping compressed message id {0} with {1} subsets ".format(ident,nsubsets))oscarFile=os.path.join(cdirectory,"oscar.json") return obid def main()with open(oscarFile,"r") as f: args=read_cmd_line() logfile=args.logfile logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w"jtext=json.load(f) infile=args.input outfile=args.output wigosDf=parse_json_into_dataframe(jtext) mode=args.mode if mode=="web":f=open(infile,"rb") jtext=read_oscar_web(nmsg=codes_count_in_file(f) cdirectory=os.getcwd(fout=open(outfile,"wb") for i oscarFile=os.path.join(cdirectory,"oscar.json")in range(0,nmsg): with open(oscarFile,"w") as f:bid=codes_bufr_new_from_file(f) json.dump(jtext,fobid=codes_clone(bid) else: codes_set(bid, cdirectory=os.getcwd('skipExtraKeyAttributes', 1) oscarFile=os.path.join(cdirectory,"oscar.json"codes_set(bid,"unpack",1) with open(oscarFile,"r") as f:ident=get_ident(bid) if ident: jtext=json.load(f) logging.info (" \t message {0} ident {1} ".format(i+1,ident)) wigosDf=parse_json_into_dataframe(jtext) fodf=openwigosDf.query(infile,"rb") nmsg=codes_count_in_file(f) fout=open(outfile,"wb") "oldID=='{0}'".format(ident)) for i in range(0,nmsg): obid=codes_bufr_new_from_samples("BUFR4") if bid=codes_bufr_new_from_file(f)not odf.empty: codes_set(bid,"unpack",1) ident=get_ident(bidadd_wigos_info(ident,bid, odf,obid) if ident: codes_write(obid,fout) logging.info (" \t message {0} ident {1} ".format(i+1,ident)) else: add_wigos_info(ident,bid, wigosDf, obid) logging.info(" wigos {0} is empty for codes_write(obid,foutident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) else: logging.info ("message {0} rejected ".format(i+1)) codes_release(obid) codes_release(bid) f.close() print (" finished") if __name__ == '__main__': main() |
The program can be called with the following arguments
...
that are uncompressed ( compressed =0) and single subset ( numberOfSubsets=1) if their ident matches the ones in wigosDf.
5) a new function ( copy_header) was added to avoid changing the header of the message. Now, it copies the keys from bid to obid except typicalDate which is read onlyIf get_ident function founds many idents on a message only returns the first one.
During program execution a log file is generated containing information about the processing.
...
- Only uncompressed messages (compressed =0) and single subset (numberOfSubsets=1) are considered
- The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
- When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string.
- The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.
Results
The output file contains 22724 messagesfile contains 19543 SYNOP messages obtained from running the program on a input BUFR file containing raw SYNOP data received through GTS
View file | ||||
---|---|---|---|---|
|
This file contains 7 TEMP messages obtained running the program on a BUFR file containing raw TEMP messages.
View file | ||||
---|---|---|---|---|
|