...
The outline of this page is :
1) Problem description
2) Program flow
3) Test data file and caveats
Data date of predefined data set is: 2019-10-15 till 2019-10-17
1) Description
The WIGOS id contains four parts such as 0-2XXXX-0-YYYYY,
...
old stations and their WIGOS ids.
2)Program description
Code Block | ||
---|---|---|
| ||
''' Created on 22 Oct 2019 # Copyright 2005-2018 ECMWF. # This software is licensed under the terms of the Apache Licence Version 2.0 # which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. # In applying this licence, ECMWF does not waive the privileges and immunities # granted to it by virtue of its status as an intergovernmental organisation # nor does it submit to any jurisdiction This is a test program to encode Wigos Synop requires 1) ecCodes version 2.814.1 or above (available at https://confluence.ecmwf.int/display/ECC/Releases) 2) python3.6.8-01 To run the program -i <input bufr >./addWigosProg.py -m <mode [web|json]> -l <logFile> -o <output BUFR file>i synop_multi_subset.bufr -o out_synop_multisubset.bufr -w WIGOS_TEMP_IDENT.csv Uses BUFR version 4 template and adds the WIGOS Identifier 301150 REQUIRES TablesVersionNumber above 28 Author : Roberto Ribas Garcia ECMWF 28/10/2019 Modifications Addedperformance copy_headerimprovement function( touses keepskipExtraKeyAttributes) the header keys from the input message and codes_clone 04/11/2019 ''' from eccodes import * import argparsechanges importfor json import re import pandas as pd import numpy as np import logging import requests import os def read_cmd_line(): p=argparse.ArgumentParser() p.add_argument("-i","--input",help="input bufr file") p.add_argument("-o","--output",help="output bufr file with wigos") p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]") p.add_argument("-l","--logfile",help="log file ") args=p.parse_args() return args def read_oscar_json(jsonFile): with open(jsonFile,"r") as f: SYNOP and TEMP messages 05/11/2019 fixed codes_clone issue jtext=json.load(f) return jtext def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"): r=requests.get(oscarURL) jtext=json.loads(r.text) return jtext 05/11/2019 ''' from eccodes import * import argparse import json import re import pandas as pd import numpy as np import logging import requests import os def parseread_jsoncmd_into_dataframeline(jtext): '''p=argparse.ArgumentParser() parses the JSON from the file wigosJsonFilep.add_argument("-i","--input",help="input bufr file") filters the stations by wigosStationIdentifiers key in the dictionaries ''' wigosStations=[] nowigosStations=[] for d in jtext: if "wigosStationIdentifiers" in d.keys(): wigosStations.append(d) elsep.add_argument("-o","--output",help="output bufr file with wigos") p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]") p.add_argument("-l","--logfile",help="log file ") args=p.parse_args() return args def read_oscar_json(jsonFile): with open(jsonFile,"r") as f: nowigosStations.append(djtext=json.load(f) return jtext ''' uses only the wigos 0-20XXX-0-YYYYY (surface)def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"): '''r=requests.get(oscarURL) pjtext=rejson.compile("0-20\d{3}-0-\d{5}") loads(r.text) fwigosStations=[] return jtext def parse_json_into_dataframe(jtext): for d''' in wigosStations: parses the JSON from the wigosInfo=d["wigosStationIdentifiers"]file wigosJsonFile filters the stations by forwigosStationIdentifiers ekey in the wigosInfo:dictionaries ''' if e["primary"]==True:wigosStations=[] nowigosStations=[] for d in jtext: if wigosId=e["wigosStationIdentifierwigosStationIdentifiers"] in d.keys(): if p.match(wigosId):wigosStations.append(d) else: wigosParts=wigosIdnowigosStations.splitappend("-"d) ''' uses only the wigos 0-20XXX-0-YYYYY (surface) d["wigosIdentifierSeries"]=wigosParts[0]''' p=re.compile("0-20\d{3}-0-\d{5}") fwigosStations=[] for d in wigosStations: wigosInfo=d["wigosIssuerOfIdentifierwigosStationIdentifiers"]=wigosParts[1] for e in wigosInfo: if de["wigosIssueNumberprimary"]=wigosParts[2]=True: dwigosId=e["wigosLocalIdentifierCharacterwigosStationIdentifier"]=wigosParts[3] if p.match(wigosId): d["oldID"]=wigosParts[3][-5:] wigosParts=wigosId.split("-") fwigosStations.append(d)d["wigosIdentifierSeries"]=wigosParts[0] d["wigosIssuerOfIdentifier"]=wigosParts[1] df=pd.DataFrame(fwigosStations) df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber", "wigosLocalIdentifierCharacter","oldID"]] d["wigosIssueNumber"]=wigosParts[2] return df def get_ident(bid): ''' gets the ident of the message by combining blockNumber and stationNumber keys from the input BUFR file d["wigosLocalIdentifierCharacter"]=wigosParts[3] the ident may be single valued or multivalued ( only single valued are considered further) d["oldID"]=wigosParts[3][-5:] ''' ident=None fwigosStations.append(d) if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ): blockNumber=codes_get_array(bid,"blockNumber") stationNumber=codes_get_array(bid,"stationNumber"df=pd.DataFrame(fwigosStations) if len(blockNumber)==1 and len(stationNumber)==1: df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber", "wigosLocalIdentifierCharacter","oldID"]] return df def ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber))get_ident(bid): ''' gets the ident elif len(blockNumber)==1of the message by combining blockNumber and len(stationNumber)!=1: stationNumber keys from the input BUFR file the ident may blockNumber=np.repeat(blockNumber,len(stationNumber)) ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber)be single valued or multivalued ( only single valued are considered further) ''' ident=None if if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ): elif len(blockNumber)!=1 and len(stationNumber)!=1:=codes_get_array(bid,"blockNumber") ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) stationNumber=codes_get_array(bid,"stationNumber") if len(blockNumber)==1 and len(stationNumber)==1: if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG]ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber)) elif len(blockNumber)==1 return ident def copy_header(bid,obid)and len(stationNumber)!=1: ''' this function copies the header keys and avoids using the default values on the output message blockNumber=np.repeat(blockNumber,len(stationNumber)) ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) ''' bhc=codes_get(bid,"bufrHeaderCentre") codes_set(obid,"bufrHeaderCentre",bhc) if bhsc=codes_get(bid,"bufrHeaderSubCentre")b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] codes_set(obid,"bufrHeaderSubCentre",bhsc) elif usn=codes_get(bid,"updateSequenceNumber")len(blockNumber)!=1 and len(stationNumber)!=1: codes_set(obid,"updateSequenceNumber",usn) dc=codes_get(bid,"dataCategory") codes_set(obid,"dataCategory",dc) ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) if codes_is_defined(bid, "internationalDataSubCategory"): idsc=codes_get(bid,"internationalDataSubCategory") if b!=CODES_MISSING_LONG codes_set(obid,"internationalDataSubCategory",idsc) and s!=CODES_MISSING_LONG] dsc=codes_get(bid,"dataSubCategory") codes_set(obid,"dataSubCategory",dsc)''' year=codes_get(bid,"typicalYear") codes_set(obid,"typicalYear",year) month=codes_get(bid,"typicalMonth") codes_set(obid,"typicalMonth",month) day=codes_get(bid,"typicalDay") codes_set(obid,"typicalDay",day) hour=codes_get(bid,"typicalHour") codes_set(obid,"typicalHour",hour) tmin=codes_get(bid,"typicalMinute") codes_set(obid,"typicalMinute",tmin) sec=codes_get(bid,"typicalSecond") codes_set(obid,"typicalSecond",sec) return here only the first element of the list is returned to the main program this avoids lists being used in the dataframe query and breaking the logic ''' if isinstance(ident,list): ident=ident[0] return ident def add_wigos_info(ident,bid,wdfodf,obid): ''' add the wigos information to the message ident pointed by bid the wdfodf iscontains the whole wigos dataframe andWIGOS information for ident obid is the output bidhandle ''' if codes_is_defined(bid, "shortDelayedDescriptorReplicationFactor"): shortDelayed=codes_get_array(bid,"shortDelayedDescriptorReplicationFactor") else: shortDelayed=None if codes_is_defined(bid, "delayedDescriptorReplicationFactor"): delayedDesc=codes_get_array(bid,"delayedDescriptorReplicationFactor") else: delayedDesc=None if codes_is_defined(bid, "extendedDelayedDescriptorReplicationFactor"): nsubsetsextDelayedDesc=codes_get_array(bid,"numberOfSubsetsextendedDelayedDescriptorReplicationFactor") compressed=else: extDelayedDesc=None nsubsets=codes_get(bid,"numberOfSubsets") compressed=codes_get(bid,"compressedData") masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber") if masterTablesVersionNumber<28: masterTablesVersionNumber=28 unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors") outUD=list(unexpandedDescriptors) outUD.insert(0,301150) ''' only treat the uncompressed messages with 1 subset for future add treatment of compressed messages with more than 1 subset ''' if compressed==0 and nsubsets==1: if''' shortDelayed is not None: IMPORTANT, takes into account delayed replications ( all codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed)possible cases) to accommodate SYNOP + TEMP messages ''' if delayedDescshortDelayed is not None: codes_set_array(obid,"inputDelayedDescriptorReplicationFactorinputShortDelayedDescriptorReplicationFactor",delayedDescshortDelayed) if delayedDesc is copy_header(bid,obid)not None: codes_set_array(obid,"masterTablesVersionNumberinputDelayedDescriptorReplicationFactor",masterTablesVersionNumberdelayedDesc) codes_set(obid,"numberOfSubsets",nsubsets) odf=wdf.query("oldID=='{0}'".format(ident)) if not odf.emptyif extDelayedDesc is not None: codes_set_array(obid, "unexpandedDescriptorsinputExtendedDelayedDescriptorReplicationFactor",outUDextDelayedDesc) wis=odf["wigosIdentifierSeries"].values codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber) if len(wis)!=1:codes_set(obid,"numberOfSubsets",nsubsets) wis=wis[0] codes_set_array(obid, "wigosIdentifierSeriesunexpandedDescriptors",int(wisoutUD)) widwis=odf["wigosIssuerOfIdentifierwigosIdentifierSeries"].values if len(widwis)!=1: wid=widwis=wis[0] codes_set(obid,"wigosIssuerOfIdentifierwigosIdentifierSeries",int(widwis)) winwid=odf["wigosIssueNumberwigosIssuerOfIdentifier"].values if len(winwid)!=1: win=winwid=wid[0] codes_set(obid,"wigosIssueNumberwigosIssuerOfIdentifier",int(winwid)) wlidwin=odf["wigosLocalIdentifierCharacterwigosIssueNumber"].values wlid="{0:5}".format(wlid[0])if len(win)!=1: logging.info(" wlid here {0}".format(wlid)) win=win[0] codes_set(obid,"wigosLocalIdentifierCharacterwigosIssueNumber",strint(wlidwin)) codes_bufr_copy_data(bid,obid) else: wlid=odf["wigosLocalIdentifierCharacter"].values logging.info(" wigos {0} is empty for ident {1wlid="{0:5}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) else:wlid[0]) logging.info(" skippingwlid compressed message id here {0} with {1} subsets ".format(ident,nsubsetswlid)) return obidcodes_set(obid,"wigosLocalIdentifierCharacter",str(wlid)) def main():codes_bufr_copy_data(bid,obid) args=read_cmd_line()else: logfile=args.logfile logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w") infile=args.input info(" skipping compressed message id {0} with {1} subsets ".format(ident,nsubsets)) outfile=args.outputreturn mode=args.mode if mode=="web": jtext=read_oscar_web() def main(): print("ecCodes cdirectory=os.getcwd()version {0}".format(codes_get_api_version())) oscarFile=os.path.join(cdirectory,"oscar.json"args=read_cmd_line() logfile=args.logfile with open(oscarFile,logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w") as f: infile=args.input json.dump(jtext,f) outfile=args.output mode=args.mode else:if mode=="web": jtext=read_oscar_web() cdirectory=os.getcwd() oscarFile=os.path.join(cdirectory,"oscar.json") with open(oscarFile,"rw") as f: jtext=json.loaddump(jtext,f) else: cdirectory=os.getcwd() oscarFile=os.path.join(cdirectory,"oscar.json") with open(oscarFile,"r") as f: wigosDf=parse_json_into_dataframe(jtext) fjtext=open(infile,"rb"json.load(f) nmsg= wigosDf=parse_json_into_dataframe(jtext) f=open(infile,"rb") nmsg=codes_count_in_file(f) fout=open(outfile,"wb") for i in range(0,nmsg): obidbid=codes_bufr_new_from_samplesfile("BUFR4"f) bidobid=codes_bufr_new_from_file(fclone(bid) codes_set(bid, 'skipExtraKeyAttributes', 1) codes_set(bid,"unpack",1) ident=get_ident(bid) if ident: logging.info (" \t message {0} ident {1} ".format(i+1,ident)) add_wigos_info(ident,bid, wigosDf, obid) odf=wigosDf.query("oldID=='{0}'".format(ident)) codes_write(obid,fout) else: if not odf.empty: logging.info ("message {0} rejected ".format(i+1)) codes_release(obid)add_wigos_info(ident,bid, odf,obid) codes_releasewrite(bidobid,fout) f.close() else: print logging.info(" finished") if __name__ == '__main__': main() wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) else: logging.info ("message {0} rejected ".format(i+1)) codes_release(obid) codes_release(bid) f.close() print (" finished") if __name__ == '__main__': main() |
The program can be called with the following arguments
...
4) for each message, create the message identifier ( concatenation of blockNumber+stationNumber) and add the WIGOS information to the messagesthat are uncompressed stationNumber) and add the WIGOS information to the messages
that are uncompressed ( compressed =0) and single subset ( numberOfSubsets=1) if their ident matches the ones in wigosDf.
5) If get_ident function founds many idents on a message only returns the first one.
During program execution a log file is generated containing information about the processing.
At this point some caveats are needed
- Only uncompressed messages (compressed =0)
...
- and single subset (numberOfSubsets=1)
...
5) a new function ( copy_header) was added to avoid changing the header of the message. Now, it copies the keys from bid to obid except typicalDate which is read only
During program execution a log file is generated containing information about the processing.
...
- are considered
- The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
- When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string.
- The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.
Results
The output file contains 19543 SYNOP messages obtained from running the program on a input BUFR file containing raw SYNOP data received through GTS
View file | ||||
---|---|---|---|---|
|
This file contains 7 TEMP messages obtained running the program on a BUFR file containing raw TEMP messages.
View file | ||||
---|---|---|---|---|
|
- Only uncompressed messages (compressed =0) and single subset (numberOfSubsets=1) are considered
- The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
- When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string.
- The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.
Results
The output file contains 22724 messages