...
The outline of this page is :
1) Problem description
2) Program flow
3) Test data file and caveats
Data date of predefined data set is: 2019-10-15 till 2019-10-17
1) Description
The WIGOS id contains four parts such as 0-2XXXX-0-YYYYY,
...
old stations and their WIGOS ids.
2)Program description
Code Block | ||
---|---|---|
| ||
''' Created on 22 Oct 2019 # Copyright 2005-2018 ECMWF. # This software is licensed under the terms of the Apache Licence Version 2.0 # which can be obtained at http://www.apache.org/licenses/LICENSE-2.0. # In applying this licence, ECMWF does not waive the privileges and immunities # granted to it by virtue of its status as an intergovernmental organisation # nor does it submit to any jurisdiction This is a test program to encode Wigos Synop requires 1) ecCodes version 2.814.1 or above (available at https://confluence.ecmwf.int/display/ECC/Releases) 2) python3.6.8-01 To run the program -i <input bufr >./addWigosProg.py -m <mode [web|json]> -l <logFile> -o <output BUFR file>i synop_multi_subset.bufr -o out_synop_multisubset.bufr -w WIGOS_TEMP_IDENT.csv Uses BUFR version 4 template and adds the WIGOS Identifier 301150 REQUIRES TablesVersionNumber above 28 Author : Roberto Ribas Garcia ECMWF 28/10/2019 Modifications Addedperformance copy_headerimprovement function( touses keepskipExtraKeyAttributes) the header keys from the input message ''' from eccodes import * import argparse import json import re import pandas as pd import numpy as np import logging import requests import os def read_cmd_line(): p=argparse.ArgumentParser() p.add_argument("-i","--input",help="input bufr file") and codes_clone 04/11/2019 changes for SYNOP and TEMP messages 05/11/2019 fixed codes_clone issue 05/11/2019 ''' from eccodes import * import argparse import json import re import pandas as pd import numpy as np import logging import requests import os def read_cmd_line(): p=argparse.ArgumentParser() p.add_argument("-i","--input",help="input bufr file") p.add_argument("-o","--output",help="output bufr file with wigos") p.add_argument("-m","--mode",choices=["web","json"],help=" wigos source [ json file or web ]") p.add_argument("-l","--logfile",help="log file ") args=p.parse_args() return args def read_oscar_json(jsonFile): with open(jsonFile,"r") as f: jtext=json.load(f) return jtext def read_oscar_web(oscarURL="https://oscar.wmo.int/surface/rest/api/search/station?"): r=requests.get(oscarURL) jtext=json.loads(r.text) return jtext def parse_json_into_dataframe(jtext): ''' parses the JSON from the file wigosJsonFile filters the stations by wigosStationIdentifiers key in the dictionaries ''' wigosStations=[] nowigosStations=[] for d in jtext: if "wigosStationIdentifiers" in d.keys(): wigosStations.append(d) else: nowigosStations.append(d) ''' uses only the wigos 0-20XXX-0-YYYYY (surface) ''' p=re.compile("0-20\d{3}-0-\d{5}") fwigosStations=[] for d in wigosStations: wigosInfo=d["wigosStationIdentifiers"] for e in wigosInfo: if e["primary"]==True: wigosId=e["wigosStationIdentifier"] if p.match(wigosId): wigosParts=wigosId.split("-") d["wigosIdentifierSeries"]=wigosParts[0] d["wigosIssuerOfIdentifier"]=wigosParts[1] d["wigosIssueNumber"]=wigosParts[2] d["wigosLocalIdentifierCharacter"]=wigosParts[3] d["oldID"]=wigosParts[3][-5:] fwigosStations.append(d) df=pd.DataFrame(fwigosStations) df=df[["longitude","latitude","name","wigosStationIdentifiers","wigosIdentifierSeries","wigosIssuerOfIdentifier","wigosIssueNumber", "wigosLocalIdentifierCharacter","oldID"]] return df def get_ident(bid): ''' gets the ident of the message by combining blockNumber and stationNumber keys from the input BUFR file the ident may be single valued or multivalued ( only single valued are considered further) ''' ident=None if ( codes_is_defined(bid, "blockNumber") and codes_is_defined(bid,"stationNumber") ): blockNumber=codes_get_array(bid,"blockNumber") stationNumber=codes_get_array(bid,"stationNumber") if len(blockNumber)==1 and len(stationNumber)==1: ident="{0:02d}{1:03d}".format(int(blockNumber),int(stationNumber)) elif len(blockNumber)==1 and len(stationNumber)!=1: blockNumber=np.repeat(blockNumber,len(stationNumber)) ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] elif len(blockNumber)!=1 and len(stationNumber)!=1: ident=[str("{0:02d}{1:03d}".format(b,s)) for b,s in zip(blockNumber,stationNumber) if b!=CODES_MISSING_LONG and s!=CODES_MISSING_LONG] ''' here return ident def copy_header(bid,obid):only the first element of the list is returned to the main program ''' this function copiesavoids lists being used in the headerdataframe keysquery and breaking the '''logic bhc=codes_get(bid,"bufrHeaderCentre") codes_set(obid,"bufrHeaderCentre",bhc) ''' bhsc=codes_get(bid,"bufrHeaderSubCentre") if codes_setisinstance(obid,"bufrHeaderSubCentre",bhsc) ident,list): usn=codes_get(bid,"updateSequenceNumber") ident=ident[0] codes_set(obid,"updateSequenceNumber",usn)return ident dc=codes_get(bid,"dataCategory") def add_wigos_info(ident,bid,odf,obid): codes_set(obid,"dataCategory",dc) dsc=codes_get(bid,"dataSubCategory") codes_set(obid,"dataSubCategory",dsc) year=codes_get(bid,"typicalYear")''' add the wigos information to the message ident pointed by bid the odf contains the WIGOS information for ident codes_set(obid,"typicalYear",year) month=codes_get(bid,"typicalMonth") is the output handle codes_set(obid,"typicalMonth",month)''' day=codes_get(bid,"typicalDay") if codes_is_setdefined(obidbid, "typicalDayshortDelayedDescriptorReplicationFactor",day): hour shortDelayed=codes_get_array(bid,"typicalHourshortDelayedDescriptorReplicationFactor") codes_set(obid,"typicalHour",hour)else: shortDelayed=None tmin=if codes_is_getdefined(bid, "typicalMinutedelayedDescriptorReplicationFactor"): codes_set(obid,"typicalMinute",tmin) secdelayedDesc=codes_get_array(bid,"typicalSeconddelayedDescriptorReplicationFactor") codes_set(obid,"typicalSecond",sec)else: return delayedDesc=None def add_wigos_info(ident,bid,wdf,obid): ''' add the wigos information to the message ident pointed by bidif codes_is_defined(bid, "extendedDelayedDescriptorReplicationFactor"): extDelayedDesc=codes_get_array(bid,"extendedDelayedDescriptorReplicationFactor") theelse: wdf is the whole wigos dataframe and obid is the output bidextDelayedDesc=None ''' nsubsets=codes_get(bid,"numberOfSubsets") if compressed=codes_is_definedget(bid, "shortDelayedDescriptorReplicationFactorcompressedData"): shortDelayedmasterTablesVersionNumber=codes_get_array(bid,"shortDelayedDescriptorReplicationFactormasterTablesVersionNumber") elseif masterTablesVersionNumber<28: shortDelayed=None masterTablesVersionNumber=28 if unexpandedDescriptors=codes_isget_definedarray(bid, "delayedDescriptorReplicationFactorunexpandedDescriptors"): outUD=list(unexpandedDescriptors) delayedDesc=codes_get_array(bid,"delayedDescriptorReplicationFactor") outUD.insert(0,301150) else: delayedDesc=None ''' only treat the uncompressed messages with 1 subset for future add treatment of compressed messages nsubsets=codes_get(bid,"numberOfSubsets") compressed=codes_get(bid,"compressedData")with more than 1 subset ''' masterTablesVersionNumber=codes_get(bid,"masterTablesVersionNumber") if masterTablesVersionNumber<28compressed==0 and nsubsets==1: masterTablesVersionNumber=28 ''' IMPORTANT, takes into account delayed replications unexpandedDescriptors=codes_get_array(bid,"unexpandedDescriptors")( all possible cases) to accommodate outUD=list(unexpandedDescriptors) outUD.insert(0,301150) SYNOP + TEMP messages ''' only treat the uncompressed messagesif withshortDelayed 1is subsetnot None: for future add treatment of compressed messages with more than 1 subset ''' codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactor",shortDelayed) if delayedDesc is not None: if compressed==0 and nsubsets==1: codes_set_array(obid,"inputDelayedDescriptorReplicationFactor",delayedDesc) if shortDelayedextDelayedDesc is not None: codes_set_array(obid,"inputShortDelayedDescriptorReplicationFactorinputExtendedDelayedDescriptorReplicationFactor",shortDelayedextDelayedDesc) if delayedDesc is not None: codes_set(obid,"masterTablesVersionNumber",masterTablesVersionNumber) codes_set_array(obid,"inputDelayedDescriptorReplicationFactornumberOfSubsets",delayedDescnsubsets) copy_header(bid,obid) codes_set_array(obid, "masterTablesVersionNumberunexpandedDescriptors",masterTablesVersionNumberoutUD) codes_set(obid,"numberOfSubsets",nsubsets) wis=odf["wigosIdentifierSeries"].values if odf=wdf.query("oldID=='{0}'".format(ident))len(wis)!=1: if not odf.empty: wis=wis[0] codes_set_array(obid, "unexpandedDescriptorswigosIdentifierSeries",outUDint(wis)) wiswid=odf["wigosIdentifierSerieswigosIssuerOfIdentifier"].values if len(wiswid)!=1: wis=wiswid=wid[0] codes_set(obid,"wigosIdentifierSerieswigosIssuerOfIdentifier",int(wiswid)) widwin=odf["wigosIssuerOfIdentifierwigosIssueNumber"].values if len(widwin)!=1: wid=widwin=win[0] codes_set(obid,"wigosIssuerOfIdentifierwigosIssueNumber",int(widwin)) win=odf["wigosIssueNumber"].values if len(win)!=1:wlid=odf["wigosLocalIdentifierCharacter"].values wlid="{0:5}".format(wlid[0]) win=win[0] logging.info(" wlid here {0}".format(wlid)) codes_set(obid,"wigosIssueNumberwigosLocalIdentifierCharacter",intstr(winwlid)) codes_bufr_copy_data(bid,obid) else: logging.info(" skipping compressed message id {0} with {1} subsets wlid=odf["wigosLocalIdentifierCharacter"].values .format(ident,nsubsets)) return wlid="{0:5}".format(wlid[0]) def main(): logging.infoprint("ecCodes wlid hereversion {0}".format(wlidcodes_get_api_version())) args=read_cmd_line() logfile=args.logfile codes_set(obid,"wigosLocalIdentifierCharacter",str(wlid))logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w") infile=args.input outfile=args.output codes_bufr_copy_data(bid,obid) mode=args.mode elseif mode=="web": jtext=read_oscar_web() logging.info(" wigos {0} is empty for ident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) else: logging.info(" skipping compressed message id {0} with {1} subsets ".format(ident,nsubsets)) return obid def main(): args=read_cmd_line() logfile=args.logfile logging.basicConfig(filename=logfile,level=logging.INFO,filemode="w") infile=args.input outfile=args.output mode=args.mode if mode=="web": jtext=read_oscar_web() cdirectory=os.getcwd() oscarFile=os.path.join(cdirectory,"oscar.json") with open(oscarFile,"w") as f: json.dump(jtext,f) else cdirectory=os.getcwd() oscarFile=os.path.join(cdirectory,"oscar.json") with open(oscarFile,"w") as f: json.dump(jtext,f) else: cdirectory=os.getcwd() oscarFile=os.path.join(cdirectory,"oscar.json") with open(oscarFile,"r") as f: jtext=json.load(f) wigosDf=parse_json_into_dataframe(jtext) f=open(infile,"rb") nmsg=codes_count_in_file(f) fout=open(outfile,"wb") for i in range(0,nmsg): cdirectory=os.getcwd(bid=codes_bufr_new_from_file(f) oscarFile=os.path.join(cdirectory,"oscar.json"obid=codes_clone(bid) with open(oscarFile,"r") as f:codes_set(bid, 'skipExtraKeyAttributes', 1) codes_set(bid,"unpack",1) jtext=json.load(f ident=get_ident(bid) if ident: wigosDf=parse_json_into_dataframe(jtext) f=open(infile,"rb") nmsg=codes_count_in_file(f) fout=open(outfile,"wb") for i in range(0,nmsg): logging.info (" \t message {0} ident {1} ".format(i+1,ident)) odf=wigosDf.query("oldID=='{0}'".format(ident)) obid=codes_bufr_new_from_samples("BUFR4") bid=codes_bufr_new_from_file(f) if not odf.empty: codes_set(bid,"unpack",1) ident=get_ident(bidadd_wigos_info(ident,bid, odf,obid) if ident: codes_write(obid,fout) logging.info (" \t message {0} ident {1} ".format(i+1,ident)) else: add_wigos_info(ident,bid, wigosDf, obid) logging.info(" wigos {0} is empty for codes_write(obid,foutident {1}".format(ident,odf["wigosLocalIdentifierCharacter"].values)) else: logging.info ("message {0} rejected ".format(i+1)) codes_release(obid) codes_release(bid) f.close() print (" finished") if __name__ == '__main__': main() |
The program can be called with the following arguments
...
that are uncompressed ( compressed =0) and single subset ( numberOfSubsets=1) if their ident matches the ones in wigosDf.
5) a new function ( copy_header) was added to avoid changing the header of the message. Now, it copies the keys from bid to obid except typicalDate which is read onlyIf get_ident function founds many idents on a message only returns the first one.
During program execution a log file is generated containing information about the processing.
...
- Only uncompressed messages (compressed =0) and single subset (numberOfSubsets=1) are considered
- The Oscar information retrieved from the web server has to be cleared for this program to work. This is the goal of the function parse_json_into_dataframe that uses regular expressions to filter out the WIGOS data.
- When setting the WIGOS information It is important to preserve the data types , for example "wigosLocalIdentifierCharacter" is a character string.
- The masterTablesVersionNumber must be above 28 otherwise no WIGOS ids can be added. This is done in the add_wigos_info function that updates the table version number key for each message processed.
Results
The output file contains 22724 messages19543 SYNOP messages obtained from running the program on a input BUFR file containing raw SYNOP data received through GTS
View file | ||||
---|---|---|---|---|
|
This file contains 7 TEMP messages obtained running the program on a BUFR file containing raw TEMP messages.
View file | ||||
---|---|---|---|---|
|