You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Introduction

Below are simple checks which can be implemented to ensure that the file content is as expected. It is assumed that all fields in all expected files should remain the same if there is not any change or issue in data production (dispatching etc).

  • the number of all fields must be as expected
  • the actual full field list must be the same as expected

An example how to create a reference field list from given files, GRIBs in this case, and compare it to an actual field list follows.

The same approach can be used for any type of files but an appropriate tool for field list creation must exist or be coded.

Workflow

  • create a reference field list
    • get full sample data and  check thoroughly that it contains all expected fields
      • if this is the case the field list created as per below can be stored for future needs as the valid reference
    • in case of a change in the data (meaning e.g. new or removed fields after a model's upgrade) a new valid reference must be created
  • create an actual field list
    • as a first quick check, e.g. after getting all data, one can compare that the number of all fields is equal to the number of all reference fields
    • following full reference check means comparing full field list to the reference one

Examples of get_field_list.py usage

An example of creation of the reference or actual field list  using python script get_field_list.py (ecCodes python api is prerequisite).

  • this is a version of  get_field_list.py modified for LC-WFV data sets' needs
    • each data set requires to define different unique GRIB keys which must unambiguously identify  any expected field
    • it is rather straightforward to modify the script for other data sets
#!/bin/ksh
set -ex

# $reflist is a link to the reference field list
# $DTS_ALLOW_NEW_REFERENCE is "true" if a new reference is required/expected

# get actual field list for comparison to the reference
python $DTS_BIN/get_field_list.py -c lw.grib2 > list.tmp
awk '{print $1}' list.tmp | sort > list

# check if anything changed
diff --changed-group-format='%%<' --unchanged-group-format='' list $reflist > diff.added.tmp || true
diff --changed-group-format='%%>' --unchanged-group-format='' list $reflist > diff.removed.tmp || true
cat diff.added.tmp   | sort > diff.added
cat diff.removed.tmp | sort > diff.removed

if [[ -s diff.added || -s diff.removed ]] ; then
  # some differences found..

    if [[ "${DTS_ALLOW_NEW_REFERENCE}" = "true" ]] ; then
      cp -f list $reflist
      echo "A new partial reference field list created! ($reflist)"
    else
      echo "Differences comparing to the actual reference field list found!"
      exit -1
  fi

else
  smslabel info "The actual reference is valid ($reflist)"
fi




cc_ref=$(cat $reflist | wc -l)
cc=$(grib_count $INP_FILE_NAME)
if [[ $cc -ne $cc_ref ]] ; then
.....
  • No labels