...
There are many situations where a user is only interested in a subset of the dataset spatial domain.
For example, when comparing modelled river flow against observations, it is reasonable to
...
be able to extract the
...
time-series at those point coordinates rather than dealing with many GB of data.
...
Similarly, when
...
focusing on a specific catchment it is likely that you want to deal with only that part of the spatial domain.
In summary, there are two operations of data size reduction that are very popular on CEMS-Flood datasets
...
, area cropping and time-series extraction.
There are
...
two ways to perform those operations:
...
- Remotely - Using the CDS API
...
- Locally (full control on the process)
Table of Contents |
---|
GloFAS
Example script to crop and extract time series from different GloFAS products:
CDS API
...
language | py |
---|---|
title | Area cropping: GloFAS Medium-range reforecast |
collapse | true |
...
- to perform the operation remotely on the CDS compute nodes and retrieve only the reduced data.
- Locally - Using the CDS API to retrieve the entire data and perform the operation locally.
This section provides scripts for both cases and for both CEMS-Flood products, GloFAS and EFAS.
Set up a Python environment
If you have not done it yet, create a Python virtual environment.
Activate the conda environment and install the additional Python package https://corteva.github.io/rioxarray
Code Block | ||||
---|---|---|---|---|
| ||||
conda install rioxarray |
Prepare and retrieve data (for local processing)
For the following exercises on extracting time series on the local machine, we are going to use the latitude and longitude coordinates from a tiny subset of the GRDC dataset.
Copy the content of the code block into an empty file named "GRDC.csv", the file should reside in your working folder.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
grdc_no,wmo_reg,sub_reg,river,station,country,lat,long,area,altitude
6118010,6,18,CLAIE,SAINT-JEAN-BREVELAY,FR,47.824831,-2.703308,137.0,99.98
6118015,6,18,YVEL,LOYAT (PONT D129),FR,47.993815,-2.368849,315.0,88.26
6118020,6,18,"ARON, RUISSEAU D'",GRAND-FOUGERAY (LA BERNADAISE),FR,47.71222,-1.690835,118.0,68.0
6118025,6,18,"CANUT, RUISSEAU DE",SAINT-JUST (LA RIVIERE COLOMBEL),FR,47.775309,-1.979609,37.0,80.22
6118030,6,18,AFF,PAIMPONT (PONT DU SECRET),FR,47.981631,-2.143762,30.2,119.01
6118050,6,18,COET-ORGAN,QUISTINIC (KERDEC),FR,47.904164,-3.201265,47.7,94.42
6118060,6,18,EVEL,GUENIN,FR,47.899928,-2.975167,316.0,95.16
6118070,6,18,STER-GOZ,BANNALEC (PONT MEYA),FR,47.906833,-3.752172,69.7,85.08
6118080,6,18,MOROS,CONCARNEAU (PONT D22),FR,47.882934,-3.875375,20 |
Then, retrieve the following datasets into the same working folder.
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import cdsapi c = cdsapi.Client() c.retrieve( 'cems-glofas-historical', { 'variable': 'river_discharge_in_the_last_24_hours', 'format': 'grib', 'hydrological_model': 'lisflood', |
...
'product_type': 'intermediate', |
...
'hyear': '2021', |
...
'hmonth': 'january', |
...
'hday': [ |
...
'01', ' |
...
02' |
...
, ' |
...
03', |
...
'04', '05', '06', |
...
'07', ' |
...
08' |
...
, ' |
...
09', |
...
'10', '11', '12', |
...
|
...
' |
...
13' |
...
, ' |
...
14', |
...
'15', |
...
'16', '17', ' |
...
18' |
...
,
|
...
'19', '20', ' |
...
21' |
...
,
|
...
'22', '23', '24', '25', '26', |
...
' |
...
27', |
...
'28', '29', '30', |
...
'31', ], 'system_version': 'version_3_1', }, 'glofas_historical.grib') |
Code Block |
---|
...
...
|
...
|
...
import cdsapi c = cdsapi.Client() c.retrieve( 'efas-reforecast', { 'format': 'grib', |
...
|
...
|
...
'product_type': 'ensemble_perturbed_reforecasts', |
...
|
...
|
...
' |
...
variable': 'river_discharge_in_the_last_6_hours', |
...
|
...
|
...
' |
...
model_levels': 'surface_level', |
...
|
...
'hyear': '2007', |
...
'hmonth': 'march', 'hday': [ |
...
|
...
'04', '07', ], ' |
...
leadtime_hour': [ |
...
'0', '12', '18', |
...
' |
...
6' |
...
, ], }, ' |
...
efas_reforecast.grib') |
EFAS
Warning | ||
---|---|---|
| ||
An issue has been identified with the EFAS sub-region extraction tool, whereby it serves data that is not correctly located on the river network. The sub-region extraction tool has therefore been removed from the EFAS CDS entries, and any area specified in cdsapi requests will return the entire domain . Data previously downloaded using this tool should be disregarded. For more information please see EFAS-Known Issues |
Warning | ||
---|---|---|
| ||
When transforming from lat/lon (source coordinates) to projected LAEA (target coordinates), you need to consider that the number of decimal places of the source coordinates affects the target coordinates precision: An interval of 0.001 degrees corresponds to about 100 metres in LAEA. An interval of 0.00001 degrees corresponds to about 1 metre in LAEA. |
Remote processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
Area cropping:
Code Block |
---|
|
...
| |||
## === retrieve |
...
EFAS Seasonal |
...
Forecast === ## === subset |
...
Switzerland region === |
...
import cdsapi if __name__ == '__main__': |
...
c = cdsapi.Client() YEARS = ['%d'%(y) for y in range( |
...
2020, |
...
2022)] MONTHS = [' |
...
%02d'%(m) for m in range(1,13)] LEADTIMES = ['%d'%(l) for l in range(24, |
...
5160,24)] for year in YEARS: for month in MONTHS: c.retrieve( ' |
...
efas-seasonal |
...
', { |
...
|
...
|
...
|
...
'variable': 'river_discharge_in_the_last_24_hours', |
...
'format': 'grib', |
...
' |
...
model_ |
...
levels': ' |
...
surface_ |
...
level', |
...
' |
...
year': year, |
...
' |
...
month': |
...
'12' if year == '2020' else month, 'leadtime_hour': LEADTIMES, |
...
'area': [ |
...
47. |
...
9, |
...
5. |
...
8, |
...
45. |
...
7, |
...
10. |
...
6 ] }, |
...
f' |
...
efas_seasonal_ |
...
{year}_{month}.grib') |
...
Local processing
Time series extraction:
Info |
---|
...
|
...
| |
EFAS x and y coordinates, when converted from GRIB to NetCDF, are not projected coordinates but matrix indexes (i, j), It is necessary to download the upstream area that contains the projected coordinates and replace them in EFAS, as described in the code block below. |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr
import pandas as pd
from pyproj import Transformer,CRS
parameter = "dis06"
ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib")
df = pd.read_csv("GRDC.csv")
uparea = xr.open_dataset("ec_uparea4.0.nc") # the upstream area
# replace x, y
ds["x"] = uparea["x"]
ds["y"] = uparea["y"]
# define reprojection parameters
laea_proj = CRS.from_proj4("+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs")
transformer = Transformer.from_crs('epsg:4326', laea_proj, always_xy=True)
total = len(df)
rows = []
count = 0
for lon, lat, id in zip(df.long, df.lat, df.grdc_no):
x1, y1 = transformer.transform(lon, lat)
extracted = ds.sel(x=x1, y=y1, number = 1, method="nearest")[parameter]
df_temp = extracted.drop_vars(["surface", "number"]).to_dataframe().reset_index()
df_temp["grdc"] = str(id)
df_temp = df_temp.drop("step", axis=1)
df_temp = df_temp.set_index(["grdc","time"])
rows.append(df_temp)
count += 1
print(f"progress: {count/total*100} %")
out = pd.concat(rows)
out.to_csv("extracted.csv", index="grdc") |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr
import rioxarray as rio
from pyproj import Transformer, CRS
import numpy as np
# Rhine's basin bounding box coordinates in WGS84.
coords = [5.450796, 11.871059, 46.296530, 50.972204] # W,E,S,N
# source/target reference systems proj
EPSG_4326 = '+proj=longlat +datum=WGS84 +no_defs'
EPSG_3035 = '+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs' # EFAS
# read EFAS reforecast and the upstream area
ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib")
uparea = xr.open_dataset("ec_uparea4.0.nc")
# replace x, y coordinates
ds["x"] = uparea["x"]
ds["y"] = uparea["y"]
# add reference system to EFAS dataset
ds = ds.rio.write_crs(EPSG_3035)
# Function to convert 4 coordinates into a Polygon
def bbox_from_wesn(coords, s_srs, t_srs):
w, e, s, n = coords
transformer = Transformer.from_crs(s_srs, t_srs, always_xy=True)
# topleft
topleft = transformer.transform(w,n)
#bottomleft
bottomleft = transformer.transform(w, s)
#topright
topright = transformer.transform(e,n)
# bottomright
bottomright = transformer.transform(e,s)
bbox = [
{
'type': 'Polygon',
'coordinates': [[
topleft, bottomleft, bottomright, topright, topleft
]]
}
]
return bbox
bbox = bbox_from_wesn(coords, s_srs=EPSG_4326, t_srs=EPSG_3035)
ds_clipped = ds.rio.clip(bbox)
ds_clipped.to_netcdf("efas_reforecast_rhyne.nc")
|
GloFAS
Remote processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import cdsapi
from datetime import datetime, timedelta
def get_monthsdays(start =[2019,1,1],end=[2019,12,31]):
# reforecast time index
start, end = datetime(*start),datetime(*end)
days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
monthday = [d.strftime("%B-%d").split("-") for d in days if d.weekday() in [0,3] ]
return monthday
if __name__ == '__main__':
c = cdsapi.Client()
# station coordinates (lat,lon)
COORDS = {
"Thames":[51.35,-0.45]
}
# select date index corresponding to the event
MONTHSDAYS = get_monthsdays(start =[2019,7,11],end=[2019,7,11])
YEAR = '2007'
LEADTIMES = ['%d'%(l) for l in range(24,1128,24)]
# loop over date index (just 1 in this case)
for md in MONTHSDAYS:
month = md[0].lower()
day = md[1]
# loop over station coordinates
for station in COORDS:
station_point_coord = COORDS[station]*2 # coordinates input for the area keyword
c.retrieve(
'cems-glofas-reforecast',
{
'system_version': 'version_2_2',
'variable': 'river_discharge_in_the_last_24_hours',
'format': 'grib',
'hydrological_model': 'htessel_lisflood',
'product_type': ['control_reforecast','ensemble_perturbed_reforecasts'],
'area':station_point_coord,
'hyear': YEAR,
'hmonth': month ,
'hday': day ,
'leadtime_hour': LEADTIMES,
},
f'glofas_reforecast_{station}_{month}_{day}.grib') |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Medium-Range Reforecast ===
## === subset India, Pakistan, Nepal and Bangladesh region ===
import cdsapi
from datetime import datetime, timedelta
def get_monthsdays():
start, end = datetime(2019, 1, 1), datetime(2019, 12, 31)
days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
monthday = [d.strftime("%B-%d").split("-") for d in days if d.weekday() in [0,3] ]
return monthday
MONTHSDAYS = get_monthsdays()
if __name__ == '__main__':
c = cdsapi.Client()
# user inputs
BBOX = [40.05 ,59.95, 4.95, 95.05] # North West South East
YEARS = ['%d'%(y) for y in range(1999,2019)]
LEADTIMES = ['%d'%(l) for l in range(24,1128,24)]
# submit request
for md in MONTHSDAYS:
month = md[0].lower()
day = md[1]
c.retrieve(
'cems-glofas-reforecast',
{
'system_version': 'version_2_2',
'variable': 'river_discharge_in_the_last_24_hours',
'format': 'grib',
'hydrological_model': 'htessel_lisflood',
'product_type': 'control_reforecast',
'area': BBOX,# < - subset
'hyear': YEARS,
'hmonth': month ,
'hday': day , |
...
import cdsapi
from datetime import datetime, timedelta
def get_monthsdays(start =[2019,1,1],end=[2019,12,31]):
# reforecast time index
start, end = datetime(*start),datetime(*end)
days = [start + timedelta(days=i) for i in range((end - start).days + 1)]
monthday = [d.strftime("%B-%d").split("-") for d in days if d.weekday() in [0,3] ]
return monthday
if __name__ == '__main__':
c = cdsapi.Client()
# station coordinates (lat,lon)
COORDS = {
"Thames":[51.35,-0.45]
}
# select date index corresponding to the event
MONTHSDAYS = get_monthsdays(start =[2019,7,11],end=[2019,7,11])
YEAR = '2007'
LEADTIMES = ['%d'%(l) for l in range(24,1128,24)]
# loop over date index (just 1 in this case)
for md in MONTHSDAYS:
month = md[0].lower()
day = md[1]
# loop over station coordinates
for station in COORDS:
station_point_coord = COORDS[station]*2 # coordinates input for the area keyword
c.retrieve(
'cems-glofas-reforecast',
{
'system_version': 'version_2_2',
'variable': 'river_discharge_in_the_last_24_hours',
'format': 'grib',
'hydrological_model': 'htessel_lisflood',
'product_type': ['control_reforecast','ensemble_perturbed_reforecasts'],
'area':station_point_coord,
'hyear': YEAR,
'hmonth': month ,
'hday': day ,
'leadtime_hour': LEADTIMES,
},
f'glofas_reforecast_{station}_{month}_{day}.grib')
Local machine
EFAS
Warning | ||
---|---|---|
| ||
When transforming from lat/lon (source coordinates) to projected LAEA (target coordinates), you need to consider that the number of decimal places of the source coordinates affects the target coordinates precision: An interval of 0.001 degrees corresponds to about 100 metres in LAEA. An interval of 0.00001 degrees corresponds to about 1 metre in LAEA. |
CDS API
to update once cropping works....
Local machine
Time series extraction
We are going to extract EFAS reforecast's timeseries at locations defined by latitude and longitude coordinates from a tiny subset of the GRDC dataset.
Info | ||
---|---|---|
| ||
EFAS data's x and y coordinates are not projected coordinates but matrix indexes (i, j), It is necessary to download the upstream area static file that contains the projected coordinates and replace it in EFAS. |
...
title | Get the EFAS reforecast data |
---|---|
collapse | true |
...
|
...
'leadtime_hour': LEADTIMES, |
...
}, |
...
Copy the content into an empty file named "GRDC.csv", the file should reside in the same folder of the efas_reforecast.grib file and the upstream area.
Code Block | ||||
---|---|---|---|---|
| ||||
grdc_no,wmo_reg,sub_reg,river,station,country,lat,long,area,altitude
6118010,6,18,CLAIE,SAINT-JEAN-BREVELAY,FR,47.824831,-2.703308,137.0,99.98
6118015,6,18,YVEL,LOYAT (PONT D129),FR,47.993815,-2.368849,315.0,88.26
6118020,6,18,"ARON, RUISSEAU D'",GRAND-FOUGERAY (LA BERNADAISE),FR,47.71222,-1.690835,118.0,68.0
6118025,6,18,"CANUT, RUISSEAU DE",SAINT-JUST (LA RIVIERE COLOMBEL),FR,47.775309,-1.979609,37.0,80.22
6118030,6,18,AFF,PAIMPONT (PONT DU SECRET),FR,47.981631,-2.143762,30.2,119.01
6118050,6,18,COET-ORGAN,QUISTINIC (KERDEC),FR,47.904164,-3.201265,47.7,94.42
6118060,6,18,EVEL,GUENIN,FR,47.899928,-2.975167,316.0,95.16
6118070,6,18,STER-GOZ,BANNALEC (PONT MEYA),FR,47.906833,-3.752172,69.7,85.08
6118080,6,18,MOROS,CONCARNEAU (PONT D22),FR,47.882934,-3.875375,20 |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr
import pandas as pd
from pyproj import Transformer,CRS
parameter = "dis06"
ds = xr.open_dataset("efas_reforecast.grib", engine="cfgrib")
df = pd.read_csv("GRDC.csv")
uparea = xr.open_dataset("ec_uparea4.0.nc") # the upstream area
# replace x, y
ds["x"] = uparea["x"]
ds["y"] = uparea["y"]
# define reprojection parameters
laea_proj = CRS.from_proj4("+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs")
transformer = Transformer.from_crs('epsg:4326', laea_proj, always_xy=True)
total = len(df)
rows = []
count = 0
for lon, lat, id in zip(df.long, df.lat, df.grdc_no):
x1, y1 = transformer.transform(lon, lat)
extracted = ds.sel(x=x1, y=y1, number = 1, method="nearest")[parameter]
df_temp = extracted.drop_vars(["surface", "number"]).to_dataframe().reset_index()
df_temp["grdc"] = str(id)
df_temp = df_temp.drop("step", axis=1)
df_temp = df_temp.set_index(["grdc","time"])
rows.append(df_temp)
count += 1
print(f"progress: {count/total*100} %")
out = pd.concat(rows)
out.to_csv("extracted.csv", index="grdc") |
Code Block | ||||
---|---|---|---|---|
| ||||
grdc,time,y,x,latitude,longitude,valid_time,dis06
6118010,2007-03-04,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-04 00:00:00,9.68457
6118010,2007-03-04,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-04 06:00:00,9.5390625
6118010,2007-03-04,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-04 12:00:00,9.783691
6118010,2007-03-04,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-04 18:00:00,9.83252
6118010,2007-03-07,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-07 00:00:00,11.904785
6118010,2007-03-07,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-07 06:00:00,12.73584
6118010,2007-03-07,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-07 12:00:00,12.832031
6118010,2007-03-07,2827500.0,3372500.0,47.82327782578349,-2.7316459165737292,2007-03-07 18:00:00,13.336914
6118015,2007-03-04,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-04 00:00:00,11.508301
6118015,2007-03-04,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-04 06:00:00,11.17334
6118015,2007-03-04,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-04 12:00:00,11.09082
6118015,2007-03-04,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-04 18:00:00,11.20752
6118015,2007-03-07,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-07 00:00:00,14.920898
6118015,2007-03-07,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-07 06:00:00,15.956055
6118015,2007-03-07,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-07 12:00:00,16.127441
6118015,2007-03-07,2842500.0,3402500.0,48.001715448932764,-2.3682021184138047,2007-03-07 18:00:00,15.900879
6118020,2007-03-04,2802500.0,3447500.0,47.71290346585623,-1.6892419697226784,2007-03-04 00:00:00,13.782227
|
Cropping to a bounding box
...
language | py |
---|---|
title | Script |
collapse | true |
...
f'glofas_reforecast_{month}_{day}.grib') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Seasonal Forecast ===
## === subset South America/Amazon region ===
import cdsapi
if __name__ == '__main__':
c = cdsapi.Client()
YEARS = ['%d'%(y) for y in range(2020,2022)]
MONTHS = ['%02d'%(m) for m in range(1,13)]
LEADTIMES = ['%d'%(l) for l in range(24,2976,24)]
for year in YEARS:
for month in MONTHS:
c.retrieve(
'cems-glofas-seasonal',
{
'variable': 'river_discharge_in_the_last_24_hours',
'format': 'grib',
'year': year,
'month': '12' if year == '2020' else month,
'leadtime_hour': LEADTIMES,
'area': [ 10.95, -90.95, -30.95, -29.95 ]
},
f'glofas_seasonal_{year}_{month}.grib') |
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
## === retrieve GloFAS Seasonal Reforecast ===
## === subset South America/Amazon region ===
import cdsapi
if __name__ == '__main__':
c = cdsapi.Client()
YEARS = ['%d'%(y) for y in range(1981,2021)]
MONTHS = ['january', 'february', 'march', 'april', 'may', 'june', 'july', 'august', 'september', 'october', 'november', 'december']
LEADTIMES = ['%d'%(l) for l in range(24,2976,24)]
for year in YEARS:
for month in MONTHS:
c.retrieve(
'cems-glofas-seasonal-reforecast',
{
'system_version': 'version_2_2',
'variable':'river_discharge_in_the_last_24_hours',
'format':'grib',
'hydrological_model':'htessel_lisflood',
'hyear': year,
'hmonth': month,
'leadtime_hour': LEADTIMES,
'area': [ 10.95, -90.95, -30.95, -29.95 ]
},
f'glofas_seasonal_reforecast_{year}_{month}.grib') |
Local processing
Time series extraction:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr
import pandas as pd
parameter = "dis24"
ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib",backend_kwargs={'time_dims':['time']})
df = pd.read_csv("GRDC.csv")
total = len(df)
rows = []
count = 0
for lon, lat, id in zip(df.long, df.lat, df.grdc_no):
extracted = ds.sel(longitude=lon, latitude=lat, method="nearest")[parameter]
df_temp = extracted.drop_vars(["surface"]).to_dataframe().reset_index()
df_temp["grdc"] = str(id)
df_temp = df_temp.set_index(["grdc", "time"])
rows.append(df_temp)
count += 1
print(f"progress: {count/total*100} %")
out = pd.concat(rows)
out.to_csv("extracted.csv", index="grdc") |
Area cropping:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import xarray as xr
# Rhine's basin bounding box
bbox = [50.972204, 46.296530, 5.450796, 11.871059] # N,S,W,E
ds = xr.open_dataset("glofas_historical.grib", engine="cfgrib")
|
...
ds_cropped = ds.sel(
|
...
longitude=slice( |
...
bbox[ |
...
2], |
...
bbox[ |
...
3]), |
...
latitude=slice( |
...
bbox[0], |
...
bbox[1]) ) ds_cropped.to_netcdf(" |
...
glofas_ |
...
historical_cropped.nc") |