miranda.convert package#
Data Conversion module.
- miranda.convert.aggregate(ds: Dataset, freq: str = 'day') dict[str, Dataset] [source]#
- Parameters:
ds (xarray.Dataset)
freq (str)
- Returns:
dict[str, xarray.Dataset]
- miranda.convert.aggregations_possible(ds: Dataset, freq: str = 'day') dict[str, set[str]] [source]#
Determine which aggregations are possible based on variables within a dataset.
- Parameters:
ds (xarray.Dataset)
freq (str)
- Returns:
dict[str, set[str]]
- miranda.convert.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float | bool = False, regrid: bool = False, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray [source]#
Convert an existing Xarray-compatible dataset to another format with variable corrections applied.
- Parameters:
input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with
xarray.open_mfdataset()
and concatenate files.project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float or bool) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: False.
regrid (bool) – Performing regridding with xesmf. Default: False.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert.dataset_corrections(ds: Dataset, project: str) Dataset [source]#
Convert variables to CF-compliant format
- miranda.convert.dims_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Rename dimensions to CF to their equivalents.
- Parameters:
d (xarray.Dataset) – Dataset with dimensions to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert.gather_agcfsr(path: str | PathLike) dict[str, list[Path]] [source]#
Gather agCFSR source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_agmerra(path: str | PathLike) dict[str, list[Path]] [source]#
Gather agMERRA source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) dict[str, list[Path]] [source]#
- Parameters:
project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_emdna(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw EMDNA files for preprocessing.
Put all files with the same member together.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_grnch(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw ETS-GRNCH files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, dict(str, list[Path])) or None
- miranda.convert.gather_nex(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw NEX files for preprocessing.
Put all files that should be contained in one dataset in one entry of the dictionary.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_nrcan_gridded_obs(path: str | PathLike) dict[str, list[Path]] [source]#
Gather NRCan Gridded Observations source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert.gather_raw_rdrs_by_years(path: str | PathLike) dict[str, dict[str, list[Path]]] [source]#
Gather raw RDRS files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, dict[str, list[pathlib.Path]]
- miranda.convert.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) dict[str, dict[str, list[Path]]] [source]#
Gather RDRS processed source data.
- Parameters:
name (str)
path (str or os.PathLike)
suffix (str)
key ({“raw”, “cf”}) – Indicating which variable name dictionary to search for.
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_sc_earth(path: str | PathLike) dict[str, list[Path]] [source]#
Gather SC-Earth source data
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.gather_wfdei_gem_capa(path: str | PathLike) dict[str, list[Path]] [source]#
Gather WFDEI-GEM-CaPa source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert.load_json_data_mappings(project: str) dict[str, Any] [source]#
Load JSON mappings for supported dataset conversions.
- Parameters:
project (str)
- Returns:
dict[str, Any]
- miranda.convert.metadata_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Update xarray dataset and data_vars with project-specific metadata fields.
- Parameters:
d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray [source]#
Land-Sea mask operations.
- Parameters:
ds (xr.Dataset or str or os.PathLike)
mask (xr.Dataset or xr.DataArray)
mask_cutoff (float or bool)
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert.variable_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Add variable metadata and remove nonstandard entries.
- Parameters:
d (xarray.Dataset) – Dataset with variable(s) to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
Submodules#
miranda.convert._aggregation module#
Aggregation module.
miranda.convert._data_corrections module#
- miranda.convert._data_corrections.dataset_conversion(input_files: str | PathLike | Sequence[str | PathLike] | Iterator[PathLike] | Dataset, project: str, domain: str | None = None, mask: Dataset | DataArray | None = None, mask_cutoff: float | bool = False, regrid: bool = False, add_version_hashes: bool = True, preprocess: Callable | str | None = 'auto', **xr_kwargs) Dataset | DataArray [source]#
Convert an existing Xarray-compatible dataset to another format with variable corrections applied.
- Parameters:
input_files (str or os.PathLike or Sequence[str or os.PathLike] or Iterator[os.PathLike] or xr.Dataset) – Files or objects to be converted. If sent a list or GeneratorType, will open with
xarray.open_mfdataset()
and concatenate files.project ({“cordex”, “cmip5”, “cmip6”, “ets-grnch”, “isimip-ft”, “pcic-candcs-u6”, “converted”}) – Project name for decoding/handling purposes.
domain ({“global”, “nam”, “can”, “qc”, “mtl”}, optional) – Domain to perform subsetting for. Default: None.
mask (Optional[Union[xr.Dataset, xr.DataArray]]) – DataArray or single data_variable dataset containing mask.
mask_cutoff (float or bool) – If land_sea_mask supplied, the threshold above which to mask with land_sea_mask. Default: False.
regrid (bool) – Performing regridding with xesmf. Default: False.
add_version_hashes (bool) – If True, version name and sha256sum of source file(s) will be added as a field among the global attributes.
preprocess (callable or str, optional) – Preprocessing functions to perform over each Dataset. Default: “auto” - Run preprocessing fixes based on supplied fields from metadata definition. Callable - Runs function over Dataset (single) or supplied to preprocess (multifile dataset).
**xr_kwargs – Arguments passed directly to xarray.
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert._data_corrections.dataset_corrections(ds: Dataset, project: str) Dataset [source]#
Convert variables to CF-compliant format
- miranda.convert._data_corrections.dims_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Rename dimensions to CF to their equivalents.
- Parameters:
d (xarray.Dataset) – Dataset with dimensions to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert._data_corrections.load_json_data_mappings(project: str) dict[str, Any] [source]#
Load JSON mappings for supported dataset conversions.
- Parameters:
project (str)
- Returns:
dict[str, Any]
- miranda.convert._data_corrections.metadata_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Update xarray dataset and data_vars with project-specific metadata fields.
- Parameters:
d (xarray.Dataset) – Dataset with metadata to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
- miranda.convert._data_corrections.threshold_mask(ds: Dataset | DataArray, *, mask: Dataset | DataArray, mask_cutoff: float | bool = False) Dataset | DataArray [source]#
Land-Sea mask operations.
- Parameters:
ds (xr.Dataset or str or os.PathLike)
mask (xr.Dataset or xr.DataArray)
mask_cutoff (float or bool)
- Returns:
xr.Dataset or xr.DataArray
- miranda.convert._data_corrections.variable_conversion(d: Dataset, p: str, m: dict) Dataset [source]#
Add variable metadata and remove nonstandard entries.
- Parameters:
d (xarray.Dataset) – Dataset with variable(s) to be updated.
p (str) – Dataset project name.
m (dict) – Metadata definition dictionary for project and variable(s).
- Returns:
xarray.Dataset
miranda.convert._data_definitions module#
- miranda.convert._data_definitions.gather_agcfsr(path: str | PathLike) dict[str, list[Path]] [source]#
Gather agCFSR source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_agmerra(path: str | PathLike) dict[str, list[Path]] [source]#
Gather agMERRA source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_ecmwf(project: str, path: str | PathLike, back_extension: bool = False, monthly_means: bool = False) dict[str, list[Path]] [source]#
- Parameters:
project ({“era5-single-levels”, “era5-pressure-levels”, “era5-land”})
path (str or os.PathLike)
back_extension (bool)
monthly_means (bool)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_emdna(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw EMDNA files for preprocessing.
Put all files with the same member together.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_grnch(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw ETS-GRNCH files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, dict(str, list[Path])) or None
- miranda.convert._data_definitions.gather_nex(path: str | PathLike) dict[str, list[Path]] [source]#
Gather raw NEX files for preprocessing.
Put all files that should be contained in one dataset in one entry of the dictionary.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_nrcan_gridded_obs(path: str | PathLike) dict[str, list[Path]] [source]#
Gather NRCan Gridded Observations source data.
- Parameters:
path (str or os.PathLike)
- Returns:
dict(str, list[pathlib.Path])
- miranda.convert._data_definitions.gather_raw_rdrs_by_years(path: str | PathLike) dict[str, dict[str, list[Path]]] [source]#
Gather raw RDRS files for preprocessing.
- Parameters:
path (str or os.PathLike)
- Returns:
dict[str, dict[str, list[pathlib.Path]]
- miranda.convert._data_definitions.gather_rdrs(name: str, path: str | PathLike, suffix: str, key: str) dict[str, dict[str, list[Path]]] [source]#
Gather RDRS processed source data.
- Parameters:
name (str)
path (str or os.PathLike)
suffix (str)
key ({“raw”, “cf”}) – Indicating which variable name dictionary to search for.
- Returns:
dict[str, list[pathlib.Path]]
miranda.convert._reconstruction module#
- miranda.convert._reconstruction.reanalysis_processing(data: dict[str, list[str | PathLike]], output_folder: str | PathLike, variables: Sequence[str], aggregate: str | bool = False, domains: str | list[str] = '_DEFAULT', start: str | None = None, end: str | None = None, target_chunks: dict | None = None, output_format: str = 'netcdf', overwrite: bool = False, engine: str = 'h5netcdf', n_workers: int = 4, **dask_kwargs) None [source]#
- Parameters:
data (dict[str, list[str]])
output_folder (str or os.PathLike)
variables (Sequence[str])
aggregate ({“day”, None})
domains ({“QC”, “CAN”, “AMNO”, “NAM”, “GLOBAL”})
start (str, optional)
end (str, optional)
target_chunks (dict, optional)
output_format ({“netcdf”, “zarr”})
overwrite (bool)
engine ({“netcdf4”, “h5netcdf”})
n_workers (int)
- Returns:
None
miranda.convert.deh module#
DEH Hydrograph Conversion module.
- miranda.convert.deh.open_txt(path: str | Path, cf_table: dict | None = {'flag': {'comment': 'See DEH technical information for details.', 'long_name': 'data flag'}, 'q': {'long_name': 'River discharge', 'units': 'm3 s-1'}}) Dataset [source]#
Extract daily HQ meteorological data and convert to xr.DataArray with CF-Convention attributes.
miranda.convert.eccc module#
Environment and Climate Change Canada Data Conversion module.
miranda.convert.eccc_rdrs module#
Environment and Climate Change Canada RDRS conversion tools.
- miranda.convert.eccc_rdrs.convert_rdrs(project: str, input_folder: str | PathLike, output_folder: str | PathLike, output_format: str = 'zarr', working_folder: str | PathLike | None = None, overwrite: bool = False, **dask_kwargs) None [source]#
- Parameters:
project (str)
input_folder (str or os.PathLike)
output_folder (str or os.PathLike)
output_format ({“netcdf”, “zarr”})
working_folder (str or os.PathLike, optional)
overwrite (bool)
**dask_kwargs
- Returns:
None
- miranda.convert.eccc_rdrs.rdrs_to_daily(project: str, input_folder: str | PathLike, output_folder: str | PathLike, working_folder: str | PathLike | None = None, overwrite: bool = False, output_format: str = 'zarr', year_start: int | None = None, year_end: int | None = None, process_variables: list[str] | None = None, **dask_kwargs) None [source]#
Write out RDRS files to daily-timestep files.
- Parameters:
project (str)
input_folder (str or os.PathLike)
output_folder (str or os.PathLike)
working_folder (str or os.PathLike)
overwrite (bool)
output_format ({“netcdf”, “zarr”})
year_start (int, optional)
year_end (int, optional)
process_variables (list of str, optional)
**dask_kwargs
- Returns:
None
miranda.convert.ecmwf module#
ECMWF TIGGE Conversion module.
miranda.convert.hq module#
Hydro Quebec Weather Station Data Conversion module.
- miranda.convert.hq.open_csv(path: str | Path, cf_table: dict | None = {'hurs': {'cell_methods': 'time: point', 'comment': 'The relative humidity with respect to liquid water for T> 0 C, and with respect to ice for T<0 C.', 'frequency': '1h', 'long_name': 'Near-Surface Relative Humidity', 'out_name': 'hurs', 'standard_name': 'relative_humidity', 'type': 'real', 'units': '%'}, 'prlp': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the liquid phase.', 'frequency': 'day', 'long_name': 'Rainfall Flux', 'out_name': 'prlp', 'standard_name': 'rainfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'prsn': {'cell_methods': 'time: mean', 'comment': 'At surface; includes precipitation of all forms of water in the solid phase.', 'frequency': 'day', 'long_name': 'Snowfall Flux', 'out_name': 'prsn', 'standard_name': 'snowfall_flux', 'type': 'real', 'units': 'kg m-2 s-1'}, 'sfcWind': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) wind speed.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Speed', 'out_name': 'sfcWind', 'standard_name': 'wind_speed', 'type': 'real', 'units': 'm s-1'}, 'sfcWindAz': {'cell_methods': 'time: point', 'comment': 'Near-surface (usually, 10 meters) direction from which wind originates.', 'frequency': '1h', 'long_name': 'Near-Surface Wind Direction', 'out_name': 'sfcWindAz', 'standard_name': 'wind_direction', 'type': 'real', 'units': 'degree'}, 'snd': {'cell_methods': 'time: point', 'comment': 'The thickness of snow.', 'frequency': '1h', 'long_name': 'Snow Depth', 'out_name': 'snd', 'standard_name': 'surface_snow_thickness', 'type': 'real', 'units': 'm'}, 'tasmax_1h': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmax_day': {'cell_methods': 'time: maximum', 'comment': 'Maximum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Maximum Near-Surface Air Temperature', 'out_name': 'tasmax', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_1h': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': '1h', 'long_name': 'Hourly Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}, 'tasmin_day': {'cell_methods': 'time: minimum', 'comment': 'Minimum near-surface (usually, 2 meter) air temperature.', 'frequency': 'day', 'long_name': 'Daily Minimum Near-Surface Air Temperature', 'out_name': 'tasmin', 'standard_name': 'air_temperature', 'type': 'real', 'units': 'K'}}) DataArray [source]#
Extract daily HQ meteo data and convert to xr.DataArray with CF-Convention attributes.
miranda.convert.melcc module#
MELCC (Québec) Weather Stations data conversion module.
- miranda.convert.melcc.concat(files: Sequence[str | Path], output_folder: str | Path, overwrite: bool = True) Path [source]#
Concatenate converted weather station files.
- Parameters:
files (sequence of str or Path)
output_folder (str or Path)
overwrite (bool)
- Returns:
Path
- miranda.convert.melcc.convert_mdb(database: str | Path, stations: Dataset, definitions: Dataset, output: str | Path, overwrite: bool = True) dict[tuple[str, str], Path] [source]#
Convert microsoft databases of MELCC observation data to xarray objects.
- Parameters:
database (str or Path)
stations
definitions
output
overwrite
- Returns:
dict[tuple[str, str], Path]
- miranda.convert.melcc.convert_melcc_obs(metafile: str | Path, folder: str | Path, output: str | Path | None = None, overwrite: bool = True) dict[tuple[str, str], Path] [source]#
Convert MELCC observation data to xarray data objects, returning paths.
- Parameters:
metafile (str or Path)
folder (str or Path)
output (str or Path, optional)
overwrite (bool)
- Returns:
dict[str, Path]
- miranda.convert.melcc.convert_snow_table(file: str | Path, output: str | Path)[source]#
Convert snow data given through an Excel file.
This private data is not included in the MDB files.
- Parameters:
file (path) – The excel file with sheets: “Stations”, “Périodes standards” and “Données”
output (path) – Folder where to put the netCDF files (one for each of snd, sd and snw).
- miranda.convert.melcc.parse_var_code(vcode: str) dict[str, Any] [source]#
Parse variable code to generate metadata
- Parameters:
vcode (str)
- Returns:
dict[str, Any]
- miranda.convert.melcc.read_definitions(dbfile: str)[source]#
Read variable definition file using mdbtools.
- Parameters:
dbfile (str)
- Returns:
pandas.DataFrame
miranda.convert.utils module#
Conversion Utilities submodule.
- miranda.convert.utils.date_parser(date: str, *, end_of_period: bool = False, output_type: str = 'str', strftime_format: str = '%Y-%m-%d') str | Timestamp | NaTType [source]#
Parses datetime objects from a string representation of a date or both a start and end date.
- Parameters:
date (str) – Date to be converted.
end_of_period (bool) – If True, the date will be the end of month or year depending on what’s most appropriate.
output_type ({“datetime”, “str”}) – Desired returned object type.
strftime_format (str) – If output_type==’str’, this sets the strftime format.
- Returns:
pd.Timestamp or str or pd.NaT – Parsed date.
Notes
Adapted from code written by Gabriel Rondeau-Genesse (@RondeauG)