Exports module

After enriching occurrences, you can use the exports module to use the downloaded data. Several options are available:

Main functions

geoenrich.exports.collate_npy(ds_ref, data_path, output_res=32, slice=None, dimension3={'surface-current-u': 2})

Export a 3D numpy array with all layers for each occurrence of a dataset. WARNING: the dimension3 dictionary must be provided if some variables have a time or depth dimension.

Parameters:
  • ds_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • data_path (str) – path where numpy files will be saved.

  • output_res (int) – output data resolution along lat and lon axes.

  • slice (list[int]) – if not None, only process the given slice of the dataset.

  • dimension3 – provides the expected 3rd dimension length (time dimension * depth dimension) for each variable where it is larger than 1.

Returns:

None

geoenrich.exports.export_png(dataset_ref, occ_id, var_id, target_size=None, value_range=None, path=PosixPath('.'), geo_buff=None, time_buff=None, depth_request=None, downsample=None, cmap='coolwarm', shape='rectangle')

Export a png image of the requested data. If depth is a dimension, the shallowest layer is selected. If time is a dimension, the most recent layer is selected.

Parameters:
  • dataset_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • occ_id (str) – ID of the occurrence to get data for. Can be obtained with geoenrich.enrichment.read_ids().

  • var_id (str) – ID of the variable to retrieve.

  • target_size (int tuple) – Size of the target picture (width, height). If None, using the native data resolution.

  • value_range (float list) – Range of the variable. Necessary for consistency between all images.

  • path (str or pathlib.Path) – Path where image files will be saved.

  • geo_buff (int) – (Optional) Geo_buff that was used for enrichment.

  • time_buff (float list) – (Optional) Time_buff that was used for enrichment.

  • depth_request (str) – (Optional) Depth request that was used for enrichment.

  • downsample (dict) – (Optional) Downsample that was used for enrichment.

  • cmap (str) – (Optional) Specify a colormap (see matplotlib.cm for reference).

  • shape (str) – If ‘rectangle’, return data inside the rectangle containing the buffer. If ‘buffer’, only return data within the buffer distance from the occurrence location.

Returns:

None

geoenrich.exports.export_raster(dataset_ref, occ_id, var_id, path=PosixPath('.'), geo_buff=None, time_buff=None, depth_request=None, downsample=None, shape='rectangle', multiband=None)

Export a GeoTiff raster of the requested data. Depth or time dimension (not both) can be stored as band (see multiband argument) Otherwise, the shallowest depth and most recent time are selected.

Parameters:
  • dataset_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • occ_id (str) – ID of the occurrence to get data for. Can be obtained with geoenrich.enrichment.read_ids().

  • var_id (str) – ID of the variable to retrieve.

  • path (str or pathlib.Path) – Path where image files will be saved.

  • geo_buff (int) – (Optional) Geo_buff that was used for enrichment.

  • time_buff (float list) – (Optional) Time_buff that was used for enrichment.

  • depth_request (str) – (Optional) Depth request that was used for enrichment.

  • downsample (dict) – (Optional) Downsample that was used for enrichment.

  • shape (str) – If ‘rectangle’, return data inside the rectangle containing the buffer. If ‘buffer’, only return data within the buffer distance from the occurrence location.

  • multiband (str) – If multiband=’depth’ or ‘time’, the corresponding dimension is saved into multiple bands.

Returns:

None

geoenrich.exports.export_to_array(res, target_size=None, value_range=None, stack=None, squeeze=True, target_len=None)

Export data as a 3D numpy array where the first 2 dimensions represent geographical coordinates. Option to standardize data by specifiying target size and target value range. The third dimensions stores multiples bands if stack is set to depth, time or all.

Parameters:
  • res (dict) – output of geoenrich.exports.retrieve_data().

  • target_size (int tuple) – Size of the target array (width, height). If None, using the native data resolution.

  • value_range (float list) – Range of the variable. Necessary for consistency between all images.

  • stack (str) – If True, keep values for all depths or all times (returns 3D array).

  • squeeze (bool) – If true, remove unused dimensions in the output.

  • target_len (int) – Length of the third dimension if data is None (to return uniform results).

Returns:

output data, scaled and resized.

Return type:

numpy.array

geoenrich.exports.get_derivative(dataset_ref, occ_id, var_id, days=(0, 0), geo_buff=None, depth_request='surface', downsample={}, shape='rectangle')

Retrieve data for both specified days and return the derivative. geo_buff and downsample must be identical to the values you used for enrichment.

Parameters:
  • dataset_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • occ_id (str) – ID of the occurrence to get data for. Can be obtained with geoenrich.enrichment.read_ids().

  • var_id (str) – ID of the variable to derivate.

  • days (int tuple) – Start and end days for derivative calculation. If enriching occurrences, provide bounds relatively to occurrence, eg. (-7, 0). If enriching areas, provide bounds relatively to date_max, eg. (-7, 0).

  • geo_buff (int) – (Optional) Geo_buff that was used for enrichment.

  • depth_request (str) – (Optional) Depth request that was used for enrichment.

  • downsample (dict) – (Optional) Downsample that was used for enrichment.

  • shape (str) – If ‘rectangle’, return data inside the rectangle containing the buffer. If ‘buffer’, only return data within the buffer distance from the occurrence location.

Returns:

A dictionary of all available variables with corresponding data (numpy.ma.MaskedArray), unit (str), and coordinates (ordered list of dimension names and values).

Return type:

dict

geoenrich.exports.produce_stats(dataset_ref, var_id, geo_buff=None, time_buff=None, depth_request=None, downsample=None, out_path=PosixPath('.'))

Produce a document named dataset_ref_stats.csv with summary stats of all enriched data. If input data were occurrences, only data within the buffer distance are used for calculations.

Parameters:
  • dataset_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • var_id (str) – ID of the variable to retrieve.

  • geo_buff (int) – (Optional) Geo_buff that was used for enrichment.

  • time_buff (float list) – (Optional) Time_buff that was used for enrichment.

  • depth_request (str) – (Optional) Depth request that was used for enrichment.

  • downsample (dict) – (Optional) Downsample that was used for enrichment.

  • out_path (str or pathlib.Path) – Path where you want to save the output stats file.

Returns:

None

geoenrich.exports.retrieve_data(dataset_ref, occ_id, var_id, geo_buff=None, time_buff=None, depth_request=None, downsample=None, shape='rectangle', serialized={})

Retrieve downloaded data for the given occurrence id and variable. If enrichment was done several times with different buffers, specify

Parameters:
  • dataset_ref (str) – The enrichment file name (e.g. gbif_taxonKey).

  • occ_id (str) – ID of the occurrence to get data for. Can be obtained with geoenrich.enrichment.read_ids().

  • var_id (str) – ID of the variable to retrieve.

  • geo_buff (int) – (Optional) Geo_buff that was used for enrichment.

  • time_buff (float list) – (Optional) Time_buff that was used for enrichment.

  • depth_request (str) – (Optional) Depth request that was used for enrichment.

  • downsample (dict) – (Optional) Downsample that was used for enrichment.

  • shape (str) – If ‘rectangle’, return data inside the rectangle containing the buffer. If ‘buffer’, only return data within the buffer distance from the occurrence location.

  • serialized (dict) – (Optional) provide a dictionary of variables to reduce processing time (supports df, dimdict, var, var_source, ds).

Returns:

A dictionary of all available variables with corresponding data (numpy.ma.MaskedArray), unit (str), and coordinates (ordered list of dimension names and values).

Return type:

dict

Other functions (for internal use)

geoenrich.exports.compute_stats(row, en_params, input_type, var_indices, ds, dimdict, var)

Compute and return stats for the given row.

Parameters:
  • row (pandas.Series) – One row of an enrichment file.

  • en_params (dict) – Enrichment parameters as stored in the json config file.

  • input_type (str) – ‘occurrence’ or ‘area’.

  • var_indices (dict) – Dictionary of column indices for the selected variable, output of geoenrich.enrichment.parse_columns().

  • ds (netCDF4.Dataset) – Local dataset.

  • dimdict (dict) – Dictionary of dimensions as returned by geoenrich.satellite.get_metadata.

  • var (dict) – Variable dictionary as returned by geoenrich.satellite.get_metadata.

Returns:

Statistics for the given row.

Return type:

pandas.Series

geoenrich.exports.fetch_data(row, var_id, var_indices, ds, dimdict, var, downsample, indices=None)

Fetch data locally for a specific occurrence and variable.

Parameters:
  • row (pandas.Series) – One row of an enrichment file.

  • var_id (str) – ID of the variable to fetch.

  • var_indices (dict) – Dictionary of column indices for the selected variable, output of geoenrich.enrichment.parse_columns().

  • ds (netCDF4.Dataset) – Local dataset.

  • dimdict (dict) – Dictionary of dimensions as returned by geoenrich.satellite.get_metadata.

  • var (dict) – Variable dictionary as returned by geoenrich.satellite.get_metadata.

  • downsample (dict) – Number of points to skip between each downloaded point, for each dimension, using its standard name as a key.

  • indices (dict) – Coordinates of the netCDF subset. If None, they are read from row and var_indices arguments.

Returns:

Raw data and coordinates along all dimensions.

Return type:

numpy.ma.MaskedArray, list