Tutorial for Python users

In this tutorial we will enrich Acropora digitifera observations with bathymetry information. Please make sure you already followed the installation instructions.

Here is a summary of the main steps:

  • Loading your dataset and giving it a name (called reference here).

  • Performing data enrichment by specifying your dataset reference, the environmental variable you need, etc.

  • Exporting the downloaded data to the format you require.

1. Load your occurrence data

First import all functions from geoenrich

[ ]:
import os
from geoenrich.dataloader import *
from geoenrich.enrichment import *
from geoenrich.exports import *

Three input formats are accepted. You can follow the instructions that match your format:

A. DarwinCore archive

A DarwinCore archive is bundled into the package for user testing (GBIF Occurrence Download 10.15468/dl.megb8n). If you don’t have a dataset and you don’t want to register to GBIF yet you can use this one.

[ ]:
example_path = os.path.split(geoenrich.__file__)[0] + '/data/AcDigitifera.zip'
geodf = open_dwca(path = example_path)

B. Georeferenced points (csv format)

Fill in the path to your csv and the compulsory column names.

[ ]:
geodf = import_occurrences_csv( path = '', id_col = '', date_col = '',
                                lat_col = '', lon_col = '')

C. Area bounds (csv format)

See documentation for information about input file format.

[ ]:
example_path = os.path.split(geoenrich.__file__)[0]  + '/data/areas.csv'
geodf = load_areas_file(example_path)

For all formats: Choose a dataset reference and associate it to your dataset

[ ]:
dataset_ref = 'ac_digitifera'
create_enrichment_file(geodf, dataset_ref)

2. Enrich

Define enrichment scope

  • var_id: Pick a variable id from the catalog

  • geo_buff: the buffer around the occurences (in kilometers). Choose 0 to obtain nearest values.

  • time_buff: Choose a temporal buffer. In this case we download data from 7 days before the occurrence date, to the occurrence date. time_buff is only used for variables that have a time dimension

[ ]:
var_id = 'bathymetry'
dataset_ref = 'ac_digitifera'
geo_buff = 115
time_buff = [-7, 0]

Start enrichment

The slice argument allows you to only run the enrichment on a subset of the points.

[ ]:
enrich(dataset_ref, var_id, geo_buff, time_buff, slice = (0, 100))

Check the enrichment progress

[ ]:
enrichment_status(dataset_ref)

3. Retrieve and export downloaded data

Select the dataset reference and the variable that you want to export. Then you can choose one of four options:

[ ]:
dataset_ref = 'ac_digitifera'
var_id = 'bathymetry'

A. Export variable statistics for the whole dataset

[ ]:
produce_stats(dataset_ref, var_id, out_path = './')

B. Export data as a raster layer for a given occurrence

[ ]:
ids = read_ids(dataset_ref)
occ_id = ids[0] # first occurrence of the dataset

export_raster(dataset_ref, occ_id, var_id, path = './')

C. Export data as a png file for a given occurrence

[ ]:
ids = read_ids(dataset_ref)
occ_id = ids[0]

export_png(dataset_ref, occ_id, var_id, path = './')

D. Retrieve the raw data (and plot it)

[6]:
output = retrieve_data(dataset_ref, occ_id, var_id, shape = 'buffer')

data = output['values']
unit = output['unit']
coords = output['coords']
[8]:
from shapely import wkt
from matplotlib import pyplot as plt
%matplotlib notebook

# Get latitude and longitude values for the requested data
lat_dim = [c[0] for c in coords].index('latitude')
lon_dim = [c[0] for c in coords].index('longitude')
lats = coords[lat_dim][1]
longs = coords[lon_dim][1]

# Get coordinates for the occurrence point
filepath = biodiv_path + dataset_ref + '.csv'
df = pd.read_csv(filepath, parse_dates = ['eventDate'], infer_datetime_format = True, index_col = 0)
point = wkt.loads(df.loc[occ_id, 'geometry'])

# Plot
extent = [longs[0] , longs[-1], lats[0] , lats[-1]]
plt.imshow(data, extent = extent, origin = 'lower')
plt.title(var_id + ' (' + unit + ')')
plt.colorbar()
plt.scatter([point.x], [point.y], c='black', marker='x')

# NB: If your data has time or depth dimensions, you will have to pick a slice of the data array to be able to plot it
[8]:
<matplotlib.collections.PathCollection at 0x7f7877ead880>

Appendix: Using occurrence data from GBIF

You may want to use occurrences from GBIF. GeoEnrich provides a few functions to easily download occurrences for any taxon, relying on the pygbif package.

Get GBIF id for the taxon of interest

[ ]:
tax_key = get_taxon_key('Acropora digitifera')

Request an archive with all occurrences of this taxon

[ ]:
request_id = request_from_gbif(tax_key)

Download request

For large requests, some waiting time is needed for the archive to be ready.

[ ]:
download_requested(request_key = request_id)

You can then load data using open_dwca and the taxon_key parameter

[ ]:
geodf = open_dwca(taxon_key = tax_key)

You can then create an enrichment file just like with any other dataset (see first section of the tutorial).