Tutorial for Python users
In this tutorial we will enrich Acropora digitifera observations with bathymetry information. Please make sure you already followed the installation instructions.
Here is a summary of the main steps:
Loading your dataset and giving it a name (called reference here).
Performing data enrichment by specifying your dataset reference, the environmental variable you need, etc.
Exporting the downloaded data to the format you require.
1. Load your occurrence data
First import all functions from geoenrich
[ ]:
import os
from geoenrich.dataloader import *
from geoenrich.enrichment import *
from geoenrich.exports import *
Three input formats are accepted. You can follow the instructions that match your format:
A. DarwinCore archive
A DarwinCore archive is bundled into the package for user testing (GBIF Occurrence Download 10.15468/dl.megb8n). If you don’t have a dataset and you don’t want to register to GBIF yet you can use this one.
[ ]:
example_path = os.path.split(geoenrich.__file__)[0] + '/data/AcDigitifera.zip'
geodf = open_dwca(path = example_path)
B. Georeferenced points (csv format)
Fill in the path to your csv and the compulsory column names.
[ ]:
geodf = import_occurrences_csv( path = '', id_col = '', date_col = '',
lat_col = '', lon_col = '')
C. Area bounds (csv format)
See documentation for information about input file format.
[ ]:
example_path = os.path.split(geoenrich.__file__)[0] + '/data/areas.csv'
geodf = load_areas_file(example_path)
For all formats: Choose a dataset reference and associate it to your dataset
[ ]:
dataset_ref = 'ac_digitifera'
create_enrichment_file(geodf, dataset_ref)
2. Enrich
Define enrichment scope
var_id: Pick a variable id from the catalog
geo_buff: the buffer around the occurences (in kilometers). Choose 0 to obtain nearest values.
time_buff: Choose a temporal buffer. In this case we download data from 7 days before the occurrence date, to the occurrence date. time_buff is only used for variables that have a time dimension
[ ]:
var_id = 'bathymetry'
dataset_ref = 'ac_digitifera'
geo_buff = 115
time_buff = [-7, 0]
Start enrichment
The slice argument allows you to only run the enrichment on a subset of the points.
[ ]:
enrich(dataset_ref, var_id, geo_buff, time_buff, slice = (0, 100))
Check the enrichment progress
[ ]:
enrichment_status(dataset_ref)
3. Retrieve and export downloaded data
Select the dataset reference and the variable that you want to export. Then you can choose one of four options:
[ ]:
dataset_ref = 'ac_digitifera'
var_id = 'bathymetry'
A. Export variable statistics for the whole dataset
[ ]:
produce_stats(dataset_ref, var_id, out_path = './')
B. Export data as a raster layer for a given occurrence
[ ]:
ids = read_ids(dataset_ref)
occ_id = ids[0] # first occurrence of the dataset
export_raster(dataset_ref, occ_id, var_id, path = './')
C. Export data as a png file for a given occurrence
[ ]:
ids = read_ids(dataset_ref)
occ_id = ids[0]
export_png(dataset_ref, occ_id, var_id, path = './')
D. Retrieve the raw data (and plot it)
[6]:
output = retrieve_data(dataset_ref, occ_id, var_id, shape = 'buffer')
data = output['values']
unit = output['unit']
coords = output['coords']
[8]:
from shapely import wkt
from matplotlib import pyplot as plt
%matplotlib notebook
# Get latitude and longitude values for the requested data
lat_dim = [c[0] for c in coords].index('latitude')
lon_dim = [c[0] for c in coords].index('longitude')
lats = coords[lat_dim][1]
longs = coords[lon_dim][1]
# Get coordinates for the occurrence point
filepath = biodiv_path + dataset_ref + '.csv'
df = pd.read_csv(filepath, parse_dates = ['eventDate'], infer_datetime_format = True, index_col = 0)
point = wkt.loads(df.loc[occ_id, 'geometry'])
# Plot
extent = [longs[0] , longs[-1], lats[0] , lats[-1]]
plt.imshow(data, extent = extent, origin = 'lower')
plt.title(var_id + ' (' + unit + ')')
plt.colorbar()
plt.scatter([point.x], [point.y], c='black', marker='x')
# NB: If your data has time or depth dimensions, you will have to pick a slice of the data array to be able to plot it
[8]:
<matplotlib.collections.PathCollection at 0x7f7877ead880>
Appendix: Using occurrence data from GBIF
You may want to use occurrences from GBIF. GeoEnrich provides a few functions to easily download occurrences for any taxon, relying on the pygbif package.
Get GBIF id for the taxon of interest
[ ]:
tax_key = get_taxon_key('Acropora digitifera')
Request an archive with all occurrences of this taxon
[ ]:
request_id = request_from_gbif(tax_key)
Download request
For large requests, some waiting time is needed for the archive to be ready.
[ ]:
download_requested(request_key = request_id)
You can then load data using open_dwca and the taxon_key parameter
[ ]:
geodf = open_dwca(taxon_key = tax_key)
You can then create an enrichment file just like with any other dataset (see first section of the tutorial).