- 1. Introduction
- 2. Installation
- 3. Login to Copernicus
- 4. Running
- 5. Resuming the script at a specific point
This is a wrapper to help you download data from Copernicus servers via a configuration file, eliminating therefore the needs to deal with programming. It has been developed in Python 3.9 and makes calls to the recently released Copernicus Marine Toolbox's python API - as per December 2023.
In a nutshell, this script will look for a csv file with rows having latitude, longitude, depth and specific dates.
It will also look for a configuration file - setup.toml
- to find what products and variables you are after. Then
it will enquire Copernicus service for such details and generate a csv output of the exact same dimension as your input,
merging the sought information into the inputs file.
The main advantages over the toolbox's console client are as follows:
- The input data is processed by single unique dates, meaning that rows which dates are the same are processed together by calculating the widest area that embeds all given coordinates for that day. This avoids reducing the number of calls or the generation of massive files should the whole set be requested at once.
- Only one single csv file is generated.
- Although the data fetched is by area and date, the script will find the point in the downloaded dataset closest to the coordinates given in each row of the input file, meaning that the final generated csv file has got the same number of rows as the input file.
- All Individual and original downloaded files are kept intact - .nc format - so that they can be post-processed in whichever way you consider appropriate, should you need to do so in the forthcoming future.
This wrapper uses solely the subset functionality of the Copernicus Marine Toolbox's Python API.
First, install Python >=3.9 and < 3.12 as required by Copernicus Marine Toolbox and pip - please do ensure you install pip too. To do so download the Python version of your choice from https://www.python.org/downloads/ and then follow the instructions on https://docs.python.org/3/using/index.html. Details for Windows, Mac and *nix users are provided in the appropriate sections.
Second, download the source code from Github either by downloading the zip directly from the web on https://github.com/d2gex/copernicus-subset-wrapper.git as shown in the figure below ...
... Or just git-cloning to your preferred location, ensuring that the destination folder is empty:
cd <<your_source_folder>>
git clone https://github.com/d2gex/copernicus-subset-wrapper.git .
Third, install the project dependencies. If you do not want to install them system-wide, which is highly recommended, you can create a virtual environment as described on Python Virtual Environments and Packages. A quick tutorial hack is shown below:
python3 -m venv <<your_virtualenv_folder>>
source /path/to/your_virtualenv_folder/bin/activate # (Linux-way)
\path\to\your_virtualenv_folder\Scripts\activate # (Windows-way)
Otherwise you just can install the project requirements as:
pip install -r /path/to/your_source_folder/requirements.txt
The file requirements.txt contains all libraries that are necessary for this wrapper to run.
If you have not yet registered with Copernicus you need first to do so here.
Then you need to run the login
function from Copernicus API one-off for your credentials to be generated. Subsequent calls
to the wrapper will know where your credentials are stored and pick them as needed. To call login you need to run the following
on the console:
copernicusmarine login
You will be asked for the username and password you used earlier on in the registration process. Upon providing it, a message saying that the credentials have been generated and the location where they are should be prompted to you. You are now ready to use the wrapper without worrying in the future about credentials or whatsoever.
The setup.toml
file is the configuration file used by the wrapper and contains information about the products and
variables you are trying to download. It is placed within <<your_source_folder>> and its options have been explained
within the file itself and should be self-explanatory.
input_filename = "api_parameters.csv" # name of the file holding the input parameters
output_filename = "result.nc" # suffix added to the name of each individual file fetched per input row
dataset_id = "cmems_mod_glo_phy_my_0.083deg_P1M-m" # data set identifier
variables = ["thetao", "zos"] # variables wanting to be fetched
years = [2012, 2020] # date interval of interest. One single year can be defined as [2012]
# distance method used to calculate the nearest point.
# See alternatives on https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
distance = "euclidean"
# [days, hours, minutes, seconds] time added to each start_time - per row - in days, hours, minutes and seconds
time_offset = [0, 23, 59, 59]
start_mode = 0 # 0 start afresh, 1 resume from given years interval and 2 read only from disk
The wrapper will read a csv file within data
folder in '<<your_source_folder>>' provided by the variable
input_filename
in your configuration file. An example is shown below:
In a nutshell, columns lat
, lon
, time
and depth
must be named as such and time
must be in %d/%m/%Y %H:%M
.
The coordinate system is WGS 84 EPSG: 4326. There must be a column in the spreadsheet identifying each row uniquely,
although its name is down to you. In the example above it is called ID_Gil
.
cd your_source_folder
python -m src.main
After the data has been downloaded look for the resulting csv file in '<<your_source_folder>>/data/<<dataset_identifier>>/csv/<<dataset_identifier>>.csv'.
The wrapper will also place each downloaded *.nc
files in '<<your_source_folder>>/data/<<dataset_identifier>>/nc/'.
Given that fetching data from Copernicus servers falls within the Big Data domain, dealing with large datasets does not come
without troubles. The natural unreliability of the internet connection you may be using plus the spatiotemporal inconvenient
derived from constantly downloading data, may make the script to break at some point. In such case it is possible to resume
at a desired point by both reducing the original yearly interval one was after and setting the start_mode = 2
in the setup.toml file.
Beware that all files associated to the first year of the new reduced interval will be deleted entirely and re-downloaded again.