The Topo Processor is a collection of small components that can be combined together to create a pipeline. It can be run on a local workstation or using AWS Batch.
These components include transforming data into cloud optimised formats like COG and the creation of STAC metadata.
Follow the Poetry installation guide.
Follow the Docker Engine installation guide (Ubuntu).
poetry shell
poetry install
The global user configuration is defined by environment variables, example environment variables are found in the .env
file.
yarn
yarn build
NOTE: AWS deployment is done automatically through GitHub Actions.
To deploy the Batch via CDK locally:
On the AWS account you are logged into
yarn build
npx cdk deploy
To allow the system to perform cross account AWS requests, you'll need to config AWS roles inside of an AWS SSM parameter.
This configuration parameter can be referenced via $LINZ_SSM_BUCKET_CONFIG_NAME
NOTE: Only the upload
command is implemented to run on AWS Batch. Currently the job submission is restricted to only one job per survey.
NOTE: You may need to set the AWS_REGION
environment variable to your region.
# Passing survey IDs as argument
node ./build/infra/src/submit.js surveyId1 surveyId3 [...]
# Passing S3 folder as argument
node ./build/infra/src/submit.js s3://my-bucket/backup2/surveyId1/ s3://my-bucket/backup4/surveyId3/ [...]
NOTE: The upload
command is restricted to a run per survey
and only for the Historical Imagery
layer. To run multiple surveys, please refere to AWS Batch
described above.
Argument | Description |
---|---|
-s or --source |
The source of the data to import. Can be a survey ID or a path (local or s3 ) to the survey. |
-d or --datatype |
The datatype of the upload. Only imagery.historic is available at the moment. |
-t or --target |
The target local directory path or s3 path of the upload. |
-cid or --correlationid |
OPTIONAL. The correlation ID of the batch job. AWS Batch only. |
-m or --metadata |
OPTIONAL. The metadata file (local or s3 ) path. |
-f or --footprint |
TESTING PURPOSE. The footprint metadata file (local or s3 ) path. |
--force |
Flag to force the upload even if some data are invalid (some items might not be uploaded). |
-v or --verbose |
Flag to display trace logs. |
The user has to specify the survey id or path (where the data is) as a --source
and it will be validated against the latest version of metadata. A metadata file path can also be specified by using --metadata
if the LDS cache version one is not wanted. The --datatype
has to be imagery.historic
. The user also has to specify a target folder for the output.
# Run in a virtual environment (poetry shell):
./upload --source source_path --datatype data.type --target target_folder
# For help:
./upload --help
# To see all logs in a tidy format, use pretty-json-log:
./upload --source source_path --datatype data.type --target target_folder --verbose | pjl
The following source and target combinations can be used:
Source | Target |
---|---|
s3 | s3 |
s3 | local |
local | local |
local | s3 |
This command allows to add a survey to the Geostore by using the Geostore API.
Prerequisites: The survey has to be processed by the upload
command first. The output files of the upload
is what will be exported to the Geostore
.
Argument | Description |
---|---|
-s , --source TEXT |
The s3 path to the survey to export [required] |
-r , --role TEXT |
The ARN role to access to the source bucket [required] |
-c , --commit |
Use this flag to commit the creation of the dataset |
-v , --verbose |
Use verbose to display debug logs |
poetry run add -s "s3://bucket/survey-path/" -r "arn:aws:iam::123456789:role/read-role"
This is to follow the current upload status to the Geostore
for a particular dataset
version. You may have to run it several times as the status gets updated.
Argument | Description |
---|---|
-a , --execution-arn TEXT |
The execution ARN received from the Geostore after invoking an upload [required] |
-v , --verbose |
Use verbose to display debug logs |
NOTE: The command to run is given in the logs after calling successfully the add
command:
"info": "To check the export status, run the following command 'poetry run status -arn arn:aws:states:ap-southeast-2:632223577832:execution:ABCD'"
It gives you the information for one or all the datasets created on the Geostore
.
Argument | Description |
---|---|
-t , --title TEXT |
The Geostore title of the survey to filter e.g. historical-aerial-imagery-survey-2660 |
-v , --verbose |
Use verbose to display debug logs |
poetry run list [-s ID123ABC]
Delete a dataset from the Geostore
. Only if the dataset does not contain any version. To delete a dataset which contains a version, contact the Geostore support.
Argument | Description |
---|---|
-d , --dataset-id TEXT |
The dataset id to delete [required] |
-c , --commit |
Use this flag to commit the suppression of the dataset. |
-v , --verbose |
Use verbose to display debug logs |
poetry run delete -d ID123ABC [--commit]
NOTE: This command is currently only implemented for Historical Imagery
. Other layers will come later.
This command runs a validation against a layer. It gets the layer last version metadata and generates the corresponding STAC objects on the fly. Then, it runs a JSON schema validation (using jsonschema-rs) for the Items
and Collections
. It outputs the errors and their recurrences grouped by JSON schemas as:
"errors": {"https://stac.linz.govt.nz/v0.0.11/aerial-photo/schema.json": {"'aerial-photo:run' is a required property": 4, "'aerial-photo:sequence_number' is a required property": 10}
To validate another version than the latest one, specify the metadata csv file wanted to be validated by using the --metadata
argument.
The following command have to be run in a virtual environment (poetry shell):
# Run default:
poetry run validate
# Run against a specific version (can be a s3 or local file):
poetry run validate --metadata s3://bucket/layer_id/metadata_file.csv
# Run against the `Items` only:
poetry run validate --item
# Run against the `Collections` only:
poetry run validate --collection
# For help:
poetry run validate --help
# To see all logs in a tidy format, use pretty-json-log:
poetry run validate --verbose | pjl
# To record the output in an external file:
poetry run validate | tee output.file
CI/CD is used to deploy into AWS, to trigger a deployment create a new "release:" commit and merge it to master
A helpful utility script is in ./scripts/version.bump.sh
to automate this process
./scripts/version.bump.sh
# Push branch release/v:versionNumber
git push
# Create the pull request
gh pr create
# Merge to master