Topo Processor

Description

The Topo Processor is a collection of small components that can be combined together to create a pipeline. It can be run on a local workstation or using AWS Batch.

These components include transforming data into cloud optimised formats like COG and the creation of STAC metadata.

Installation

Requirements to run Topo Processor locally:

Poetry

Follow the Poetry installation guide.

Docker

Follow the Docker Engine installation guide (Ubuntu).

Use poetry to install

poetry shell

poetry install

Configuration

The global user configuration is defined by environment variables, example environment variables are found in the .env file.

Requirements to run Topo Processor using AWS Batch:

Software

yarn

yarn build

AWS Batch Stack deployment

NOTE: AWS deployment is done automatically through GitHub Actions.

To deploy the Batch via CDK locally:

On the AWS account you are logged into

yarn build

npx cdk deploy

AWS Roles

To allow the system to perform cross account AWS requests, you'll need to config AWS roles inside of an AWS SSM parameter.

This configuration parameter can be referenced via $LINZ_SSM_BUCKET_CONFIG_NAME

Usage

AWS Batch Job Submission

NOTE: Only the upload command is implemented to run on AWS Batch. Currently the job submission is restricted to only one job per survey.

NOTE: You may need to set the AWS_REGION environment variable to your region.

# Passing survey IDs as argument
node ./build/infra/src/submit.js surveyId1 surveyId3 [...]

# Passing S3 folder as argument
node ./build/infra/src/submit.js s3://my-bucket/backup2/surveyId1/ s3://my-bucket/backup4/surveyId3/ [...]

`upload`

NOTE: The upload command is restricted to a run per survey and only for the Historical Imagery layer. To run multiple surveys, please refere to AWS Batch described above.

Argument	Description
`-s` or `--source`	The source of the data to import. Can be a `survey ID` or a path (local or `s3`) to the survey.
`-d` or `--datatype`	The datatype of the upload. Only `imagery.historic` is available at the moment.
`-t` or `--target`	The target local directory path or `s3` path of the upload.
`-cid` or `--correlationid`	OPTIONAL. The `correlation ID` of the batch job. `AWS Batch` only.
`-m` or `--metadata`	OPTIONAL. The metadata file (local or `s3`) path.
`-f` or `--footprint`	TESTING PURPOSE. The footprint metadata file (local or `s3`) path.
`--force`	Flag to force the upload even if some data are invalid (some items might not be uploaded).
`-v` or `--verbose`	Flag to display trace logs.

The user has to specify the survey id or path (where the data is) as a --source and it will be validated against the latest version of metadata. A metadata file path can also be specified by using --metadata if the LDS cache version one is not wanted. The --datatype has to be imagery.historic. The user also has to specify a target folder for the output.

# Run in a virtual environment (poetry shell):
./upload --source source_path --datatype data.type --target target_folder

# For help:
./upload --help

# To see all logs in a tidy format, use pretty-json-log:
./upload --source source_path --datatype data.type --target target_folder --verbose | pjl

The following source and target combinations can be used:

Source	Target
s3	s3
s3	local
local	local
local	s3

`add` (Geostore)

This command allows to add a survey to the Geostore by using the Geostore API.

Prerequisites: The survey has to be processed by the upload command first. The output files of the upload is what will be exported to the Geostore.

Argument	Description
`-s`, `--source` TEXT	The s3 path to the survey to export [required]
`-r`, `--role` TEXT	The ARN role to access to the source bucket [required]
`-c`, `--commit`	Use this flag to commit the creation of the dataset
`-v`, `--verbose`	Use verbose to display debug logs

poetry run add -s "s3://bucket/survey-path/" -r "arn:aws:iam::123456789:role/read-role"

`status` (Geostore)

This is to follow the current upload status to the Geostore for a particular dataset version. You may have to run it several times as the status gets updated.

Argument	Description
`-a`, `--execution-arn` TEXT	The execution ARN received from the Geostore after invoking an upload [required]
`-v`, `--verbose`	Use verbose to display debug logs

NOTE: The command to run is given in the logs after calling successfully the add command:

"info": "To check the export status, run the following command 'poetry run status -arn arn:aws:states:ap-southeast-2:632223577832:execution:ABCD'"

`list` (Geostore)

It gives you the information for one or all the datasets created on the Geostore.

Argument	Description
`-t`, `--title` TEXT	The Geostore title of the survey to filter e.g. historical-aerial-imagery-survey-2660
`-v`, `--verbose`	Use verbose to display debug logs

poetry run list [-s ID123ABC]

`delete` (Geostore)

Delete a dataset from the Geostore. Only if the dataset does not contain any version. To delete a dataset which contains a version, contact the Geostore support.

Argument	Description
`-d`, `--dataset-id` TEXT	The dataset id to delete [required]
`-c`, `--commit`	Use this flag to commit the suppression of the dataset.
`-v`, `--verbose`	Use verbose to display debug logs

poetry run delete -d ID123ABC [--commit]

`validate`

NOTE: This command is currently only implemented for Historical Imagery. Other layers will come later.

This command runs a validation against a layer. It gets the layer last version metadata and generates the corresponding STAC objects on the fly. Then, it runs a JSON schema validation (using jsonschema-rs) for the Items and Collections. It outputs the errors and their recurrences grouped by JSON schemas as:

"errors": {"https://stac.linz.govt.nz/v0.0.11/aerial-photo/schema.json": {"'aerial-photo:run' is a required property": 4, "'aerial-photo:sequence_number' is a required property": 10}

To validate another version than the latest one, specify the metadata csv file wanted to be validated by using the --metadata argument.

The following command have to be run in a virtual environment (poetry shell):

# Run default:
poetry run validate

# Run against a specific version (can be a s3 or local file):
poetry run validate --metadata s3://bucket/layer_id/metadata_file.csv

# Run against the `Items` only:
poetry run validate --item

# Run against the `Collections` only:
poetry run validate --collection

# For help:
poetry run validate --help

# To see all logs in a tidy format, use pretty-json-log:
poetry run validate --verbose | pjl

# To record the output in an external file:
poetry run validate | tee output.file

AWS Deployment / CI / CD

CI/CD is used to deploy into AWS, to trigger a deployment create a new "release:" commit and merge it to master

A helpful utility script is in ./scripts/version.bump.sh to automate this process

./scripts/version.bump.sh
# Push branch release/v:versionNumber
git push
# Create the pull request
gh pr create
# Merge to master

Name		Name	Last commit message	Last commit date
Latest commit History 329 Commits
.github		.github
infra/src		infra/src
scripts		scripts
test_data		test_data
topo_processor		topo_processor
.dockerignore		.dockerignore
.env		.env
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
.kodiak.toml		.kodiak.toml
.prettierrc.cjs		.prettierrc.cjs
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
cdk.json		cdk.json
conftest.py		conftest.py
package.json		package.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tsconfig.json		tsconfig.json
upload		upload
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topo Processor

Description

Installation

Requirements to run Topo Processor locally:

Poetry

Docker

Recommended

Use poetry to install

Configuration

Requirements to run Topo Processor using AWS Batch:

Software

AWS Batch Stack deployment

AWS Roles

Usage

AWS Batch Job Submission

`upload`

`add` (Geostore)

`status` (Geostore)

`list` (Geostore)

`delete` (Geostore)

`validate`

AWS Deployment / CI / CD

About

Releases 12

Contributors 9

Languages

License

linz/topo-processor

Folders and files

Latest commit

History

Repository files navigation

Topo Processor

Description

Installation

Requirements to run Topo Processor locally:

Poetry

Docker

Recommended

Use poetry to install

Configuration

Requirements to run Topo Processor using AWS Batch:

Software

AWS Batch Stack deployment

AWS Roles

Usage

AWS Batch Job Submission

upload

add (Geostore)

status (Geostore)

list (Geostore)

delete (Geostore)

validate

AWS Deployment / CI / CD

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 12

Contributors 9

Languages

`upload`

`add` (Geostore)

`status` (Geostore)

`list` (Geostore)

`delete` (Geostore)

`validate`