Skip to content

Commit

Permalink
Update scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
mephenor committed Jul 15, 2024
1 parent f987fa9 commit 27cdb46
Show file tree
Hide file tree
Showing 12 changed files with 222 additions and 498 deletions.
1 change: 1 addition & 0 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@ services:
DCS_CONFIG_YAML: /workspace/services/dcs/dev_config.yaml
EKSS_CONFIG_YAML: /workspace/services/ekss/dev_config.yaml
FIS_CONFIG_YAML: /workspace/services/fis/dev_config.yaml
FINS_CONFIG_YAML: /workspace/services/fins/dev_config.yaml
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This is a monorepo containing all GHGA file backend microservices.

[Download Controller Service](services/dcs/README.md)
[Encryption Key Store Service](services/ekss/README.md)
[File Information Service](services/fins/README.md)
[File Ingest Service](services/fis/README.md)
[Internal File Registry Service](services/ifrs/README.md)
[Interrogation Room Service](services/irs/README.md)
Expand Down
6 changes: 4 additions & 2 deletions scripts/update_config_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ def get_schema(service: str) -> str:
"""Returns a JSON schema generated from a Config class."""

config = get_dev_config(service)
return config.schema_json(indent=2) # change eventually to .model_json_schema(...)
# change eventually to .model_json_schema(...)
return config.schema_json(indent=2)


def get_example(service: str) -> str:
Expand Down Expand Up @@ -144,7 +145,8 @@ def check_docs(service: str):
if example_expected != example_observed:
print_diff(example_expected, example_observed)
raise ValidationError(
f"Example config YAML at '{example_config_yaml}' is not up to date."
f"Example config YAML at '{
example_config_yaml}' is not up to date."
)

schema_expected = get_schema(service)
Expand Down
3 changes: 2 additions & 1 deletion services/fins/.readme_generation/description.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
The File Information Service exposes metadata about files available TODO.
The File Information Service serves publicly available metadata about files registered with the Internal File Registry service.
Currently this includes the SHA256 checksum of the unencrypted file content and the size of the unencrypted file in bytes.
182 changes: 47 additions & 135 deletions services/fins/README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,33 @@
# File Ingest Service
# File Information Service

A lightweight service to propagate file upload metadata to the GHGA file backend services
Providing public metadata about files registered with the Internal File Registry

## Description

The File Ingest Service provides an endpoint to populate the Encryption Key Store,
Internal File Registry and Download Controller with output metadata from the S3 upload
script at https://github.com/ghga-de/data-steward-scripts/blob/main/src/s3_upload.py.

The File Information Service serves publicly available metadata about files registered with the Internal File Registry service.
Currently this includes the SHA256 checksum of the unencrypted file content and the size of the unencrypted file in bytes.

## Installation

We recommend using the provided Docker container.

A pre-build version is available at [docker hub](https://hub.docker.com/repository/docker/ghga/file-ingest-service):
A pre-build version is available at [docker hub](https://hub.docker.com/repository/docker/ghga/file-information-service):
```bash
docker pull ghga/file-ingest-service:3.0.0
docker pull ghga/file-information-service:1.0.0
```

Or you can build the container yourself from the [`./Dockerfile`](./Dockerfile):
```bash
# Execute in the repo's root dir:
docker build -t ghga/file-ingest-service:3.0.0 .
docker build -t ghga/file-information-service:1.0.0 .
```

For production-ready deployment, we recommend using Kubernetes, however,
for simple use cases, you could execute the service using docker
on a single server:
```bash
# The entrypoint is preconfigured:
docker run -p 8080:8080 ghga/file-ingest-service:3.0.0 --help
docker run -p 8080:8080 ghga/file-information-service:1.0.0 --help
```

If you prefer not to use containers, you may install the service from source:
Expand All @@ -38,17 +36,27 @@ If you prefer not to use containers, you may install the service from source:
pip install .

# To run the service:
fis --help
fins --help
```

## Configuration

### Parameters

The service requires the following configuration parameters:
- **`files_to_delete_topic`** *(string)*: The name of the topic for events informing about files to be deleted.


Examples:

```json
"file_deletions"
```


- **`log_level`** *(string)*: The minimum log level to capture. Must be one of: `["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG", "TRACE"]`. Default: `"INFO"`.

- **`service_name`** *(string)*: Default: `"fis"`.
- **`service_name`** *(string)*: Default: `"fins"`.

- **`service_instance_id`** *(string)*: A string that uniquely identifies this instance across all instances of this service. A globally unique Kafka client ID will be created by concatenating the service_name and the service_instance_id.

Expand Down Expand Up @@ -83,119 +91,82 @@ The service requires the following configuration parameters:

- **`log_traceback`** *(boolean)*: Whether to include exception tracebacks in log messages. Default: `true`.

- **`vault_url`** *(string)*: URL of the vault instance to connect to.
- **`kafka_servers`** *(array)*: A list of connection strings to connect to Kafka bootstrap servers.

- **Items** *(string)*


Examples:

```json
"http://127.0.0.1.8200"
[
"localhost:9092"
]
```


- **`vault_role_id`**: Vault role ID to access a specific prefix. Default: `null`.

- **Any of**

- *string, format: password*

- *null*


Examples:

```json
"example_role"
```
- **`kafka_security_protocol`** *(string)*: Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL. Must be one of: `["PLAINTEXT", "SSL"]`. Default: `"PLAINTEXT"`.

- **`kafka_ssl_cafile`** *(string)*: Certificate Authority file path containing certificates used to sign broker certificates. If a CA is not specified, the default system CA will be used if found by OpenSSL. Default: `""`.

- **`vault_secret_id`**: Vault secret ID to access a specific prefix. Default: `null`.
- **`kafka_ssl_certfile`** *(string)*: Optional filename of client certificate, as well as any CA certificates needed to establish the certificate's authenticity. Default: `""`.

- **Any of**
- **`kafka_ssl_keyfile`** *(string)*: Optional filename containing the client private key. Default: `""`.

- *string, format: password*
- **`kafka_ssl_password`** *(string, format: password)*: Optional password to be used for the client private key. Default: `""`.

- *null*
- **`generate_correlation_id`** *(boolean)*: A flag, which, if False, will result in an error when inbound requests don't possess a correlation ID. If True, requests without a correlation ID will be assigned a newly generated ID in the correlation ID middleware function. Default: `true`.


Examples:

```json
"example_secret"
true
```


- **`vault_verify`**: SSL certificates (CA bundle) used to verify the identity of the vault, or True to use the default CAs, or False for no verification. Default: `true`.

- **Any of**

- *boolean*

- *string*


Examples:

```json
"/etc/ssl/certs/my_bundle.pem"
false
```


- **`vault_path`** *(string)*: Path without leading or trailing slashes where secrets should be stored in the vault.

- **`vault_secrets_mount_point`** *(string)*: Name used to address the secret engine under a custom mount path. Default: `"secret"`.
- **`db_connection_str`** *(string, format: password)*: MongoDB connection string. Might include credentials. For more information see: https://naiveskill.com/mongodb-connection-string/.


Examples:

```json
"secret"
"mongodb://localhost:27017"
```


- **`vault_kube_role`**: Vault role name used for Kubernetes authentication. Default: `null`.

- **Any of**

- *string*

- *null*
- **`db_name`** *(string)*: Name of the database located on the MongoDB server.


Examples:

```json
"file-ingest-role"
"my-database"
```


- **`service_account_token_path`** *(string, format: path)*: Path to service account token used by kube auth adapter. Default: `"/var/run/secrets/kubernetes.io/serviceaccount/token"`.

- **`private_key`** *(string)*: Base64 encoded private key of the keypair whose public key is used to encrypt the payload.

- **`source_bucket_id`** *(string)*: ID of the bucket the object(s) corresponding to the upload metadata have been uploaded to. This should currently point to the staging bucket.

- **`token_hashes`** *(array)*: List of token hashes corresponding to the tokens that can be used to authenticate calls to this service.

- **Items** *(string)*

- **`file_upload_validation_success_topic`** *(string)*: The name of the topic use to publish FileUploadValidationSuccess events.
- **`file_registered_event_topic`** *(string)*: The name of the topic for events informing about new registered files for which the metadata should be made available.


Examples:

```json
"file_upload_validation_success"
"internal_file_registry"
```


- **`file_validations_collection`** *(string)*: The name of the collection used to store FileUploadValidationSuccess events. Default: `"file-validations"`.
- **`file_registered_event_type`** *(string)*: The name of the type used for events informing about new registered files for which the metadata should be made available.


Examples:

```json
"file-validations"
"file_registered"
```


Expand Down Expand Up @@ -291,73 +262,14 @@ The service requires the following configuration parameters:
```


- **`generate_correlation_id`** *(boolean)*: A flag, which, if False, will result in an error when trying to publish an event without a valid correlation ID set for the context. If True, the a newly correlation ID will be generated and used in the event header. Default: `true`.


Examples:

```json
true
```


```json
false
```


- **`kafka_servers`** *(array)*: A list of connection strings to connect to Kafka bootstrap servers.

- **Items** *(string)*


Examples:

```json
[
"localhost:9092"
]
```


- **`kafka_security_protocol`** *(string)*: Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL. Must be one of: `["PLAINTEXT", "SSL"]`. Default: `"PLAINTEXT"`.

- **`kafka_ssl_cafile`** *(string)*: Certificate Authority file path containing certificates used to sign broker certificates. If a CA is not specified, the default system CA will be used if found by OpenSSL. Default: `""`.

- **`kafka_ssl_certfile`** *(string)*: Optional filename of client certificate, as well as any CA certificates needed to establish the certificate's authenticity. Default: `""`.

- **`kafka_ssl_keyfile`** *(string)*: Optional filename containing the client private key. Default: `""`.

- **`kafka_ssl_password`** *(string, format: password)*: Optional password to be used for the client private key. Default: `""`.

- **`db_connection_str`** *(string, format: password)*: MongoDB connection string. Might include credentials. For more information see: https://naiveskill.com/mongodb-connection-string/.


Examples:

```json
"mongodb://localhost:27017"
```


- **`db_name`** *(string)*: Name of the database located on the MongoDB server.


Examples:

```json
"my-database"
```



### Usage:

A template YAML for configuring the service can be found at
[`./example-config.yaml`](./example-config.yaml).
Please adapt it, rename it to `.fis.yaml`, and place it into one of the following locations:
- in the current working directory were you are execute the service (on unix: `./.fis.yaml`)
- in your home directory (on unix: `~/.fis.yaml`)
Please adapt it, rename it to `.fins.yaml`, and place it into one of the following locations:
- in the current working directory were you are execute the service (on unix: `./.fins.yaml`)
- in your home directory (on unix: `~/.fins.yaml`)

The config yaml will be automatically parsed by the service.

Expand All @@ -366,8 +278,8 @@ The config yaml will be automatically parsed by the service.
All parameters mentioned in the [`./example-config.yaml`](./example-config.yaml)
could also be set using environment variables or file secrets.

For naming the environment variables, just prefix the parameter name with `fis_`,
e.g. for the `host` set an environment variable named `fis_host`
For naming the environment variables, just prefix the parameter name with `fins_`,
e.g. for the `host` set an environment variable named `fins_host`
(you may use both upper or lower cases, however, it is standard to define all env
variables in upper cases).

Expand Down
Loading

0 comments on commit 27cdb46

Please sign in to comment.