make upload directory in S3 bucket configurable #254

trz42 · 2024-02-16T14:59:06Z

This is the final PR to sync bot codes used by NESSI and EESSI. Essentially it provides two parts:

changes to upload location of the tarball and the corresponding metadata file
any fixes to bugs that were potentially introduced by recent PRs:
includes two small changes/additions
- 3a2df5a (report download status also in case of success)
- bbfb21b (obtain target architecture from comment when processing the bot: status command)
updates README.md to include new and missing settings

The PR has been extensively tested through multiple PRs with both a bot instance used for development and several production bot instances (NESSI). See

Changes to the upload location

Current practice in EESSI and NESSI

In EESSI, both the tarball and the metadata file are uploaded to the same unique path

BUCKET_NAME/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

In NESSI, we use a slightly modified directory structure. The tarball is placed at

BUCKET_NAME/tarballs/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

The metadata file is placed at

BUCKET_NAME/new/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

The motivation for splitting the location for the tarball and the metadata file was to improve the efficiency of the ingestion procedure. In EESSI, the ingestion procedure scans for the whole S3 bucket for new tarballs and compares the found tarballs to Git branches in a staging repository on GitHub. The EESSI procedure also moved the metadata file in the staging repository to different top-level directories corresponding to the status of the ingestion. This procedure frequently consumed the available GitHub REST API requests (per hour). It was adjusted in NESSI such that most changes to the staging repository are done locally (in a Git repository on "disk") and only strictly necessary updates are pushed to the repository. Also, instead of moving the metadata file in the staging repository it was moved in the S3 bucket to keep track of the status of an ingestion.

Suggested changes (this PR) on the bot-side

To synchronise bot codes but accommodate for different ingestion procedures and needs in different scenarios, the initial idea for this PR was to make only a part of the upload location configurable. In particular, tarballs and new could be made configurable by settings tarball_dir and metadata_dir, respectively. However, we saw an opportunity to make this even more flexible and thus also make a first step towards future efficiency improvements of the ingestion procedure. Issues that may benefit from this PR are

deploy should only upload a single tarball #192
Streamlining deploy workflow #225
upload tarball and metadata file to different directories in S3 bucket #241
dealing with multiple bots that deploy #242
So, the suggested changes feature the following capabilities:
the full path to where a tarball or metadata file are uploaded to are configurable
the specification of the location can be done separately for a tarball and a metadata file
the specification is customisable for a repository target/identifier, e.g., for different CVMFS repositories/versions different upload locations can be specified
the specification may use variable components, currently those are
- ${github_repository} which is expanded to the full name of the repository, e.g., EESSI/software-layer
- ${pull_request_number} which is expanded to the number of the pull request in which the tarball was created
- ${legacy_aws_path} which is expanded to the legacy path used in EESSI

- adds command line args to specify upload target directories/directory pattern - adds a command line arg to show variables that could be used in a directory pattern - makes the legacy path accessible (i.e., can be used in the directory pattern) - slighly modifies some variable names to make their purpose more clear - fixes small typo - makes script a bit more 'noisy' to let it tell what it does (should facilitate debugging; as some actions are more configurable it's desirable to have a bit more information about what is going on)

- read settings from config - parse settings - pass settings to upload script - adjust messages used for reporting via GitHub PR comments

…nto configurable_upload_directory

- added new settings `metadata_prefix` and `tarball_prefix` - added new settings for reporting about pull request download failures - added missing settings for build permissions and reporting about no permissions

bedroge

Nice work @trz42 , looks good to me. Just have some tiny comments.

README.md

truib added 3 commits February 16, 2024 09:44

adds settings and description for them to example cfg file

ccf5230

pass *_prefix settings to upload script

1fd7b60

- read settings from config - parse settings - pass settings to upload script - adjust messages used for reporting via GitHub PR comments

trz42 added enhancement difficulty:medium priority:high refactoring labels Feb 16, 2024

trz42 marked this pull request as draft February 16, 2024 15:00

trz42 mentioned this pull request Feb 16, 2024

TESTING bot PR254 with EESSI/SWL PR467 (Run test suite based on pr366) trz42/software-layer#69

Open

truib added 2 commits February 20, 2024 07:01

return status of PR download in case of no ERROR

3a2df5a

add a bit log info + fixing obtaining target_arch

bbfb21b

trz42 mentioned this pull request Feb 21, 2024

{2023.06}[foss/2021a] CaDiCaL v1.3.0 trz42/software-layer#72

Open

truib added 2 commits February 22, 2024 20:26

Merge branch 'develop' of github.com:EESSI/eessi-bot-software-layer i…

f1c4001

…nto configurable_upload_directory

add new/missing bot settings to README.md

7ef1e4b

- added new settings `metadata_prefix` and `tarball_prefix` - added new settings for reporting about pull request download failures - added missing settings for build permissions and reporting about no permissions

trz42 marked this pull request as ready for review February 23, 2024 19:44

fix flake8 test issue

2eb7f0a

poksumdo mentioned this pull request Feb 24, 2024

{2023.06}[foss/2023a] snakemake v8.4.2 NorESSI/software-layer#281

Merged

bedroge requested changes Feb 26, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

updating README.md to implement suggestions

8b09bec

bedroge approved these changes Feb 26, 2024

View reviewed changes

bedroge merged commit 41c1ab1 into EESSI:develop Feb 26, 2024
7 checks passed

This was referenced Feb 27, 2024

release notes for v0.4.0 #258

Merged

release v0.4.0 #259

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make upload directory in S3 bucket configurable #254

make upload directory in S3 bucket configurable #254

trz42 commented Feb 16, 2024 •

edited

Loading

bedroge left a comment

make upload directory in S3 bucket configurable #254

make upload directory in S3 bucket configurable #254

Conversation

trz42 commented Feb 16, 2024 • edited Loading

Changes to the upload location

Current practice in EESSI and NESSI

Suggested changes (this PR) on the bot-side

bedroge left a comment

Choose a reason for hiding this comment

trz42 commented Feb 16, 2024 •

edited

Loading