Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make upload directory in S3 bucket configurable #254

Merged
merged 9 commits into from
Feb 26, 2024

Conversation

trz42
Copy link
Contributor

@trz42 trz42 commented Feb 16, 2024

This is the final PR to sync bot codes used by NESSI and EESSI. Essentially it provides two parts:

The PR has been extensively tested through multiple PRs with both a bot instance used for development and several production bot instances (NESSI). See

Changes to the upload location

Current practice in EESSI and NESSI

In EESSI, both the tarball and the metadata file are uploaded to the same unique path

BUCKET_NAME/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

In NESSI, we use a slightly modified directory structure. The tarball is placed at

BUCKET_NAME/tarballs/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

The metadata file is placed at

BUCKET_NAME/new/EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/

The motivation for splitting the location for the tarball and the metadata file was to improve the efficiency of the ingestion procedure. In EESSI, the ingestion procedure scans for the whole S3 bucket for new tarballs and compares the found tarballs to Git branches in a staging repository on GitHub. The EESSI procedure also moved the metadata file in the staging repository to different top-level directories corresponding to the status of the ingestion. This procedure frequently consumed the available GitHub REST API requests (per hour). It was adjusted in NESSI such that most changes to the staging repository are done locally (in a Git repository on "disk") and only strictly necessary updates are pushed to the repository. Also, instead of moving the metadata file in the staging repository it was moved in the S3 bucket to keep track of the status of an ingestion.

Suggested changes (this PR) on the bot-side

To synchronise bot codes but accommodate for different ingestion procedures and needs in different scenarios, the initial idea for this PR was to make only a part of the upload location configurable. In particular, tarballs and new could be made configurable by settings tarball_dir and metadata_dir, respectively. However, we saw an opportunity to make this even more flexible and thus also make a first step towards future efficiency improvements of the ingestion procedure. Issues that may benefit from this PR are

- adds command line args to specify upload target directories/directory pattern
- adds a command line arg to show variables that could be used in a directory
  pattern
- makes the legacy path accessible (i.e., can be used in the directory pattern)
- slighly modifies some variable names to make their purpose more clear
- fixes small typo
- makes script a bit more 'noisy' to let it tell what it does (should facilitate
  debugging; as some actions are more configurable it's desirable to have a bit
  more information about what is going on)
- read settings from config
- parse settings
- pass settings to upload script
- adjust messages used for reporting via GitHub PR comments
- added new settings `metadata_prefix` and `tarball_prefix`
- added new settings for reporting about pull request download failures
- added missing settings for build permissions and reporting about no permissions
@trz42 trz42 marked this pull request as ready for review February 23, 2024 19:44
Copy link
Contributor

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @trz42 , looks good to me. Just have some tiny comments.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@bedroge bedroge merged commit 41c1ab1 into EESSI:develop Feb 26, 2024
7 checks passed
This was referenced Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants