upload tarball and metadata file to different directories in S3 bucket #241

trz42 · 2024-02-06T08:13:28Z

Currently (in EESSI), the tarball for built software and a metadata file describing the contents of that tarball are uploaded to the same directory in the S3 bucket. During ingestion they stay in the same directory. The ingestion procedure puts only the metadata file into a staging repository and when the state of an ingestion changes, the metadata file is moved to a corresponding top level directory. For example, it is first created under new/some_path/TARBALL.meta.txt and moved to staged/some_path/TARBALL.meta.txt when the tarball has been staged from the S3 bucket to the Stratum-0 server.

The current procedure may lead to many GitHub API requests for which an hourly limit of 5000 is imposed. Hitting that limit will lead to failing or slowed-down ingestion progression.

In NESSI, we use a slightly different approach. Tarballs are always put under tarballs/some_path/TARBALL in the S3 bucket (different top-level directory) and never moved (same as in EESSI). Metadata files are initially created under new/some_path/TARBALL.meta.txt in the S3 bucket (different top-level directory). The ingestion procedure moves the metadata file in the S3 bucket to a top-level directory corresponding to the state of the ingestion (differs to EESSI approach). The metadata file is not moved between different directories in the staging repository on GitHub (differs to EESSI approach).

In NESSI, we have modified the script eessi-upload-to-staging such that the tarball and the metadata file are uploaded to different top-level directories. Code looks like this after the change

        echo Uploading to "${url}"
        echo "  store tarball at tarballs/${aws_path}/${aws_file}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "tarballs/${aws_path}/${aws_file}" \
                "${endpoint_url}"
        echo "  store metadata file at new/${aws_path}/${aws_file}.meta.txt"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "new/${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

The corresponding code in EESSI is

        echo Uploading to "${url}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}" \
                "${endpoint_url}"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

Instead of hardcoding the destination for the uploads it might be better to make that location configurable. This would also allow for a smoother migration because using different locations in the S3 bucket will also require changes to the the ingestion scripts running as cron jobs on the Stratum-0.

The text was updated successfully, but these errors were encountered:

trz42 added enhancement priority:medium difficulty:easy labels Feb 6, 2024

trz42 mentioned this issue Feb 6, 2024

support for rebuilding software or removing paths before ingestion #147

Open

trz42 mentioned this issue Feb 16, 2024

make upload directory in S3 bucket configurable #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upload tarball and metadata file to different directories in S3 bucket #241

upload tarball and metadata file to different directories in S3 bucket #241

trz42 commented Feb 6, 2024

upload tarball and metadata file to different directories in S3 bucket #241

upload tarball and metadata file to different directories in S3 bucket #241

Comments

trz42 commented Feb 6, 2024