Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make upload directory in S3 bucket configurable #254

Merged
merged 9 commits into from
Feb 26, 2024
98 changes: 98 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,20 @@ submit_command = /usr/bin/sbatch
```
`submit_command` is the full path to the Slurm job submission command used for submitting batch jobs. You may want to verify if `sbatch` is provided at that path or determine its actual location (using `which sbatch`).

```
build_permission = GH_ACCOUNT_1 GH_ACCOUNT_2 ...
```
`build_permission` defines which GitHub accounts have the permission to trigger
build jobs, i.e., for which accounts the bot acts on `bot: build ...` commands.
If the value is left empty, everyone can trigger build jobs.

```
no_build_permission_comment = The `bot: build ...` command has been used by user `{build_labeler}`, but this person does not have permission to trigger builds.
```
`no_build_permission_comment` defines a comment (template) that is used when
the account trying to trigger build jobs has no permission to do so.


#### `[bot_control]` section

The `[bot_control]` section contains settings for configuring the feature to
Expand Down Expand Up @@ -485,6 +499,43 @@ This defines a message that is added to the status table in a PR comment
corresponding to a job whose tarball should have been uploaded (e.g., after
setting the `bot:deploy` label).


```
metadata_prefix = LOCATION_WHERE_METADATA_FILE_GETS_DEPOSITED
tarball_prefix = LOCATION_WHERE_TARBALL_GETS_DEPOSITED
```

These two settings are used to define where (which directory) in the S3 bucket
(see `bucket_name` above) the metadata file and the tarball will be stored. The
value `LOCATION...` can be a string value to always use the same 'prefix'
regardless of the target CVMFS repository, or can be a mapping of a target
repository id (see also `repo_target_map` below) to a prefix.

The prefix itself can use some (environment) variables that are set within
the upload script (see `tarball_upload_script` above). Currently those are:
* `'${github_repository}'` (which would be expanded to the full name of the GitHub
repository, e.g., `EESSI/software-layer`),
* `'${legacy_aws_path}'` (which expands to the legacy/old prefix being used for
storing tarballs/metadata files, the old prefix is
`EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/`), _and_
* `'${pull_request_number}'` (which would be expanded to the number of the pull
request from which the tarball originates).
Note, it's important to single-quote (`'`) the variables as shown above, because
they may likely not be defined when the bot calls the upload script.

The list of supported variables can be shown by running
`scripts/eessi-upload-to-staging --list-variables`.

**Examples:**
```
metadata_prefix = {"eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
tarball_prefix = {
"eessi-pilot-2023.06": "",
"eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"
}
```
If left empty, the old/legacy prefix is being used.

#### `[architecturetargets]` section

The section `[architecturetargets]` defines for which targets (OS/SUBDIR), (for example `linux/x86_64/amd/zen2`) the EESSI bot should submit jobs, and which additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack.
Expand Down Expand Up @@ -657,6 +708,53 @@ job_test_unknown_fmt = <details><summary>:shrug: UNKNOWN _(click triangle for de
`job_test_unknown_fmt` is used in case no test file (produced by `bot/check-test.sh`
provided by target repository) was found.


#### `[download_pr_comments]` section

The `[download_pr_comments]` section sets templates for messages related to
downloading the contents of a pull request.
```
git_clone_failure = Unable to clone the target repository.
```
`git_clone_failure` is shown when `git clone` failed.

```
git_clone_tip = _Tip: This could be a connection failure. Try again and if the issue remains check if the address is correct_.
```
`git_clone_tip` should contain some hint on how to deal with the issue. It is shown when `git clone` failed.

```
git_checkout_failure = Unable to checkout to the correct branch.
```
`git_checkout_failure` is shown when `git checkout` failed.

```
git_checkout_tip = _Tip: Ensure that the branch name is correct and the target branch is available._
```
`git_checkout_tip` should contain some hint on how to deal with the failure. It
is shown when `git checkout` failed.

```
curl_failure = Unable to download the `.diff` file.
```
`curl_failure` is shown when downloading the `PR_NUMBER.diff`
```
curl_tip = _Tip: This could be a connection failure. Try again and if the issue remains check if the address is correct_
```
`curl_tip` should help in how to deal with failing downloads of the `.diff` file.

```
git_apply_failure = Unable to download or merge changes between the source branch and the destination branch.
```
`git_apply_failure` is shown when applying the `.diff` file with `git apply`
failed.

```
git_apply_tip = _Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts._
```
`git_apply_tip` should guide the contributor/maintainer about resolving the cause
of `git apply` failing.

# Instructions to run the bot components

The bot consists of three components:
Expand Down
22 changes: 22 additions & 0 deletions app.cfg.example
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,28 @@ deploy_permission =
# template for comment when user who set a label has no permission to trigger deploying tarballs
no_deploy_permission_comment = Label `bot:deploy` has been set by user `{deploy_labeler}`, but this person does not have permission to trigger deployments

# settings for where (directory) in the S3 bucket to store the metadata file and
# the tarball
# - Can be a string value to always use the same 'prefix' regardless of the target
# CVMFS repository, or can be a mapping of a target repository id (see also
# repo_target_map) to a prefix.
# - The prefix itself can use some (environment) variables that are set within
# the script. Currently those are:
# * 'github_repository' (which would be expanded to the full name of the GitHub
# repository, e.g., 'EESSI/software-layer'),
# * 'legacy_aws_path' (which expands to the legacy/old prefix being used for
# storing tarballs/metadata files) and
# * 'pull_request_number' (which would be expanded to the number of the pull
# request from which the tarball originates).
# - The list of supported variables can be shown by running
# `scripts/eessi-upload-to-staging --list-variables`.
# - Examples:
# metadata_prefix = {"eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
# tarball_prefix = {"eessi-pilot-2023.06": "", "eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
# If left empty, the old/legacy prefix is being used.
metadata_prefix =
tarball_prefix =


[architecturetargets]
# defines both for which architectures the bot will build
Expand Down
109 changes: 86 additions & 23 deletions scripts/eessi-upload-to-staging
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ function create_metadata_file
_tarball=$1
_url=$2
_repository=$3
_pull_request=$4
_pull_request_number=$4
_pull_request_comment_id=$5

_tmpfile=$(mktemp)
Expand All @@ -56,31 +56,43 @@ function create_metadata_file
--arg sha256 "$(sha256sum "${_tarball}" | awk '{print $1}')" \
--arg url "${_url}" \
--arg repo "${_repository}" \
--arg pr "${_pull_request}" \
--arg pr "${_pull_request_number}" \
--arg pr_comment_id "${_pull_request_comment_id}" \
'{
uploader: {username: $un, ip: $ip, hostname: $hn},
payload: {filename: $fn, size: $sz, ctime: $ct, sha256sum: $sha256, url: $url},
link2pr: {repo: $repo, pr: $pr, pr_comment_id: $pr_commend_id},
link2pr: {repo: $repo, pr: $pr, pr_comment_id: $pr_comment_id},
}' > "${_tmpfile}"

echo "${_tmpfile}"
}

function display_help
{
echo "Usage: $0 [OPTIONS] <filenames>" >&2
echo " -e | --endpoint-url URL - endpoint url (needed for non AWS S3)" >&2
echo " -h | --help - display this usage information" >&2
echo " -i | --pr-comment-id - identifier of a PR comment; may be" >&2
echo " used to efficiently determine the PR" >&2
echo " comment to be updated during the" >&2
echo " ingestion procedure" >&2
echo " -n | --bucket-name BUCKET - bucket name (same as BUCKET above)" >&2
echo " -p | --pull-request NUMBER - a pull request NUMBER; used to" >&2
echo " link the upload to a PR" >&2
echo " -r | --repository FULL_NAME - a repository name ACCOUNT/REPONAME;" >&2
echo " used to link the upload to a PR" >&2
echo "Usage: $0 [OPTIONS] <filenames>" >&2
echo " -e | --endpoint-url URL - endpoint url (needed for non AWS S3)" >&2
echo " -h | --help - display this usage information" >&2
echo " -i | --pr-comment-id - identifier of a PR comment; may be" >&2
echo " used to efficiently determine the PR" >&2
echo " comment to be updated during the" >&2
echo " ingestion procedure" >&2
echo " -l | --list-variables - list variables that are available" >&2
echo " for expansion" >&2
echo " -m | --metadata-prefix PREFIX - a directory to which the metadata" >&2
echo " file shall be uploaded; BASH variable" >&2
echo " expansion will be applied; arg '-l'" >&2
echo " lists variables that are defined at" >&2
echo " the time of expansion" >&2
echo " -n | --bucket-name BUCKET - bucket name (same as BUCKET above)" >&2
echo " -p | --pull-request-number INT - a pull request number (INT); used to" >&2
echo " link the upload to a PR" >&2
echo " -r | --repository FULL_NAME - a repository name ACCOUNT/REPONAME;" >&2
echo " used to link the upload to a PR" >&2
echo " -t | --tarball-prefix PREFIX - a directory to which the tarball" >&2
echo " shall be uploaded; BASH variable" >&2
echo " expansion will be applied; arg '-l'" >&2
echo " lists variables that are defined at" >&2
echo " the time of expansion" >&2
}

if [[ $# -lt 1 ]]; then
Expand All @@ -106,8 +118,16 @@ endpoint_url=

# provided via command line arguments
pr_comment_id="none"
pull_request="none"
repository="EESSI/software-layer"
pull_request_number="none"
github_repository="EESSI/software-layer"

# provided via options in the bot's config file app.cfg and/or command line argument
metadata_prefix=
tarball_prefix=

# other variables
legacy_aws_path=
variables="github_repository legacy_aws_path pull_request_number"

while [[ $# -gt 0 ]]; do
case $1 in
Expand All @@ -119,20 +139,36 @@ while [[ $# -gt 0 ]]; do
display_help
exit 0
;;
-l|--list-variables)
echo "variables that will be expanded: name (default value)"
for var in ${variables}
do
echo " ${var} (${!var:-unset})"
done
exit 0
;;
-i|--pr-comment-id)
pr_comment_id="$2"
shift 2
;;
-m|--metadata-prefix)
metadata_prefix="$2"
shift 2
;;
-n|--bucket-name)
bucket_name="$2"
shift 2
;;
-p|--pull-request)
pull_request="$2"
-p|--pull-request-number)
pull_request_number="$2"
shift 2
;;
-r|--repository)
repository="$2"
github_repository="$2"
shift 2
;;
-t|--tarball-prefix)
tarball_prefix="$2"
shift 2
;;
-*|--*)
Expand Down Expand Up @@ -168,23 +204,50 @@ for file in "$*"; do
basefile=$( basename ${file} )
if check_file_name ${basefile}; then
if tar tf "${file}" | head -n1 > /dev/null; then
aws_path=$(basename ${file} | tr -s '-' '/' \
# 'legacy_aws_path' might be used in tarball_prefix or metadata_prefix
# its purpose is to support the old/legacy method to derive the location
# where to store the tarball and metadata file
export legacy_aws_path=$(basename ${file} | tr -s '-' '/' \
| perl -pe 's/^eessi.//;' | perl -pe 's/\.tar\.gz$//;' )
if [ -z ${tarball_prefix} ]; then
aws_path=${legacy_aws_path}
else
export pull_request_number
export github_repository
aws_path=$(envsubst <<< "${tarball_prefix}")
fi
aws_file=$(basename ${file})
echo "Creating metadata file"
url="${bucket_base}/${aws_path}/${aws_file}"
metadata_file=$(create_metadata_file "${file}" "${url}" \
"${repository}" "${pull_request}" \
echo "create_metadata_file file=${file} \
url=${url} \
github_repository=${github_repository} \
pull_request_number=${pull_request_number} \
pr_comment_id=${pr_comment_id}"
metadata_file=$(create_metadata_file "${file}" \
"${url}" \
"${github_repository}" \
"${pull_request_number}" \
"${pr_comment_id}")
echo "metadata:"
cat ${metadata_file}

echo Uploading to "${url}"
echo " store tarball at ${aws_path}/${aws_file}"
upload_to_staging_bucket \
"${file}" \
"${bucket_name}" \
"${aws_path}/${aws_file}" \
"${endpoint_url}"

if [ -z ${metadata_prefix} ]; then
aws_path=${legacy_aws_path}
else
export pull_request_number
export github_repository
aws_path=$(envsubst <<< "${metadata_prefix}")
fi
echo " store metadata file at ${aws_path}/${aws_file}.meta.txt"
upload_to_staging_bucket \
"${metadata_file}" \
"${bucket_name}" \
Expand Down
11 changes: 10 additions & 1 deletion tasks/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
ERROR_GIT_APPLY = "git apply"
ERROR_GIT_CHECKOUT = "git checkout"
ERROR_GIT_CLONE = "curl"
ERROR_NONE = "none"
GITHUB = "github"
GIT_CLONE_FAILURE = "git_clone_failure"
GIT_CLONE_TIP = "git_clone_tip"
Expand Down Expand Up @@ -399,6 +400,9 @@ def download_pr(repo_name, branch_name, pr, arch_job_dir):
error_stage = ERROR_GIT_APPLY
return git_apply_output, git_apply_error, git_apply_exit_code, error_stage

# need to return four items also in case everything went fine
return 'downloading PR succeeded', 'no error while downloading PR', 0, ERROR_NONE


def comment_download_pr(base_repo_name, pr, download_pr_exit_code, download_pr_error, error_stage):
"""
Expand Down Expand Up @@ -862,6 +866,8 @@ def request_bot_build_issue_comments(repo_name, pr_number):
status_table (dict): dictionary with 'arch', 'date', 'status', 'url' and 'result'
for all the finished builds;
"""
fn = sys._getframe().f_code.co_name

status_table = {'arch': [], 'date': [], 'status': [], 'url': [], 'result': []}
cfg = config.read_config()

Expand All @@ -882,9 +888,12 @@ def request_bot_build_issue_comments(repo_name, pr_number):
first_line = comment['body'].split('\n')[0]
arch_map = get_architecture_targets(cfg)
for arch in arch_map.keys():
target_arch = '/'.join(arch.split('/')[-1])
# drop the first element in arch (which names the OS type) and join the remaining items with '-'
target_arch = '-'.join(arch.split('/')[1:])
if target_arch in first_line:
status_table['arch'].append(target_arch)
else:
log(f"{fn}(): target_arch '{target_arch}' not found in first line '{first_line}'")

# get date, status, url and result from the markdown table
comment_table = comment['body'][comment['body'].find('|'):comment['body'].rfind('|')+1]
Expand Down
Loading
Loading