EESSI · bedroge · Feb 26, 2024 · Feb 16, 2024 · Feb 16, 2024 · Feb 16, 2024
diff --git a/README.md b/README.md
@@ -404,6 +404,20 @@ submit_command = /usr/bin/sbatch
 ```
 `submit_command` is the full path to the Slurm job submission command used for submitting batch jobs. You may want to verify if `sbatch` is provided at that path or determine its actual location (using `which sbatch`).
 
+```
+build_permission = GH_ACCOUNT_1 GH_ACCOUNT_2 ...
+```
+`build_permission` defines which GitHub accounts have the permission to trigger
+build jobs, i.e., for which accounts the bot acts on `bot: build ...` commands.
+If the value is left empty, everyone can trigger build jobs.
+
+```
+no_build_permission_comment = The `bot: build ...` command has been used by user `{build_labeler}`, but this person does not have permission to trigger builds.
+```
+`no_build_permission_comment` defines a comment (template) that is used when
+the account trying to trigger build jobs has no permission to do so.
+
+
 #### `[bot_control]` section
 
 The `[bot_control]` section contains settings for configuring the feature to
@@ -485,6 +499,43 @@ This defines a message that is added to the status table in a PR comment
 corresponding to a job whose tarball should have been uploaded (e.g., after
 setting the `bot:deploy` label).
 
+
+```
+metadata_prefix = LOCATION_WHERE_METADATA_FILE_GETS_DEPOSITED
+tarball_prefix = LOCATION_WHERE_TARBALL_GETS_DEPOSITED
+```
+
+These two settings are used to define where (which directory) in the S3 bucket
+(see `bucket_name` above) the metadata file and the tarball will be stored. The
+value `LOCATION...` can be a string value to always use the same 'prefix'
+regardless of the target CVMFS repository, or can be a mapping of a target
+repository id (see also `repo_target_map` below) to a prefix.
+
+The prefix itself can use some (environment) variables that are set within
+the upload script (see `tarball_upload_script` above). Currently those are:
+ * `'${github_repository}'` (which would be expanded to the full name of the GitHub
+   repository, e.g., `EESSI/software-layer`),
+ * `'${legacy_aws_path}'` (which expands to the legacy/old prefix being used for
+   storing tarballs/metadata files, the old prefix is
+   `EESSI_VERSION/TARBALL_TYPE/OS_TYPE/CPU_ARCHITECTURE/TIMESTAMP/`), _and_
+ * `'${pull_request_number}'` (which would be expanded to the number of the pull
+   request from which the tarball originates).
+Note, it's important to single-quote (`'`) the variables as shown above, because
+they may likely not be defined when the bot calls the upload script.
+
+The list of supported variables can be shown by running
+`scripts/eessi-upload-to-staging --list-variables`.
+
+**Examples:**
+```
+metadata_prefix = {"eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
+tarball_prefix = {
+    "eessi-pilot-2023.06": "",
+    "eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"
+    }
+```
+If left empty, the old/legacy prefix is being used.
+
 #### `[architecturetargets]` section
 
 The section `[architecturetargets]` defines for which targets (OS/SUBDIR), (for example `linux/x86_64/amd/zen2`) the EESSI bot should submit jobs, and which additional `sbatch` parameters will be used for requesting a compute node with the CPU microarchitecture needed to build the software stack.
@@ -657,6 +708,53 @@ job_test_unknown_fmt = <details><summary>:shrug: UNKNOWN _(click triangle for de
 `job_test_unknown_fmt` is used in case no test file (produced by `bot/check-test.sh`
 provided by target repository) was found.
 
+
+#### `[download_pr_comments]` section
+
+The `[download_pr_comments]` section sets templates for messages related to
+downloading the contents of a pull request.
+```
+git_clone_failure = Unable to clone the target repository.
+```
+`git_clone_failure` is shown when `git clone` failed.
+
+```
+git_clone_tip = _Tip: This could be a connection failure. Try again and if the issue remains check if the address is correct_.
+```
+`git_clone_tip` should contain some hint on how to deal with the issue. It is shown when `git clone` failed.
+
+```
+git_checkout_failure = Unable to checkout to the correct branch.
+```
+`git_checkout_failure` is shown when `git checkout` failed.
+
+```
+git_checkout_tip = _Tip: Ensure that the branch name is correct and the target branch is available._
+```
+`git_checkout_tip` should contain some hint on how to deal with the failure. It
+is shown when `git checkout` failed.
+
+```
+curl_failure = Unable to download the `.diff` file.
+```
+`curl_failure` is shown when downloading the `PR_NUMBER.diff`
+```
+curl_tip = _Tip: This could be a connection failure. Try again and if the issue remains check if the address is correct_
+```
+`curl_tip` should help in how to deal with failing downloads of the `.diff` file.
+
+```
+git_apply_failure = Unable to download or merge changes between the source branch and the destination branch.
+```
+`git_apply_failure` is shown when applying the `.diff` file with `git apply`
+failed.
+
+```
+git_apply_tip = _Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts._
+```
+`git_apply_tip` should guide the contributor/maintainer about resolving the cause
+of `git apply` failing.
+
 # Instructions to run the bot components
 
 The bot consists of three components:

diff --git a/app.cfg.example b/app.cfg.example
@@ -147,6 +147,28 @@ deploy_permission =
 # template for comment when user who set a label has no permission to trigger deploying tarballs
 no_deploy_permission_comment = Label `bot:deploy` has been set by user `{deploy_labeler}`, but this person does not have permission to trigger deployments
 
+# settings for where (directory) in the S3 bucket to store the metadata file and
+# the tarball
+# - Can be a string value to always use the same 'prefix' regardless of the target
+#   CVMFS repository, or can be a mapping of a target repository id (see also
+#   repo_target_map) to a prefix.
+# - The prefix itself can use some (environment) variables that are set within
+#   the script. Currently those are:
+#   * 'github_repository' (which would be expanded to the full name of the GitHub
+#     repository, e.g., 'EESSI/software-layer'),
+#   * 'legacy_aws_path' (which expands to the legacy/old prefix being used for
+#     storing tarballs/metadata files) and
+#   * 'pull_request_number' (which would be expanded to the number of the pull
+#     request from which the tarball originates).
+# - The list of supported variables can be shown by running
+#   `scripts/eessi-upload-to-staging --list-variables`.
+# - Examples:
+#   metadata_prefix = {"eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
+#   tarball_prefix = {"eessi-pilot-2023.06": "", "eessi.io-2023.06": "new/${github_repository}/${pull_request_number}"}
+# If left empty, the old/legacy prefix is being used.
+metadata_prefix =
+tarball_prefix =
+
 
 [architecturetargets]
 # defines both for which architectures the bot will build

diff --git a/scripts/eessi-upload-to-staging b/scripts/eessi-upload-to-staging
@@ -41,7 +41,7 @@ function create_metadata_file
   _tarball=$1
   _url=$2
   _repository=$3
-  _pull_request=$4
+  _pull_request_number=$4
   _pull_request_comment_id=$5
 
   _tmpfile=$(mktemp)
@@ -56,31 +56,43 @@ function create_metadata_file
     --arg sha256 "$(sha256sum "${_tarball}" | awk '{print $1}')" \
     --arg url "${_url}" \
     --arg repo "${_repository}" \
-    --arg pr "${_pull_request}" \
+    --arg pr "${_pull_request_number}" \
     --arg pr_comment_id "${_pull_request_comment_id}" \
     '{
        uploader: {username: $un, ip: $ip, hostname: $hn},
        payload: {filename: $fn, size: $sz, ctime: $ct, sha256sum: $sha256, url: $url},
-       link2pr: {repo: $repo, pr: $pr, pr_comment_id: $pr_commend_id},
+       link2pr: {repo: $repo, pr: $pr, pr_comment_id: $pr_comment_id},
      }' > "${_tmpfile}"
 
   echo "${_tmpfile}"
 }
 
 function display_help
 {
-  echo "Usage: $0 [OPTIONS] <filenames>"                                        >&2
-  echo "  -e | --endpoint-url URL      -  endpoint url (needed for non AWS S3)" >&2
-  echo "  -h | --help                  -  display this usage information"       >&2
-  echo "  -i | --pr-comment-id         -  identifier of a PR comment; may be"   >&2
-  echo "                                  used to efficiently determine the PR" >&2
-  echo "                                  comment to be updated during the"     >&2
-  echo "                                  ingestion procedure"                  >&2
-  echo "  -n | --bucket-name BUCKET    -  bucket name (same as BUCKET above)"   >&2
-  echo "  -p | --pull-request NUMBER   -  a pull request NUMBER; used to"       >&2
-  echo "                                  link the upload to a PR"              >&2
-  echo "  -r | --repository FULL_NAME  -  a repository name ACCOUNT/REPONAME;"  >&2
-  echo "                                  used to link the upload to a PR"      >&2
+  echo "Usage: $0 [OPTIONS] <filenames>"                                           >&2
+  echo "  -e | --endpoint-url URL        -  endpoint url (needed for non AWS S3)"  >&2
+  echo "  -h | --help                    -  display this usage information"        >&2
+  echo "  -i | --pr-comment-id           -  identifier of a PR comment; may be"    >&2
+  echo "                                    used to efficiently determine the PR"  >&2
+  echo "                                    comment to be updated during the"      >&2
+  echo "                                    ingestion procedure"                   >&2
+  echo "  -l | --list-variables          -  list variables that are available"     >&2
+  echo "                                    for expansion"                         >&2
+  echo "  -m | --metadata-prefix PREFIX  -  a directory to which the metadata"     >&2
+  echo "                                    file shall be uploaded; BASH variable" >&2
+  echo "                                    expansion will be applied; arg '-l'"   >&2
+  echo "                                    lists variables that are defined at"   >&2
+  echo "                                    the time of expansion"                 >&2
+  echo "  -n | --bucket-name BUCKET      -  bucket name (same as BUCKET above)"    >&2
+  echo "  -p | --pull-request-number INT -  a pull request number (INT); used to"  >&2
+  echo "                                    link the upload to a PR"               >&2
+  echo "  -r | --repository FULL_NAME    -  a repository name ACCOUNT/REPONAME;"   >&2
+  echo "                                    used to link the upload to a PR"       >&2
+  echo "  -t | --tarball-prefix PREFIX   -  a directory to which the tarball"      >&2
+  echo "                                    shall be uploaded; BASH variable"      >&2
+  echo "                                    expansion will be applied; arg '-l'"   >&2
+  echo "                                    lists variables that are defined at"   >&2
+  echo "                                    the time of expansion"                 >&2
 }
 
 if [[ $# -lt 1 ]]; then
@@ -106,8 +118,16 @@ endpoint_url=
 
 # provided via command line arguments
 pr_comment_id="none"
-pull_request="none"
-repository="EESSI/software-layer"
+pull_request_number="none"
+github_repository="EESSI/software-layer"
+
+# provided via options in the bot's config file app.cfg and/or command line argument
+metadata_prefix=
+tarball_prefix=
+
+# other variables
+legacy_aws_path=
+variables="github_repository legacy_aws_path pull_request_number"
 
 while [[ $# -gt 0 ]]; do
   case $1 in
@@ -119,20 +139,36 @@ while [[ $# -gt 0 ]]; do
       display_help
       exit 0
       ;;
+    -l|--list-variables)
+      echo "variables that will be expanded: name (default value)"
+      for var in ${variables}
+      do
+        echo "    ${var} (${!var:-unset})"
+      done
+      exit 0
+      ;;
     -i|--pr-comment-id)
       pr_comment_id="$2"
       shift 2
       ;;
+    -m|--metadata-prefix)
+      metadata_prefix="$2"
+      shift 2
+      ;;
     -n|--bucket-name)
       bucket_name="$2"
       shift 2
       ;;
-    -p|--pull-request)
-      pull_request="$2"
+    -p|--pull-request-number)
+      pull_request_number="$2"
       shift 2
       ;;
     -r|--repository)
-      repository="$2"
+      github_repository="$2"
+      shift 2
+      ;;
+    -t|--tarball-prefix)
+      tarball_prefix="$2"
       shift 2
       ;;
     -*|--*)
@@ -168,23 +204,50 @@ for file in "$*"; do
     basefile=$( basename ${file} )
     if check_file_name ${basefile}; then
       if tar tf "${file}" | head -n1 > /dev/null; then
-        aws_path=$(basename ${file} | tr -s '-' '/' \
+        # 'legacy_aws_path' might be used in tarball_prefix or metadata_prefix
+        # its purpose is to support the old/legacy method to derive the location
+        # where to store the tarball and metadata file
+        export legacy_aws_path=$(basename ${file} | tr -s '-' '/' \
                  | perl -pe 's/^eessi.//;' | perl -pe 's/\.tar\.gz$//;' )
+        if [ -z ${tarball_prefix} ]; then
+          aws_path=${legacy_aws_path}
+        else
+          export pull_request_number
+          export github_repository
+          aws_path=$(envsubst <<< "${tarball_prefix}")
+        fi
         aws_file=$(basename ${file})
         echo "Creating metadata file"
         url="${bucket_base}/${aws_path}/${aws_file}"
-        metadata_file=$(create_metadata_file "${file}" "${url}" \
-                                             "${repository}" "${pull_request}" \
+        echo "create_metadata_file file=${file} \
+                                   url=${url} \
+                                   github_repository=${github_repository} \
+                                   pull_request_number=${pull_request_number} \
+                                   pr_comment_id=${pr_comment_id}"
+        metadata_file=$(create_metadata_file "${file}" \
+                                             "${url}" \
+                                             "${github_repository}" \
+                                             "${pull_request_number}" \
                                              "${pr_comment_id}")
         echo "metadata:"
         cat ${metadata_file}
 
         echo Uploading to "${url}"
+        echo "  store tarball at ${aws_path}/${aws_file}"
         upload_to_staging_bucket \
                 "${file}" \
                 "${bucket_name}" \
                 "${aws_path}/${aws_file}" \
                 "${endpoint_url}"
+
+        if [ -z ${metadata_prefix} ]; then
+          aws_path=${legacy_aws_path}
+        else
+          export pull_request_number
+          export github_repository
+          aws_path=$(envsubst <<< "${metadata_prefix}")
+        fi
+        echo "  store metadata file at ${aws_path}/${aws_file}.meta.txt"
         upload_to_staging_bucket \
                 "${metadata_file}" \
                 "${bucket_name}" \

diff --git a/tasks/build.py b/tasks/build.py
@@ -49,6 +49,7 @@
 ERROR_GIT_APPLY = "git apply"
 ERROR_GIT_CHECKOUT = "git checkout"
 ERROR_GIT_CLONE = "curl"
+ERROR_NONE = "none"
 GITHUB = "github"
 GIT_CLONE_FAILURE = "git_clone_failure"
 GIT_CLONE_TIP = "git_clone_tip"
@@ -399,6 +400,9 @@ def download_pr(repo_name, branch_name, pr, arch_job_dir):
         error_stage = ERROR_GIT_APPLY
         return git_apply_output, git_apply_error, git_apply_exit_code, error_stage
 
+    # need to return four items also in case everything went fine
+    return 'downloading PR succeeded', 'no error while downloading PR', 0, ERROR_NONE
+
 
 def comment_download_pr(base_repo_name, pr, download_pr_exit_code, download_pr_error, error_stage):
     """
@@ -862,6 +866,8 @@ def request_bot_build_issue_comments(repo_name, pr_number):
         status_table (dict): dictionary with 'arch', 'date', 'status', 'url' and 'result'
             for all the finished builds;
     """
+    fn = sys._getframe().f_code.co_name
+
     status_table = {'arch': [], 'date': [], 'status': [], 'url': [], 'result': []}
     cfg = config.read_config()
 
@@ -882,9 +888,12 @@ def request_bot_build_issue_comments(repo_name, pr_number):
                 first_line = comment['body'].split('\n')[0]
                 arch_map = get_architecture_targets(cfg)
                 for arch in arch_map.keys():
-                    target_arch = '/'.join(arch.split('/')[-1])
+                    # drop the first element in arch (which names the OS type) and join the remaining items with '-'
+                    target_arch = '-'.join(arch.split('/')[1:])
                     if target_arch in first_line:
                         status_table['arch'].append(target_arch)
+                    else:
+                        log(f"{fn}(): target_arch '{target_arch}' not found in first line '{first_line}'")
 
                 # get date, status, url and result from the markdown table
                 comment_table = comment['body'][comment['body'].find('|'):comment['body'].rfind('|')+1]