Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use Workload Identity Federation and domain-wide delegation to Google Workspace user with googleapiclient upload to Google Drive #440

Closed
benglewis opened this issue Sep 10, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@benglewis
Copy link

benglewis commented Sep 10, 2024

TL;DR

I am also struggling with this. I am using the Python googleapiclient library to upload files to Google Drive using both GCP's Workload Identity Provider (WIF) and delegation to my Google Workspace user. The delegation seems to fail and instead, the files are uploaded as the service account (to which I already gave access to the folder), but it then gets stuck later on since it does not have sufficient storage space in it's Google Drive allowance. How can I get this work?

Here's the contents of upload_to_gdrive.py:

import logging
import os
from pathlib import Path
import sys
from googleapiclient.discovery import build
from google.auth import default
from googleapiclient.http import MediaFileUpload
from googleapiclient.errors import HttpError

BATCH_SIZE = 10  # Adjust this value based on your needs
SKIPPED_FOLDERS = {".git"}

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)


def create_folder_if_not_exists(service, folder_name: str, parent_id: str) -> str:
    query = f"name='{folder_name}' and '{parent_id}' in parents and mimeType='application/vnd.google-apps.folder' and trashed=false"
    results = (
        service.files().list(q=query, spaces="drive", fields="files(id)").execute()
    )
    if results["files"]:
        folder_id = results["files"][0]["id"]
        logger.info(f"Folder '{folder_name}' found with ID: {folder_id}")
        return folder_id
    else:
        folder_metadata = {
            "name": folder_name,
            "mimeType": "application/vnd.google-apps.folder",
            "parents": [parent_id],
        }
        folder = service.files().create(body=folder_metadata, fields="id").execute()
        logger.info(
            f"Folder '{folder_name}' created successfully with ID: {folder.get('id')}"
        )
        return folder.get("id")


def upload_file(service, file_path: str, parent: tuple[str, str]):
    parent_name, parent_id = parent
    file_metadata = {"name": Path(file_path).name, "parents": [parent_id]}
    media = MediaFileUpload(file_path, resumable=True)
    try:
        file = (
            service.files()
            .create(body=file_metadata, media_body=media, fields="id")
            .execute()
        )
        logger.debug(
            f"File '{file_path}' uploaded successfully with ID: {file.get('id')} to folder with name: {parent_name} with ID: {parent_id}"
        )
        return file.get("id")
    except HttpError as error:
        logger.error(f"An error occurred while uploading '{file_path}': {error}")
        raise error


def upload_folder(
    service, folder_path: Path, parent_id: str, target_subfolder: str | None = None
):
    target_parent_id = (
        create_folder_if_not_exists(service, target_subfolder, parent_id)
        if target_subfolder
        else parent_id
    )
    folder_path_to_id = {
        Path("."): target_parent_id
    }  # Caching the root folder as the parent ID

    for root, dirs, files in folder_path.walk():
        if any(folder in str(root) for folder in SKIPPED_FOLDERS):
            continue
        # Cache folder paths to prevent redundant folder creation
        for dir_name in dirs:
            dir_path = Path(root) / dir_name
            relative_path = dir_path.relative_to(folder_path)

            if relative_path not in folder_path_to_id:
                folder_id = create_folder_if_not_exists(service, dir_name, parent_id)
                folder_path_to_id[relative_path] = folder_id

        # Process files in batches
        file_batch = []
        for file in files:
            file_path = root / file
            rel_folder_path = file_path.parent.relative_to(folder_path)
            folder_id = folder_path_to_id.get(rel_folder_path, parent_id)
            file_batch.append((file_path, (rel_folder_path, folder_id)))

            if len(file_batch) == BATCH_SIZE:
                upload_batch(service, file_batch)
                file_batch = []

        if file_batch:
            upload_batch(service, file_batch)


def upload_batch(service, file_batch: list[tuple[str, tuple[str, str]]]):
    for file_path, (parent_name, parent_id) in file_batch:
        upload_file(service, file_path, (parent_name, parent_id))


def main(folder_path: str, target_subfolder: str | None = None):
    credentials, project = default()
    
    service = build("drive", "v3", credentials=credentials)
    folder_id = os.environ.get("GOOGLE_DRIVE_FOLDER_ID")
    if not folder_id:
        raise ValueError("GOOGLE_DRIVE_FOLDER_ID environment variable is not set")
    upload_folder(service, Path(folder_path), folder_id, target_subfolder)


if __name__ == "__main__":
    if len(sys.argv) < 2:
        logger.warn("Usage: python upload_to_drive.py <folder_path>")
        sys.exit(1)
    folder_path = sys.argv[1]
    target_subfolder = sys.argv[2] if len(sys.argv) == 3 else None
    if not Path(folder_path).is_dir():
        logger.error(f"Error: '{folder_path}' is not a valid directory.")
        sys.exit(1)
    main(folder_path, target_subfolder)

Expected behavior

The files should be uploaded as my Google Workspace user

Observed behavior

The files are uploaded as the Service Account user until the user hits a storage limit (I have given the service account permissions to the folder and I am uploading large files)

Action YAML

permissions:
  id-token: write # This is required for requesting the JWT
  contents: read # This is required for actions/checkout
jobs:
  Deploy:
    runs-on: ubuntu-latest-m
    steps:
      - name: Git clone the repository
        uses: actions/checkout@v4
      - name: Google auth
        id: 'auth'
        uses: google-github-actions/auth@v2
        with:
          token_format: 'access_token'
          workload_identity_provider: '${{ secrets.GCP_WIF_PROVIDER }}'
          service_account: '${{ secrets.GOOGLE_DRIVE_SERVICE_ACCOUNT }}'
          access_token_lifetime: 1800s
          access_token_scopes: https://www.googleapis.com/auth/drive
          access_token_subject: '${{ vars.GOOGLE_DRIVE_SUBJECT_ACCOUNT }}'
          delegates: '${{ secrets.GOOGLE_DRIVE_SERVICE_ACCOUNT }}'
      
      - name: Print the credentials file path
        run: |
          echo "Credentials file path: ${{ steps.auth.outputs.credentials_file_path }}"
          cat ${{ steps.auth.outputs.credentials_file_path }}

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.x'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
      
      - name: Upload to Google Drive
        env:
          GOOGLE_DRIVE_FOLDER_ID: ${{ secrets.GOOGLE_DRIVE_FOLDER_ID }}
          GOOGLE_APPLICATION_CREDENTIALS: ${{ steps.auth.outputs.credentials_file_path }}
        run: |
          python .github/workflows/upload_to_gdrive.py my-folder target-folder-location

Log output

##[debug]Evaluating condition for step: 'Google auth'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Google auth
##[debug]Register post job cleanup for action: google-github-actions/auth@v2
##[debug]Loading inputs
##[debug]Evaluating: secrets.GCP_WIF_PROVIDER
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'GCP_WIF_PROVIDER'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: secrets.GOOGLE_DRIVE_SERVICE_ACCOUNT
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'GOOGLE_DRIVE_SERVICE_ACCOUNT'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: vars.GOOGLE_DRIVE_SUBJECT_ACCOUNT
##[debug]Evaluating Index:
##[debug]..Evaluating vars:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'GOOGLE_DRIVE_SUBJECT_ACCOUNT'
##[debug]=> 'my-google-workspace-user@my-workspace.com'
##[debug]Result: 'my-google-workspace-user@my-workspace.com'
##[debug]Evaluating: secrets.GOOGLE_DRIVE_SERVICE_ACCOUNT
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'GOOGLE_DRIVE_SERVICE_ACCOUNT'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Loading env
Run google-github-actions/auth@v2
##[debug]Using workload identity provider "***"
##[debug]ID token url is https://pipelinesghubeus6.actions.githubusercontent.com/SKne7tZ5N7wvk8SqLMNClC7Djusf78ztXROgwIwWBbKxy0MekT/00000000-0000-0000-0000-000000000000/_apis/distributedtask/hubs/Actions/plans/f1a39477-9491-423a-882e-dfbe5ce25679/jobs/e07742bd-189a-5079-918b-43f8b2f94b89/idtoken?api-version=2.0&audience=https%3A%2F%2Fiam.googleapis.com%2F***
::add-mask::***
##[debug]WorkloadIdentityFederationClient: Computed audience, //iam.googleapis.com/***
##[debug]Creating credentials file
##[debug]WorkloadIdentityFederationClient.createCredentialsFile: Enabling service account impersonation via https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/***:generateAccessToken
##[debug]WorkloadIdentityFederationClient.createCredentialsFile: Creating credentials, {
##[debug]  "outputPath": "/home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json"
##[debug]}
Created credentials file at "/home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json"
##[debug]WorkloadIdentityFederationClient.getToken: Built request, {
##[debug]  "method": "POST",
##[debug]  "path": "https://sts.googleapis.com/v1/token",
##[debug]  "body": {
##[debug]    "audience": "//iam.googleapis.com/***",
##[debug]    "grantType": "urn:ietf:params:oauth:grant-type:token-exchange",
##[debug]    "requestedTokenType": "urn:ietf:params:oauth:token-type:access_token",
##[debug]    "scope": "https://www.googleapis.com/auth/cloud-platform",
##[debug]    "subjectTokenType": "urn:ietf:params:oauth:token-type:jwt",
##[debug]    "subjectToken": "***"
##[debug]  }
##[debug]}
##[debug]Successfully generated auth token
::add-mask::***
##[debug]Creating access token
##[debug]Using Domain-Wide Delegation flow
##[debug]WorkloadIdentityFederationClient.getToken: Using cached token, {
##[debug]  "now": 1725886770601,
##[debug]  "cachedAt": 1725886770495
##[debug]}
##[debug]WorkloadIdentityFederationClient.signJWT: Built request, {
##[debug]  "method": "POST",
##[debug]  "path": "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/***:signJwt",
##[debug]  "headers": {
##[debug]    "Authorization": "***"
##[debug]  },
##[debug]  "body": {
##[debug]    "payload": "{\"iss\":\"***\",\"aud\":\"https://oauth2.googleapis.com/token\",\"iat\":1725886770,\"exp\":1725888570,\"sub\":\"my-google-workspace-user@my-workspace.com\",\"scope\":\"https://www.googleapis.com/auth/drive\"}"
##[debug]  }
##[debug]}
##[debug]IAMCredentialsClient.generateDomainWideDelegationAccessToken: Built request, {
##[debug]  "method": "POST",
##[debug]  "path": "https://oauth2.googleapis.com/token",
##[debug]  "headers": {
##[debug]    "Accept": "application/json",
##[debug]    "Content-Type": "application/x-www-form-urlencoded"
##[debug]  },
##[debug]  "body": "grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&assertion=***"
##[debug]}
::add-mask::***
##[debug]Node Action run completed with exit code 0
##[debug]CLOUDSDK_AUTH_CREDENTIAL_FILE_OVERRIDE='/home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json'
##[debug]GOOGLE_APPLICATION_CREDENTIALS='/home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json'
##[debug]GOOGLE_GHA_CREDS_PATH='/home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json'
##[debug]CLOUDSDK_CORE_PROJECT='hirundo-mvp-dev'
##[debug]CLOUDSDK_PROJECT='hirundo-mvp-dev'
##[debug]GCLOUD_PROJECT='hirundo-mvp-dev'
##[debug]GCP_PROJECT='hirundo-mvp-dev'
##[debug]GOOGLE_CLOUD_PROJECT='hirundo-mvp-dev'
##[debug]Set output credentials_file_path = /home/runner/work/infrastructure/infrastructure/gha-creds-be2508996412c6e7.json
##[debug]Set output project_id = hirundo-mvp-dev
##[debug]Set output auth_token = ***
##[debug]Set output access_token = ***
##[debug]Finishing: Google auth
...
googleapiclient.errors.ResumableUploadError: <HttpError 403 when requesting None returned "The user's Drive storage quota has been exceeded.". Details: "[{'message': "The user's Drive storage quota has been exceeded.", 'domain': 'usageLimits', 'reason': 'storageQuotaExceeded'}]">

Additional information

No response

@benglewis benglewis added the bug Something isn't working label Sep 10, 2024
Copy link

Hi there @benglewis 👋!

Thank you for opening an issue. Our team will triage this as soon as we can. Please take a moment to review the troubleshooting steps which lists common error messages and their resolution steps.

@sethvargo
Copy link
Member

Hi @benglewis - the error you're getting is because of quota (number of requests per second) to the Drive API. From the debug logs, the domain-wide delegation token was successfully created. To use the token, you need to pull the generated access_token from the auth action outputs. There's no mechanism for ADC to declare a DWD scope; only the generated token will have the DWD scope.

@benglewis
Copy link
Author

Hi @sethvargo ,

I would greatly appreciate if you can explain with an example how I can "pull the generated access_token from the auth action outputs". I tried multiple times to load the credentials JSON output, both from the file path and environment variable. Every time I got different weird errors (like the JSON being invalid and not credentials for a service account, for example). I would really appreciate just a little more guidance and some explanation of how I can use that access_token properly.

Thank you in advance :)

@benglewis
Copy link
Author

@sethvargo I figured it out...
Here are the needed changes:

  1. Adjust the Python script:
from google.oauth2.credentials import Credentials
...
def main(folder_path: str, target_subfolder: str | None = None):
    credentials = Credentials(os.environ["GOOGLE_APPLICATION_ACCESS_TOKEN"])
    service = build("drive", "v3", credentials=credentials)
  1. Adjust the GitHub Actions workflow:
...
- name: Upload to Google Drive
        env:
          GOOGLE_DRIVE_FOLDER_ID: ${{ secrets.GOOGLE_DRIVE_FOLDER_ID }}
          GOOGLE_APPLICATION_ACCESS_TOKEN: ${{ steps.auth.outputs.access_token }}
        run: |
          python .github/workflows/upload_to_gdrive.py  my-folder target-folder-location
...

That is it :)

@sethvargo
Copy link
Member

It's an output of the auth action: https://github.com/google-github-actions/auth?tab=readme-ov-file#outputs

@benglewis
Copy link
Author

Right. The issue I faced was understanding how to use it together with the Python library. My above code shows how I was able to get it to work 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants