Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure-compatibility #610

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Azure-compatibility #610

wants to merge 11 commits into from

Conversation

violetbrina
Copy link
Collaborator

Changes to the analysis runner to enable Azure compatability.

@violetbrina violetbrina self-assigned this Mar 28, 2023
Copy link
Contributor

@illusional illusional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like awesome progress!

'-c',
'--cloud',
required=False,
default=DEFAULT_CLOUD_ENVIRONMENT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit instead of provide a default? Which lets the analysis-runner decide the default

server/util.py Outdated
Comment on lines 124 to 138
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')

if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')

if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this, group member checks are not in secrets, therefore no gcp_project ID is needed anymore (I think):

Suggested change
if environment == 'gcp':
# do this to check access-members cache
gcp_project = dataset_config.get('gcp', {}).get('projectId')
if not gcp_project:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)
elif environment == 'azure':
azure_resource_group = dataset_config.get('azure', {}).get('resourceGroup')
if not azure_resource_group:
raise web.HTTPBadRequest(
reason=f'The analysis-runner does not support checking group members for the {environment} environment'
)

server/util.py Outdated
if environment == 'gcp':
output_dir = f'gs://cpg-{dataset}-{cpg_namespace(access_level)}/{output_prefix}'
elif environment == 'azure':
# TODO: need a way for analysis runner to know where to save metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It follows the same sort of convention right, where storage-account is cpg{datasetWithoutTabs}

azure://{storage-account}/{main,test}/{output_prefix}

import hailtop.batch as hb


@click.command()
Copy link
Contributor

@illusional illusional Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of test workflows in examples/batch, can you use them or move this one to there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants