-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(bigquery): Configurable table read session project #10924
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
hey @juanli16 thanks for the contribution. Some question on this improvement and your use case ? Do you have multiple billing project that can be target for different table reads or it's the same project ID that you want to configure the same as the project ID set up on on the BigQuery client ? We make this configuration more global to the One concern that I'm having here is changing the signature of the And can we add integration tests ? Probably exercise reading a public dataset table and creating a session from the main project ID that the client is set up. |
Yes, in our use case, we have the same billing project that we want to use for reading tables stored in different projects.
This makes sense to me, it will make the setting more global, and we won't have to set it per table read. And certainly, I can try to add integration test for it. |
It's more about having right to read from a table but without billing access to where it lives. In fact, in our use case, the service account does have the permissions to read on the table, but no other permissions including on the project where said table lives. It's akin to controlling in which project |
When reading result sets using the Storage Read API Acceleration enabled, currently the read session is created by default in the table's project. This works for cases where the destination table is not specified and automatically created, which defaults to the project where the the query or job was created. But when reading a table directly or specifying a destination table, it doesn't work in cases where the client doesn't have BQ Storage permissions (just table read permission for example). This is a common use case where some customers have a main billing project and this project has access to other GCP projects with just permission to read data from BigQuery tables. With this PR, we default to use the defined Query/Job projectID (which defaults to the current `bigquery.Client.projectID` or when reading the a table directly, we also use default to the `bigquery.Client.projectID`. Reported initially on PR #10924 ~Supersedes #10924~
0746c09
to
9d5da3c
Compare
Hi @alvarowolfx , I have rebased my PR on top of your change in #10932 which has been merged. Now it only introduces the |
When creating a BigQuery table RowIterator with StorageReadAPI enabled, the read session is created by default in the table's project. We should be able to overwrite this, so that we can keep the storage of the table data and the permission/cost management of the read session separate.
To accomplish this,
TableReadOption
withWithClientProject
option is added, and if set, it will create the session using the client's project, otherwise, it keeps the default behaviour.