-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache Ranger plugin #13297
Apache Ranger plugin #13297
Conversation
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Erik Anderson.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed the code. Initial set of comments.
I think we should have product tests that would prove that KRB authentication is working.
@@ -0,0 +1,212 @@ | |||
======================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a connector but system access control
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this should likely live under /docs/src/main/sphinx/security in a file called ranger-access-control.rst
.
|
||
To connect to Apache Ranger you need: | ||
|
||
* Apache Ranger installed, up and running. Compatible with Ranger v2.1.x, 2.2.x, 2.3.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we verify that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dprophet we should add some tests for this. @kokosing recommends something like TestSqlStandardAccessControlChecks and we need some environemnt with additional services, so we could follow EnvMultinodeKafka.
@@ -0,0 +1,49 @@ | |||
<?xml version="1.0"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How these files are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These files are pure example files. I didnt know what/where to put them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps we could add these for use with a test and later consider making a tutorial with them.
<artifactId>ranger-plugins-common</artifactId> | ||
<version>${dep.ranger.version}</version> | ||
<exclusions> | ||
<exclusion> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that things are still working after exclusions? It looks risky. I think we we should shade libraries instead otherwise we may get runtime errors that class is not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dprophet ^^ This is also something that we will validate with product tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This works after exclusions. If you do a mvn dependency:tree you will see a deep tree where same dependencies (with different) versions exists.
bitsondatadev is correct. This can be ironed out with automated product tests.
<artifactId>maven-enforcer-plugin</artifactId> | ||
<configuration> | ||
<rules> | ||
<requireUpperBoundDeps> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove these excludes and fix them instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dprophet ^^
} | ||
} | ||
|
||
private void activatePluginClassLoader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? Should you use Plugin thread safe entities, like io.trino.plugin.base.classloader.ClassLoaderSafeConnectorAccessControl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Ranger plugins, not Trino plugins. All ranger plugins use this same setup. I kept it there because I will allow switching access control systems based on catalogs. In some cases I need a simple REGEX matching based on LDAP/AD not configured in Ranger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on Ranger, but I've worked on connecting another service to Ranger in this style (having the implementation be part of the service code rather than part of Ranger). So this is my understanding of Ranger class loading and other intricacies:
Ranger has its own class loader which is related to the way it handles injecting code into Trino (or other services). To clarify, am I talking about the cases when a plugin is built in Ranger and then added to the classpath of another service - the way it's done in this PR is different.
In short, Ranger provides a shim class with no logic in it that implements TrinoAuthorizer
and calls out to another implementation class. The implementation class should be only loaded by RangerClassLoader
so it can bring in its own dependencies. There is a disconnect between service-specific work (in this case, reading configuration for example) and authorization work.
All that activatePluginClassLoader
does is:
public void activate() {
preActivateClassLoader.set(Thread.currentThread().getContextClassLoader());
Thread.currentThread().setContextClassLoader(this);
}
In this case, it might work to remove RangerClassLoader
completely and unite the current shim/implementation architecture since Trino has its own system for handling plugin dependencies.
@Override | ||
public Set<String> filterCatalogs(SystemSecurityContext context, Set<String> catalogs) | ||
{ | ||
LOG.debug("==> RangerSystemAccessControl.filterCatalogs(" + catalogs + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use io.trino.plugin.base.util.LoggingInvocationHandler
instead of this logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of the LoggingInvocationHandler? There is no usage nor examples in Trino code base. The standard in all the code is io.airlift.log.Logger.
} | ||
|
||
viewExpression = new ViewExpression( | ||
context.getIdentity().getUser(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why current user? That means if you have recursive masks or filters then you may end up that you see filtered or masked data in subsequent expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the FileBasedSystemAccessControl.java getRowFilters
They also context.getIdentity().getUser() in the same way I do recall.
DISCLAIMER (off topic): Nesting/recursive row level filters is not a good idea for performance reasons
} | ||
|
||
@Override | ||
public void checkCanKillQueryOwnedBy(SystemSecurityContext context, String queryOwner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you tests this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have significant changes not pushed for this PR around this permissioning. I am happy to push but @bitsondatadev recommended I push on the next round. That said, there is a lot I havent pushed but are very necessary to this functionality
In the non-pushed changes:
In Ranger there is a checkbox, impersonate. If that checked, TrinoAccessType.IMPERSONATE is set, you can login to the Trino UI, click on a running query, and kill another users query. Also if you are a superuser you can impersonate any user and kill the running queries.
import static io.trino.spi.security.PrincipalType.USER; | ||
import static io.trino.spi.security.Privilege.SELECT; | ||
|
||
public class RangerSystemAccessControlTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to TestRangerSystemAccessControl
It looks like most of the access control methods are not covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like most of the access control methods are not covered.
I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
https://ranger.apache.org/ | ||
|
||
Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms. | |
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino. |
This plugin is designed to be build and run inside Trino. | ||
|
||
It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all. | ||
|
||
Here are the setup steps. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plugin is designed to be build and run inside Trino. | |
It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all. | |
Here are the setup steps. | |
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino. Rather than maintain the plugin with the Apache Ranger project, this plugin is designed to be built and run inside Trino. | |
It works with vanilla Apache Ranger, version 2.2.1 and up. You don't need to customize Ranger at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache Ranger is trademarked by the ASF, so saying vanilla Apache Ranger seems a bit redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbaenziger vanilla Ranger means ZERO changes in Ranger. Works out of the box. Happy to change the wording.
|
||
### Full Enterprise ready Apache Ranger setup | ||
|
||
* Install and startup Ranger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to break this up a bit and avoid the "quick and dirty" terminology. Quick may indicate that it should be simple and make someone feel dumb if they get stuck. Dirty may indicate that it's not ready for production.
Happy to help you break this up. We'll also likely want to make some video enablement around this. That will give me a chance to actually run through and validate these steps and any potential hiccups users might encounter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also likely want to make some video enablement around this.
Ya, I was thinking the same thing. IMO, ASF Ranger community has some poor job of documentation. Worse I have seen.
* Most of the defaults work fine. The configs you want to pay attention to are | ||
* PYTHON_COMMAND_INVOKER, DB_FLAVOR, SQL_CONNECTOR_JAR, db_root_user, db_root_password, db_host, db_name, db_user, db_password, rangerAdmin_password, rangerTagsync_password, rangerUsersync_password, keyadmin_password | ||
* $RANGER_HOME/usersync/install.properties | ||
* In most enterprise environments, ActiveDirectory/LDAP are used to manage users and their roles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely have a separate section for LDAP that we reference here. That would make this section shorter and also leave it up to the reader if they want to just play around they can do as they please. I think the recommendation is sound but not everyone is ready to immediately jump into an LDAP setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely have a separate section for LDAP
Agreed. This should be 2 separate setups. LDAP or stand-alone.
* User: admin, Password: (defined above in rangerAdmin_password) | ||
* Create your Trino service. | ||
* The name is important and needs to be configured in the access-control-ranger.properties file. As of version 2.2.0, just use the Presto service type. Presto, also known as PrestoSQL, is the old name for Trino. | ||
* The defult policies that Ranger installs under your above service is WIDE open. Everything works for everyone. I will not document the full ranger setup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the full ranger setup somewhere in Trino land though and point to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where?
@Override | ||
public Set<String> filterCatalogs(SystemSecurityContext context, Set<String> catalogs) | ||
{ | ||
LOG.debug("==> RangerSystemAccessControl.filterCatalogs(" + catalogs + ")"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
@Override | ||
public void checkCanKillQueryOwnedBy(SystemSecurityContext context, String queryOwner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import static io.trino.spi.security.PrincipalType.USER; | ||
import static io.trino.spi.security.Privilege.SELECT; | ||
|
||
public class RangerSystemAccessControlTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,212 @@ | |||
======================= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this should likely live under /docs/src/main/sphinx/security in a file called ranger-access-control.rst
.
@@ -34,6 +34,7 @@ from different data sources. | |||
Pinot <connector/pinot> | |||
PostgreSQL <connector/postgresql> | |||
Prometheus <connector/prometheus> | |||
Ranger <connector/ranger> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
.. raw:: html | ||
|
||
<img src="../_static/img/apache_ranger.png" class="connector-logo"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the latest Ranger logo. We all love the elephant but anything I see with it now I associate with "outdated and slow".
Hey @dprophet I see now its available at Apache Ranger as well (https://github.com/apache/ranger/tree/master/plugin-trino) |
In my opinion it would still be great to get a plugin into the Trino codebase. This would ensure that it works against the SPI of the current Trino release. The plugin in the Ranger codebase is using Trino 377 at this stage. This is hopelessly out of date and imposes a lot of work on anyone who wants to use this with Trino since they would have to individually update to their version of Trino. From what I understand @dprophet is still hoping to pick this up again and continue the work. |
@dprophet any hope on ranger plugin now OPA is merged? |
@dprophet Are you still working on this PR? @mneethiraj sent a new PR #22675 |
<artifactId>jersey-json</artifactId> | ||
</exclusion> | ||
<exclusion> | ||
<groupId>com.sun.jersey</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the server needed for ranger client?
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
I think the new PR is a replacement. #22675 Closing this one. |
Description
Trino native plugin for Apache Ranger. Its a refactoring and rewrite of the Presto It allows full access controls to be applied to catalogs, schemas, tables, columns, and rows.
Strong data access control technologies are one of the required Trino features to make it Enterprise ready.
Trino is a rapidly moving open source project. Apache Ranger moves very slow. Its impossible to maintain a Trino plugin outside of Trino itself. The Trino SystemAccessControl spi interfaces change constantly thus breaking any externally maintained plugin.
If you run Trino, as a Data Mesh, ie datalakes connected to datalakes, the first problem you will run into is not all users should be entitled to the entire content of the Data Mesh. This new feature allows you to secure the contents across a vast ocean Data Mesh. If you look at the Data Mesh as a tree, Ranger allows you to protect anything from the roots to the leaves and everything in between.
Related issues, pull requests, and links
Documentation
(X) Sufficient documentation is included in this PR.
You will find documentation under plugin/trino-ranger. This will be moved to the standard documentation section later.
Release notes
(X) No release notes entries required.
( ) Release notes entries required with the following suggested text: