Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Ranger plugin #13297

Closed
wants to merge 11 commits into from
Closed

Apache Ranger plugin #13297

wants to merge 11 commits into from

Conversation

dprophet
Copy link
Contributor

Description

Trino native plugin for Apache Ranger. Its a refactoring and rewrite of the Presto It allows full access controls to be applied to catalogs, schemas, tables, columns, and rows.

Strong data access control technologies are one of the required Trino features to make it Enterprise ready.

Is this change a fix, improvement, new feature, refactoring, or other?
New "Trino" feature. There was originally a PrestoSQL based plugin in Apache Ranger but it got orphaned long ago. This brings back these critical enterprise features into open source Trino.

Trino is a rapidly moving open source project. Apache Ranger moves very slow. Its impossible to maintain a Trino plugin outside of Trino itself. The Trino SystemAccessControl spi interfaces change constantly thus breaking any externally maintained plugin.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)
Its a plugin that implements io.trino.spi.security.SystemAccessControl. It allows you to write an SQL access control policy in the Apache Ranger UI yet manage the policies by ActiveDirectory/LDAP systems found in typical corporations.

How would you describe this change to a non-technical end user or system administrator?

If you run Trino, as a Data Mesh, ie datalakes connected to datalakes, the first problem you will run into is not all users should be entitled to the entire content of the Data Mesh. This new feature allows you to secure the contents across a vast ocean Data Mesh. If you look at the Data Mesh as a tree, Ranger allows you to protect anything from the roots to the leaves and everything in between.

Related issues, pull requests, and links

Documentation

(X) Sufficient documentation is included in this PR.

You will find documentation under plugin/trino-ranger. This will be moved to the standard documentation section later.

Release notes

(X) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot
Copy link

cla-bot bot commented Jul 21, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Erik Anderson.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot added the cla-signed label Jul 25, 2022
@github-actions github-actions bot added the docs label Jul 25, 2022
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed the code. Initial set of comments.

I think we should have product tests that would prove that KRB authentication is working.

@@ -0,0 +1,212 @@
=======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a connector but system access control

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should likely live under /docs/src/main/sphinx/security in a file called ranger-access-control.rst.


To connect to Apache Ranger you need:

* Apache Ranger installed, up and running. Compatible with Ranger v2.1.x, 2.2.x, 2.3.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we verify that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprophet we should add some tests for this. @kokosing recommends something like TestSqlStandardAccessControlChecks and we need some environemnt with additional services, so we could follow EnvMultinodeKafka.

@@ -0,0 +1,49 @@
<?xml version="1.0"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these files are used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprophet ^^ I assume these files are default confs? If so, we should likely make a tutorial on these. I can also help with this. cc: @mosabua

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are pure example files. I didnt know what/where to put them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps we could add these for use with a test and later consider making a tutorial with them.

<artifactId>ranger-plugins-common</artifactId>
<version>${dep.ranger.version}</version>
<exclusions>
<exclusion>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that things are still working after exclusions? It looks risky. I think we we should shade libraries instead otherwise we may get runtime errors that class is not present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprophet ^^ This is also something that we will validate with product tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This works after exclusions. If you do a mvn dependency:tree you will see a deep tree where same dependencies (with different) versions exists.

bitsondatadev is correct. This can be ironed out with automated product tests.

<artifactId>maven-enforcer-plugin</artifactId>
<configuration>
<rules>
<requireUpperBoundDeps>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove these excludes and fix them instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

private void activatePluginClassLoader()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Should you use Plugin thread safe entities, like io.trino.plugin.base.classloader.ClassLoaderSafeConnectorAccessControl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Ranger plugins, not Trino plugins. All ranger plugins use this same setup. I kept it there because I will allow switching access control systems based on catalogs. In some cases I need a simple REGEX matching based on LDAP/AD not configured in Ranger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on Ranger, but I've worked on connecting another service to Ranger in this style (having the implementation be part of the service code rather than part of Ranger). So this is my understanding of Ranger class loading and other intricacies:

Ranger has its own class loader which is related to the way it handles injecting code into Trino (or other services). To clarify, am I talking about the cases when a plugin is built in Ranger and then added to the classpath of another service - the way it's done in this PR is different.

In short, Ranger provides a shim class with no logic in it that implements TrinoAuthorizer and calls out to another implementation class. The implementation class should be only loaded by RangerClassLoader so it can bring in its own dependencies. There is a disconnect between service-specific work (in this case, reading configuration for example) and authorization work.

All that activatePluginClassLoader does is:

   public void activate() {
        preActivateClassLoader.set(Thread.currentThread().getContextClassLoader());

        Thread.currentThread().setContextClassLoader(this);
    }

In this case, it might work to remove RangerClassLoader completely and unite the current shim/implementation architecture since Trino has its own system for handling plugin dependencies.

@Override
public Set<String> filterCatalogs(SystemSecurityContext context, Set<String> catalogs)
{
LOG.debug("==> RangerSystemAccessControl.filterCatalogs(" + catalogs + ")");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use io.trino.plugin.base.util.LoggingInvocationHandler instead of this logging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of the LoggingInvocationHandler? There is no usage nor examples in Trino code base. The standard in all the code is io.airlift.log.Logger.

}

viewExpression = new ViewExpression(
context.getIdentity().getUser(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why current user? That means if you have recursive masks or filters then you may end up that you see filtered or masked data in subsequent expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the FileBasedSystemAccessControl.java getRowFilters

They also context.getIdentity().getUser() in the same way I do recall.

DISCLAIMER (off topic): Nesting/recursive row level filters is not a good idea for performance reasons

}

@Override
public void checkCanKillQueryOwnedBy(SystemSecurityContext context, String queryOwner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you tests this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have significant changes not pushed for this PR around this permissioning. I am happy to push but @bitsondatadev recommended I push on the next round. That said, there is a lot I havent pushed but are very necessary to this functionality


In the non-pushed changes:

In Ranger there is a checkbox, impersonate. If that checked, TrinoAccessType.IMPERSONATE is set, you can login to the Trino UI, click on a running query, and kill another users query. Also if you are a superuser you can impersonate any user and kill the running queries.

import static io.trino.spi.security.PrincipalType.USER;
import static io.trino.spi.security.Privilege.SELECT;

public class RangerSystemAccessControlTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to TestRangerSystemAccessControl

It looks like most of the access control methods are not covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like most of the access control methods are not covered.
I know.

Copy link
Member

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dprophet, I made a few suggestions around the documentation and highlighted a few comments that @kokosing made that need to be addressed before we loop him back in.

I am happy to work on the devex side of this PR while you address the build and testing issues.


https://ranger.apache.org/

Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms.
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino.

Comment on lines +3 to +8
This plugin is designed to be build and run inside Trino.

It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all.

Here are the setup steps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This plugin is designed to be build and run inside Trino.
It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all.
Here are the setup steps.
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino. Rather than maintain the plugin with the Apache Ranger project, this plugin is designed to be built and run inside Trino.
It works with vanilla Apache Ranger, version 2.2.1 and up. You don't need to customize Ranger at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Ranger is trademarked by the ASF, so saying vanilla Apache Ranger seems a bit redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbaenziger vanilla Ranger means ZERO changes in Ranger. Works out of the box. Happy to change the wording.


### Full Enterprise ready Apache Ranger setup

* Install and startup Ranger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to break this up a bit and avoid the "quick and dirty" terminology. Quick may indicate that it should be simple and make someone feel dumb if they get stuck. Dirty may indicate that it's not ready for production.

Happy to help you break this up. We'll also likely want to make some video enablement around this. That will give me a chance to actually run through and validate these steps and any potential hiccups users might encounter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also likely want to make some video enablement around this.

Ya, I was thinking the same thing. IMO, ASF Ranger community has some poor job of documentation. Worse I have seen.

* Most of the defaults work fine. The configs you want to pay attention to are
* PYTHON_COMMAND_INVOKER, DB_FLAVOR, SQL_CONNECTOR_JAR, db_root_user, db_root_password, db_host, db_name, db_user, db_password, rangerAdmin_password, rangerTagsync_password, rangerUsersync_password, keyadmin_password
* $RANGER_HOME/usersync/install.properties
* In most enterprise environments, ActiveDirectory/LDAP are used to manage users and their roles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely have a separate section for LDAP that we reference here. That would make this section shorter and also leave it up to the reader if they want to just play around they can do as they please. I think the recommendation is sound but not everyone is ready to immediately jump into an LDAP setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely have a separate section for LDAP

Agreed. This should be 2 separate setups. LDAP or stand-alone.

* User: admin, Password: (defined above in rangerAdmin_password)
* Create your Trino service.
* The name is important and needs to be configured in the access-control-ranger.properties file. As of version 2.2.0, just use the Presto service type. Presto, also known as PrestoSQL, is the old name for Trino.
* The defult policies that Ranger installs under your above service is WIDE open. Everything works for everyone. I will not document the full ranger setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the full ranger setup somewhere in Trino land though and point to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where?

@Override
public Set<String> filterCatalogs(SystemSecurityContext context, Set<String> catalogs)
{
LOG.debug("==> RangerSystemAccessControl.filterCatalogs(" + catalogs + ")");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public void checkCanKillQueryOwnedBy(SystemSecurityContext context, String queryOwner)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import static io.trino.spi.security.PrincipalType.USER;
import static io.trino.spi.security.Privilege.SELECT;

public class RangerSystemAccessControlTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,212 @@
=======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should likely live under /docs/src/main/sphinx/security in a file called ranger-access-control.rst.

@@ -34,6 +34,7 @@ from different data sources.
Pinot <connector/pinot>
PostgreSQL <connector/postgresql>
Prometheus <connector/prometheus>
Ranger <connector/ranger>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would move here.

cc: @mosabua


.. raw:: html

<img src="../_static/img/apache_ranger.png" class="connector-logo">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the latest Ranger logo. We all love the elephant but anything I see with it now I associate with "outdated and slow".

@osscm
Copy link
Contributor

osscm commented Nov 2, 2023

Hey @dprophet
Wondering, is there any update or decision on this PR (adding Ranger plugin to the Trino itself)

I see now its available at Apache Ranger as well (https://github.com/apache/ranger/tree/master/plugin-trino)

@mosabua
Copy link
Member

mosabua commented Nov 7, 2023

In my opinion it would still be great to get a plugin into the Trino codebase. This would ensure that it works against the SPI of the current Trino release. The plugin in the Ranger codebase is using Trino 377 at this stage. This is hopelessly out of date and imposes a lot of work on anyone who wants to use this with Trino since they would have to individually update to their version of Trino.

From what I understand @dprophet is still hoping to pick this up again and continue the work.

@lozbrown
Copy link

lozbrown commented Feb 5, 2024

@dprophet any hope on ranger plugin now OPA is merged?

@ebyhr
Copy link
Member

ebyhr commented Jul 17, 2024

@dprophet Are you still working on this PR? @mneethiraj sent a new PR #22675

<artifactId>jersey-json</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jersey</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the server needed for ranger client?

Copy link

github-actions bot commented Sep 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Sep 5, 2024
@mosabua
Copy link
Member

mosabua commented Sep 11, 2024

I think the new PR is a replacement. #22675

Closing this one.

@mosabua mosabua closed this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

10 participants