Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apache Ranger plugin #13297

Closed
wants to merge 11 commits into from
1 change: 1 addition & 0 deletions docs/src/main/sphinx/connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ from different data sources.
Pinot <connector/pinot>
PostgreSQL <connector/postgresql>
Prometheus <connector/prometheus>
Ranger <connector/ranger>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would move here.

cc: @mosabua

Redis <connector/redis>
Redshift <connector/redshift>
SingleStore (MemSQL) <connector/memsql>
Expand Down
212 changes: 212 additions & 0 deletions docs/src/main/sphinx/connector/ranger.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
=======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a connector but system access control

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should likely live under /docs/src/main/sphinx/security in a file called ranger-access-control.rst.

Apache Ranger connector
=======================

.. raw:: html

<img src="../_static/img/apache_ranger.png" class="connector-logo">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the latest Ranger logo. We all love the elephant but anything I see with it now I associate with "outdated and slow".


https://ranger.apache.org/

Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Apache Ranger is a framework to enable comprehensive data security across many platforms. Trino is one of these platforms.
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino.


Requirements
------------

To connect to Apache Ranger you need:

* Apache Ranger installed, up and running. Compatible with Ranger v2.1.x, 2.2.x, 2.3.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we verify that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprophet we should add some tests for this. @kokosing recommends something like TestSqlStandardAccessControlChecks and we need some environemnt with additional services, so we could follow EnvMultinodeKafka.

* Network access from the Trino coordinator to the Apache Ranger HTTPS port.

Configuration
-------------

The connector downloads the JSON policies from the configured Trino service in the Ranger UI editor.

Example access-control-ranger.properties file:

.. code-block:: text

access-control.name=ranger
ranger.use_ugi=true
ranger.service_name=trino-dev-system
ranger.hadoop_config=/workspace/testing/trino-server-dev/etc/trino-ranger-site.xml
ranger.audit_resource=/workspace/testing/trino-server-dev/etc/trino-ranger-audit.xml
ranger.security_resource=/workspace/testing/trino-server-dev/etc/trino-ranger-security.xml
ranger.policy_manager_ssl_resource=/workspace/testing/trino-server-dev/etc/trino-ranger-policymgr-ssl.xml

In the Trino config.properties file set the below so it properly points to the above configuration file

.. code-block:: text

...
access-control.config-files=/etc/trino/access-control-ranger.properties


use_ugi (aka UserGroupInformation): Tells the plugin to map users and groups together.
Its much simpler to manage groups of users than individual users. Setting to true
is a requirement if you are going to use corporate AD/LDAP to manage access controls.

service_name as defined in the Ranger UI

hadoop_config (aka trino-ranger-site.xml): is the Ranger site file required to connect
to corporate AD/LDAP systems. When the user logs into trino, this is the file used to
connect to the AD/LDAP system and get a list of the users groups.

audit_resource: Ranger can be configured to send reports to various systems.
At the moment this file is required for legacy reasons and should be cleaned up
in the figure. all DEFAULT settings is recommented. If you want auditing use the
trino-http-event-listener and post those events to the HTTP service of your choice.
We recommend using it to post to Kafla but its very flexible.

security_resource (aka trino-ranger-security.xml): Configures the connectivity
between the trino-ranger plugin and the Apache Ranger running service.

policy_manager_ssl_resource (aka trino-ranger-policymgr-ssl.xml): Used to setup
up 2 way SSL client/server validation


Full Enterprise ready Apache Ranger setup
-----------------------------------------

Install and startup Ranger:

* Quick and dirty setup for Ranger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See README comments on how to break this up and update terminology.


* git clone Ranger. I recommend using official tagged released. Here is an example:

* git clone --recursive --branch release-ranger-2.2.0 https://github.com/apache/ranger

* Build Ranger. Here is the commands I use

.. code-block:: bash

mvn -e -X clean package -pl '!plugin-kylin,!ranger-kylin-plugin-shim' -DskipTests

* You are looking for all of the build targets under 'target/ranger-\*.tar.gz'

* Copy the main Ranger file to your binary install directory and expand it. Lets call this directory $RANGER_HOME

* The main service to install is the ranger-$RANGER_VERSION-admin.tar.gz

* Ranger has 2 configuration files that generate the full ranger install

* $RANGER_HOME/install.properties

* You need to generate this config. There is already a sample install.properties extracted from the admin.tar.gz file

* Most of the defaults work fine. The configs you want to pay attention to are

* PYTHON_COMMAND_INVOKER, DB_FLAVOR, SQL_CONNECTOR_JAR, db_root_user, db_root_password, db_host, db_name, db_user, db_password, rangerAdmin_password, rangerTagsync_password, rangerUsersync_password, keyadmin_password

* $RANGER_HOME/usersync/install.properties

* In most enterprise environments, ActiveDirectory/LDAP are used to manage users and their roles.

* I do NOT recommend trying to manage users inside of Apache Ranger UI. I would configure the usersync service to pull the groups from AD/LDAP and attach your Ranger SQL policies to those external AD/LDAP groups.

* Most of the defaults work fine. The configs you want to pay attention to are

* SYNC_SOURCE=(ldap), rangerUsersync_password (configured above), SYNC_INTERVAL, SYNC_LDAP_URL, SYNC_LDAP_BIND_DN, SYNC_LDAP_BIND_PASSWORD, SYNC_LDAP_DELTASYNC, SYNC_LDAP_SEARCH_BASE, SYNC_LDAP_USER_SEARCH_BASE, SYNC_LDAP_USER_SEARCH_SCOPE, SYNC_LDAP_USER_OBJECT_CLASS, SYNC_LDAP_USER_SEARCH_FILTER, SYNC_LDAP_USER_NAME_ATTRIBUTE, SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE, SYNC_LDAP_USERNAME_CASE_CONVERSION, SYNC_LDAP_GROUPNAME_CASE_CONVERSION

* SYNC_INTERVAL specifies the time in minutes that you want Ranger to synchronize AD/LDAP. This can take some time so dont make it too short. 360 (6 hours) is a good number.

* With both the $RANGER_HOME/install.properties and $RANGER_HOME/usersync/install.properties configured run the setup scripts

* $RANGER_HOME/setup.sh

* $RANGER_HOME/usersync/setup.sh

* Now try and start the main ranger service.

* $RANGER_HOME/ews/start-ranger-admin.sh

* I highly recommend monitoring the ranger PID, /run/ranger/rangeradmin.pid, and restarting it of it should fail. If using Docker/kubernetes a simple bash script like so works well

.. code-block:: bash

ranger_admin_pid=`cat /run/ranger/rangeradmin.pid` > /dev/null 2>&1
echo "${0##*/}:$LINENO: Waiting for ranger_admin_pid = $ranger_admin_pid"
while s=`ps -p $ranger_admin_pid -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
echo "${0##*/}:$LINENO: Ranger admin service exited!!!"

* Now try and start the AD/LDAP usersync serivce

* $RANGER_HOME/usersync/ranger-usersync-services.sh start

* NOTE: If you are running multiple instances of the Ranger UI, you should only ever have 1 and only 1 AD/LDAP usersync service running. Your groups will not properly sync otherwise.

* Now login to the ranger UI and configure the Trino service.

* Example:

* http://localhost:6080

* User: admin, Password: (defined above in rangerAdmin_password)

* Create your Trino service.

* The name is important and needs to be configured in the access-control-ranger.properties file. As of version 2.2.0, just use the Presto service type. Presto, also known as PrestoSQL, is the old name for Trino.

* The defult policies that Ranger installs under your above service is WIDE open. Everything works for everyone. I will not document the full ranger setup.

* Just remember, everything a JDBC driver sees, ranger also sees. This means even simple things like date_time functions will break in a fully locked down environment.


Connect Trino to Ranger
-----------------------

* Connecting Trino to Ranger involved 5 files.
* Example files are checkin to the trino project under plugins/trino-ranger/conf/

* Main Trino config.properties

* access-control.config-files=/usr/lib/trino/etc/access-control-ranger.properties

* Ranger will lockdown EVERYTHING, no user can see another users queries. If you need a system wide user inside trino that can see the problems across the cluster you should add a access-control-file-based.properties to the above comma-separated list.

* The access-control-ranger.properties file itself. Here is an example

.. code-block:: text

access-control.name=ranger
ranger.use_ugi=true
ranger.service_name=trino-dev-companyname-com
ranger.hadoop_config=/workspace/testing/trino-server-dev/etc/trino-ranger-site.xml
ranger.audit_resource=/workspace/testing/trino-server-dev/etc/ranger-trino-audit.xml
ranger.security_resource=/workspace/testing/trino-server-dev/etc/ranger-trino-security.xml
ranger.policy_manager_ssl_resource=/workspace/testing/trino-server-dev/etc/ranger-policymgr-ssl.xml

* The ranger.service_name is the name of the service you created under the Ranger UI

* ranger.hadoop_config=

* Example file: trino-ranger-site.xml

* Hopefully using hadoop is a temporary option. Hadoop is very heavy weight system just to load config files

* This setup file, when a user logs in and executes the first SQL, it will pull the list of the users groups into Trino. This group list is used to match against the Ranger UI SQL policies you setup.

* ranger.audit_resource

* Example file: ranger-trino-site.xml

* I dont recommend this but up to you. It was original Ranger features. As an alternative try the trino-http-event-listener and send every incoming SQL query to a kafka pipe. This way you can take the SQL contents and ingest into any system you want. Real time alerting is easier via using kafka pipes.

* ranger.security_resource=

* Example file: ranger-trino-security.xml

* This is the main file that maps trino and ranger together. Only the 2 below are critical, the rest of the configurations the defaults are fine

* ranger.plugin.trino.service.name is the same as the ranger.service_name entry.

* ranger.plugin.trino.policy.rest.url is the URL of the Ranger admin service.

* ranger.policy_manager_ssl_resource=

* Example file: ranger-trino-security.xml

* Defaults are fine. If you setup 2 way SSL verification then you will need to manage key expirations.
Binary file added docs/src/main/sphinx/static/img/apache_ranger.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
95 changes: 95 additions & 0 deletions plugin/trino-ranger/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Trino-Ranger plugin

This plugin is designed to be build and run inside Trino.

It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all.

Here are the setup steps.

Comment on lines +3 to +8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This plugin is designed to be build and run inside Trino.
It works with vanilla Apache Ranger, version 2.2.1 and up. You dont need to customize Ranger at all.
Here are the setup steps.
Apache Ranger is a framework to enable comprehensive data security across many platforms, including Trino. Rather than maintain the plugin with the Apache Ranger project, this plugin is designed to be built and run inside Trino.
It works with vanilla Apache Ranger, version 2.2.1 and up. You don't need to customize Ranger at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Ranger is trademarked by the ASF, so saying vanilla Apache Ranger seems a bit redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbaenziger vanilla Ranger means ZERO changes in Ranger. Works out of the box. Happy to change the wording.


### Full Enterprise ready Apache Ranger setup

* Install and startup Ranger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to break this up a bit and avoid the "quick and dirty" terminology. Quick may indicate that it should be simple and make someone feel dumb if they get stuck. Dirty may indicate that it's not ready for production.

Happy to help you break this up. We'll also likely want to make some video enablement around this. That will give me a chance to actually run through and validate these steps and any potential hiccups users might encounter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also likely want to make some video enablement around this.

Ya, I was thinking the same thing. IMO, ASF Ranger community has some poor job of documentation. Worse I have seen.

* Quick and dirty setup for Ranger
* git clone Ranger. I recommend using official tagged released. Here is an example:
* git clone --recursive --branch release-ranger-2.2.0 https://github.com/apache/ranger
* Build Ranger. Here is the commands I use
* mvn -e -X clean package -pl '!plugin-kylin,!ranger-kylin-plugin-shim' -DskipTests
* You are looking for all of the build targets under "target/ranger-*.tar.gz"
* Copy the main Ranger file to your binary install directory and expand it. Lets call this directory $RANGER_HOME
* The main service to install is the ranger-$RANGER_VERSION-admin.tar.gz
* Ranger has 2 configuration files that generate the full ranger install
* $RANGER_HOME/install.properties
* You need to generate this config. There is already a sample install.properties extracted from the admin.tar.gz file
* Most of the defaults work fine. The configs you want to pay attention to are
* PYTHON_COMMAND_INVOKER, DB_FLAVOR, SQL_CONNECTOR_JAR, db_root_user, db_root_password, db_host, db_name, db_user, db_password, rangerAdmin_password, rangerTagsync_password, rangerUsersync_password, keyadmin_password
* $RANGER_HOME/usersync/install.properties
* In most enterprise environments, ActiveDirectory/LDAP are used to manage users and their roles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely have a separate section for LDAP that we reference here. That would make this section shorter and also leave it up to the reader if they want to just play around they can do as they please. I think the recommendation is sound but not everyone is ready to immediately jump into an LDAP setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely have a separate section for LDAP

Agreed. This should be 2 separate setups. LDAP or stand-alone.

* I do NOT recommend trying to manage users inside of Apache Ranger UI. I would configure the usersync service to pull the groups from AD/LDAP and attach your Ranger SQL policies to those external AD/LDAP groups.
* Most of the defaults work fine. The configs you want to pay attention to are
* SYNC_SOURCE=(ldap), rangerUsersync_password (configured above), SYNC_INTERVAL, SYNC_LDAP_URL, SYNC_LDAP_BIND_DN, SYNC_LDAP_BIND_PASSWORD, SYNC_LDAP_DELTASYNC, SYNC_LDAP_SEARCH_BASE, SYNC_LDAP_USER_SEARCH_BASE, SYNC_LDAP_USER_SEARCH_SCOPE, SYNC_LDAP_USER_OBJECT_CLASS, SYNC_LDAP_USER_SEARCH_FILTER, SYNC_LDAP_USER_NAME_ATTRIBUTE, SYNC_LDAP_USER_GROUP_NAME_ATTRIBUTE, SYNC_LDAP_USERNAME_CASE_CONVERSION, SYNC_LDAP_GROUPNAME_CASE_CONVERSION
* SYNC_INTERVAL specifies the time in minutes that you want Ranger to synchronize AD/LDAP. This can take some time so dont make it too short. 360 (6 hours) is a good number.
* With both the $RANGER_HOME/install.properties and $RANGER_HOME/usersync/install.properties configured run the setup scripts
* $RANGER_HOME/setup.sh
* $RANGER_HOME/usersync/setup.sh
* Now try and start the main ranger service.
* $RANGER_HOME/ews/start-ranger-admin.sh
* I highly recommend monitoring the ranger PID, /run/ranger/rangeradmin.pid, and restarting it of it should fail. If using Docker/kubernetes a simple bash script like so works well
```
ranger_admin_pid=`cat /run/ranger/rangeradmin.pid` > /dev/null 2>&1

echo "${0##*/}:$LINENO: Waiting for ranger_admin_pid = $ranger_admin_pid"

while s=`ps -p $ranger_admin_pid -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done

echo "${0##*/}:$LINENO: Ranger admin service exited!!!"
```

* Now try and start the AD/LDAP usersync serivce
* $RANGER_HOME/usersync/ranger-usersync-services.sh start
* NOTE: If you are running multiple instances of the Ranger UI, you should only ever have 1 and only 1 AD/LDAP usersync service running. Your groups will not properly sync otherwise.
* Now login to the ranger UI and configure the Trino service.
* Example:
* http://localhost:6080
* User: admin, Password: (defined above in rangerAdmin_password)
* Create your Trino service.
* The name is important and needs to be configured in the access-control-ranger.properties file. As of version 2.2.0, just use the Presto service type. Presto, also known as PrestoSQL, is the old name for Trino.
* The defult policies that Ranger installs under your above service is WIDE open. Everything works for everyone. I will not document the full ranger setup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document the full ranger setup somewhere in Trino land though and point to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where?

* Just remember, everything a JDBC driver sees, ranger also sees. This means even simple things like date_time functions will break in a fully locked down environment.


### Connect Trino to Ranger

* Connecting Trino to Ranger involved 5 files.
* Main Trino config.properties
* access-control.config-files=/usr/lib/trino/etc/access-control-ranger.properties
* Ranger will lockdown EVERYTHING, no user can see another users queries. If you need a system wide user inside trino that can see the problems across the cluster you should add a access-control-file-based.properties to the above comma-separated list.
* The access-control-ranger.properties file itself. Here is an example
```
access-control.name=ranger
ranger.use_ugi=true
ranger.service_name=trino-dev-companyname-com
ranger.hadoop_config=/workspace/testing/trino-server-dev/etc/trino-ranger-site.xml
ranger.audit_resource=/workspace/testing/trino-server-dev/etc/ranger-trino-audit.xml
ranger.security_resource=/workspace/testing/trino-server-dev/etc/ranger-trino-security.xml
ranger.policy_manager_ssl_resource=/workspace/testing/trino-server-dev/etc/ranger-policymgr-ssl.xml
```
* The ranger.service_name is the name of the service you created under the Ranger UI
* ranger.hadoop_config=
* Example file: plugins/trino-ranger/conf/trino-ranger-site.xml
* Hopefully using hadoop is a temporary option. Hadoop is very heavy weight system just to load config files
* This setup file, when a user logs in and executes the first SQL, it will pull the list of the users groups into Trino. This group list is used to match against the Ranger UI SQL policies you setup.
* ranger.audit_resource
* Example file: plugins/trino-ranger/conf/ranger-trino-site.xml
* I dont recommend this but up to you. It was original Ranger features. As an alternative try the trino-http-event-listener and send every incoming SQL query to a kafka pipe. This way you can take the SQL contents and ingest into any system you want. Real time alerting is easier via using kafka pipes.
* ranger.security_resource=
* Example file: plugins/trino-ranger/conf/ranger-trino-security.xml
* This is the main file that maps trino and ranger together. Only the 2 below are critical, the rest of the configurations the defaults are fine
* ranger.plugin.trino.service.name is the same as the ranger.service_name entry.
* ranger.plugin.trino.policy.rest.url is the URL of the Ranger admin service.
* ranger.policy_manager_ssl_resource=
* Example file: plugins/trino-ranger/conf/ranger-trino-security.xml
* Defaults are fine. Java keystore is a pain but if you feel its necessary feel free to set it up.

49 changes: 49 additions & 0 deletions plugin/trino-ranger/conf/ranger-policymgr-ssl.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<?xml version="1.0"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How these files are used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dprophet ^^ I assume these files are default confs? If so, we should likely make a tutorial on these. I can also help with this. cc: @mosabua

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are pure example files. I didnt know what/where to put them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps we could add these for use with a test and later consider making a tutorial with them.

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<!-- The following properties are used for 2-way SSL client server validation -->
<property>
<name>xasecure.policymgr.clientssl.keystore</name>
<value>trinoservice-clientcert.jks</value>
<description>
Java Keystore files
</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore</name>
<value>cacerts-xasecure.jks</value>
<description>
java truststore file
</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.keystore.credential.file</name>
<value>jceks://file/tmp/keystore-trinoservice-ssl.jceks</value>
<description>
java keystore credential file
</description>
</property>
<property>
<name>xasecure.policymgr.clientssl.truststore.credential.file</name>
<value>jceks://file/tmp/truststore-trinoservice-ssl.jceks</value>
<description>
java truststore credential file
</description>
</property>
</configuration>
Loading