Dynamic Security Controls - Apache Ranger Integration
In this tutorial, we will go on an overview of the ranger-based policy enforcement procedures, we will also exercise the different permissions that you can grant to Dremio users when using Ranger, and last but not least we will demonstrate how to implement row level security controls.
Keep in mind that you would need Hive as well as Ranger in your environment and an accessible AD/LDAP server. We’ve also made a video available if you would rather watch how the workflow evolves.
The steps that we are about to show, work for Enterprise and Community editions of Dremio.
To get the most of this tutorial, we recommend that you first follow getting oriented to Dremio and working with your first dataset tutorials. In addition, we will be using ranger 0.7. And the latest deployment of Dremio.
For this tutorial, we will be using the same AD/LDAP for both Ranger and Dremio since we will be utilizing the same users and groups. We are also going to use pre-defined Ranger policies as follows:
In addition, we will be using a fictitious ‘Production’ database composed of 20 tables located in Hive, however, for this tutorial we will be applying the policies to the four tables listed above. You will also have the opportunity to see how we interact with Ranger’s audit screen, as well as how Ranger security is enabled in Dremio once you connect to a data source.
For this tutorial we will be using Ranger 0.7.
Ranger Architectural Overview
In this diagram, we see the Dremio cluster on the right and the Hive environment on the left. In traditional Hive, you have the metastore which contains all the information about the data stored in HDFS in terms of stats, files, rows, tables and columns, along with Hive Server 2 instances which serve ODBC and JDBC requests from above.
Each Hive Server 2 instance has a Ranger plug-in that speaks to the Ranger server which authorizes access to Hive resources as specified by defined Ranger policies which are created and managed in the Ranger server. On the other side, the Dremio cluster [shown on the right] is running in a Yarn deployment composed of Coordinator and Executor nodes. In this mode, Dremio integrates with the Yarn resource manager to secure compute resources in a shared multi-tenant environment. This integration allows enterprises to more easily integrate Dremio on a Hadoop cluster including the ability to shrink or expand resources on demand.
Coordinator nodes are responsible for query planning, web user interface and handling client connections. Executor nodes are largely responsible for query execution and most of the heavy lifting.
The new item for Dremio, is the Ranger Plugin, this plug-in is installed in the Coordinator node. It allows the Coordinator node to communicate with Ranger and allow or deny access to HDFS resources based on the policies defined on the Ranger server instance. In this mode, client queries come in via JDBC or ODBC to the Dremio coordinator, or access is requested via these queries. Access is checked via Ranger policies to make sure they are valid and based on that authorization, the Executor node(s) are permitted to access that source and execute the query for the user. Then the results are sent back to the coordinator so they can be represented in the UI.
Dremio and Ranger Security Interaction
In the production database that we will be using for this tutorial we are going to focus on the 4 tables indicated below.
Additionally, we have created a security profile
hive_site_1_policies in Ranger that maps directly to the permissions that we have previously defined.
In this part of the tutorial, we are going to make sure that the Ranger policies are being enforced by Dremio. As we can see if the image above,
Bharri should only have access to
Marketing should only have access to
As you can see here we are logged in as
And that username should have access to the Vehicles table and also the Engines table since he/she is a member of the
Sure enough, it seems like the policy was enforced correctly, and we can observe that he has been granted access to the
Engine table since he belongs to the
Engineering group. However an extra step is due to double check on the enforcement of the policy, let’s try to access a data set that
Cjohan has not been granted access to.
While we are logged in as
Cjohan and tried to access the
customers table, Dremio provides us with an access denied warning. At this point we can conclude that the policy was enforced correctly. The same behavior would be expected if the same username tries to access the
Now let’s go ahead and try the same procedure using the
Bharri username which in this case should only have access to
Let’s head back to the list of available tables and select
Effectively we can see that
Bharri has access to the
customers table, now let’s try a different one to double-check on the validity of this policy.
As expected and based on the security policy already pre-defined in Ranger,
Bharri does not have access to the
Now, if we head back to the audit screen in Ranger, we can observe that the access trials and test that we ran from the Dremio interface are recorded reflecting the results (access granted and denied) for each one of the events. These results map directly to the originally defined policies in Ranger.
Each one of these policies can be edited within Ranger to grant or deny access to the available users and groups mapped from the AD/LDAP server.
Data Reflections Interoperability
Now, let’s switch tracks for a second and talk about Data Reflections before we move into the next section of this tutorial. Data Reflections are a materialized view of a dataset, in essence they are a way for Dremio to pre-compute a physical representation of the data that is optimized for various query patterns.
Reflections largely replace cubes, abstracts and aggregations that many users are having to create when trying to access and accelerate their data. Because they are seen as an index of data, Data Reflections are invisible to the user, they don’t need to be managed and they are not a physical copy of the data, therefore they reduce the costly overhead of having to move and manage copies of data. Since they are created and managed in the background, Dremio users get to benefit from them immediately without having to know they even exist.
What does this mean for a scenario where users don’t have access to certain data that was used to create a Reflection? Data Reflections are a powerful shared asset in Dremio, however, this doesn’t necessarily mean that it permits users who already do not have access to the data via Ranger policies to view those datasets which were generated through the use of Data Reflections.
In our demo scenario, we’ve made some changes to the policies to reflect the following permissions chart:
Now we want to demonstrate that Data Reflections adhere to Ranger security policies. Let’s head back to Dremio logged in as
Bharri and try to access the
Customers data set, which according to the chart he/she should have access to.
The fiery icon next to the table name indicates that this set was accelerated by a Data Reflection. When we navigate to the ‘Jobs’ screen we can see that this query was indeed accelerated by a raw Reflection that is not being pushed down to the data source.
Now if we login as
Cjohan and try to access the same dataset, we can corroborate that the user won’t have access to that dataset through the use of the data reflection that was generated.
In this tutorial we did a complete evaluation of how security policies defined in Ranger and inherited by Dremo are effectively enforced to the users. We also analyzed how these policies are continuously enforced when using Data Reflections as well. This exercise allowed us to demonstrate how easy yet trustworthy is the process to accelerate your BI analysis while keeping your data safe through robust security technologies like Ranger when using Dremio.