Dynamic Security Controls – Apache Ranger Integration

   

Table of Contents

Note: Dremio only supports one data governance policy manager at a time, so you can use either Dremio or Ranger as a policy manager but not both at the same time.

In this tutorial, we will go on an overview of the ranger-based policy enforcement procedures, we will also exercise the different permissions that you can grant to Dremio users when using Ranger, and last but not least we will demonstrate how to implement row level security controls.

Keep in mind that you would need Hive as well as Ranger in your environment and an accessible AD/LDAP server. We’ve also made a video available if you would rather watch how the workflow evolves.

The steps that we are about to show, work for Enterprise and Community editions of Dremio.

Environment Setup

To get the most of this tutorial, we recommend that you first follow getting oriented to Dremio and working with your first dataset tutorials. In addition, we will be using ranger 0.7. And the latest deployment of Dremio.

For this tutorial, we will be using the same AD/LDAP for both Ranger and Dremio since we will be utilizing the same users and groups. We are also going to use pre-defined Ranger policies as follows:

   EngineeringMarketing 
UserMember ofVehiclesEnginesCustomersSales
cjohanEngineeringYesYesNoNo
bharriMarketingNoNoYesYes

In addition, we will be using a fictitious ‘Production’ database composed of 20 tables located in Hive, however, for this tutorial we will be applying the policies to the four tables listed above. You will also have the opportunity to see how we interact with Ranger’s audit screen, as well as how Ranger security is enabled in Dremio once you connect to a data source.

For this tutorial we will be using Ranger 0.7.

Ranger Architectural Overview

In this diagram, we see the Dremio cluster on the right and the Hive environment on the left. In traditional Hive, you have the metastore which contains all the information about the data stored in HDFS in terms of stats, files, rows, tables and columns, along with Hive Server 2 instances which serve ODBC and JDBC requests from above.

Each Hive Server 2 instance has a Ranger plug-in that speaks to the Ranger server which authorizes access to Hive resources as specified by defined Ranger policies which are created and managed in the Ranger server. On the other side, the Dremio cluster [shown on the right] is running in a Yarn deployment composed of Coordinator and Executor nodes. In this mode, Dremio integrates with the Yarn resource manager to secure compute resources in a shared multi-tenant environment. This integration allows enterprises to more easily integrate Dremio on a Hadoop cluster including the ability to shrink or expand resources on demand.

Coordinator nodes are responsible for query planning, web user interface and handling client connections. Executor nodes are largely responsible for query execution and most of the heavy lifting.

The new item for Dremio, is the Ranger Plugin, this plug-in is installed in the Coordinator node. It allows the Coordinator node to communicate with Ranger and allow or deny access to HDFS resources based on the policies defined on the Ranger server instance. In this mode, client queries come in via JDBC or ODBC to the Dremio coordinator, or access is requested via these queries. Access is checked via Ranger policies to make sure they are valid and based on that authorization, the Executor node(s) are permitted to access that source and execute the query for the user. Then the results are sent back to the coordinator so they can be represented in the UI.

Dremio and Ranger Security Interaction

In the production database that we will be using for this tutorial we are going to focus on the 4 tables indicated below.

Additionally, we have created a security profile hive_site_1_policies in Ranger that maps directly to the permissions that we have previously defined.

In this part of the tutorial, we are going to make sure that the Ranger policies are being enforced by Dremio. As we can see if the image above, Cjohan and Bharri should only have access to Vehicles and Sales and the groups Engineering and Marketing should only have access to Engines and Customers respectively.

As you can see here we are logged in as Cjohan

And that username should have access to the Vehicles table and also the Engines table since he/she is a member of the Engineering group.

Sure enough, it seems like the policy was enforced correctly, and we can observe that he has been granted access to the Engine table since he belongs to the Engineering group. However an extra step is due to double check on the enforcement of the policy, let’s try to access a data set that Cjohan has not been granted access to.

While we are logged in as Cjohan and tried to access the customers table, Dremio provides us with an access denied warning. At this point we can conclude that the policy was enforced correctly. The same behavior would be expected if the same username tries to access the Sales table.

Now let’s go ahead and try the same procedure using the Bharri username which in this case should only have access to Customers and Sales.

Let’s head back to the list of available tables and select customers.

Effectively we can see that Bharri has access to the customers table, now let’s try a different one to double-check on the validity of this policy.

As expected and based on the security policy already pre-defined in Ranger, Bharri does not have access to the Engines table. Now, if we head back to the audit screen in Ranger, we can observe that the access trials and test that we ran from the Dremio interface are recorded reflecting the results (access granted and denied) for each one of the events. These results map directly to the originally defined policies in Ranger.

Each one of these policies can be edited within Ranger to grant or deny access to the available users and groups mapped from the AD/LDAP server.

Data Reflections Interoperability

Now, let’s switch tracks for a second and talk about Data Reflections before we move into the next section of this tutorial. Data Reflections are a materialized view of a dataset, in essence they are a way for Dremio to pre-compute a physical representation of the data that is optimized for various query patterns.

Reflections largely replace cubes, abstracts and aggregations that many users are having to create when trying to access and accelerate their data. Because they are seen as an index of data, Data Reflections are invisible to the user, they don’t need to be managed and they are not a physical copy of the data, therefore they reduce the costly overhead of having to move and manage copies of data. Since they are created and managed in the background, Dremio users get to benefit from them immediately without having to know they even exist.

What does this mean for a scenario where users don’t have access to certain data that was used to create a Reflection? Data Reflections are a powerful shared asset in Dremio, however, this doesn’t necessarily mean that it permits users who already do not have access to the data via Ranger policies to view those datasets which were generated through the use of Data Reflections.

In our demo scenario, we’ve made some changes to the policies to reflect the following permissions chart:

   EngineeringMarketing 
UserMember ofVehiclesEnginesCustomersSales
cjohanEngineeringNoYesNoNo
bharriMarketingNoYesYesYes

Now we want to demonstrate that Data Reflections adhere to Ranger security policies. Let’s head back to Dremio logged in as Bharri and try to access the Customers data set, which according to the chart he/she should have access to.

The fiery icon next to the table name indicates that this set was accelerated by a Data Reflection. When we navigate to the ‘Jobs’ screen we can see that this query was indeed accelerated by a raw Reflection that is not being pushed down to the data source.

Now if we login as Cjohan and try to access the same dataset, we can corroborate that the user won’t have access to that dataset through the use of the data reflection that was generated.

Conclusion

In this tutorial we did a complete evaluation of how security policies defined in Ranger and inherited by Dremo are effectively enforced to the users. We also analyzed how these policies are continuously enforced when using Data Reflections as well. This exercise allowed us to demonstrate how easy yet trustworthy is the process to accelerate your BI analysis while keeping your data safe through robust security technologies like Ranger when using Dremio.

Ready to Get Started? Here Are Some Resources to Help

Whitepaper Thumb

Whitepaper

Dremio Upgrade Testing Framework

read more
Whitepaper Thumb

Whitepaper

Operating Dremio Cloud Runbook

read more

Webinars

Unlock the Power of a Data Lakehouse with Dremio Cloud

read more
get started

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now
demo on demand

See Dremio in Action

Not ready to get started today? See the platform in action.

Watch Demo
talk expert

Talk to an Expert

Not sure where to start? Get your questions answered fast.

Contact Us

Ready to Get Started?

Bring your users closer to the data with organization-wide self-service analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run Dremio anywhere with self-managed software or Dremio Cloud.