Dremio Jekyll

Dynamic Security Controls - Apache Ranger Integration

Intro

In this tutorial we will introduce the latest innovation that we have added to Dremio’s dynamic security control capabilities. With the 3.0 release, we’ve added Apache Ranger integration into the product. We will go on an overview of the ranger-based policy enforcement procedures, we will also exercise the different permissions that you can grant to Dremio users when using Ranger, and last but not least we will demonstrate how to implement [row level security controls]

We encourage you to work through this tutorial. Keep in mind that you would need Hive as well as Ranger in your environment and an accessible AD/LDAP server. We’ve also made a video available if you would rather watch how the workflow evolves.

The steps that we are about to show, work for Enterprise and Community editions of Dremio.

Assumptions and Demo set up

To get the most of this tutorial, we recommend that you first follow getting oriented to dremio and working with your first dataset tutorials. In addition, we will be using ranger 0.7. And the latest deployment of Dremio.

For this tutorial, we will be using the same AD/LDAP for both Ranger and Dremio since we will be utilizing the same users and groups. We are also going to use pre-defined Ranger policies as follows:

Engineering Marketing
User Member of Vehicles Engines Customers Sales
cjohan Engineering Yes Yes No No
bharri Marketing No No Yes Yes

In addition, we will be using a fake ‘Production’ database composed of 20 tables located in Hive, however, for this tutorial we will be applying the policies to the four tables listed above. You will also have the opportunity to see how we interact with Ranger’s audit screen, as well as how Ranger security is enabled in Dremio once you connect to a data source.

For this tutorial we will be using Ranger 0.7.

Ranger Architectural Overview

image alt text

In this diagram, we see the Dremio cluster on the right and the Hive environment on the left. In traditional Hive, you have the metastore which contains all the information about the data stored in HDFS in terms of stats, files, rows, tables and columns, along with Hive Server 2 instances which serve ODBC and JDBC requests from above.

Each Hive Server 2 instance has a Ranger plug-in that speaks to the Ranger server which authorizes access to Hive resources as specified by defined Ranger policies which are created and managed in the Ranger server.

On the other side, the Dremio cluster [shown on the right] is running in a Yarn deployment composed of Coordinator and Executor nodes. In this mode, Dremio integrates with the Yarn resource manager to secure compute resources in a shared multi-tenant environment. This integration allow enterprises to more easily integrate Dremio on a Hadoop cluster including the ability to shrink or expand resources on demand.

Coordinator nodes are responsible for query planning, web user interface and handling client connections. Executor nodes are largely responsible for query execution and most of the heavy lifting.

The new item for Dremio, is the Ranger Plugin, this plug-in is installed in the Coordinator node. It allows the Coordinator node to communicate with Ranger and allow or deny access to HDFS resources based on the policies defined on the Ranger server instance. In this mode, client queries come in via JDBC or ODBC to the Dremio coordinator, or access is requested via these queries. Access is checked via Ranger policies to make sure they are valid and based on that authorization, the Executor node(s) are permitted to access that source and execute the query for the user. Then the results are sent back to the coordinator so they can be represented in the UI.

Dremio and Ranger Security Interaction

In the production database that we will be using for this tutorial we are going to focus on the 4 tables indicated below.

image alt text

Additionally, we have created a security profile (‘hive_site_1_policies”) in Ranger that maps directly to the permissions that we have previously defined.

image alt text

image alt text

In this part of the tutorial, we are going to make sure that the Ranger policies are being enforced by Dremio. As we can see if the image above, Cjohan and Bharri should only have access to Vehicles and Sales and the groups Engineering and Marketing should only have access to Engines and Customers respectively.

As you can see here we are logged in as Cjohan

image alt text

And that username should have access to the Vehicles table and also the Engines table since it he/she is a member of the Engineering group.

image alt text

Sure enough, it seems like the policy was enforced correctly, and we can observe that he has been granted access to the Engine table since he belongs to the Engineering group. However an extra step is due to double check on the enforcement of the policy, let’s try to access a data set that Cjohan has not been granted access to.

image alt textWhile we are logged in as Cjohan and tried to access the “customers” table, Dremio provides us with an access denied warning. At this point we can conclude that the policy was enforced correctly. The same behavior would be expected if the same username tries to access the Sales table.

Now let’s go ahead and try the same procedure using the Bharri username which in this case should only have access to “Customers” and “Sales”.

Let’s head back to the list of available tables and select “customers”.

image alt text

Effectively we can see that Bharri has access to the ‘customers’ table, now let’s try a different one to double-check on the validity of this policy.

image alt text

As expected and based on the security policy already pre-defined in Ranger, Bharri does not have access to the ‘Engines’ table.

Now, if we head back to the audit screen in Ranger, we can observe that the access trials and test that we ran from the Dremio interface are recorded reflecting the results (access granted and denied) for each one of the events. These results map directly to the originally defined policies in Ranger.

image alt text

Each one of these policies can be edited within Ranger to grant or deny access to the available users and groups mapped from the AD/LDAP server.

image alt text

Data Reflections Interoperability

Now, let’s switch tracks for a second and talk about Data Reflections before we move into the next section of this tutorial. Data Reflections are a materialized view of a dataset, in essence they are a way for Dremio to pre-compute a physical representation of the data that is optimized for various query patterns.

Reflections largely replace cubes, abstracts and aggregations that many users are having to create when trying to access and accelerate their data. Because they are seen as an index of data, Data Reflections are invisible to the user, they don’t need to be managed and they are not a physical copy of the data, therefore they reduce the costly overhead of having to move and manage copies of data. Since they are created and managed in the background, Dremio users get to benefit from them immediately without having to know they even exist.

What does this mean for a scenario where users don’t have access to certain data that was used to create a Reflection? Data Reflections are a powerful shared asset in Dremio, however, this doesn’t necessarily mean that it permits users who already do not have access to the data via Ranger policies to view those datasets which were generated through the use of Data Reflections.

In our demo scenario, we’ve made some changes to the policies to reflect the following permissions chart:

Engineering Marketing
User Member of Vehicles Engines Customers Sales
cjohan Engineering No Yes No No
bharri Marketing No Yes Yes Yes

Now we want to demonstrate that Data Reflections adhere to Ranger security policies. Let’s head back to Dremio logged in as Bharri and try to access to ‘Customers’ data set, which according to the chart he/she should have access to.

image alt text

The fiery icon next to the table name indicates us that this set was accelerated by a Data Reflection. When we navigate to the ‘Jobs’ screen we can see that this query was indeed accelerated by a raw Reflection that is not being pushed down to the data source.

image alt text

image alt text

Now if we login as Cjohan and try to access the same dataset, we can corroborate that the user won’t have access to that dataset through the use of the data reflection that was generated.

image alt text

Conclusion

In this tutorial we did a complete evaluation of how security policies defined in Ranger and inherited by Dremo are effectively enforced to the users. We also analyzed how these policies are continuously enforced when using Data Reflections as well. This exercise allowed us to demonstrate how easy yet trustworthy is the process to accelerate your BI analysis while keeping your data safe through robust security technologies like Ranger when using Dremio.