Dremio Jekyll

Dynamically Masking Sensitive Data Using Dremio

Dremio

Introduction

The amount of information that industries must keep safe is ever-increasing due to how easy it is to collect data from customers, patients, employees, etc. Now more than ever, it is very critical that we ensure that data security and privacy remain a priority to protect against expensive threats.

Dremio provides a powerful and flexible set of security features that integrate with the controls deployed across enterprise systems, and provides additional capabilities for masking and uniform, fine-grained security policies no matter where the data is managed.

In this tutorial we will walk through the steps of dynamically masking data so it is only visible by those who have been authorized. We will use a dataset that contains social security numbers and salary information, we will use Dremio to mask this data, and then we will try to visualize the data using a BI client.

Prerequisites

To get the most of this tutorial, we encourage you to complete getting oriented to Dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio.

Loading Unmasked Data Into Dremio

In this tutorial we will work with a predefined dataset that we have already loaded into Dremio and contains the following information:

  • Employee ID
  • First Name
  • Last Name
  • Social Security Number
  • Credit Card Number
  • CC Code
  • Email
  • Phone number
  • Hire date
  • Job ID
  • Salary
  • Manager ID
  • Department ID

Dealing with Sensitive Data

In this tutorial we will play the role of a data engineer who was requested to curate and provide access to the employee database to certain users. However, the dataset contains several fields which not everyone should have access to. We’ve been requested to identify the sensitive fields and provide access to only team members who belong to the “Accounting” group.

First, let’s explore the data:

There are 3 different fields that contain sensitive data:

  • SSN = Social Security Number
  • credit_card
  • salary

To verify that this data is fully accessible, we will visualize it using a BI client and a generic user account.

Here we can observe that the data has full open access regardless of the user that tries to access it.

Applying Security Policies

As mentioned earlier, in this scenario we’ve been requested to provide full access to the data only for those who belong to the “Accounting” team, anyone else will be able to have access to the data but it will be masked. Let’s see how we can get this done.

Using query_user() or is_member(), Dremio allows us to set up a virtual dataset with selective masking of its columns for different users or groups without having to create multiple datasets.

In this case we will use the following query:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT employee_id, first_name, last_name,
CASE
WHEN query_user() IN ('gnarly@dremio.com','dremio') OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4))
END AS ssn,
CASE
WHEN query_user() IN ('gnarly@dremio.com','dremio') OR is_member('Accounting') THEN credit_card
ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4))
END AS creddit_card,
CASE
WHEN query_user() IN ('gnarly@dremio.com','dremio') OR is_member('Accounting') THEN cc_code
END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
FROM emp_data

This query checks to see if the user belongs to the accounting group, if it doesn’t then it will reveal only the last 4 numbers of the SSN field and will show only the last 4 digits of the Credit Card field. Let’s take a look at the results.

Accessing Masked Data From a BI Client

Now that the data has been masked, let’s check how secure is the data when we try to access it from a BI client; we will use Tableau in this scenario.

Click on the Tableau icon on the top toolbar:

Connect Dremio to Tableau using the same credentials you used to log-in into Dremio. Next, try to visualize the data:

We can notice that the masking policy has been successfully applied and it takes effect when we try to access the data from a BI client. In this case the user is not able to see the SSN and Credit Card information because we didn’t log in with gnarly@dremio.com or we are not part of the ‘Accounting’ LDAP group.

Conclusion

In this tutorial we demonstrated a simple but powerful feature that Dremio provides to data engineers and data consumers who want to make sure they comply with the latest and most rigorous security and privacy measures in the data industry.

We used a generic dataset that contained sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.

Checkout Dremio’s Security Architecture Guide to learn more about Dremio’s security features and how they work.