Dremio Jekyll

Dynamic Security Controls - Masking Sensitive Data Using Dremio

Introduction

The amount of information that industries must keep safe is ever-increasing due to how easy is to collect data from customers, patients, employees, etc. Now more than ever, is very critical that we ensure that data security and privacy remain a priority to protect against expensive threats.

Dremio provides a powerful and flexible set of security features that integrate with the controls deployed across enterprise systems, and provides additional capabilities for masking and uniform, fine-grained security policies no matter where the data is managed.

In this tutorial we will walk you through the steps of dynamically masking data so it is only visible by those who have been authorized. We will use a dataset that contains [social security numbers and salary information], we will use Dremio to mask this data and then we will try to visualize the data using a BI client.

Assumptions

To get the most of this tutorial, we recommend that you first follow getting oriented to dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio.

Loading Unmasked Data Into Dremio

In this tutorial we will work with a predefined dataset that we have already loaded into Dremio and contains the following information:

  • Employee ID
  • First Name
  • Last Name
  • Social Security Number
  • Credit Card Number
  • CC Code
  • Email
  • Phone number
  • Hire date
  • Job ID
  • Salary
  • Manager ID
  • Department ID

Dealing with Sensitive Data

In this tutorial I’m going to play the role of a data engineer who was requested to curate and provide access to the employee database to certain users. However, the dataset contains several fields which not everyone should have access to. I’ve been requested to identify the sensitive fields and provide access to only team members who belong to the “Accounting” group.

First, I’m going to explore the data to see what is it that we are dealing with

image alt text

I have identified 3 different fields that contain sensitive data:

  • SSN = Social Security Number
  • Credit_card
  • Salary

To verify that this data is fully accessible, I’m going to try to visualize it using a BI client and a generic user account.

image alt text

Here we can observe that the data has full open access regardless of the user that tries to access it.

Applying Security Policies

As mentioned earlier, in this scenario I’ve been requested to only provide full access to the data for those who belong to the “Accounting” team, anyone else will be able to have access to the data but it will be masked. Let’s see how can get this done.

Using query_user() or is_member(), Dremio allows us to setup a virtual dataset with selective masking of its columns for different users or groups without having to create multiple datasets.

In this case we will use the following query:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT employee_id, first_name, last_name,
CASE
WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4))
END AS ssn,
CASE
WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN credit_card
ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4))
END AS creddit_card,
CASE
WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN cc_code
END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
FROM firefly.SSN.employee_SSN_Orig

This query checks to see if the user belongs to the accounting group, if it doesn’t then it will reveal only the last 4 numbers of the SSN field and will show only the last 4 digits of the Credit Card field. Let’s take a look at the results.

image alt text

Accessing Masked Data From a BI Client

Now that the data has been masked, let’s check how secure is the data when we try to access it from a BI client; we will use Tableau in this scenario.

First, I’m going to download the dataset

image alt text

Then I will complete the ODBC connection using my non-admin username and password. This user does not belong to the accounting team

image alt text

Next, I’m going to try to visualize the data

image alt text

We can notice that the masking policy has been successfully applied and it takes effect when trying to access the data from any BI client.

Conclusion

In this tutorial we demonstrated a simple but powerful feature that Dremio provides to data engineers and data consumers who want to make sure they comply with the latest and most rigorous security and privacy measures in the data industry.

We used a generic dataset that contained sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.

We hope you enjoyed this tutorial, stay tuned to learn more about how can you gain insights from your data faster using Dremio.