Dynamically Masking Sensitive Data Using Dremio

   
  • Dremio

Table of Contents

Table of Contents

Introduction

The amount of information that industries must keep safe is ever-increasing due to how easy it is to collect data from customers, patients, employees, etc. Now more than ever, it is very critical that we ensure that data security and privacy remain a priority to protect against expensive threats.

Dremio provides a powerful and flexible set of security features that integrate with the controls deployed across enterprise systems, and provides additional capabilities for masking and uniform, fine-grained security policies no matter where the data is managed.

In this tutorial we will walk through the steps of dynamically masking data so it is only visible by those who have been authorized. We will use a dataset that contains social security numbers and salary information, we will use Dremio to mask this data, and then we will try to visualize the data using a BI client.

Prerequisites

To get the most of this tutorial, we encourage you to complete getting oriented to Dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio.

Loading Unmasked Data Into Dremio

In this tutorial we will work with a predefined dataset that we have already loaded into Dremio and contains the following information:

  • Employee ID
  • First Name
  • Last Name
  • Social Security Number
  • Credit Card Number
  • CC Code
  • Email
  • Phone number
  • Hire date
  • Job ID
  • Salary
  • Manager ID
  • Department ID

Dealing with Sensitive Data

In this tutorial we will play the role of a data engineer who was requested to curate and provide access to the employee database to certain users. However, the dataset contains several fields which not everyone should have access to. We’ve been requested to identify the sensitive fields and provide access to only team members who belong to the “Accounting” group.

First, let’s explore the data:

There are 3 different fields that contain sensitive data:

  • SSN = Social Security Number
  • credit_card
  • salary

To verify that this data is fully accessible, we will visualize it using a BI client and a generic user account.

Here we can observe that the data has full open access regardless of the user that tries to access it.

Applying Security Policies

As mentioned earlier, in this scenario we’ve been requested to provide full access to the data only for those who belong to the “Accounting” team, anyone else will be able to have access to the data but it will be masked. Let’s see how we can get this done.

Using query_user() or is_member(), Dremio allows us to set up a virtual dataset with selective masking of its columns for different users or groups without having to create multiple datasets.

In this case we will use the following query:

SELECT employee_id, first_name, last_name,
CASE
WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN ssn
ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4))
END AS ssn,
CASE
WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN credit_card
ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4))
END AS creddit_card,
CASE
WHEN query_user() IN ('[email protected]','dremio') OR is_member('Accounting') THEN cc_code
END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
FROM emp_data

This query checks to see if the user belongs to the accounting group, if it doesn’t then it will reveal only the last 4 numbers of the SSN field and will show only the last 4 digits of the Credit Card field. Let’s take a look at the results.

Accessing Masked Data From a BI Client

Now that the data has been masked, let’s check how secure is the data when we try to access it from a BI client; we will use Tableau in this scenario.

Click on the Tableau icon on the top toolbar:

Connect Dremio to Tableau using the same credentials you used to log-in into Dremio. Next, try to visualize the data:

We can notice that the masking policy has been successfully applied and it takes effect when we try to access the data from a BI client. In this case the user is not able to see the SSN and Credit Card information because we didn’t log in with [email protected] or we are not part of the ‘Accounting’ LDAP group.

Conclusion

In this tutorial we demonstrated a simple but powerful feature that Dremio provides to data engineers and data consumers who want to make sure they comply with the latest and most rigorous security and privacy measures in the data industry.

We used a generic dataset that contained sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.

Checkout Dremio’s Security Architecture Guide to learn more about Dremio’s security features and how they work.

Ready to Get Started? Here Are Some Resources to Help

Alteryx Analytic Platform and Dremio Open Lakehouse combine to simplify data operations and enable broad access to the data lake

Webinars

Unlocking Analytics from your Data Lake with Alteryx and Dremio

As a result of the accelerated growth of data lakes, data teams have been forced to either build and maintain expensive and complex processes to make new sources of data available for use in proprietary data warehouses, or hinder access to analytics for all data consumers. In this webinar, learn how the Alteryx Analytic Platform and Dremio Open Lakehouse combine to simplify data operations and enable broad access to the data lake for exploration, discovery, and insights.

read more

Webinars

How Open Lakehouses Simplify Analytics on Cloud Data Lakes

Cloud migration affords your organization the opportunity to rethink the fundamental architecture of corporate reporting and analytics system design. This webinar explores how cloud resources and services eliminate the need for costly data warehouse solutions that require significant data integration and preparation efforts.

read more

Guides

Data Virtualization vs. Data Lakes

Businesses need to aggregate data sources to be able to use the data. Data virtualization and data lakes are popular approaches, but which to choose?

read more

Get Started Free

No time limit - totally free - just the way you like it.

Sign Up Now

Watch Demo

Not ready to get started today? See the platform in action.

Check Out Demo