The amount of information that industries must keep safe is ever-increasing due to how easy is to collect data from customers, patients, employees, etc. Now more than ever, is very critical that we ensure that data security and privacy remain a priority to protect against expensive threats.
Dremio provides a powerful and flexible set of security features that integrate with the controls deployed across enterprise systems, and provides additional capabilities for masking and uniform, fine-grained security policies no matter where the data is managed.
In this tutorial we will walk you through the steps of dynamically masking data so it is only visible by those who have been authorized. We will use a dataset that contains [social security numbers and salary information], we will use Dremio to mask this data and then we will try to visualize the data using a BI client.
To get the most of this tutorial, we recommend that you first follow getting oriented to dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio.
In this tutorial we will work with a predefined dataset that we have already loaded into Dremio and contains the following information:
In this tutorial I’m going to play the role of a data engineer who was requested to curate and provide access to the employee database to certain users. However, the dataset contains several fields which not everyone should have access to. I’ve been requested to identify the sensitive fields and provide access to only team members who belong to the “Accounting” group.
First, I’m going to explore the data to see what is it that we are dealing with
I have identified 3 different fields that contain sensitive data:
To verify that this data is fully accessible, I’m going to try to visualize it using a BI client and a generic user account.
Here we can observe that the data has full open access regardless of the user that tries to access it.
As mentioned earlier, in this scenario I’ve been requested to only provide full access to the data for those who belong to the “Accounting” team, anyone else will be able to have access to the data but it will be masked. Let’s see how can get this done.
Using query_user() or is_member(), Dremio allows us to setup a virtual dataset with selective masking of its columns for different users or groups without having to create multiple datasets.
In this case we will use the following query:
1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT employee_id, first_name, last_name, CASE WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN ssn ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4)) END AS ssn, CASE WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN credit_card ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4)) END AS creddit_card, CASE WHEN query_user() IN ('fireflyx','dremio') OR is_member('Accounting') THEN cc_code END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id FROM firefly.SSN.employee_SSN_Orig
This query checks to see if the user belongs to the accounting group, if it doesn’t then it will reveal only the last 4 numbers of the SSN field and will show only the last 4 digits of the Credit Card field. Let’s take a look at the results.
Now that the data has been masked, let’s check how secure is the data when we try to access it from a BI client; we will use Tableau in this scenario.
First, I’m going to download the dataset
Then I will complete the ODBC connection using my non-admin username and password. This user does not belong to the accounting team
Next, I’m going to try to visualize the data
We can notice that the masking policy has been successfully applied and it takes effect when trying to access the data from any BI client.
In this tutorial we demonstrated a simple but powerful feature that Dremio provides to data engineers and data consumers who want to make sure they comply with the latest and most rigorous security and privacy measures in the data industry.
We used a generic dataset that contained sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.
We hope you enjoyed this tutorial, stay tuned to learn more about how can you gain insights from your data faster using Dremio.