Dremio Jekyll

Working with Dremio and LDAP/AD Authentication

Introduction

The lightweight directory access protocol (LDAP) is an open protocol used to store and retrieve data from a hierarchical directory structure. In this tutorial we will show you what are the steps needed to integrate LDAP authentication in Dremio. Then we will demonstrate how we can control access for different users and groups in Dremio using LDAP. We will create separate Spaces inside Dremio and will grant and restrict access to each group. Then we will mask sensitive data and try to access it from a BI client using different usernames to see how data is masked according to the users’ group memberships.

Assumptions

To get the most of this tutorial, we recommend that you first follow getting oriented to Dremio and working with your first dataset tutorials. It is also important to note that the feature that we are about to demonstrate is available only on the Enterprise Edition version of Dremio. While you are welcome to work with any LDAP provider, for this tutorial we will be using Okta to create the users and groups that we are going to use to log into Dremio once the set-up is configured. For more information on how to connect to Okta using the LDAP interface, they have a great article in their documentation that covers this topic.

Demo Set-up

For this tutorial we will be using the following predefined policies

User Okta Group Dremio Space Access to sensitive data
beluga@dremio.com Engineering Engineering No
polar@dremio.com Human Resources Human Resources Yes

Creating Users in Okta

Keep in mind that LDAP is an early access feature in Okta and you will need to get in touch with their customer care team to have it enabled on your account.

To create users, I will first log into my Okta account

image alt text

Then navigate to “Directory” and from here we will go to “people” and create a couple of users

image alt text

Here, click on “Add Person”

image alt text

And then we are going to add the users that we will be working with, in this case we are going to create Beluga Ice and Polar Bear.

Now I’m going to assign these users to their respective groups following the table shown in the “Demo Set-up” section; Beluga Ice will be assigned to the Engineering group and Polar Bear to Human Resources.

image alt text

Configuring LDAP in Dremio

Before we deploy Dremio we need to make the following changes to the config file.

We are going to edit the Dremio.conf file. Inside the application directory for Dremio, navigate to “/conf” and locate and open “dremio.conf”. I will be using a text editor to make the changes.

At the bottom of the file we are going to locate the following block

1
2
3
4
5
services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true
}

And we are going to add the following lines

1
2
Coordinator.web.auth.type: "ldap"
Coordinator.web.auth.ldap_config: "ad.json"

The “services” block should look like this

1
2
3
4
5
6
7
services: {
  coordinator.enabled: true,
  coordinator.master.enabled: true,
  executor.enabled: true,
  coordinator.web.auth.type: "ldap"
  coordinator.web.auth.ldap_config: "ad.json"
}

Create a new file inside the /conf directory, name it “ad.json” and paste the following lines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
    "connectionMode": "ANY_SSL",
    "servers": [
        {
            "hostname": "dremio.ldap.okta.com",
            "port": 636
        }
    ],
    "names": {
        "bindDN": "uid=naren,dc=dremio,dc=okta,dc=com",
        "bindPassword": "<password>",
        "baseDN": "ou=users,dc=dremio,dc=okta,dc=com",
        "userFilter": "&(objectclass=inetorgperson)",
        "userAttributes": {
            "baseDNs": [
                "ou=users,dc=dremio,dc=okta,dc=com"
            ],
            "searchScope": "SUB_TREE",
            "firstname": "givenName",
            "id": "uid",
            "lastname": "sn",
            "email": "email"
        },
        "groupMembership": "memberOf",
        "groupDNs": ["cn={0},ou=groups,dc=dremio,dc=okta,dc=com"],
        "groupFilter": "(objectClass=groupofUniqueNames)",
        "autoAdminFirstUser": true
    }
}

Notice that the hostname directs to “dremio.ldap.okta.com” since that is the provider we are using.

Now, double check that there are no Dremio processes running on your machine

1
$sudo ps -ef | grep dremio

At this point we can start Dremio. Navigate to the /bin directory inside Dremio and execute the following command:

1
./dremio start

I’m going to log into Dremio using my administrator username. Notice that I’m using the email address as username.

image alt text

Working With Spaces

I’m going to create two spaces, one for Engineering and another for Human Resources and will specify the respective groups in the sharing settings. Additionally, I will upload data corresponding to each team and will move these datasets to their respective spaces.

image alt text

image alt text

At this point I should be able to see both spaces on the main screen

image alt text

We will skip through the steps of uploading the datasets for each space, if you would like to review those steps, please see our “working with your first dataset” tutorial.

Verifying Access to the Spaces

Now, I should be able to test the access to each space by logging in with each one of the users that belong to each group.

We can effectively confirm that Polar Bear has only access to Human Resources

image alt text

And Beluga has only access to Engineering

image alt text

Working with Sensitive Data

In this scenario we are going to test how would this work if we have sensitive data that has been made accessible to several groups. To mask the sensitive data, I will follow the same methodology from our Dynamic Masking tutorial.

In this case I have placed a dataset containing Social Security Number and Credit Card information in a common space that everyone can access. I want to be able to share this dataset with several groups without having to create several datasets with different masked data each.

image alt text

I’m going to mask the data for those users who are not members of the “Human Resources” group using the following query

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT employee_id, first_name, last_name,
CASE
WHEN is_member('Human Resources') THEN ssn
ELSE CONCAT('XXXXXXX',SUBSTR(ssn,8,4))
END AS ssn,
CASE
WHEN is_member('Human Resources') THEN credit_card
ELSE CONCAT('XXXX-XXXX-XXXX-',SUBSTR(credit_card,16,4))
END AS creddit_card,
CASE
WHEN is_member('Human Resources') THEN cc_code
END AS cc_code, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id
FROM "@lucio@dremio.com"."firefly-ssn-employee_ssn_orig"

If I try to access the dataset from the Polar Bear profile, I should be able to see the unmasked data since that user belongs to Human Resources

image alt text

Now let’s try to access this dataset using Beluga’s username, remember Beluga Ice is a member of the Engineering group.

image alt text

We can see that the sensitive information was correctly masked according to the rules created in the SQL query.

Accessing Sensitive Data From External Tools

We are going to see how this implementation helps us when users try to visualize sensitive data using a BI client. This method applies to any BI or data science tool that you would like to use to analyze your data. We will use Tableau as an example.

I’m going to log back in as Polar Bear (Human Resources) and try to create a report in Tableau. After downloading the .tds file that Dremio generates, I’ll provide the same LDAP credentials of the user that I’m logged as in Dremio.

image alt text

Then I’ll generate a brief report and verify that the information is available to members of the Human Resources group

image alt text

Now I will perform the same exercise using a member of the Engineering group

image alt text

We can observe that the dynamic masking has successfully hide the data from the new user

image alt text

Conclusion

In this tutorial we demonstrated how easy, practical and yet powerful can be to implement LDAP on your Dremio environment. This feature provides data engineers and data consumers a robust solution to make sure their data is kept safe in accordance with the latest and most rigorous security and privacy measures in the data industry.

We used a trial version of Okta and their early access LDAP server offering. Created separate Spaces for each one of the groups and demonstrated how access was provided to each group correctly. Additionally, we used a generic dataset that contains sensitive data (SSN and CC numbers) and used dynamic masking to define column level permissions on a single dataset; this allowed us to provide different levels of visibility to the data without having to create separate virtual datasets for each one of the user groups who would be working with the same data.

We hope you enjoyed this tutorial, stay tuned to learn more about how can you gain insights from your data faster using Dremio.