What is ADLS Gen2 - and why it matters
The second generation of ADLS, also known as ADLS Gen2, brings together all the great features of ADLS Gen1 and Azure Blob Storage. ADLS Gen2 can be seen by users as a superset of ADLS Gen1, which we talked about in our ADLS explainer.
Described by Microsoft as a “no-compromise data lake”, ADLS Gen 2 extends the capabilities of Azure Blob Storage and is optimized for large scale analytics workloads. Users can store data once and access it through existing blob storage and HDFS-compliant file system interfaces, with no programming changes or data copying when doing database operations.
Why does this matter?
Up to this point, if you were going to use Azure Storage for your analytics workloads, the first question that you had to answer was: “Do I need to use ADLS or Blob Storage?” Here’s a comparison of their features:
|ADLS Gen1||Blob Storage|
There’s a trade-off between both solutions, with neither fully covering the myriad of large scale analytics workloads that we commonly see. The data lake unification that ADLS Gen2 provides allows users to take advantage of the best of both in the same place.
When setting up your Azure account, Azure won’t explicitly ask you if you want to set up and ADLS Gen2 account – you basically will be setting up a “Storage Account” that will be enabled to support ADLS Gen2. Many users might find that confusing. But a new Storage Account will generally be ADLS Gen2.
What does ADLS Gen2 bring to the table?
Gen2 of Azure Data Lake Storage includes the best of both worlds – ADLS and Blob Storage:
Performance: When making data-driven decisions, time is everything. ADLS Gen2 provides best-of-class storage performance, resulting in less computing resources needed to extract data to be analyzed. This means faster insights from data but also reduced costs. Scalability: For big data analytics, this is one of the most important factors. Azure ADLS Gen2 provides an elastic, scalable environment that can easily adapt to the ever-increasing volume of data that needs to be analyzed. Security: Keeping your data secure should be highly prioritized when working with cloud technologies. There are multiple security features provided when creating a data lake in Azure:
Azure Active Directory (AAD) integration provides users with seamless secure access to the apps they are working with in the Azure cloud. It also allows them to deploy, manage and monitor security policies across the entire environment.
A combination of Azure Role Based Access Control and POSIX ACLs provide flexible and elastic data access control.
TLS data encryption at rest and transit.
Hierarchical File System (HFS): allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on users’ computers is organized.
High Availability: Read-access geo-redundant storage guarantees at least 99.99% of availability of your data. Geo-redundant storage replicates your data in different regions to ensure it is always available should a catastrophic event compromise the original storage location.
Virtually Limitless Storage: Store files up to 5 TB, with unlimited overall storage capacity
Storage Tiers: Blob tiers (Hot, Cool, Archive), each one of these tiers have been created to provide an efficient storage option based on how often you access your data and also access patterns. With hot, cool, and archive access tiers, Azure Blob storage addresses this need for differentiated access tiers with separate pricing models.
Upgrading or migrating to ADLS Gen2 allows you to immediately start taking advantage not just of these features, but also of all Gen2’s security and performance improvements. However, we recommend that you check ADLS Gen2’s known issues page to make sure that there aren’t any issues with any of the features that you are thinking of using.
One of the characteristics that makes ADLS so attractive is its cost effectiveness. Its no upfront cost, pay-per-use model allows users to pay only for data at rest and the number of gigabytes stored, as well as the number of transactions (read and write) over that data.
ADLS Gen2 storage prices can range from $0.002/GB for archive to $0.0184/GB for “Hot” storage. Transaction prices can range anywhere from $6.50 to $0.065 depending on the operation (read or write) and the type of storage that the operation is being performed on. Transaction prices will vary depending on whether the file structure needed is a Hierarchical or Flat namespace.
To understand how your bill will look at the end of the month, first we need to understand how Azure will break down your analytics workloads. Azure breaks down items into “Transactions”. Transactions are defined as reads or writes of data on sizes varying from 128kb up to 4MB.
For example, if a user places a 9M item in ADLS, Azure will break this down into 3 different transactions: 4 MB + 4 MB + 1 MB. The monthly cost is calculated based on monthly usage volume (transactions) plus the storage used. An example of the cost breakdown for a common use case would be the following:
Say you have an application that writes data into ADLS at a rate of 10 items per second, each item being 4 MB, then another service that runs for 4 hours a day and reads 1000 items per second, then the monthly bill would be:
Note: For this example we will use the following time period parameters:
- Month = 31 days.
- 31 Days = 730 hours.
- 1 hour = 3600 seconds.
|Custom app||10 items/second x 3600 x 730||$0.05 every 10k transactions||$131.40|
|Reading job||1000 items/second x 3600 x 4 x 31||$0.004 every 10k transactions||$178.56|
|Storage||3.4 Terabyte/Day||$0.0184 per GB||$1,939.36|
Dremio and ADLS
Dremio connects to data lakes like ADLS, Amazon S3, HDFS and more, putting all of your data in one place and providing it structure. We provide an integrated, self-service interface for data lakes, designed for BI users and data scientists. Dremio increases the productivity of these users by allowing them to easily search, curate, accelerate, and share datasets with other users. In addition, Dremio allows companies to run their BI workloads from their data lake infrastructure, removing the need to build cubes or BI extracts.
To learn more about how you can increase the value of your ADLS data using Dremio, check out or tutorial:Building a Cloud Data Lake on Azure with Dremio and ADLS