What is ADLS Gen2? And why it matters
The second generation of ADLS, also known as ADLS Gen2, brings together all the great features of ADLS Gen1 and Azure Blob Storage. ADLS Gen2 can be seen by users as a superset of ADLS Gen1 which we talked about thoroughly in our ADLS explainer.
Described by Microsoft as a “no-compromise data lake”, ADLS Gen 2 extends Azure Blob Storage capabilities and is optimized for large scale analytics workloads. Users can store data once and access it through existing blob storage and HDFS-compliant file system interfaces with no programming changes or data copying when doing database operations.
Why does this matter?
Up to this point, if you were going to use Azure Storage for your analytics workloads, the first question that you had to answer was: “Do I need to use ADLS or Blob Storage?” Here’s a comparison of their features:
|ADLS Gen1||Blob Storage|
There’s a trade-off between both solutions, with neither fully covering the myriad of large scale analytics workloads that we commonly see. The data lake unification that ADLS Gen2 provides, allows users to take advantage of the best from both worlds in the same place.
When setting up your Azure account, Azure won’t explicitly ask you if you want to set up and ADLS Gen2 account – you basically will be setting up a “Storage Account” that will be enabled to support ADLS Gen2. Many users might find that confusing. But a new Storage Account will generally be ADLS Gen2.
What does ADLS Gen2 Brings to The Table?
Microsoft’s second delivery of Azure Data Lake Storage includes the best of both worlds – ADLS and Blob Storage including:
Performance: When making data-driven decisions, time is everything. ADLS Gen2 provides top-of-the-class storage performance resulting in less computing resources needed to extract data to be analyzed. This translates not only on gaining faster insights from data but also reduced costs.
Scalability: For big data analytics, this is one of the most important factors. The value of building on the scalability of Azure ADLS Gen2 is that it provides an elastic scalable environment that can easily adapt to the ever-increasing volume of data that needs to be analyzed.
Security: Keeping your data secure and integrity should be highly prioritized when working with cloud technologies. There are multiple security features provided when creating a data lake in Azure:
Azure Active Directory (AAD) integration provides users with seamless secure access to the apps they are working with in the Azure cloud. It also allows them to deploy, manage and monitor security policies across the entire environment.
A combination of Azure Role Based Access Control and POSIX ACLs provide flexible and elastic data access control.
TLS data encryption at rest and transit.
Hierarchical File System (HFS): allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on users’ computers is organized.
High Availability: read-access geo-redundant storage guarantees at least 99.99% of availability of your data. Geo-redundant storage replicates your data in different regions to ensure it is always available should a catastrophic event compromise the original storage location.
Virtually Limitless Storage: store files up to 5 TB, with unlimited overall storage capacity
Storage Tiers: blob tiers (Hot, Cool, Archive), each one of these tiers have been created to provide an efficient storage option based on how often you access your data and also access patterns. With hot, cool, and archive access tiers, Azure Blob storage addresses this need for differentiated access tiers with separate pricing models.
Upgrading or migrating to ADLS Gen2 would allow you to immediately start taking advantage of not just these features but also all the security and performance improvements that have been made available. However, we recommend that you make sure to frequently check the ADLS Gen2’s known issues page to make sure that there aren’t any issues with any of the features that you are considering to use.
One of the characteristics that makes ADLS so attractive is its cost effectiveness. Its no upfront cost, pay-per-use model allows users to pay only for data at rest and the number of gigabytes stored, as well as the number of transactions (read and write) over that data.
At the time of writing, ADLS Gen2 storage prices can range from $0.002/GB for just archive to $0.0184/GB for Hot storage. Transaction prices can range anywhere from $6.50 to $0.065 depending on the operation (read or write) and the type of storage that the operation is being performed on. Transaction prices will vary depending on whether the file structure needed will be a Hierarchical or Flat namespace.
To understand how your bill will look at the end of the month, first we need to understand how Azure will break down your analytics workloads. Azure breaks down items into “Transactions”. Transactions are defined as reads or writes of data on sizes varying from 128kb up to 4MB.
For example, if a user places a 9M item in ADLS, Azure will break this down into 4 different transactions: 4 MB + 4 MB + 1 MB. The monthly cost is calculated based on monthly usage volume (transactions) plus the storage used. An example of the cost breakdown for a common use case would be the following:
Say you have an application that writes data into ADLS at a rate of 10 items per second, each item being 4 MB, then another service that runs for 4 hours a day and reads 1000 items per second, then the monthly bill would be:
|Custom app||10 items/second 3600730||$0.05 per 10k transactions||$131.40|
|Reading job||1000 items/second 3600431||$0.04per 10k transactions||$178.56|
|Storage||3.4 Terabyte/Day||$0.0184 per GB||$1,939.36|
Who is ADLS Gen2 for?
Customers who are using ADLS Gen1, or customers who are using Azure Blob Storage, or both. Since ADLS Gen2 delivers the best of both worlds, current ADLS Gen1 users won’t see new features, so they can remain in ADLS Gen1 unless they need to use features associated with Blob Storage. The same applies for current Azure Blob Storage users, they can remain in their current environments and save on transaction costs. It is always a best practice to define first what the storage need is before selecting the service, for example, if a user only needs to store images, or back-up files, the simplicity of Azure Blob Storage might be all they need.
Dremio and ADLS
Dremio connects to data lakes like ADLS, Amazon S3, HDFS and more, putting all of your data in one place and providing it structure. We provide an integrated, self-service interface for data lakes, designed for BI users and data scientists. Dremio increases the productivity of these users by allowing them to easily search, curate, accelerate, and share datasets with other users. In addition, Dremio allows companies to run their BI workloads from their data lake infrastructure, removing the need to build cubes or BI extracts.
To learn more about how you can increase the value of your ADLS data using Dremio, check out or tutorial: Building a Cloud Data Lake on Azure with Dremio and ADLS