Get Started Free
No time limit - totally free - just the way you like it.Sign Up Now
The second generation of ADLS, also known as ADLS Gen2, brings together all the great features of ADLS Gen1 and Azure Blob Storage. ADLS Gen2 can be seen by users as a superset of ADLS Gen1, which we talked about in our ADLS explainer.
Described by Microsoft as a “no-compromise data lake”, ADLS Gen 2 extends the capabilities of Azure Blob Storage and is optimized for large scale analytics workloads. Users can store data once and access it through existing blob storage and HDFS-compliant file system interfaces, with no programming changes or data copying when doing database operations.
Up to this point, if you were going to use Azure Storage for your analytics workloads, the first question that you had to answer was: “Do I need to use ADLS or Blob Storage?” Here’s a comparison of their features:
|ADLS Gen1||Blob Storage|
There’s a trade-off between both solutions, with neither fully covering the myriad of large scale analytics workloads that we commonly see. The data lake unification that ADLS Gen2 provides allows users to take advantage of the best of both in the same place.
When setting up your Azure account, Azure won’t explicitly ask you if you want to set up and ADLS Gen2 account – you basically will be setting up a “Storage Account” that will be enabled to support ADLS Gen2. Many users might find that confusing. But a new Storage Account will generally be ADLS Gen2.
Gen2 of Azure Data Lake Storage includes the best of both worlds – ADLS and Blob Storage:
Performance: When making data-driven decisions, time is everything. ADLS Gen2 provides best-of-class storage performance, resulting in less computing resources needed to extract data to be analyzed. This means faster insights from data but also reduced costs. Scalability: For big data analytics, this is one of the most important factors. Azure ADLS Gen2 provides an elastic, scalable environment that can easily adapt to the ever-increasing volume of data that needs to be analyzed. Security: Keeping your data secure should be highly prioritized when working with cloud technologies. There are multiple security features provided when creating a data lake in Azure:
Upgrading or migrating to ADLS Gen2 allows you to immediately start taking advantage not just of these features, but also of all Gen2’s security and performance improvements. However, we recommend that you check ADLS Gen2’s known issues page to make sure that there aren’t any issues with any of the features that you are thinking of using.
One of the characteristics that makes ADLS so attractive is its cost effectiveness. Its no upfront cost, pay-per-use model allows users to pay only for data at rest and the number of gigabytes stored, as well as the number of transactions (read and write) over that data.
ADLS Gen2 storage prices can range from $0.002/GB for archive to $0.0184/GB for “Hot” storage. Transaction prices can range anywhere from $6.50 to $0.065 depending on the operation (read or write) and the type of storage that the operation is being performed on. Transaction prices will vary depending on whether the file structure needed is a Hierarchical or Flat namespace.
To understand how your bill will look at the end of the month, first we need to understand how Azure will break down your analytics workloads. Azure breaks down items into “Transactions”. Transactions are defined as reads or writes of data on sizes varying from 128kb up to 4MB.
For example, if a user places a 9M item in ADLS, Azure will break this down into 3 different transactions: 4 MB + 4 MB + 1 MB. The monthly cost is calculated based on monthly usage volume (transactions) plus the storage used. An example of the cost breakdown for a common use case would be the following:
Say you have an application that writes data into ADLS at a rate of 10 items per second, each item being 4 MB, then another service that runs for 4 hours a day and reads 1000 items per second, then the monthly bill would be:
Note: For this example we will use the following time period parameters:
|Custom app||10 items/second x 3600 x 730||$0.05 every 10k transactions||$131.40|
|Reading job||1000 items/second x 3600 x 4 x 31||$0.004 every 10k transactions||$178.56|
|Storage||3.4 Terabyte/Day||$0.0184 per GB||$1,939.36|
Dremio connects to data lakes like ADLS, Amazon S3, HDFS and more, putting all of your data in one place and providing it structure. We provide an integrated, self-service interface for data lakes, designed for BI users and data scientists. Dremio increases the productivity of these users by allowing them to easily search, curate, accelerate, and share datasets with other users. In addition, Dremio allows companies to run their BI workloads from their data lake infrastructure, removing the need to build cubes or BI extracts.
To learn more about how you can increase the value of your ADLS data using Dremio, check out or tutorial:Building a Cloud Data Lake on Azure with Dremio and ADLS