As the name suggests, a data lakehouse architecture combines a data lake and a data warehouse. Although it is not just a mere integration between the two, the idea is to bring the best out of the two architectures: the reliable transactions of a data warehouse and the scalability and low cost of a data […]
The Semantic Layer The semantic layer is a business representation of corporate data for end users. In most data architectures, the semantic layer sits between your data store (like data warehouse and data lake) and consumption tools for your end users. By representing data in a business-friendly format, data analysts can create meaningful dashboards and […]
Data Lineage Definition: Data lineage refers to the data’s “line of descent.” In other words, it’s a record of how data got to a specific location and the intermediate steps and transformations that took place as it traveled through business systems. For organizations that depend on data, understanding where data comes from, evaluating its quality, […]
Data mesh is a decentralized approach to data management that focuses on domain-driven design (DDD). It aims to bring data closer to business units or domains, where people are responsible for generating, governing, and treating the data as a product. A Data Mesh is an architectural approach to designing data-driven applications. It provides a way […]
To make data available to data consumers like analysts for analytics and reporting, businesses need to aggregate data sources. Data virtualization and data lakes are popular approaches to breaking down data silos and providing centralized data access. Your approach can significantly impact scalability, cost, and performance, so it’s important to understand the differences.
A data pipeline moves data between systems. Data pipelines involve a series of data processing steps to move data from source to target. These steps may involve copying data, moving it from an on-premises system to the cloud, standardizing it, joining it with other data sources, and more.
An enterprise data warehouse (EDW) is a database that centralizes all of a company’s data in one place for reporting. The information kept in an EDW typically originates in operational systems, such as ERP, CRM, and HR systems. The EDW empowers companies to aggregate and structure this data in a format that teams and employees […]
If your organization depends on data, you need a place to store it. Not only that — you need the right kind of data storage and management solution for the data you use and produce. Most organizations find that a data warehouse or data lake meets their needs. Many even use both. Data lakes and […]
If you’ve ever discussed data warehousing, you’ve probably heard the term “ETL.” It refers to processes that allow businesses to access data, modify it, and store it. Organizations use ETL for a variety of reasons, including the efficient management of data and the ability to run business intelligence (BI) against their data. There are several […]
If you prefer videos over written text, here’s a recording of a presentation of this content In this article, we’ll go through: ✅ What Iceberg is ❌ What Iceberg is not – A table format specification- A set of APIs and libraries for engines to interact with tables following that specification – A storage engine- An […]
AWS Glue Architecture AWS Management Console Defines AWS Glue objects such as crawlers, jobs, tables, and connections Sets up a layout for crawlers to work Designs events and timetables for job triggers Searches and filters AWS Glue objects Edits scripts for transformation scenarios AWS Glue Data Catalog AWS Glue Data Catalog provides centralized uniform metadata […]