Azure Data Lake Architecture
Azure Data Lake is a set of services designed for Big Data Analytics.
- Azure Data Lake Storage—massively scalable, secure data lake functionality
HDInsight—cloud Apache Spark and Hadoop® service for the enterprise
- Apache Hadoop
- Apache Spark
- Apache Kafka
- Apache HBase
- Apache Hive LLAP
- Apache Storm
- ML Services (R, Python, ML.NET)
- Azure Data Factory (ADF)—fully managed cloud-based data integration
Azure Data Lake allows us to interface with your data using both file system and object storage paradigms.
Azure Data Lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.
While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.
Each data element in a Data Lake is assigned a unique identifier and tagged with a set of extended metadata tags.
When a business question arises, the Data Lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
Azure Data Lake analytics is a set of analytics services/tools built on Apache YARN that compliments Azure Data Lake storage.
The analytics service can handle jobs of any scale instantly with on-demand processing power and a pay-as-you-go model that's very cost effective for short-term or on-demand jobs.
It includes a scalable distributed runtime called U-SQL, a language that unifies the benefits of SQL with the expressive power of user code.