Azure Data Lake Ingestion

Data Ingestion is the lifeblood of any Data Lake Environment

In addition to using tools like Azure Data Factory, ExcelliMatrix uses our emFramework, as well as third-party ETL tools, to implement a solid Data Ingestion architecture. One that that lends to strong Data Governance and Monitoring.

Azure Data Lake Ingestion

One of the services used to ingest data into an Azure Data Lake is Azure Data Factory. But when building a robust Data Lake Environment you can blend Azure services together to result is much richer and vibrant solution. Our approach to data ingestion considers the data source and volatility of the data; from static to real-time. By using the right tool (service) for each source data set we can capitalize on the Azure services available to us including:

  • Azure Data Factory
  • Azure Event Grid
  • Azure Event Hub
  • Azure IoT Hub
  • Azure Service Bus
These Professional Services, along with some of the tooling we provide in our .Net / .Net Core framework (emFramework), allows us to deliver a solution that is truly enterprise class.

Overview

Ingesting data into an Azure Data Lake isn't complex, but it requires a strong understanding of the source data, the Azure architecture, and the outcome expected. However, when implementing each ingestion process it is important to ensure that several organizational needs are taken into consideration.

Performance

The performance needs of real-time data ingestion versus static is obviously different, but that being said too many people take the same approach for each. As part of each source data set being implemented, a design consideration for performance is a must.

Monitoring

To ensure the Data Lake Environment is function on a minute-to-minute basis, and ready to supports the organization's objectives, a strong approach to monitoring data ingestion is critical. Implementing an event centric ingestion architecture (the "Information Pipeline") allows for a rich set of options for in a publish-subscribing orientation. Our approach allows for a rich set of monitoring and alerting solutions otherwise unavailable or ignored within a typical implementation.

Repository Initialization (Bulk Loading)

By implementing an event centric data ingestion architecture the same ingestion pipeline can be used for standard data loading and data lake initialization (bulk data loading) with the key difference being scale during each of those classes of data load. In taking an event centric approach the standard data load can be tested as part of repository initialization. Additionally testing the scaling capabilities of the Data Lake Environment is also tested during repository initialization.

Our team knows the importance of the work we do for our clients. We know that our efforts have a direct impact on your productivity, profitability and success, so we take our tasks seriously! We look forward to providing your company with strong
ROI and value.