Data Lake

Updated at May 5th, 2025

System Architecture

+ More

As a data warehouse, the data lake architecture is based on the same notion of ingest‐ ing the operational systems’ data and transforming it into an analytical model. How‐ ever, there is a conceptual difference between the two approaches.

A data lake–based system ingests the operational systems’ data. However, instead of being transformed right away into an analytical model, the data is persisted in its raw form, that is, in the original operational model.

Eventually, the raw data cannot fit the needs of data analysts. As a result, it is the job of the data engineers and the BI engineers to make sense of the data in the lake and implement the ETL scripts that will generate analytical models and feed them into a data warehouse. Figure 16-10 depicts a data lake architecture.

Figure 16-10. Data lake architecture

Since the operational systems’ data is persisted in its original, raw form and is trans‐ formed only afterward, the data lake allows working with multiple, task-oriented ana‐ lytical models. One model can be used for reporting, another for training ML models, and so on. Furthermore, new models can be added in the future and initialized with the existing raw data.

That said, the delayed generation of analytical models increases the complexity of the overall system. It’s not uncommon for data engineers to implement and support mul‐ tiple versions of the same ETL script to accommodate different versions of the opera‐ tional model, as shown in Figure 16-11.

Figure 16-11. Multiple versions of the same ETL script accommodating different versions of the operational model

Furthermore, since data lakes are schema-less—there is no schema imposed on the incoming data—and there is no control over the quality of the incoming data, the data lake’s data becomes chaotic at certain levels of scale. Data lakes make it easy to ingest data but much more challenging to make use of it. Or, as is often said, a data lake becomes a data swamp. The data scientist’s job becomes orders of magnitude more complex to make sense of the chaos and to extract useful analytical data.

Data Lake

Contact Us

Related Articles