20 March 2021

🧭Business Intelligence: New Technologies, Old Challenges (Part II - ETL vs. ELT)

 

Business Intelligence

Data lakes and similar cloud-based repositories drove the requirement of loading the raw data before performing any transformations on the data. At least that’s the approach the new wave of ELT (Extract, Load, Transform) technologies use to handle analytical and data integration workloads, which is probably recommendable for the mentioned cloud-based contexts. However, ELT technologies are especially relevant when is needed to handle data with high velocity, variance, validity or different value of truth (aka big data). This because they allow processing the workloads over architectures that can be scaled with workloads’ demands.

This is probably the most important aspect, even if there can be further advantages, like using built-in connectors to a wide range of sources or implementing complex data flow controls. The ETL (Extract, Transform, Load) tools have the same capabilities, maybe reduced to certain data sources, though their newer versions seem to bridge the gap.

One of the most stressed advantages of ELT is the possibility of having all the (business) data in the repository, though these are not technological advantages. The same can be obtained via ETL tools, even if this might involve upon case a bigger effort, effort depending on the functionality existing in each tool. It’s true that ETL solutions have a narrower scope by loading a subset of the available data, or that transformations are made before loading the data, though this depends on the scope considered while building the data warehouse or data mart, respectively the design of ETL packages, and both are a matter of choice, choices that can be traced back to business requirements or technical best practices.

Some of the advantages seen are context-dependent – the context in which the technologies are put, respectively the problems are solved. It is often imputed to ETL solutions that the available data are already prepared (aggregated, converted) and new requirements will drive additional effort. On the other side, in ELT-based solutions all the data are made available and eventually further transformed, but also here the level of transformations made depends on specific requirements. Independently of the approach used, the data are still available if needed, respectively involve certain effort for further processing.

Building usable and reliable data models is dependent on good design, and in the design process reside the most important challenges. In theory, some think that in ETL scenarios the design is done beforehand though that’s not necessarily true. One can pull the raw data from the source and build the data models in the target repositories.

Data conversion and cleaning is needed under both approaches. In some scenarios is ideal to do this upfront, minimizing the effect these processes have on data’s usage, while in other scenarios it’s helpful to address them later in the process, with the risk that each project will address them differently. This can become an issue and should be ideally addressed by design (e.g. by building an intermediate layer) or at least organizationally (e.g. enforcing best practices).

Advancing that ELT is better just because the data are true (being in raw form) can be taken only as a marketing slogan. The degree of truth data has depends on the way data reflects business’ processes and the way data are maintained, while their quality is judged entirely on their intended use. Even if raw data allow more flexibility in handling the various requests, the challenges involved in processing can be neglected only under the consequences that follow from this.

Looking at the analytics and data integration cloud-based technologies, they seem to allow both approaches, thus building optimal solutions relying on professionals’ wisdom of making appropriate choices.

Previous Post <<||>>Next Post

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.