Showing posts with label refactoring. Show all posts
Showing posts with label refactoring. Show all posts

17 February 2024

Business Intelligence: Microsoft Fabric's Notebooks

Business Intelligence Series
Business Intelligence Series 

When several technologies make their entrance in a data-related field like Data Warehousing, Data Analitics or Data Science, one is forced to understand how the respective technologies can be used or misused, respectively what's their place in the bigger picture. Microsoft Fabric introduces several important technologies that will change the way data are stored, processed and consumed. 

The first important technology is the notebook - a web document-like cell-based container for writing and executing code in a collaborative manner. The concept is not new, Jupyter notebooks have been around for almost a decade. In Microsof Fabric, notebooks support multiple languages, from which a default one applies to the whole notebook, while on cell level any of the supported languages can be used. 

One can execute a single cell, multiple cells or the entire notebook in a sequential manner, mix languages for the various operations - load, transform, save, and visualize data when needed. Notebooks can be parametrized and run via the homonymous activity in Data Factory pipelines, automating thus data processing. Probably more functionality is to come. 

Data engineers seems to have great flexibility, though usually flexibility implies constraints and/or mischiefs in other areas. I see for example in presentations the overuse of temporary data objects (mainly views) in Spark SQL as part of complex logic. That's acceptable during prototyping, though such code becomes a danger as soon the logic is deployed into production. Data objects should be created outside of the logic that uses them and should be treated as artifacts, with version control and proper documentation. It's maybe true that temporary objects reduce the volume of objects in the metastore, though is this the way to go?

Temporary objects tend to lead to wheel's reinvention or they get duplicated across multiple notebooks, which can easily create a maintenance nightmare. One needs to consider that the business logic changes a lot, the requirements and the data sources change, and on the long term, the cost of maintaining the code can easily overweight the benefits. 

Notebooks remind me of the beginnings of web programming when HTML was mixed up with client scripting languages like VB Script or Javascript, CSS, respectively server-side scripting languages. It was kind of a spaghetti code, modified repeatedly by multiple programmers, unendingly duplicated, and through a miracle it worked, until it stopped working unexpectedly in strangest situations. The strangest part was when after removing  commented code from a section made the code run again. 

The debugging of another person's code was a nightmare. Code developed by two people for similar purposes was looking unrecognizable different in terms of structure, programming techniques and layout. The technical debt was high, increasing in exponential manner. One was aware that the code needed refactoring, though there were more important things to do or no time allocated for it.

In the meantime the maturity of programming languages, frameworks, methodologies, best practices, and hopefully of programmers improved the overall quality of software (at least on average). Thinking of software from an Engineer's perspective improved the efficiency and effectiveness of a programmer's endeavor. The average programmer is able to write quality code, though there's a considerable minimum of "engineering" knowledge involved beside the mere knowledge of languages and tools. 

Notebooks are good up to a point, beyond which one needs to take a step back, restructure, move the code where it belongs, take a few more steps back and review the good practices and their application, disseminate the knowledge inside the team and use it in the next iterations, respectively refractor the code when needed! Hopefully, people learned from the mistakes of the past. 

Resources:
[1] Microsoft Learn (2023) How to use Microsoft Fabric notebooks (link

03 October 2006

Girish Suryanarayana - Collected Quotes

"A software design can be described as a collection of design decisions. These design decisions include decisions about what classes should be included, how classes should behave, and how they should interact with each other. Each and every design decision is influenced by previously made design decisions, constraints on the design, and the requirements. In turn, every design decision also impacts the design; it can narrow down the set of future design decisions considerably or widen the scope of possible design decisions. In other words, each and every design decision impacts and even changes the context of the design." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"An important enabling technique to effectively apply the principle of abstraction is to assign single and meaningful responsibility for each abstraction. In particular, the Single Responsibility Principle says that an abstraction should have a single well-defined responsibility and that responsibility should be entirely encapsulated within that abstraction. An abstraction that is suffering from Multifaced Abstraction has more than one responsibility assigned to it, and hence violates the principle of abstraction." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"Given the significance of technical debt, it is important to be able to measure it. A prerequisite for measuring something is to be able to quantify it. In this context, technical debt is very difficult to quantify accurately. There are two main reasons for this. First, there is no clear consensus on the factors that contribute to technical debt. Second, there is no standard way to measure these factors." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"How does a smell manifest in design? A smell occurs as a result of a combination of one or more design decisions. In other words, the design ecosystem itself is responsible for the creation of the smell. The presence of the smell in turn impacts the ecosystem in several ways. First, it is likely that the presence of the smell triggers new design decisions that are needed to address the smell! Second, the smell can potentially influence or constrain future design decisions as a result of which one or more new smells may manifest in the ecosystem. Third, smells also tend to have an effect on other smells. For instance, some smells amplify the effects of other smells, or co-occur with or act as precursors to other smells. Clearly, smells share a rich relationship with the ecosystem in which they occur." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"Technical debt is the debt that accrues when you knowingly or unknowingly make wrong or non-optimal design decisions. [...] when a software developer opts for a quick fix rather than a proper well-designed solution, he introduces technical debt. It is okay if the developer pays back the debt on time. However, if the developer chooses not to pay or forgets about the debt created, the accrued interest on the technical debt piles up, just like financial debt, increasing the overall technical debt. The debt keeps increasing over time with each change to the software; thus, the later the developer pays off the debt, the more expensive it is to pay off. If the debt is not paid at all, then eventually the pile-up becomes so huge that it becomes immensely difficult to change the software. In extreme cases, the accumulated technical debt is so huge that it cannot be paid off anymore and the product has to be abandoned. Such a situation is called technical bankruptcy." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"The primary intent behind the principle of encapsulation is to separate the interface and the implementation, which enables the two to change nearly independently. This separation of concerns allows the implementation details to be hidden from the clients who must depend only on the interface of the abstraction. If an abstraction exposes implementation details to the clients, it leads to undesirable coupling between the abstraction and its clients, which will impact the clients whenever the abstraction needs to change its implementation details. Providing more access than required can expose implementation details to the clients, thereby, violating the 'principle of hiding'." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"The principle of abstraction advocates the simplification of entities through reduction and generalization: reduction is by elimination of unnecessary details and generalization is by identification and specification of common and important characteristics." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"The principle of encapsulation advocates separation of concerns and information hiding through techniques such as hiding implementation details of abstractions and hiding variations." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"The principle of hierarchy advocates the creation of a hierarchical organization of abstractions using techniques such as classification, generalization, substitutability, and ordering." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"The principle of modularization advocates the creation of cohesive and loosely coupled abstractions through techniques such as localization and decomposition." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

"Why is the interest compounding in nature for technical debt? One major reason is that often new changes introduced in the software become interwoven with the debt-ridden design structure, further increasing the debt. Further, when the original debt remains unpaid, it encourages or even forces developers to use 'hacks' while making changes, which further compounds the debt." (Girish Suryanarayana et al, "Refactoring for Software Design Smells: Managing Technical Debt", 2015)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.