Showing posts with label Software Engineering. Show all posts
Showing posts with label Software Engineering. Show all posts

13 June 2024

Business Intelligence: One Person Can’t Learn or Do Everything in Microsoft Fabric

Business Intelligence Series
Business Intelligence Series

Today’s Explicit Measures webcast [1] considered an article written by Kurt Buhler (The Data Goblins): [Microsoft] "Fabric is a Team Sport: One Person Can’t Learn or Do Everything" [2]. It’s a well-written article that deserves some thought as there are several important points made. I can’t say I agree with the full extent of some statements, even if some disagreements are probably just a matter of semantics.

My main disagreement starts with the title “One Person Can’t Learn or Do Everything”. As clarified in webcast's chat, the author defines “everything" as an umbrella for “all the capabilities and experiences that comprise Fabric including both technical (like Power BI) or non-technical (like adoption data literacy) and everything in between” [1].

For me “everything” is relative and considers a domain's core set of knowledge, while "expertise" (≠ "mastery") refers to the degree to which a person can use the respective knowledge to build back-to-back solutions for a given area. I’d say that it becomes more and more challenging for beginners or average data professionals to cover the core features. Moreover, I’d separate the non-technical skills because then one will also need to consider topics like Data, Project, Information or Knowledge Management.

There are different levels of expertise, and they can vary in depth (specialization) or breadth (covering multiple areas), respectively depend on previous experience (whether one worked with similar technologies). Usually, there’s a minimum of requirements that need to be covered for being considered as expert (e.g. certification, building a solution from beginning to the end, troubleshooting, performance optimization, etc.). It’s also challenging to roughly define when one’s expertise starts (or ends), as there are different perspectives on the topics. 

Conversely, the term expert is in general misused extensively, sometimes even with a mischievous intent. As “expert” is usually considered an external consultant or a person who got certified in an area, even if the person may not be able to build solutions that address a customer’s needs. 

Even data professionals with many years of experience can be overwhelmed by the volume of knowledge, especially when one considers the different experiences available in MF, respectively the volume of new features released monthly. Conversely, expertise can be considered in respect to only one or more MF experiences or for one area within a certain layer. Lot of the knowledge can be transported from other areas – writing SQL and complex database objects, modelling (enterprise) semantic layers, programming in Python, R or Power Query, building data pipelines, managing SQL databases, etc. 

Besides the standard documentation, training sessions, and some reference architectures, Microsoft made available also some labs and other material, which helps discovering the features available, though it doesn’t teach people how to build complete solutions. I find more important than declaring explicitly the role-based audience, the creation of learning paths for the various roles.

During the past 6-7 months I've spent on average 2 days per week learning MF topics. My problem is not the documentation but the lack of maturity of some features, the gaps in functionality, identifying the respective gaps, knowing what and when new features will be made available. The fact that features are made available or changed while learning makes the process more challenging. 

My goal is to be able to provide back-to-back solutions and I believe that’s possible, even if I might not consider all the experiences available. During the past 22 years, at least until MF, I could build complete BI solutions starting from requirements elicitation, data extraction, modeling and processing for data consumption, respectively data consumption for the various purposes. At least this was the journey of a Software Engineer into the world of data. 

References:
[1] Explicit Measures (2024) Power BI tips Ep.328: Microsoft Fabric is a Team Sport (link)
[2] Data Goblins (2024) Fabric is a Team Sport: One Person Can’t Learn or Do Everything (link)

16 March 2024

Business Intelligence: A Software Engineer's Perspective VII (Think for Yourself!)

Business Intelligence
Business Intelligence Series

After almost a quarter-century of professional experience the best advice I could give to younger professionals is to "gather information and think for themselves", and with this the reader can close the page and move forward! Anyway, everybody seems to be looking for sudden enlightenment with minimal effort, as if the effort has no meaning in the process!

In whatever endeavor you are caught, it makes sense to do upfront a bit of thinking for yourself - what's the task, or more general the problem, which are the main aspects and interpretations, which are the goals, respectively the objectives, how a solution might look like, respectively how can it be solved, how long it could take, etc. This exercise is important for familiarizing yourself with the problem and creating a skeleton on which you can build further. It can be just vague ideas or something more complex, though no matter the overall depth is important to do some thinking for yourself!

Then, you should do some research to identify how others approached and maybe solved the problem, what were the justifications, assumptions, heuristics, strategies, and other tools used in sense-making and problem solving. When doing research, one should not stop with the first answer and go with it. It makes sense to allocate a fair amount of time for information gathering, structuring the findings in a reusable way (e.g. tables, mind maps or other tools used for knowledge mapping), and looking at the problem from the multiple perspectives derived from them. It's important to gather several perspectives, otherwise the decisions have a high chance of being biased. Just because others preferred a certain approach, it doesn't mean one should follow it, at least not blindly!

The purpose of research is multifold. First, one should try not to reinvent the wheel. I know, it can be fun, and a lot can be learned in the process, though when time is an important commodity, it's important to be pragmatic! Secondly, new information can provide new perspectives - one can learn a lot from other people’s thinking. The pragmatism of problem solvers should be combined, when possible, with the idealism of theories. Thus, one can make connections between ideas that aren't connected at first sight.

Once a good share of facts was gathered, you can review the new information in respect to the previous ones and devise from there several approaches worthy of attack. Once the facts are reviewed, there are probably strong arguments made by others to follow one approach over the others. However, one can show that has reached a maturity when is able to evaluate the information and take a decision based on the respective information, even if the decision is not by far perfect.

One should try to develop a feeling for decision making, even if this seems to be more of a gut-feeling and stressful at times. When possible, one should attempt to collect and/or use data, though collecting data is often a luxury that tends to postpone the decision making, respectively be misused by people just to confirm their biases. Conversely, if there's any important benefit associated with it, one can collect data to validate in time one's decision, though that's a more of a scientist’s approach.

I know that's easier to go with the general opinion and do what others advise, especially when some ideas are popular and/or come from experts, though then would mean to also follow others' mistakes and biases. Occasionally, that can be acceptable, especially when the impact is neglectable, however each decision we are confronted with is an opportunity to learn something, to make a difference! 

Previous Post <<||>> Next Post

04 March 2024

Business Intelligence: A Software Engineer's Perspective VI (The Data Citizen)

Business Intelligence
Business Intelligence Series

More than a century ago, Jerbert G Wells wrote on mathematical literacy: "[...] the time may not be very remote when it will be understood that for complete initiation as an efficient citizen of one of the new great complex world-wide States that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write” [1]. The quote is occasionally misquoted as referring to Statistics, though frankly the boundaries of mathematical, statistical, numerical and data literacy tend to melt into each other, existing multiple dependencies between them.

In the age of big data, data citizens, business people able to use data, data processing and visualization tools for building solutions that enable their job, become steadily a necessity for businesses in their quest of making data-driven decisions, gaining insight and whatever valuable use data might have for the organizations. The need is not new,  Microsoft Access and Excel were used for similar purposes already in the 90s, becoming a maintenance nightmare for IT, data islands without proper backup or documentation existing through the organizations, diverse numbers being reported and contradicting each other. 

Then IT took over, trying to find alternatives for the data islands, implementing concepts like single source(s) of truth, quality gates and supporting processes, designing data models and infrastructures for self-service, allowing users to tap into the data for data exploration, discovery, reporting, etc. Getting all this right required to redesign existing infrastructures, making one step forward and a few steps back, in the end everything is a learning process. Such an effort can easily consume an organization's resources. 

Microsoft and other vendors for data-driven solutions keep insisting on how much potential exist in their tools for the data citizen, how the citizens can bring competitive advantage for organizations, automating business and supporting processes. The potential is not to neglect, though it requires a considerable investment from organizations in training and mentoring data citizens, in building data warehouses or data meshes that focus on end-user self-service needs. The data citizen needs time to learn, to play with the data, build solutions, test their usefulness in the daily tasks, respectively incorporate and disseminate the knowledge gained within the organization. 

There are many scenarios in which results can be obtained with a minimum of effort, however there are also hard limits. Besides the learning effort and the time available, there are cognitive, knowledge and ability limits that vary from person to person. Understanding what good architecture, design and techniques means is unfortunately not for everybody, and here's where the concept of citizen data analyst or citizen scientist breaks, and this independently of the tools used. There are also IT people who have similar challenges. 

It must be also recognized that the solutions built in the early stages by data citizens are primarily personal solutions that need to be reviewed and brought to the standards adopted by the organization. In time, it's expected to reduce considerably such effort by evolving data citizen's knowledge and skillset. Without this further work, the solutions built will tend to display some of the shortcomings of the solutions built on MS Access or Excel

The concept of data citizen can work as long the various assumptions and needs are adequately addressed, however progress will not happen overnight. The effort needs to become part of organization's long-term strategy, and the effort can be considerable for many organizations. Mentorship in terms of technical and non-technical support is needed. It's advisable to proceed in small iterative steps and integrate gradually the lessons learned.

Previous Post <<||>> Next Post

Resources:

[1] “Mankind in the Making”, by Herbert G Wells, 1903 [Source]

28 February 2024

Business Intelligence: A Software Engineer's Perspective V (From Process Management to Mental Models in Knowledge Gaps)

Business Intelligence Series
Business Intelligence Series 

An organization's business processes are probably one of its most important assets because they reflect the business model, philosophy and culture, respectively link the material, financial, decisional, informational and communicational flows across the whole organization with implication in efficiency, productivity, consistency, quality, adaptability, agility, control or governance. A common practice in organizations is to document the business-critical processes and manage them accordingly over their lifetime, making sure that the employees understand and respect them, respectively improve them continuously. 

In what concerns the creation of data artifacts, data without the processual context are often meaningless, no matter how much a data professional knows about data structures/models. Processes allow to delimit the flow and boundaries of data, respectively delimit the essential from non-essential. Moreover, it's the knowledge of processes that allows to reengineer the logic behind systems especially when no proper documentation about the logic is available. 

Therefore, the existence of documented processes allows to bridge the knowledge gaps existing on the factual side, and occasionally also on the technical side. In theory, the processes should provide a complete overview of the procedures, rules, policies and responsibilities existing in the organization, respectively how the business operates. However, even if people tend to understand how the world works locally, when broken down into parts, their understanding is systemically flawed, missing the implications of causal relationships that span time with delays, feedback, variable confusion, chaotic behavior, and/or other characteristics borrowed from the vocabulary of complex systems.  

Jay W Forrester [3], Peter M Senge [1], John D Sterman [2] and several other systems-thinking theoreticians stressed the importance of mental models in making-sense about the world especially in setups that reflect the characteristics of complex systems. Mental models frame our experience about the world in congruent mental constructs that are further used to think, understand and navigate the world. They are however tacit, fuzzy, incomplete, imprecisely stated, inaccurate, evolving simplifications with dual character, enabling on one side, while impeding on the other side cognitive processes like sense-making, learning, thinking or decision-making, limiting the range of action to what is familiar and comfortable. 

On one side one of the primary goals of Data Analytics is to provide new insights, while on the other side the new insights fail to be recognized and put into practice because they conflict with existing mental models, limiting employees to familiar ways of thinking and acting. 

Externalizing and sharing mental models allow besides making assumptions explicit and creating a world view also to strategize, make tests and simulations, respectively make sure that the barriers and further constraints don't impact the decisional process. Sange goes further and advances that mental models, especially at management level, offer a competitive advantage, allowing to maintain coherence and direction, people becoming more perceptive and responsive about environmental or circumstance changes.

The whole process isn't about creating a unique congruent mental model, even if several mental models may converge toward one or more holistic models, but of providing different diverse perspectives and enabling people to make leaps in abstraction (by moving from direct observations to generalizations) while blending advocacy and inquiry to promote collaborative learning. Gradually, people and organizations should recognize a shift from mental models dominated by events to mental models that recognize longer-tern patterns of change and the underlying structures producing those patterns [1].

Probably, for many the concept of mental models seems to be still too abstract, respectively that the effort associated with it is unnecessary, or at least questionable on whether it can make a difference. Conversely, being aware of the positive and negative implications the mental models hold, can makes us explore, even if ad-hoc, the roads they open.

Previous Post <<||>> Next Post

Resources:
[1] Peter M Senge (1990) The Fifth Discipline: The Art & Practice of The Learning Organization
[2] John D Sterman (2000) "Business Dynamics: Systems thinking and modeling for a complex world"
[3] Jay W Forrester (1971) "Counterintuitive Behaviour of Social Systems", Technology Review

21 February 2024

Business Intelligence: A Software Engineer's Perspective IV (The Loom of Interactions)

Business Intelligence Series
Business Intelligence Series 

The process of developing or creating a report is quite simple - there's a demand for data, usually a business problem, the user (aka requestor) defines a set of requirements, the data professional writes one or more queries to address the requirements, which are then used to build one or more reports. The report(s) is/are reviewed by the requestor and with this the process should be over in most of the cases. However, this is rather the exception - a long series of changes over multiple iterations are usually necessary, the queries and the reports get modified and even rewritten until they reach the final form, lot of effort being wasted in the process on both sides.

Common practices for improving the process behind resume to assuring that the requirements are complete and understood upfront, that best practices are followed, that the user gets an early review of the work and that there's a continuous communication, that process' performance is monitored, that controls are in place, etc. Standardizing the process helps to reduce the number of iterations, but only by a factor. Unfortunately, the bigger issue - the knowledge gap - is often ignored.

There's lot of literature on problem solving, on what steps to follow, on how to define the problem, what aspects should be considered, etc. Recipes are good when one knows how to follow them, respectively how to cook, and that can be a tedious process. It is said that framing the right problem is half the way to its solving, and that's so true. Part of the bigger issue is that users need data to better understand the problem, however the drives can be different - sometimes is problem's complexity, while other times the need is apparent, only with the first set of data the users start thinking seriously about the problem. 

So, the first major gap is between the problem and user's knowledge about the problem. Experience and theory can help reduce the gap, however the most important progress comes when the user understands the data behind the various processes that overlap with the problem. Sometimes, it's enough to explore the data visually, while other times deeper explorations are needed. Data literacy is important, though more important are the exposure to the data and problems of different variety and complexity, respectively having the time for this. 

The second gap concerns the data professional - building the data model and the logic for the report requires domain knowledge. The level of knowledge depends from case to case, and typically what one doesn't know has the biggest impact. A data professional can help to the degree of the information, respectively knowledge he has about the business. The expectation to provide a report based on a set of fields might be valid for simple requirements, though the more complex a problem, the more domain knowledge is needed. Moreover, the data professional might need to reengineer the logic from the source system, which can prove challenging only by looking at the data.

Ideally, the two parties should work together starting with problem's framing and build common ground while covering the knowledge gaps on both sides. Of course, the user doesn't need to dive into the technical knowledge unless the organization leverages this interaction further by adopting the data citizen mindset. Such interactions can help to build trust, respectively a basis for further collaboration. Conversely, the more isolated the two parties, the higher the chances for more iterations to occur. 

Covering the knowledge gaps might look like a redistribution of the effort, though by keeping the status quo there is little chance for growth!

18 February 2024

Business Intelligence: A Software Engineer's Perspective III (More of a One-Man Show)

Business Intelligence Series
Business Intelligence Series 

Probably, in some organizations there are still recounted stories about a hero who knew so much about the business and was technically proficient that he/she was able to provide data-driven answers to most business questions. Unfortunately, the times of solo representations are for long gone - the world moves too fast, there are too many questions looking for an answer, many of them requiring a solution before the problem was actually defined, a whole infrastructure is needed to be able to harness the potential of  technologies and data, the volume of knowledge required grows exponentially, etc. 

One of the approaches of handling the knowledge gap between the initial and required knowledge in solving problems based on data is to build all the required knowledge in one person, either on the business or the technical side. More common is to hire a data analyst and build the knowledge in the respective resource, and the approach has great chances to work until the volume of work exceeds a person's limits. The data analyst is forced to request to have the workload prioritized, which might work in certain occasions, while in others one needs to compromise on quality and/or do overtime, and all the issues deriving from this. 

There are also situations in which the complexity of the problem exceeds a person's ability to handle it, and that's not necessarily a matter of intelligence but of knowhow. Some organizations respond with complexity to complexity, while others are more creative and break the complexity in manageable pieces. In both cases, more resources are needed to cover the knowledge and resource gap. Hiring more data analysts can get the work done though it's not a recipe for success. The more diverse the team, the higher the chances to succeed, though again it's a matter of creativity and of covering the knowledge gaps. Sometimes, it's more productive to use the resources already available in organization, though this can involve other challenges. 

Even if much of the knowledge gets documented, as soon the data analyst leaves the organization a void is created until a similar resource is able to fill it. Organizations can better cope with these challenges if they disseminate the knowledge between data professionals respectively within the business. The more resources are involved the higher the level of retention and higher the chances of reusing the knowledge. However, the more people are involved, the higher the costs, especially the one associated with the waste of effort. 

Organizations can compromise by choosing 1-2 resources from each department to be involved in knowledge dissemination, ideally people with data and technology affinity. They shall become data citizens, people who use  data, data processing and visualization for building solutions that enable their job. Data citizens are expected to act as showmen in their knowledge domain and do their magic whenever such requirements arise.

Having a whole team of data citizens opens new opportunities for organizations, though such resources will need beside domain knowledge and data literacy also technical knowledge. Unfortunately, many people will reach their limitations in this area. Besides the learning effort, understanding what good architecture, design and techniques means is unfortunately not for everybody, and here's where the concept of citizen data analyst or citizen scientist breaks, and this independently of the tools used.

A data citizen's effort works best in data discovery, exploration and visualization scenarios where the rapid creation of prototypes reduces the time from idea to solution. However, the results are personal solutions that need to be validated by a technical person, pieces of the solutions maybe redesigned and moved until enterprise solutions result.

Previous Post <<||>> Next Post

17 February 2024

Business Intelligence: A Software Engineer's Perspective II (Major Knowledge Gaps)

Business Intelligence Series
Business Intelligence Series

Solving a problem requires a certain degree of knowledge in the areas affected by the problem, degree that varies exponentially with problem's complexity. This requirement applies to scientific fields with low allowance for errors, as well as to business scenarios where the allowance for errors is in theory more relaxed. Building a report or any other data artifact is closely connected with problem solving as the data artifacts are supposed to model the whole or parts of what is needed for solving the problem(s) in scope.

In general, creating data artifacts requires: (1) domain knowledge - knowledge of the concepts, processes, systems, data, data structures and data flows as available in the organization; (2) technical knowledge - knowledge about the tools, techniques, processes and methodologies used to produce the artifacts; (3) data literacy - critical thinking, the ability to understand and explore the implications of data, respectively communicating data in context; (4) activity management - managing the activities involved. 

At minimum, creating a report may require only narrower subsets from the areas mentioned above, depending on the complexity of the problem and the tasks involved. Ideally, a single person should be knowledgeable enough to handle all this alone, though that's seldom the case. Commonly, two or more parties are involved, though let's consider the two-parties scenario: on one side is the customer who has (in theory) a deep understanding of the domain, respectively on the other side is the data professional who has (in theory) a deep understanding of the technical aspects. Ideally, both parties should be data literates and have some basic knowledge of the other party's domain. 

To attack a business problem that requires one or more data artifacts both parties need to have a common understanding of the problem to be solved, of the requirements, constraints, assumptions, expectations, risks, and other important aspects associated with it. It's critical for the data professional to acquire the domain knowledge required by the problem, otherwise the solution has high chances to deviate from the expectations. The general issue is that there are multiple interactions that are iterative. Firstly, the interactions for building the needed common ground. Secondly, the interaction between the problem and reality. Thirdly, the interaction between the problem and parties’ mental models und understanding about the problem. 

The outcome of these interactions is that the problem and its requirements go through several iterations in which knowledge from the previous iterations are incorporated successively. With each important piece of knowledge gained, it's important to revise and refine the question(s), respectively the problem. If in each iteration there are also programming and further technical activities involved, the effort and costs resulted in the process can explode, while the timeline expands accordingly. 

There are several heuristics that could be devised to address these challenges: (1) build all the required knowledge in one person, either on the business or the technical side; (2) make sure that the parties have the required knowledge for approaching the problems in scope; (3) make sure that the gaps between reality and parties' mental models is minimal; (4) make sure that the requirements are complete and understood before starting the development; (5) adhere to methodologies that accommodate the necessary iterations and endeavor's particularities; (6) make sure that there's a halt condition for regularly reviewing the progress, respectively halting the work; (7) build an organizational culture to support all this. 

The list is open, and the heuristics aren't exclusive, so in theory any combination of them can be considered. Ideally, an organization should reflect all these heuristics in one form or another. The higher the coverage, the more mature the organization is. The question is how organizations with a suboptimal setup can change the status quo?

Previous Post <<||>> Next Post

Business Intelligence: A Software Engineer's Perspective I (Houston, we have a Problem!)

Business Intelligence Series
Business Intelligence Series

One of the critics addressed to the BI/Data Analytics, Data Engineering and even Data Science fields is their resistance to applying Software Engineering (SE) methods in practice. SE can be regarded as the application of sound methods, methodologies, techniques, principles, and practices to obtain high quality economic software in a reproducible manner. At minimum, should be applied SE techniques and practices proven to work, for example the use of best practices, reference technologies, standardized processes for requirements gathering and management, etc. This doesn't mean that one should apply the full extent of SE but consider a minimum that makes sense to adopt.

Unfortunately, the creation of data artifacts (queries, reports, data models, data pipelines, data visualizations, etc.) as process seem to be done after the principle of least action, though least action means here the minimum interaction to push pieces on a board rather than getting the things done. At high level, the process is as follows: get the requirements, build something, present results, get more requirements, do changes, present the results, and the process is repeated ad infinitum.

Given that data artifact's creation finds itself at the intersection of two or more knowledge areas in which knowledge is exchanged in several iterations between the parties involved until a common ground is achieved, this process is totally inefficient from multiple perspectives. First of all, it takes considerably more time than planned to reach a solution, resources being wasted in the process, multiple forms of waste being involved. Secondly, the exchange and retention of knowledge resulting from the process is minimal, mainly on a need by basis. This might look as an efficient approach on the short term, but is inefficient overall.

BI reflects the general issues from SE - most of the issues can be traced back to requirements - if the requirements are incorrect and there's no magic involved in between, then one can't expect for the solution to be correct. The bigger the difference between the initial and final requirements elicited in the process, the more resources are wasted. The more time passes between the start of the development phase and the time a solution is presented to the customer, the longer it takes to build the final solution. Same impact have the time it takes to establish a common ground and other critical factors for success involved in the process.

One can address these issues through better requirements elicitation, rapid prototyping, the use of agile methodologies and similar approaches, though the general feeling is that even if they bring improvements, they don't address the root causes - lack of data literacy skills, lack of knowledge about the business, lack of maturity in planning and executing tasks, the inexistence of well-designed processes and procedures, respectively the lack of an engineering mindset.

These inefficiencies have low impact when building a report occasionally, though they accumulate and tend to create systemic issues in what concerns the overall BI effort. They are addressed locally by experts and in general through a strategic approach like the elaboration of a BI strategy, though organizations seldom pay attention to them. Some organizations consider that they are automatically addressed as part of the data culture, though data culture focuses in general on data literacy and not on the whole set of assumptions mentioned above.

An experienced data professional sees more likely the inefficiencies, tries to address them locally in his interactions with the various stakeholders, he/she can build a business case for addressing them, though it depends on organizations to recognize that they have a problem, respective address the inefficiencies in a strategic and systemic manner!

Previous Post <<||>> Next Post

Business Intelligence: Microsoft Fabric's Notebooks

Business Intelligence Series
Business Intelligence Series 

When several technologies make their entrance in a data-related field like Data Warehousing, Data Analitics or Data Science, one is forced to understand how the respective technologies can be used or misused, respectively what's their place in the bigger picture. Microsoft Fabric introduces several important technologies that will change the way data are stored, processed and consumed. 

The first important technology is the notebook - a web document-like cell-based container for writing and executing code in a collaborative manner. The concept is not new, Jupyter notebooks have been around for almost a decade. In Microsof Fabric, notebooks support multiple languages, from which a default one applies to the whole notebook, while on cell level any of the supported languages can be used. 

One can execute a single cell, multiple cells or the entire notebook in a sequential manner, mix languages for the various operations - load, transform, save, and visualize data when needed. Notebooks can be parametrized and run via the homonymous activity in Data Factory pipelines, automating thus data processing. Probably more functionality is to come. 

Data engineers seems to have great flexibility, though usually flexibility implies constraints and/or mischiefs in other areas. I see for example in presentations the overuse of temporary data objects (mainly views) in Spark SQL as part of complex logic. That's acceptable during prototyping, though such code becomes a danger as soon the logic is deployed into production. Data objects should be created outside of the logic that uses them and should be treated as artifacts, with version control and proper documentation. It's maybe true that temporary objects reduce the volume of objects in the metastore, though is this the way to go?

Temporary objects tend to lead to wheel's reinvention or they get duplicated across multiple notebooks, which can easily create a maintenance nightmare. One needs to consider that the business logic changes a lot, the requirements and the data sources change, and on the long term, the cost of maintaining the code can easily overweight the benefits. 

Notebooks remind me of the beginnings of web programming when HTML was mixed up with client scripting languages like VB Script or Javascript, CSS, respectively server-side scripting languages. It was kind of a spaghetti code, modified repeatedly by multiple programmers, unendingly duplicated, and through a miracle it worked, until it stopped working unexpectedly in strangest situations. The strangest part was when after removing  commented code from a section made the code run again. 

The debugging of another person's code was a nightmare. Code developed by two people for similar purposes was looking unrecognizable different in terms of structure, programming techniques and layout. The technical debt was high, increasing in exponential manner. One was aware that the code needed refactoring, though there were more important things to do or no time allocated for it.

In the meantime the maturity of programming languages, frameworks, methodologies, best practices, and hopefully of programmers improved the overall quality of software (at least on average). Thinking of software from an Engineer's perspective improved the efficiency and effectiveness of a programmer's endeavor. The average programmer is able to write quality code, though there's a considerable minimum of "engineering" knowledge involved beside the mere knowledge of languages and tools. 

Notebooks are good up to a point, beyond which one needs to take a step back, restructure, move the code where it belongs, take a few more steps back and review the good practices and their application, disseminate the knowledge inside the team and use it in the next iterations, respectively refractor the code when needed! Hopefully, people learned from the mistakes of the past. 

Resources:
[1] Microsoft Learn (2023) How to use Microsoft Fabric notebooks (link

12 February 2024

Business Intelligence: A One Man Show I (Some Personal Background and a Big Thanks!)

Business Intelligence Series
Business Intelligence Series

Over the past 24 years, I found myself often in the position of a "one man show" doing almost everything in the data space from requirements gathering to development, testing, deployment, maintenance/support (including troubleshooting and optimization), and Project Management, respectively from operations to strategic management, when was the case. Of course, different tasks of varying complexity are involved! Developing a SSRS or Power BI report has a smaller complexity than developing in the process also all or parts of the Data Warehouse, or Lakehouse nowadays, respectively of building the whole infrastructure needed for reporting. All I can say is that "I've been there, I've done that!". 

Before SSRS became popular, I even built for a customer a whole reporting solution based on SQL Server, HTML & XML, respectively COM+ objects for database access. UI’s look-and-feel was like SSRS, though there was no wizardry involved besides the creative use of programming and optimization techniques. Once I wrote an SQL query, the volume of work needed to build a report was comparable to the one in SSRS. It was a great opportunity to use my skillset, working previously as a web developer and VB/VBA programmer. I worked for many years as a Software Engineer, applying the knowledge acquired in the field whenever it made sense to do so, working alone or in a team, as the projects required.

During this time, I was involved in other types of projects and activities that had less to do with the building of reports and warehouses. Besides of the development of various desktop, web, and data-processing solutions, I was also involved in 6-8 ERP implementations, being responsible for the migration of data, building the architectures needed in the process, supporting key users in various areas like Data Quality or Data Management. I also did Project Management, Application Management, Release and Change Management, and even IT Management. Thus, there were at times at least two components involved - one component was data-related, while the other component had more diversity. It was a good experience, because the second component often needed knowledge of the first, and vice versa. 

For example, arriving to understand the data model and business processes behind an ERP system by building ad-hoc and standardized reports, allowed me to get a good understanding of what data is needed for a Data Migration, which are the dependencies, or the level of quality needed. Similarly, the knowledge acquired by building ETL-based pipelines and data warehouses allowed me to design and build flexible Data Migration solutions, both architectures being quite similar from many perspectives. Knowledge of the data models and architectures involved can facilitate the overall process and is a premise for building reliable performant solutions. 

Similar examples can also be given in Data Management, Data Operations, Data Governance, during and post-implementation ERP support, etc. Reports and data are needed also in the Management areas - it starts from knowing what data are needed in the supporting processes for providing transparency, of getting insights and bringing the processes under control, if needed.

Working alone, being able to build a solution from the beginning to the end was often a job requirement. This doesn't imply that I was a "lone wolf". The nature of a data professional or software engineer’s job requires you to interact with various businesspeople from report requesters to key users, internal and external consultants, intermediary managers, and even upper management. There was also the knowledge of many data professionals involved indirectly – the resources I used to learn from - books, tutorials, blogs, webcasts, code, and training material. I'm thankful for their help over all these years!

Previous Post <<||>> Next Post

22 August 2023

Book Review: Laurent Bossavit's The Leprechauns of Software Engineering (2015)




Software Engineering should be the "establishment and use of sound engineering principles to obtain economically software that is reliable and works on real machines efficiently" [2]. Working for more than 20 years in the field I feel sometimes that its foundation is a strange mix of sound and questionable ideas that take the form of methodologies, principles, standards, myths, folklore, statistics and other similar concepts that form its backbone.

I tend to look with critical eyes at the important numbers advanced in research and pseudo-scientific papers especially when they’re related to my job, this because I know that statistics are seldom what they appear to be - there are accidental and sometimes even intended errors made to support the facts. Unfortunately, the missing row data and often the information about the methodologies used in collecting and processing the respective data make numbers and/or graphics' understanding more challenging, not to mention the considerable amount of effort and time spent to uncover the evidence trail.
Fortunately, there are other professionals who went further down the path of bibliographical references and shared their findings in blogs, papers, books and other media content. It’s also the case of Laurent Bossavit, who in his book, "The Leprechauns of Software Engineering" (2015), looks behind some of the numbers that over time become part of the leprechaunish folklore of IT professionals, puts them into the historical context and provides in appendix the evidence trails for the reader to validate his findings. Over several chapters the author focuses mainly on the cost of defects, Boehm’s cone of uncertainty, the differences in productivity amount individual programmers (aka 10x claim), respectively the relation between poor requirements and defects.

His most important finding is that the references used in most of the researched sources advancing the above numbers were secondary, while the actual sources provide no direct information of empirical data or the methodology for its collection. The way the numbers are advanced and used makes one question the validity of the measurements performed, respectively the character of the mistakes the authors made. Many of the cited papers hardly match the academic requirements of other scientific fields, being a mix of false claims, improperly conducted research and citations.

Secondly, he argues that the small sample sizes used as basis for the experiments, the small population formed usually of students, respectively the way numbers were mixed without any reliable scientific character makes him (and the reader as well) question even more how the experiments were performed in the respective papers. With this, it is more likely that a bigger number of research based on these sources should raise further concerns. The reader can thus ask himself/herself how deep the domino effect goes inside of the Software Engineering field.

In author’s opinion Software Engineering as social process "needs to be studied with tools that borrow as much from the social and cognitive sciences as they do from the mathematical theories of computation". How much is possible to extend the theories and models of the respective fields is an open topic. The bottom line, the field of Software Engineering needs better and scientific empirical experiments that are based on commonly agreed definitions, data collection and processing techniques, respectively higher standards for research publications. Without this, we’ll continue to compare apples with peaches and mix them in calculations so we can get some stories that support our leprechaunish theories.

Overall, the book is a good read for software engineers as well as for other IT professionals. Even if it barely scratched the surface of software myths and folklore, there’s enough material for the readers who want to dive deeper.

Previous Post <<||>> Next Post

References:
[1] Laurent Bossavit (2015) "The Leprechauns of Software Engineering"
[2] Friedrich Bauer (1972) "Software Engineering", Information Processing

07 March 2021

Project Management: Agile Manifesto Reloaded I (Introduction)

 

Project Management

There are so many books written on agile methodologies, each attempting to depict the realities of software development projects. There are many truths considered in them, though they seem to blend in a complex texture in which the writer takes usually the position of a preacher in which the sins of the traditional technologies are contrasted with the agile principles. In extremis everything done in the past seems to be wrong, while the agile methods seem to be a panacea, which is seldom the case.

There are already 20 years since the agile manifesto was published and the methodologies adhering to the respective principles don’t seem to provide the expected success, suffering from the same chronical symptoms of their predecessors - they are poorly understood and implemented, tend to function after hammer’s principle, respectively the software development projects still deliver poor results. Moreover, there are more and more professionals who raise their voice against agile practices.

Frankly, the principles behind the agile manifesto make sense. A project should by definition satisfy stakeholders’ requirements, ideally through regular deliveries that incorporate the needed functionality while gradually seeking to get early feedback from customers, respectively involve the customer through all project’s duration, working together to deliver a feasible product. Moreover, self-organizing teams, face-to-face meetings, constant pace, technical excellence should allow minimizing the waste, respectively maximizing the efficiency in the project. Further aspects like simplicity, good design and architecture should establish a basis for success.

Re-reading the agile manifesto, even if each read pulls from experience more and more pro and cons, the manifesto continues to look like a Christmas wish-list. Even if the represented ideas make sense and satisfy a specific need, they are difficult to achieve in a project’s context and setup. Each wish introduces a constraint that brings with it its own limitations. Unfortunately, each policy introduced by a methodology follows the same pattern, no matter of the methodology considered. Moreover, the wishes cover only a small subset from a project’s texture, are general and let lot of space for interpretation and implementation, though the same can be said about any principles that don’t provide a coherent worldview or a conceptual model.

The software development industry needs a coherent worldview that reflects its assumptions, models, characteristics, laws and challenges. Software Engineering (SE) attempts providing such a worldview though unfortunately is too complex for many and there seem to be a big divide when considered in respect to the worldviews introduced by the various Project Management (PM) methodologies. Studying one or two PM methodologies, learning a few programming languages and even the hand on experience on a few projects won’t fill the gaps in knowledge associated with the SE worldview.

Organizations don’t seem to see the need for professionals of having a formal education in SE. On the other side is expected from employees to have by default some of the skillset required, which is not the case. Besides understanding and implementing a technology there are a set of knowledge areas in which the IT professional must have at least a high-level knowledge if it’s expected from him/her to think critically about the respective areas. Unfortunately, the lack of such knowledge leads sometimes to situations which can impact negatively projects.

Almost each important word from the agile manifesto pulls with it a set of concepts from a SE’ worldview – customer satisfaction, software delivery, working software, requirements management, change management, cooperation, teamwork, trust, motivation, communication, metrics, stakeholders’ management, good design, good architecture, lessons learned, performance management, etc. The manifesto needs to be regarded from a SE’s eyeglasses if one expects value from it.

Previous Post <<||>> Next Post

27 December 2020

Data Warehousing: Data Vault 2.0 (The Good, the Bad and the Ugly)

Data Warehousing
Data Warehousing Series

One of the interesting concepts that seems to gain adepts in Data Warehousing is the Data Vault – a methodology, architecture and implementation for Data Warehouses (DWH) developed by Dan Linstedt between 1990 and 2000, and evolved into an open standard with the 2.0 version.

According to its creator, the Data Vault is a detail-oriented, historical tracking and uniquely linked set of normalized tables that support one or more business functional areas [2]. To hold data at the lowest grain of detail from the source system(s) and track the changes occurred in the data, it splits the fact and dimension tables into hubs (business keys), links (the relationships between business keys), satellites (descriptions of the business keys), and reference (dropdown values) tables [3], while adopting a hybrid approach between 3rd normal form and star schemas. In addition, it provides a two- or three-layered data integration architecture, a series of standards, methods and best practices supposed to facilitate its use.

It integrates several other methodologies that allow bridging the gap between the technical, logistic and execution parts of the DWH life-cycle – the PMI methodology is used for the various levels of planning and execution, while the Scrum methodology is used for coordinating the day-to-day project tasks. Six Sigma is used together with Total Quality Management for the design and continuous improvement of DWH and data-related processes. In addition, it follows the CMMI maturity model for providing a clear baseline for benchmarking an organization’s DWH capabilities in development, acquisition and service areas.

The Good: The decomposition of the source data models into hub, link and satellite tables provides traceability and auditability at raw data level, allowing thus to address the compliance requirements of Sarabanes-Oxley, HIPPA and Basel II by design.

The considered standards, methods, principles and best practices are leveraged from Software Engineering [1], establishing common ground and a standardized approach to DWH design, implementation and testing. It also narrows down the learning and implementation paths, while allowing an incremental approach to the various phases.

Data Vault 2.0 offers support for real-time, near-real-time and unstructured data, while new technologies like MapReduce, NoSQL can be integrated within its architecture, though the same can be said about other approaches as long there’s compatibility between the considered technologies. In fact, except business entities’ decomposition, many of the notions used are common to DWH design.

The Bad: Further decomposing the fact and dimension tables can impact the performance of the queries run against the tables as more joins are required to gather the data from the various tables. The further denormalization of tables can lead to higher data storage needs, though this can be neglectable compared with the volume of additional objects that need to be created in DWH. For an ERP system with a few hundred of meaningful tables the complexity can become overwhelming.

Unless one uses a COTS tool which automates some part of the design and creation process, building everything from scratch can be time-consuming, increasing thus the time-to-market for solutions. However, the COTS tools can introduce restrictions of their own, which can negatively impact the overall experience with the methodology.

The incorporation of non-technical methodologies can have positive impact, though unless one has experience with the respective methodologies, the disadvantages can easily overshadow the (theoretical) advantages.

The Ugly: The dangers of using Data Vault can be corroborated as usual with the poor understanding of the methodology, poor level of skillset or the attempt of implementing the methodology without allowing some flexibility when required. Unless one knows what he is doing, bringing more complexity in a field which is already complex, can easily impact negatively projects’ outcomes.

Previous Post <<||>> Next Post

References:
[1] Dan Linstedt & Michael Olschimke (2015) Building a Scalable Data Warehouse with Data Vault 2.0
[2] Dan Linstedt (?) Data Vault Basics [source]
[3] Dan Linstedt (2018) Data Vault: Data Modeling Specification v 2.0.2 [source]

25 December 2019

Software Engineering: Mea Culpa (Part II: The Beginnings)

Software Engineering
Software Engineering Series

I started programming at 14-15 years old with logical schemas made on paper, based mainly on simple mathematical algorithms like solving equations of second degree, finding prime or special numbers, and other simple tricks from the mathematical world available for a student at that age. It was challenging to learn programming based only on schemas, though, looking back, I think it was the best learning basis a programmer could have, because it allowed me thinking logically and it was also a good exercise, as one was forced to validate mentally or on paper the outputs.

Then I moved to learning Basic and later Pascal on old generation Spectrum computers, mainly having a keyboard with 64K memory and an improvised monitor. It felt almost like a holiday when one had the chance to work 45 minutes or so on an IBM computer with just 640K memory. It was also a motivation to stay long after hours to write a few more lines of code. Even if it made no big difference in what concerns the speed, the simple idea of using a more advanced computer was a big deal.

The jump from logical schemas to actual programming was huge, as we moved from static formulas to exploratory methods like the ones of finding the roots of equations of upper degrees by using approximation methods, working with permutations and a few other combinatoric tools, interpolation methods, and so on. Once I got my own 64K Spectrum keyboard, a new world opened, having more time to play with 2- and 3-dimensional figures, location problems and so on. It was probably the time I got most interesting exposure to things not found in the curricula.  

Further on, during the university years I moved to Fortran, back to Pascal and dBASE, and later to C and C++, the focus being further on mathematical and sorting algorithms, working with matrices, and so on. I have to admit that it was a big difference between the students who came from 2-3 hours of Informatics per week (like I did) and the ones coming from lyceums specialized on Informatics, this especially during years in which learning materials were almost inexistent. In the end all went well.

The jumping through so many programming languages, some quite old for the respective times, even if allowed acquiring different perspectives, it felt sometimes like  a waste of time, especially when one was limited to using the campus computers, and that only during lab hours. That was the reality of those times. Fortunately, the university years went faster than they came. Almost one year after graduation, with a little help, some effort and benevolence, I managed to land a job as web developer, jumping from an interlude with Java to ASP, JavaScript, HTML, ColdFusion, ActionScript, SQL, XML and a few other programming languages ‘en vogue’ during the 2000.

Somewhere between graduation and my first job, my life changed when I was able to buy my own PC (a Pentium). It was the best investment I could make, mainly because it allowed me to be independent of what I was doing at work. It allowed me learning the basics of OOP programming based on Visual Basic and occasionally on Visual C++ and C#. Most of the meaningful learning happened after work, from the few books available, full of mistakes and other challenges.

That was my beginning. It is not my intent to brag about how much or how many programming languages I learned - knowledge is anyway relative - but to differentiate between the realities of then and today, as a bridge over time.

Previous Post <<||>> Next Post

15 May 2019

Software Engineering: Rapid Prototyping (Part I: An Introduction)

Software Engineering
Software Engineering Series

Rapid (software) prototyping (RSP) is a group of techniques applied in Software Engineering to quickly build a prototype (aka mockup, wireframe) to verify the technical or factual realization and feasibility of an application architecture, process or business model. A similar notion is the one of Proof-of-Concept (PoC), which attempts to demonstrate by building a prototype, starting an experiment or a pilot project that a technical concept, business proposal or theory has practical potential. In other words in Software Engineering a RSP encompasses the techniques by which a PoC is lead.

In industries that consider physical products a prototype is typically a small-scale object made from inexpensive material that resembles the final product to a certain degree, some characteristics, details or features being completely ignored (e.g. the inner design, some components, the finishing, etc.). Building several prototypes is much easier and cheaper than building the end product, they allowing to play with a concept or idea until it gets close to the final product. Moreover, this approach reduces the risk of ending up with a product nobody wants.

A similar approach and reasoning is used in Software Engineering as well. Building a prototype allows focusing at the beginning on the essential characteristics or aspects of the application, process or (business) model under consideration. Upon case one can focus on the user interface (UI) , database access, integration mechanism or any other feature that involves a challenge. As in the case of the UI one can build several prototypes that demonstrate different designs or architectures. The initial prototype can go through a series of transformations until it reaches the desired form, following then to integrate more functionality and refine the end product gradually. This iterative and incremental approach is known as rapid evolutional prototyping.

A prototype is useful especially when dealing with the uncertainty, e.g. when adopting (new) technologies or methodologies, when mixing technologies within an architecture, when the details of the implementation are not known, when exploring an idea, when the requirements are expected to change often, etc. Building rapidly a prototype allows validating the requirements, responding agilely to change, getting customers’ feedback and sign-off as early as possible, showing them what’s possible, how the future application can look like, and this without investing too much effort. It’s easier to change a design or an architecture in the concept and design phases than later.

In BI prototyping resumes usually in building queries to identify the source of the data, reengineer the logic from the business application, prove whether the logic is technically feasible, feasibility being translate in robustness, performance, flexibility. In projects that have a broader scope one can attempt building the needed infrastructure for several reports, to make sure that the main requirements are met. Similarly, one can use prototyping to build a data warehouse or a data migration layer. Thus, one can build all or most of the logic for one or two entities, resolving the challenges for them, and once the challenges solved one can go ahead and integrate gradually the other entities.

Rapid prototyping can be used also in the implementation of a strategy or management system to prove the concepts behind. One can start thus with a narrow focus and integrate more functions, processes and business segments gradually in iterative and incremental steps, each step allowing to integrate the lesson learned, address the risks and opportunities, check the progress and change the direction as needed.

Rapid prototyping can prove to be a useful tool when given the chance to prove its benefits. Through its iterative and incremental approaches it allows to reach the targets efficiently

13 May 2019

Software Engineering: Programming (Good Programmer, Bad Programmer)

Software Engineering
Software Engineering Series

The use of denominations like 'good' or 'bad' related to programmers and programming carries with it a thin separation between these two perceptional poles that represent the end results of the programming process, reflecting the quality of the code delivered, respectively the quality of a programmer’s effort and  behavior as a whole. This means that the usage of the two denominations is often contextual, 'good' and 'bad' being moving points on a imaginary value scale with a wide range of values within and outside the interval determined by the two.

The 'good programmer' label is a idealization of the traits associated with being a programmer – analyzing and understanding the requirements, filling the gaps when necessary, translating the requirements in robust designs, developing quality code with a minimum of overwork, delivering on-time, being able to help others, to work as part of a (self-organizing) team and alone, when the project requires it, to follow methodologies, processes or best practices, etc. The problem with such a definition is that there's no fix limit, considering that programmer’s job description can include an extensive range of requirements.

The 'bad programmer' label is used in general when programmers (repeatedly) fail to reach others’ expectations, occasionally the labeling being done independently of one’s experience in the field. The volume of bugs and mistakes, the fuzziness of designs and of the code written, the lack of comments and documentation, the lack of adherence to methodologies, processes, best practices and naming conventions are often considered as indicators for such labels. Sometimes even the smallest mistakes or the wrong perceptions of one’s effort and abilities can trigger such labels.

Labeling people as 'good' or 'bad' has the tendency of reinforcing one's initial perception, in extremis leading to self-fulfilling prophecies - predictions that directly or indirectly cause themselves to become true, by the very terms on how the predictions came into being. Thus, when somebody labels another as 'good' or 'bad' he more likely will look for signs that reinforce his previous believes. This leads to situations in which "good" programmers’ mistakes are easier overlooked than 'bad' programmers' mistakes, even if the mistakes are similar.

A good label can in theory motivate, while a bad label can easily demotivate, though their effects depend from person to person. Such labels can easily become a problem for beginners, because they can easily affect beginners' perception about themselves. It’s so easy to forget that programming is a continuous learning process in which knowledge is relative and highly contextual, each person having strengths and weaknesses.

Each programmer has a particular set of skills that differentiate him from other programmers. Each programmer is unique, aspect reflected in the code one writes. Expecting programmers to fit an ideal pattern is unrealistic. Instead of using labels one should attempt to strengthen the weaknesses and make adequate use of a person’s strengths. In this approach resides the seeds for personal growth and excellence.

There are also programmers who excel in certain areas - conceptual creativity, ability in problem identification, analysis and solving, speed, ingenuity of design and of making best use of the available tools, etc. Such programmers, as Randall Stross formulates it, “are an order of magnitude better” than others. The experience and skills harnessed with intelligence have this transformational power that is achievable by each programmer in time.

Even if we can’t always avoid such labeling, it’s important to become aware of the latent force the labels carry with them, the effect they have on our colleagues and teammates. A label can easily act as a boomerang, hitting us back long after it was thrown away.

12 May 2019

Software Engineering: Programming (Misconceptions about Programming II)

Software Engineering

Continuation

One of the organizational stereotypes is having a big room full of cubicles filled with employees. Even if programmers can work in such settings, improperly designed environments restrict to a certain degree the creativity and productivity, making more difficult employees' collaboration and socialization. Despite having dedicated meeting rooms, an important part of the communication occurs ad-hoc. In open spaces each transient interruption can easily lead inadvertently to loss of concentration, which leads to wasted time, as one needs retaking thoughts’ thread and reviewing the last written code, and occasionally to bugs.

Programming is expected to be a 9 to 5 job with the effective working time of 8 hours. Subtracting the interruptions, the pauses one needs to take, the effective working time decreases to about 6 hours. In other words, to reach 8 hours of effective productivity one needs to work about 10 hours or so. Therefore, unless adequately planned, each project starts with a 20% of overtime. Moreover, even if a task is planned to take 8 hours, given the need of information the allocated time is split over multiple days. The higher the need for further clarifications the higher the chances for effort to expand. In extremis, the effort can double itself.

Spending extensive time in front of the computer can have adverse effects on programmers’ physical and psychical health. Same effect has the time pressure and some of the negative behavior that occurs in working environments. Also, the communication skills can suffer when they are not properly addressed. Unfortunately, few organizations give importance to these aspects, few offer a work free time balance, even if a programmer’s job best fits and requires such approach. What’s even more unfortunate is when organizations ignore the overtime, taking it as part of job’s description. It’s also one of the main reasons why programmers leave, why competent workforce is lost. In the end everyone’s replaceable, however what’s the price one must pay for it?

Trainings are offered typically within running projects as they can be easily billed. Besides the fact that this behavior takes time unnecessarily from a project’s schedule, it can easily make trainings ineffective when the programmers can’t immediately use the new knowledge. Moreover, considering resources that come and go, the unwillingness to invest in programmers can have incalculable effects on an organization performance, respectively on their personal development.

Organizations typically look for self-motivated resources, this request often encompassing organization’s whole motivational strategy. Long projects feel like a marathon in which is difficult to sustain the same rhythm for the whole duration of the project. Managers and team leaders need to work on programmers’ motivation if they need sustained performance. They must act as mentors and leaders altogether, not only to control tasks’ status and rave and storm each time deviations occur. It’s easy to complain about the status quo without doing anything to address the existing issues (challenges).

Especially in dysfunctional teams, programmers believe that management can’t contribute much to project’s technical aspects, while management sees little or no benefit in making developers integrant part of project's decisional process. Moreover, the lack of transparence and communication lead to a wide range of frictions between the various parties.

Probably the most difficult to understand is people’s stubbornness in expecting different behavior by following the same methods and of ignoring the common sense. It’s bewildering the easiness with which people ignore technological and Project Management principles and best practices. It resides in human nature the stubbornness of learning on the hard way despite the warnings of the experienced, however, despite the negative effects there’s often minimal learning in the process...

To be eventually continued…

Software Engineering: Programming (Misconceptions about Programming - Part I)

Software Engineering
Software Engineering Series

Besides equating the programming process with a programmer’s capabilities, minimizing the importance of programming and programmers’ skills in the whole process (see previous post), there are several other misconceptions about programming that influence process' outcomes.


Having a deep knowledge of a programming language allows programmers to easily approach other programming languages, however each language has its own learning curve ranging from a few weeks to half of year or more. The learning curve is dependent on the complexity of the languages known and the language to be learned, same applying to frameworks and architectures, the scenarios in which the languages are used, etc. One unrealistic expectation is that the programmers are capablle of learning a new programming language or framework overnight, this expectation pushing more pressure on programmers’ shoulders as they need to compensate in a short time for the knowledge gap. No, the programming languages are not the same even if there’s high resemblance between them!

There’s lot of code available online, many of the programming tasks involve writing similar code. This makes people assume that programming can resume to copy-paste activities and, in extremis, that there’s no creativity into the act of programming. Beside the fact that using others’ code comes with certain copyright limitations, copy-pasting code is in general a way of introducing bugs in software. One can learn a lot from others’ code, though programmers' challenge resides in writing better code, in reusing code while finding the right the level of abstraction.  
 
There’s the tendency on the market to build whole applications using wizard-like functionality and of generating source-code based on data or ontological models. Such approaches work in a range of (limited) scenarios, and even if the trend is to automate as much in the process, is not what programming is about. Each such tool comes with its own limitations that sooner or later will push back. Changing the code in order to build new functionality or to optimize the code is often not a feasible solution as it imposes further limitations.

Programming is not only about writing code. It involves also problem-solving abilities, having a certain understanding about the business processes, in which the conceptual creativity and ingenuity of design can prove to be a good asset. Modelling and implementing processes help programmers gain a unique perspective within a business.

For a programmer the learning process never stops. The release cycle for the known tools becomes smaller, each release bringing a new set of functionalities. Moreover, there are always new frameworks, environments, architectures and methodologies to learn. There’s a considerable amount of effort in expanding one's (necessary) knowledge, effort usually not planned in projects or outside of them. Trainings help in the process, though they hardly scratch the surface. Often the programmer is forced to fill the knowledge gap in his free time. This adds up to the volume of overtime one must do on projects. On the long run it becomes challenging to find the needed time for learning.

In resource planning there’s the tendency to add or replace resources on projects, while neglecting the influence this might have on a project and its timeline. Each new resource needs some time to accommodate himself on the role, to understand project requirements, to take over the work of another. Moreover, resources are replaced on project with a minimal or even without the knowledge transfer necessary for the job ahead. Unfortunately, same behavior occurs in consultancy as well, consultants being moved from one known functional area into another unknown area, changing the resources like the engines of different types of car, expecting that everything will work as magic.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.