Showing posts with label self-reflection. Show all posts
Showing posts with label self-reflection. Show all posts

14 September 2024

🗄️Data Management: Data Governance (Part II: Heroes Die Young)

Data Management Series
Data Management Series

In the call for action there are tendencies in some organizations to idealize and overcharge main actors' purpose and image when talking about data governance by calling them heroes. Heroes are those people who fight for a goal they believe in with all their being and occasionally they pay the supreme tribute. Of course, the image of heroes is idealized and many other aspects are ignored, though such images sell ideas and ideals. Organizations might need heroes and heroic deeds to change the status quo, but the heroism doesn't necessarily payoff for the "heroes"! 

Sometimes, organizations need a considerable effort to change the status quo. It can be people's resistance to new, to the demands, to the ideas propagated, especially when they are not clearly explained and executed. It can be the incommensurable distance between the "AS IS" and the "TO BE" perspectives, especially when clear paths aren't in sight. It can be the lack of resources (e.g., time, money, people, tools), knowledge, understanding or skillset that makes the effort difficult. 

Unfortunately, such initiatives favor action over adequate strategies, planning and understanding of the overall context. The call do to something creates waves of actions and reactions which in the organizational context can lead to storms and even extreme behavior that ranges from resistance to the new to heroic deeds. Finding a few messages that support the call for action can help, though they can't replace the various critical for success factors.

Leading organizations on a new path requires a well-defined realistic strategy, respectively adequate tactical and operational planning that reflects organizations' specific needs, knowledge and capabilities. Just demanding from people to do their best is not enough, and heroism has chances to appear especially in this context. Unfortunately, the whole weight falls on the shoulders of the people chosen as actors in the fight. Ideally, it should be possible to spread the whole weight on a broader basis which should be considered the foundation for the new. 

The "heroes" metaphor is idealized and the negative outcome probably exaggerated, though extreme situations do occur in organizations when decisions, planning, execution and expectations are far from ideal. Ideal situations are met only in books and less in practice!

The management demands and the people execute, much like in the army, though by contrast people need to understand the reasoning behind what they are doing. Proper execution requires skillset, understanding, training, support, tools and the right resources for the right job. Just relying on people's professionalism and effort is not enough and is suboptimal, but this is what many organizations seem to do!

Organizations tend to respond to the various barriers or challenges with more resources or pressure instead of analyzing and depicting the situation adequately, and eventually change the strategy, tactics or operations accordingly. It's also difficult to do this as long an organization doesn't have the capabilities and practices of self-check, self-introspection, self-reflection, etc. Even if it sounds a bit exaggerated, an organization must know itself to overcome the various challenges. Regular meetings, KPIs and other metrics give the illusion of control when self-control is needed. 

Things don't have to be that complex even if managing data governance is a complex endeavor. Small or midsized organizations are in theory more capable to handle complexity because they can be more agile, have a robust structure and the flow of information and knowledge has less barriers, respectively a shorter distance to overcome, at least in theory. One can probably appeal to the laws and characteristics of networks to understand more about the deeper implications, of how solutions can be implemented in more complex setups.

22 March 2024

🧭Business Intelligence: Perspectives (Part IX: Dashboards Are Dead & Other Crap)

Business Intelligence
Business Intelligence Series

I find annoying the posts that declare that a technology is dead, as they seem to seek the sensational and, in the end, don't offer enough arguments for the positions taken; all is just surfing though a few random ideas. Almost each time I klick on such a link I find myself disappointed. Maybe it's just me - having too great expectations from ad-hoc experts who haven't understood the role of technologies and their lifecycle.

At least until now dashboards are the only visual tool that allows displaying related metrics in a consistent manner, reflecting business objectives, health, or other important perspective into an organization's performance. More recently notebooks seem to be getting closer given their capabilities of presenting data visualizations and some intermediary steps used to obtain the data, though they are still far away from offering similar capabilities. So, from where could come any justification against dashboard's utility? Even if I heard one or two expert voices saying that they don't need KPIs for managing an organization, organizations still need metrics to understand how the organization is doing as a whole and taken on parts. 

Many argue that the design of dashboards is poor, that they don't reflect data visualization best practices, or that they are too difficult to navigate. There are so many books on dashboard and/or graphic design that is almost impossible not to find such a book in any big library if one wants to learn more about design. There are many resources online as well, though it's tough to fight with a mind's stubbornness in showing no interest in what concerns the topic. Conversely, there's also lot of crap on the social networks that qualify after the mainstream as best practices. 

Frankly, design is important, though as long as the dashboards show the right data and the organization can guide itself on the respective numbers, the perfectionists can say whatever they want, even if they are right! Unfortunately, the numbers shown in dashboards raise entitled questions and the reasons are multiple. Do dashboards show the right numbers? Do they focus on the objectives or important issues? Can the number be trusted? Do they reflect reality? Can we use them in decision-making? 

There are so many things that can go wrong when building a dashboard - there are so many transformations that need to be performed, that the chances of failure are high. It's enough to have several blunders in the code or data visualizations for people to stop trusting the data shown.

Trust and quality are complex concepts and there’s no standard path to address them because they are a matter of perception, which can vary and change dynamically based on the situation. There are, however, approaches that allow to minimize this. One can start for example by providing transparency. For each dashboard provide also detailed reports that through drilldown (or also by running the reports separately if that’s not possible) allow to validate the numbers from the report. If users don’t trust the data or the report, then they should pinpoint what’s wrong. Of course, the two sources must be in synch, otherwise the validation will become more complex.

There are also issues related to the approach - the way a reporting tool was introduced, the way dashboards flooded the space, how people reacted, etc. Introducing a reporting tool for dashboards is also a matter of strategy, tactics and operations and the various aspects related to them must be addressed. Few organizations address this properly. Many organizations work after the principle "build it and they will come" even if they build the wrong thing!

Previous Post <<||>> Next Post

16 March 2024

🧭Business Intelligence: A Software Engineer's Perspective (Part VII: Think for Yourself!)

Business Intelligence
Business Intelligence Series

After almost a quarter-century of professional experience the best advice I could give to younger professionals is to "gather information and think for themselves", and with this the reader can close the page and move forward! Anyway, everybody seems to be looking for sudden enlightenment with minimal effort, as if the effort has no meaning in the process!

In whatever endeavor you are caught, it makes sense to do upfront a bit of thinking for yourself - what's the task, or more general the problem, which are the main aspects and interpretations, which are the goals, respectively the objectives, how a solution might look like, respectively how can it be solved, how long it could take, etc. This exercise is important for familiarizing yourself with the problem and creating a skeleton on which you can build further. It can be just vague ideas or something more complex, though no matter the overall depth is important to do some thinking for yourself!

Then, you should do some research to identify how others approached and maybe solved the problem, what were the justifications, assumptions, heuristics, strategies, and other tools used in sense-making and problem solving. When doing research, one should not stop with the first answer and go with it. It makes sense to allocate a fair amount of time for information gathering, structuring the findings in a reusable way (e.g. tables, mind maps or other tools used for knowledge mapping), and looking at the problem from the multiple perspectives derived from them. It's important to gather several perspectives, otherwise the decisions have a high chance of being biased. Just because others preferred a certain approach, it doesn't mean one should follow it, at least not blindly!

The purpose of research is multifold. First, one should try not to reinvent the wheel. I know, it can be fun, and a lot can be learned in the process, though when time is an important commodity, it's important to be pragmatic! Secondly, new information can provide new perspectives - one can learn a lot from other people’s thinking. The pragmatism of problem solvers should be combined, when possible, with the idealism of theories. Thus, one can make connections between ideas that aren't connected at first sight.

Once a good share of facts was gathered, you can review the new information in respect to the previous ones and devise from there several approaches worthy of attack. Once the facts are reviewed, there are probably strong arguments made by others to follow one approach over the others. However, one can show that has reached a maturity when is able to evaluate the information and take a decision based on the respective information, even if the decision is not by far perfect.

One should try to develop a feeling for decision making, even if this seems to be more of a gut-feeling and stressful at times. When possible, one should attempt to collect and/or use data, though collecting data is often a luxury that tends to postpone the decision making, respectively be misused by people just to confirm their biases. Conversely, if there's any important benefit associated with it, one can collect data to validate in time one's decision, though that's a more of a scientist’s approach.

I know that's easier to go with the general opinion and do what others advise, especially when some ideas are popular and/or come from experts, though then would mean to also follow others' mistakes and biases. Occasionally, that can be acceptable, especially when the impact is neglectable, however each decision we are confronted with is an opportunity to learn something, to make a difference! 

Previous Post <<||>> Next Post

16 February 2024

🧭Business Intelligence: Strategic Management (Part I: What is a BI Strategy?)

Business Intelligence Series
Business Intelligence Series

"A BI strategy is a plan to implement, use, and manage data and analytics to better enable your users to meet their business objectives. An effective BI strategy ensures that data and analytics support your business strategy." [1]

The definition is from Microsoft's guide on Power BI implementation planning, a long-awaited resource for those deploying Power BI in their organization. 

I read the definition repeatedly and, even if it looks logically correct, the general feeling is that it falls short, and I'm trying to understand why. A strategy is a plan indeed, even if various theorists use modifiers like unified, comprehensive, integrative, forward-looking, etc. Probably, because it talks about a BI strategy, the definition implies using a strategic plan. Conversely, using "strategic plan" in the definition seems to make the definition redundant, though it would pull then with it all what a strategy is about. 

A business strategy is about enabling users to meet organization's business objectives, otherwise it would fail by design. Implicitly, an organization's objectives become its employees' objectives. The definition kind of states the obvious. Conversely, it talks only about the users, and not all employees are users. Thus, it refers only to a subset. Shouldn't a BI strategy support everybody? 

Usually, data analytics refers to the procedures and techniques used for exploration and analysis. Isn't supposed to consider also the visualization of data? Did it forgot something else? Ideally, a definition shouldn't define what its terms are about individually, but what they are when used together.

BI as a set of technologies, architectures, methodologies, processes and practices is by definition an enabler if we take these components individually or as a whole. I would play devil's advocate and ask "better than what?". Many of the information systems used in organizations come with a set of reports or functionalities that enable users in their jobs without investing a cent in a BI infrastructure. 

One or two decades ago one of the big words used in sales pitches for BI tools was "competitive advantage". I was asking myself when and where did the word disappeared? Is BI technologies' success so common that the word makes no sense anymore? Did the sellers become more ethical? Or did we recognize that the challenges behind a technology are more of an organizational nature? 

When looking at a business strategy, the hierarchy of business objectives forms its backbone, though there are other important elements that form its foundation: mission, vision, purpose, values or principles. A BI strategy needs to be aligned with the business strategy and the other strategies (e.g. quality, IT, communication, etc.). Being able to trace this kind of relationships between strategies is quintessential. 

We talk about BI, Data Analytics, Data Management and newly Data Science. The relationship between them becomes more complex. Therefore, what differentiates a BI strategy from the other strategies? The above definition could apply to the other fields as well. Moreover, does it makes sense to include them in one form or another?

Independently how the joint field is called, BI and Data Analytics should be about gaining a deeper understanding about the business and disseminating that knowledge within the organization, respectively about exploring courses of action, building the infrastructure, the skillset, the culture and the mindset to approach more complex challenges and not only to enable business goals!

There are no perfect definitions, especially when the concepts used have drifting definitions as well, being caught into a net that makes it challenging to grasp the essence of things. In the end, a definition is good enough if the data professionals can work with it. 

Resources:

[1] Microsoft Learn (2004) Power BI implementation planning: BI strategy (link).

14 February 2024

🧭Business Intelligence: A One-Man Show (Part VI: The Lakehouse Perspective)

Business Intelligence Suite
Business Intelligence Suite

Continuing the ideas on Christopher Laubenthal's article "Why one person can't do everything in the data space" [1] and why his analogy between a college's functional structure and the core data roles is poorly chosen. In the last post I mentioned as a first argument that the two constructions have different foundations.

Secondly, it's a matter of construction, namely the steps used to arrive from one state to another. Indeed, there's somebody who builds the data warehouse (DWH), somebody who builds the ETL/ELT pipelines for moving the data from the sources to the DWH, somebody who builds the sematic data model that includes business related logic, respectively people who tap into the data for reporting, data visualizations, data science projects, and whatever is still needed in the organization. On top of this, there should be somebody who manages the DWH. I haven't associated any role to them because one of the core roles can be responsible for more than one step. 

In the case of a lakehouse, it is the data engineer who moves the data from the various data sources to the data lake if that doesn't happen already by design or configuration. As per my understanding the data engineers are the ones who design and build the new lakehouse, move transform and manage the data as required. The Data Analysts, Data Scientist and maybe some Information Designers can tap then into the data. However, the DWH and the lakehouse(s) are technologies that facilitate their work. They can still do their work also if the same data are available by other means.

In what concerns the dorm analogy, the verbs were chosen to match the way data warehouses (DWH) or lakehouses are built, though the congruence of the steps is questionable. One could have compared the number of students with the numbers of data entities, but not with the data themselves. Usually, students move by themselves and occupy the places. The story tellers, the assistants and researchers are independent on whether the students are hosted in the dorm or not. Therefore, the analogy seems to be a bit forced. 

Frankly, I covered all the steps except the ones related to Data Science by myself for both described scenarios. It helped that I knew the data from the data sources and the transformations rules I had to apply, respectively the techniques needed for moving and transforming the data, and the volume of data entities was manageable somehow. Conversely, 1-2 more resources in the area of data analysis and visualizations could have helped to bring more value to the business. 

This opens the challenge of scale and it has do to with systems engineering and how the number of components and the interactions between them increase systems' complexity and the demand for managing the respective components. In the simplest linear models, for each multiplier of a certain number of components of the same type from the organization, the number of resources managing the respective layer matches to some degree the multiplier. E.g. if a data engineer can handle x data entities in a unit of time, then for hand n*x components are more likely at least n data engineers required. However, the output of n components is only a fraction of the n*x given the dependencies existing between components and other constraints.

An optimization problem resumes in finding out what data roles to chose to cover an organization's needs. A one man show can be the best solution for small organizations, though unless there's a good division of labor, bringing a second person will make the throughput slower until will become faster.

Previous Post <<|||>> Next Post

Resources:
[1] Christopher Laubenthal (2024) "Why One Person Can’t Do Everything In Data" (link)

13 February 2024

🧭🏭Business Intelligence: A One-Man Show III (The Microsoft Fabric)

Business Intelligence Series
Business Intelligence Series

Announced at the end of the last year, Microsoft Fabric (MF) become a reality for the data professional, even if there are still many gaps in the overall architecture and some things don't work as they should. The Delta Lake and the various data consumption experiences seem to bring more flexibility but also raise questions on how one can use them adequately in building solutions for Data Analytics and/or Data Science. 

Currently, as it happens with new technologies, data professionals seem to try to explore the functionality, see what's possible, what's missing, and that's a considerable effort as everybody is more or less on his own. The material released by Microsoft and other professionals should facilitate in theory this effort, though the considerable number of features and the effort needed to review them do the opposite. Some professionals do this as part of their jobs, and exploring the feature seems to be a full job in each area, while others, like myself, do it in their own time. 

There are organizations that demand from their employees to regularly actualize their knowledge in their field of activity, respectively explore how new technologies can be integrated in organization's architecture. Having a few hours or even a day a weak for this can go a long way! Occasionally, I could take 1-2 hours a week during the program and take maybe a few many more hours from my own time. Unfortunately, most of the significant progress I made in a certain area (SQL Server, Dynamics 365, Software Engineering, Power BI, and now MF) it was done in my own time, which became in time more and more challenging to do given the pace with which new features and technologies develop.

By comparison, it was relatively easy to locally install SQL Server in its various CTP or community versions, deploy one of the readily-available databases, and start learning. I'm still doing it, playing with a SQL Server 2022 instance whenever I find the time. Similarly, I can use Power BI and a few other tools, depending again on the time available to make progress. However, with MF things start slowly to get blurry. The 60 days of trial won't cut it anymore as there are so many things to learn - Spark SQL, PySpark, Delta Lake, KQL, Dataflows, etc. Probably, there will be ways for learning any of this standalone, though not together in an integrated manner. 

The complexity of the tools demands more time, a proper infrastructure and a good project to accommodate them. This doesn't mean that the complexity of the solutions need to increase as well! Azure Synapse allowed me to reuse many of the techniques I used in the past to build a modern Data Analytics solution, while in other areas I had to accommodate the new. The solution wasn't perfect (only time will tell), though it provided the minimum of what was needed. I expect the same to happen in Microsoft Fabric, even if the number of choices is bigger. 

There's a considerable difference between building a minimal viable solution and exploring, respectively harnessing MF's capabilities. The challenge for many organizations is to determine what that minimum is about, how to build that knowledge into the team, especially when starting from zero. 

Conversely, this doesn't mean that the skillset and effort can't be covered by one person. It might be more challenging though achievable if the foundation is there, respectively if certain conditions are met. This depends also on organization's expectations, infrastructure and other characteristics. A whole team is more likely to succeed than one person, but not certainty! 

Previous Post <<||>> Next Post

🧭Business Intelligence: A One-Man Show (Part II: In the Cusps of Complexity)

Business Intelligence Series
Business Intelligence Series

I watched today on YouTube Power BI Tips' "One Person to Do Everything" episode I missed last week. The main topic is based on Christopher Laubenthal's article "Why one person can't do everything in the data space". Author's arguments are based on an analogy between the various data areas and a college's functional structure. Reading the article, I must say that it takes a poorly chosen analogy to mess messy things more!

One of the most confusing things is that there are so many data-related context-dependent roles with considerable overlapping, that it becomes more and more difficult to understand what they cover. The author considers the roles of Data Architect, Data Engineer, Database Administrator (DBA), Data Analyst, Information Designer and Data Scientist. However, to the every aspect of a data architecture there are also developers on the database (backend) and reporting side (front-end). Conversely, there are other data professionals on the management side for the various knowledge areas of Data Management: Data Governance, Data Strategy, Data Security, Data Operations, etc. There are also roles at the border between the business and the technical side like Data Stewards, Business Analysts, Data Citizen, etc. 

There are two main aspects here. According to the historical perspective, many of these roles appeared when a new set of requirements or a new layer appeared in the architecture. Firstly, it was maybe the DBA, who was supposed to primarily administer the database. Being a keeper of the data and having some knowledge of the data entities, it was easy for him/her to export data for the various reporting needs. In time such activities were taken over by a second category of data professionals. Then the data were moved to Decision Support Systems and later to Data Warehouses and Data Lakes/Lakehoses, this evolution requiring other professionals to address the challenges of each layer. Every activity performed on the data requires a certain type of knowledge that can result in the end in a new denomination. 

The second perspective results from the management of data and the knowledge areas associated with it. If in small organizations with one or two systems in place one doesn't need to talk about Data Operations, in big organizations, where a data center or something similar is maybe in place, Data Operations can easily become a topic on its own, a management structure needing to be in place for its "effective and efficient" management. And the same can happen in the other knowledge areas and their interaction with the business. It's an inherent tendency of answering to complexity with complexity, which on the long term can be in the detriment of any business. In extremis, organizations tend to have a whole team in each area, which can further increase the overall complexity by a small to not that small magnitude. 

Fortunately, one of the benefits of technological advancement is that much of the complexity can be moved somewhere else, and these are the areas where the cloud brings the most advantages. Parts or all architecture can be deployed into the cloud, being managed by cloud providers and third-parties on an on-demand basis at stable costs. Moreover, with the increasing maturity and integration of the various layers, the impact of the various roles in the overall picture is reduced considerably as areas like governance, security or operations are built-in as services, requiring thus less resources. 

With Microsoft Fabric, all the data needed for reporting becomes in theory easily available in the OneLake. Unfortunately, there is another type of complexity that is dumped on other professionals' shoulders and these aspects need to be furthered considered. 

Previous Post <<|||>> Next Post

Resources:
[1] Christopher Laubenthal (2024) "Why One Person Can’t Do Everything In Data" (link)
[2] Power BI tips (2024) Ep.292: One Person to Do Everything (link)


12 February 2024

🧭Business Intelligence: A One-Man Show (Part I: Some Personal Background and a Big Thanks!)

Business Intelligence Series
Business Intelligence Series

Over the past 24 years, I found myself often in the position of a "one man show" doing almost everything in the data space from requirements gathering to development, testing, deployment, maintenance/support (including troubleshooting and optimization), and Project Management, respectively from operations to strategic management, when was the case. Of course, different tasks of varying complexity are involved! Developing a SSRS or Power BI report has a smaller complexity than developing in the process also all or parts of the Data Warehouse, or Lakehouse nowadays, respectively of building the whole infrastructure needed for reporting. All I can say is that "I've been there, I've done that!". 

Before SSRS became popular, I even built for a customer a whole reporting solution based on SQL Server, HTML & XML, respectively COM+ objects for database access. UI’s look-and-feel was like SSRS, though there was no wizardry involved besides the creative use of programming and optimization techniques. Once I wrote an SQL query, the volume of work needed to build a report was comparable to the one in SSRS. It was a great opportunity to use my skillset, working previously as a web developer and VB/VBA programmer. I worked for many years as a Software Engineer, applying the knowledge acquired in the field whenever it made sense to do so, working alone or in a team, as the projects required.

During this time, I was involved in other types of projects and activities that had less to do with the building of reports and warehouses. Besides of the development of various desktop, web, and data-processing solutions, I was also involved in 6-8 ERP implementations, being responsible for the migration of data, building the architectures needed in the process, supporting key users in various areas like Data Quality or Data Management. I also did Project Management, Application Management, Release and Change Management, and even IT Management. Thus, there were at times at least two components involved - one component was data-related, while the other component had more diversity. It was a good experience, because the second component often needed knowledge of the first, and vice versa. 

For example, arriving to understand the data model and business processes behind an ERP system by building ad-hoc and standardized reports, allowed me to get a good understanding of what data is needed for a Data Migration, which are the dependencies, or the level of quality needed. Similarly, the knowledge acquired by building ETL-based pipelines and data warehouses allowed me to design and build flexible Data Migration solutions, both architectures being quite similar from many perspectives. Knowledge of the data models and architectures involved can facilitate the overall process and is a premise for building reliable performant solutions. 

Similar examples can also be given in Data Management, Data Operations, Data Governance, during and post-implementation ERP support, etc. Reports and data are needed also in the Management areas - it starts from knowing what data are needed in the supporting processes for providing transparency, of getting insights and bringing the processes under control, if needed.

Working alone, being able to build a solution from the beginning to the end was often a job requirement. This doesn't imply that I was a "lone wolf". The nature of a data professional or software engineer’s job requires you to interact with various businesspeople from report requesters to key users, internal and external consultants, intermediary managers, and even upper management. There was also the knowledge of many data professionals involved indirectly – the resources I used to learn from - books, tutorials, blogs, webcasts, code, and training material. I'm thankful for their help over all these years!

Previous Post <<||>> Next Post

02 January 2024

🕸Systems Engineering: Never-Ending Stories in Praxis (Quote of the Day)

Systems Engineering
Systems Engineering Cycle

"[…] the longer one works on […] a project without actually concluding it, the more remote the expected completion date becomes. Is this really such a perplexing paradox? No, on the contrary: human experience, all-too-familiar human experience, suggests that in fact many tasks suffer from similar runaway completion times. In short, such jobs either get done soon or they never get done. It is surprising, though, that this common conundrum can be modeled so simply by a self-similar power law." (Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990)

I found the above quote while browsing through Manfred Schroeder's book on fractals, chaos and power laws, book that also explores similar topics like percolation, recursion, randomness, self-similarity, determinism, etc. Unfortunately, when one goes beyond the introductory notes of each chapter, the subjects require more advanced knowledge of Mathematics, respectively further analysis and exploration of the models behind. Despite this, the book is still an interesting read with ideas to ponder upon.

I found myself a few times in the situation described above - working on a task that didn't seem to end, despite investing more effort, respectively approaching the solution from different angles. The reasons residing behind such situations were multiple, found typically beyond my direct area of influence and/or decision. In a systemic setup, there are parts of a system that find themselves in opposition, different forces pulling in distinct directions. It can be the case of interests, goals, expectations or solutions which compete or make subject to politics. 

For example, in Data Analytics or Data Science there are high chances that no progress can be made beyond a certain point without addressing first the quality of data or design/architectural issues. The integrations between applications, data migrations and other solutions which heavily rely on data are sensitive to data quality and architecture's reliability. As long the source of variability (data, data generators) is not stabilized, providing a stable solution has low chances of success, no matter how much effort is invested, respectively how performant the tools are. 

Some of the issues can be solved by allocating resources to handle their implications. Unfortunately, some organizations attempt to solve such issues by allocating the resources in the wrong areas or by addressing the symptoms instead of taking a step back and looking systemically at the problem, analyzing and modeling it accordingly. Moreover, there are organizations which refuse to recognize they have a problem at all! In the blame game, it's much easier to shift the responsibility on somebody else's shoulders. 

Defining the right problem to solve might prove more challenging than expected and usually this requires several iterations in which the knowledge obtained in the process is incorporated gradually. Other times, one attempts to solve the correct problem by using the wrong methodology, architecture and/or skillset. The difference between right and wrong depends on the context, and even between similar problems and factors the context can make a considerable difference.

The above quote can be corroborated with situations in which perfection is demanded. In IT and management setups, excellence is often confounded with perfection, the latter being impossible to achieve, though many managers take it as the norm. There's a critical point above which the effort invested outweighs solution's plausibility by an exponential factor.  

Another source for unending effort is when requirements change frequently in a swift manner - e.g. the rate with which changes occur outweighs the progress made for finding a solution. Unless the requirements are stabilized, the effort spirals towards the outside (in an exponential manner). 

Finally, there are cases with extreme character, in which for example the complexity of the task outweighs the skillset and/or the number of resources available. Moreover, there are problems which accept plausible solutions, though there are also problems (especially systemic ones) which don't have stable or plausible solutions. 

Behind most of such cases lie factors that tend to have chaotic behavior that occurs especially when the environments are far from favorable. The models used to depict such relations are nonlinear, sometimes expressed as power laws - one quantity varying as a power of another, with the variation increasing with each generation. 

Previous Post <<||>> Next Post

Resources:
[1] Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990 (quotes)

25 December 2019

#️⃣Software Engineering: Mea Culpa (Part II: The Beginnings)

Software Engineering
Software Engineering Series

I started programming at 14-15 years old with logical schemas made on paper, based mainly on simple mathematical algorithms like solving equations of second degree, finding prime or special numbers, and other simple tricks from the mathematical world available for a student at that age. It was challenging to learn programming based only on schemas, though, looking back, I think it was the best learning basis a programmer could have, because it allowed me thinking logically and it was also a good exercise, as one was forced to validate mentally or on paper the outputs.

Then I moved to learning Basic and later Pascal on old generation Spectrum computers, mainly having a keyboard with 64K memory and an improvised monitor. It felt almost like a holiday when one had the chance to work 45 minutes or so on an IBM computer with just 640K memory. It was also a motivation to stay long after hours to write a few more lines of code. Even if it made no big difference in what concerns the speed, the simple idea of using a more advanced computer was a big deal.

The jump from logical schemas to actual programming was huge, as we moved from static formulas to exploratory methods like the ones of finding the roots of equations of upper degrees by using approximation methods, working with permutations and a few other combinatoric tools, interpolation methods, and so on. Once I got my own 64K Spectrum keyboard, a new world opened, having more time to play with 2- and 3-dimensional figures, location problems and so on. It was probably the time I got most interesting exposure to things not found in the curricula.  

Further on, during the university years I moved to Fortran, back to Pascal and dBASE, and later to C and C++, the focus being further on mathematical and sorting algorithms, working with matrices, and so on. I have to admit that it was a big difference between the students who came from 2-3 hours of Informatics per week (like I did) and the ones coming from lyceums specialized on Informatics, this especially during years in which learning materials were almost inexistent. In the end all went well.

The jumping through so many programming languages, some quite old for the respective times, even if allowed acquiring different perspectives, it felt sometimes like  a waste of time, especially when one was limited to using the campus computers, and that only during lab hours. That was the reality of those times. Fortunately, the university years went faster than they came. Almost one year after graduation, with a little help, some effort and benevolence, I managed to land a job as web developer, jumping from an interlude with Java to ASP, JavaScript, HTML, ColdFusion, ActionScript, SQL, XML and a few other programming languages ‘en vogue’ during the 2000.

Somewhere between graduation and my first job, my life changed when I was able to buy my own PC (a Pentium). It was the best investment I could make, mainly because it allowed me to be independent of what I was doing at work. It allowed me learning the basics of OOP programming based on Visual Basic and occasionally on Visual C++ and C#. Most of the meaningful learning happened after work, from the few books available, full of mistakes and other challenges.

That was my beginning. It is not my intent to brag about how much or how many programming languages I learned - knowledge is anyway relative - but to differentiate between the realities of then and today, as a bridge over time.

Previous Post <<||>>  Next Post

21 April 2019

#️⃣Software Engineering: Programming (Part VIII: Pair Programming)

Software Engineering
Software Engineering Series

“Two heads are better than one” – a proverb whose wisdom is embraced today in the various forms of harnessing the collective intelligence. The use of groups in problem solving is based on principles like “the collective is more than the sum of its individuals” or that “the crowds are better on average at estimations than the experts”. All well and good, based on the rationality of the same proverb has been advanced the idea of having two developers working together on the same piece of code – one doing the programming while the other looks over the shoulder as a observer or navigator (whatever that means), reviewing each line of code as it is written, strategizing or simply being there.

This approach is known as pair programming and considered as an agile software development technique, adhering thus to the agile principles (see the agile manifesto). Beyond some intangible benefits, its intent is to reduce the volume of defects in software and thus ensure an acceptable quality of the deliverables. It’s also an extreme approach of the pear review concept.
Without considering whether pair programming adheres to the agile principles, the concept has several big loopholes. The first time I read about pair programming it took me some time to digest the idea – I was asking myself what programmer will do that on a daily basis, watching as other programmers code or being watched while coding, each line of code being followed by questions, affirmative or negative nodding… Beyond their statute of being lone wolves, programmers can cooperate when the tasks ahead requires it, however to ask a programmer watch actively as others program it won’t work on the long run!

Talking from my own experience as programmer and of a professional working together with other programmers, I know that a programmer sees each task as a challenge, a way of learning, of reaching beyond his own condition. Programming is a way of living, with its pluses and minuses.
Moreover, the complexity of the tasks doesn’t resume at handling the programming language but of resolving the right problem. Solving the right problem is not something that can one overcome with brute force but with intelligence. If using the programming language is the challenge then the problem lies somewhere else and other countermeasures must be taken!

Some studies have identified that the use of pair programming led to a reduction of defects in software, however the numbers are misleading as long they compare apples with pears. To statistically conclude that one method is better than the other means doing the same experiment with the different methods using a representative population. Unless one addressees the requirements of statistics the numbers advanced are just fiction!

Just think again about the main premise! One doubles the expenditure for a theoretical reduction of the defects?! Actually, it's more than double considering that different types of communication takes place. Without a proven basis the effort can be somewhere between 2.2 and 2.5 and for an average project this can be a lot! The costs might be bearable in situations in which the labor is cheap, however programmers’ cooperation is a must.

The whole concept of pair programming seems like a bogus idea, just like two drivers driving the same car! This approach might work when the difference in experience and skills between developers is considerable, that being met in universities or apprenticeship environments, in which the accent is put on learning and forming. It might work on handling complex tasks as some adepts declare, however even then is less likely that the average programmer will willingly do it!


02 January 2017

#️⃣Software Engineering: Programming (Part VII: Documentation - Lessons Learned)

Software Engineering
Software Engineering Series

Introduction


“Documentation is a love letter that you write to your future self.”
Damian Conway

    For programmers as well for other professionals who write code, documentation might seem a waste of time, an effort few are willing to make. On the other side documenting important facts can save time sometimes and provide a useful base for building own and others’ knowledge. I found sometimes on the hard way what I needed to document. With the hope that others will benefit from my experience, here are my lessons learned:

 

Lesson #1: Document your worked tasks


“The more transparent the writing, the more visible the poetry.”
Gabriel Garcia Marquez


   Personally I like to keep a list with what I worked on a daily basis – typically nothing more than 3-5 words description about the task I worked on, who requested it, and eventually the corresponding project, CR or ticket. I’m doing it because it makes easier to track my work over time, especially when I have to retrieve some piece of information that is somewhere else in detail documented.

   Within the same list one can track also the effective time worked on a task, though I find it sometimes difficult, especially when one works on several tasks simultaneously. In theory this can be used to estimate further similar work. One can use also a categorization distinguishing for example between the various types of work: design, development, maintenance, testing, etc. This approach offers finer granularity, especially in estimations, though more work is needed in tracking the time accurately. Therefore track the information that worth tracking, as long there is value in it.

   Documenting tasks offers not only easier retrieval and base for accurate estimations, but also visibility into my work, for me as well, if necessary, for others. In addition it can be a useful sensemaking tool (into my work) over time.

Lesson #2: Document your code


“Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.”
Damian Conway

    There are split opinions over the need to document the code. There are people who advise against it, and probably one of most frequent reasons is rooted in Agile methodology. I have to stress that Agile values “working software over comprehensive documentation”, fact that doesn’t imply the total absence of documentation. There are also other reasons frequently advanced, like “there’s no need to document something that’s already self-explanatory “(like good code should be), “no time for it”, etc. Probably in each statement there is some grain of truth, especially when considering the fact that in software engineering there are so many requirements for documentation (see e.g. ISO/IEC 26513:2009).

   Without diving too deep in the subject, document what worth documenting, however this need to be regarded from a broader perspective, as might be other people who need to review, modify and manage your code.

    Documenting code doesn’t resume only to the code being part of a “deliverables”, but also to intermediary code written for testing or other activities. Personally I find it useful to save within the same fill all the scripts developed within same day. When some piece of code has a “definitive” character then I save it individually for reuse or faster retrieval, typically with a meaningful name that facilitates file’s retrieval. With the code it helps maybe to provide also some metadata like: a short description and purpose (who and when requested it).

   Code versioning can be used as a tool in facilitating the process, though not everything worth versioning.

 

Lesson #3: Document all issues as well the steps used for troubleshooting and fixing


“It’s not an adventure until something goes wrong.”
Yvon Chouinard

   Independently of the types of errors occurring while developing or troubleshooting code, one of the common characteristics is that the errors can have a recurring character. Therefore I found it useful to document all the errors I got in terms of screenshots, ways to fix them (including workarounds) and, sometimes also the steps followed in order to troubleshoot the problem.

   Considering that the issues are rooted in programming fallacies or undocumented issues, there is almost always something to learn from own as well from others’ errors. In fact, that was the reasons why I started the “SQL Troubles” blog – as a way to document some of the issues I met, to provide others some help, and why not, to get some feedback.

 

Lesson #4: Document software installations and changes in configurations


   At least for me this lesson is rooted in the fact that years back quite often release candidate as well final software was not that easy to install, having to deal with various installation errors rooted in OS or components incompatibilities, invalid/not set permissions, or unexpected presumptions made by the vendor (e.g. default settings). Over the years installation became smoother, though such issues are still occurring. Documenting the installation in terms of screenshots with the setup settings allows repeating the steps later. It can also provide a base for further troubleshooting when the configuration within the software changed or as evidence when something goes wrong.


   Talking about changes occurring in the environment, not often I found myself troubleshooting something that stopped working, following to discover that something changed in the environment. It’s useful to document the changes occurring in an environment, importance stressed also in “Configuration Management” section of ITIL® (Information Technology Infrastructure Library).

 

Lesson #5: Document your processes


“Verba volant, scripta manent.” Latin proverb
"Spoken words fly away, written words remain."

    In process-oriented organizations one has the expectation that the processes are documented. One can find that it’s not always the case, some organization relying on the common or individual knowledge about the various processes. Or it might happen that the processes aren’t always documented to the level of detail needed. What one can do is to document the processes from his perspective, to the level of detail needed.

 

Lesson #6: Document your presumptions


“Presumption first blinds a man, then sets him a running.”
Benjamin Franklin

   Probably this is more a Project Management related topic, though I find it useful also when coding: define upfront your presumptions/expectations – where should libraries lie, the type and format of content, files’ structure, output, and so on. Even if a piece of software is expected to be a black-box with input and outputs, at least the input, output and expectations about the environment need to be specified upfront.

 

Lesson #7: Document your learning sources


“Intelligence is not the ability to store information, but to know where to find it.”
Albert Einstein

    Computer specialists are heavily dependent on internet to keep up with the advances in the field, best practices, methodologies, techniques, myths, and other knowledge. Even if one learns something, over time the degree of retention varies, and it can decrease significantly if it wasn’t used for a long time. Nowadays with a quick search on internet one can find (almost) everything, though the content available varies in quality and coverage, and it might be difficult to find the same piece of information. Therefore, independently of the type of source used for learning, I found it useful to document also the information sources.

 

Lesson #8: Document the known as well the unknown

 

“A genius without a roadmap will get lost in any country but an average person
with a roadmap will find their way to any destination.”
Brian Tracy

   Over the years I found it useful to map and structure the learned content for further review, sometimes considering only key information about the subject like definitions, applicability, limitations, or best practices, while other times I provided also a level of depth that allow me and others to memorize and understand the topic. As part of the process I attempted to keep the  copyright attributions, just in case I need to refer to the source later. Together with what I learned I considered also the subjects that I still have to learn and review for further understanding. This provides a good way to map what I known as well what isn’t know. One can use for this a rich text editor or knowledge mapping tools like mind mapping or concept mapping.


Conclusion


    Documentation doesn’t resume only to pieces of code or software but also to knowledge one acquires, its sources, what it takes to troubleshoot the various types of issues, and the work performed on a daily basis. Documenting all these areas of focus should be done based on the principle: “document everything that worth documenting”.



25 March 2011

🧭Business Intelligence: Troubleshooting (Part II: Approaching a Query)

Business Intelligence Series
Business Intelligence Series

Introduction

You received a (long) query for troubleshooting, reviewing, conversion or any similar tasks. In addition, you don’t know much about the underlying table structure or business logic. So, what do you do then? For sure two things are intuitively clear: you don’t need to panic and, understanding the query may help you in your task. Understanding the query, it seems such a simple statement, though there is more to it. Here are some points on how to approach a query.

State your problem
 
“A problem well stated is a problem half solved” (Charles F. Kettering). Before performing any work, check what’s requested from you, whether you are having the information required for the task(s) ahead, for example documentation, valid examples, all code, etc. If something is missing, don’t hesitate to request all the information you need. While waiting for information, you can continue with next steps. As we don’t live in a perfect world, there will be also cases in which you’ll have to fill the gaps by yourself by performing additional research/work. When troubleshooting is important to understand what’s wrong and, when possible, have data against which to compare query’s output.

Save the work

Even if you are having a copy of the query somewhere on the server, save the previous version of the query and, when possible, use versioning. It might seem a redundant task, however the fact is that you never know when you need to refer to it and, as you’ll see next, it can/should be used as a baseline for validating the changes. In case you haven’t saved the query, check whether your RDBMS is tracking metadata about the queries run and, if the metadata were not reset in the meantime, you might be lucky enough to find a copy of your query.

I found that is important to save the daily work, the various analysis performed in order to understand a query, the various versions and even the data used for testing. All this work could help you letter to review what you made, the steps you missed, you can reuse one of the queries for further work, etc.

Break down

When the query is too complex, it could be useful to break the query into chunks that could be run and understood in isolation. Typically such chunks derive from query’s structure (e.g. inline queries, subqueries derived from unions). I found that often, focusing only a chunk of a query help isolating issues.

Restructure

Many programmers still write queries using the old non-ANSI joining syntax in which the join constraints appear in the WHERE clause, making the understanding and troubleshooting of a query more difficult. Often I found myself in the position of transforming first a query to ANSI SQL syntax, before performing further work on it. It’s actually a good occasion to gain a first understanding of query’s structure, but I’d prefer not to do it so often. In addition, during restructure phase it makes sense to differentiate between the join and filter constraints, this helping isolating the issue(s).

Check cardinalities

Wrong join constraints lead to duplicates or fewer records than expected, such differences being difficult to track when the variances in the numbers of records are quite small. Even if RDBMS come in developers’ help by providing metadata about the join relations, the columns and predicates participating in a join are not always so easy to identify. Therefore, in order to address this issue, it’s needed to check the constraints between any two tables between participating in a join. Sometimes, when the query is based on the table with the lowest level of detail, it can be enough to check the variations of the number of records.

Check filter constraints

Filter constraints are maybe more difficult to identify, especially when is needed to reengineer the logic built in applications. Many of the filter constraints are logical, though when you have no documentation about the schemas, is like rambling in the dark, having to check real examples and identify the various values and the impact they have on the behavior of your report.

Validate changes
 
So, you made the changes, everything looks perfect. Is it so? Often your intuition might tell you that the logic of a query is correct, though as software is not based on magic, at least not all the time, check some of the records to assure that the data are rendered as expected, check totals, compare the current with previous version, identify variations, etc. You don’t need to use all the technique you know, but to choose the best and minimal set of tools that allows you to validate the query.

Perform refactoring
    
Refactoring, the way to (continuous) code improvement, should become part of each developer’s philosophy about programming. A query, as any other piece of code, is rarely perfect as technical and factual knowledge is relative, features get deprecated and new techniques are introduced. On the other side, there is an old saying in IT – don’t change something that’s already working, so, there should be kept a balance between the two – the apparent and needed for change.

Document
    
I hope it’s not the case to stress the importance of documentation. From versioning to logic description, it’s a good practice to document the important parts of your work, especially the parts that will facilitate later work.

24 April 2010

🧭Business Intelligence: Troubleshooting (Part I: A Problem Solving Approach)

Business Intelligence Series
Business Intelligence Series


Introduction

In several occasions I observed that there are SQL and non-SQL developers who don’t know how to troubleshoot a programming problem in general, respectively a SQL-related issue in particular, and I’m not referring here to the complex problems that typically require the expertise of a specialist, but simple day to day situations: troubleshooting an error thrown by the database engine, an error in the logic, a performance issue, unavailability of resources, etc. I’m not necessarily talking here about the people posting questions on forums, professional networks or blogs, even if in many situations they could have found an answer to the problem by doing a little research, but seeing developers actually at work. It’s true that there are also many cases in which the software throws an error message that you don’t know from where to start or that the error is pointed as appearing at other line than at the line where actually the error occurs, leading the developer to a false thread.

Before going into detail let’s take a short look at troubleshooting and what it means! Paraphrasing Wikipedia’s general definition for troubleshooting, troubleshooting in IT is a type of problem solving applied to software and infrastructure related issues. Software issues refer not only to the various types of errors thrown by software applications, but also to functional, rendering or configuration errors, performance issues, data quality issues, etc. Infrastructure related issues could refer to the IT infrastructure – network, information systems, processes, methods or methodologies used. In this post I will refer only to the software issues even if the technique(s) for troubleshooting this kind of issues could be applied also to infrastructure issues.

Polya’s Approach to Problem Solving

In his book 'How To Solve It', G. Polya, a well known Hungarian mathematician, advances a 4 step natural approach in solving a problem: 1. understanding the problem, 2. devising a plan, 3. carrying out the plan, and 4. looking back [1]. G. Polya’s approach could be used for all types of problems, including IT problems, and even if I find this approach too high level for solving this type of problems, it’s actually a cornerstone on which more detailed approaches could be used. Let’s look shortly at each of Polya’s four steps!

1. Understanding the problem 
 
Understanding the problem resumes in identifying what is known, the data, the actual facts, and what is not known, what causes the issue and how it will be solved. Somebody was saying that a problem well understood is half solved, and there are quite good chances to arrive to the wrong solution if the problem is not well understood. If in Mathematics the problem is defined beforehand together with the whole range of constraints, in IT for example, when troubleshooting the problem needs to be defined,  in the context of this post the problem revolving around a technical or business issue appearing in the form of an error message, un unexpected/wrong application behavior, wrong process, etc. Thus the actual facts could resume to the error message, current vs. expected behavior, tools used, high/low level design, business logic, affected objects, specific constraints, etc. 
 
Defining the issue/problem might not be as simple as it seems, especially when the issue is pointed by other people in informal non-technical terminology, fuzzy formulations like “there is an error in the XYZ screen” without actually detailing what the issue is about, the steps followed and the input that resulted in the respective issue, and other such aspects that need to be addressed in order to understand the problem. All these aspects are not known by the developer though with a little investigation they are transformed in known information, this involving communication with the users, looking in documentation, and gathering any other facts. Actually we could group all this actions under “gathering the facts” syntagma, and this type of operations could be considered as part of this step because they are intrinsic in what concerns problem understanding.

2. Devising a plan 
 
In this step is attempted to find the connection between the data and the unknown, looking at the problem from different angles in order to obtain an idea of the solution, to make a plan [1]. We have a plan when we know which steps we have to follow in order to identify the issue (solve the problem), they don’t have to be too detailed, but addressable, not necessarily complete but as a base that could be evolved with time, for example when new information/results are found. It could be multiple directions to look into, for example based on possible list of causes, constraints the various features comes with, different features for implementing the same thing, etc. 
 
Naturally the first question a developer should ask: have I seen this issue before in actual or slightly modified form? Could be the problem broken down to smaller (known) problems? Could be derived anything useful from the data, have been considered all essential notions involved in the problem [1]? Essential notions, that’s always a thing to look into, mainly because I would say that many issues derive from feature constraints or from misuse of features. There could be used tools brainstorming, check lists, root-cause analysis, conceptual mapping, in fact any tool which could help us to track the essential notions and the relations between them.

3. Carrying out the plan 
 
Once the plan sketched, we could go on and approach each of the branches of the plan, performing the successive steps in one branch until we find an end-point (a point in which we can’t go further). There could be branches going nowhere, multiple solutions, or no apparent solution to the problem. Everything is possible… More likely while advancing in carrying out the plan, we could discover other intermediary steps, other branches (alternatives of arriving to the same result or to approach different constraints).

4. Looking back 
 
According to Polya, this step resumes to examining the solution [1], reviewing the argumentation used, solution’s construction, on whether the solution is optimal, on whether it could be reused to solve other types of problems or whether it could be improved/refactored. Actually this is a step many developers completely ignore, they found a solution, it’s working so their work is done! No, even when pressed by time should be considered also these aspects of problem solving, and from my point of view this step includes also steps like documenting the issue, and in special cases communicating the solution found to the circle of professionals (e.g. in terms of best practices or lessons learned, why not a blog post, etc.). Topics like optimality and refactoring  and are quite complex and deserve a post of their own, therefore I will resume myself to mention only the fact that they typically consider the solution from the point of view of performance, complexity, (re)usability and design, the developer having to trade between them and other similar (quality) dimensions.

Beyond Polya’s Approach

A natural question: do we really have to follow this approach?! Come on, there will be cases when you’ll have the solution without actually attempting to define the problem (explicitly) or devise a plan (explicitly), or only by listing the scope and the constraints! Unconsciously we are actually following the first three steps, but forget or complete ignore the fourth, and I feel that applying Polya’s approach brings some “conscious thought” in this process that could help us make the most of it. 
 
In many cases the solution will be there in documentation, giving developers some explicit or implicit hints about the areas in which to search, for example in case of an error related to a query a first input is the error message returned by the database engine. Fortunately RDBMS vendors like Microsoft and Oracle provide also a longer description for each error, allowing thus to understand what the error message is about. This is the happiest case, there are also many software tools that after they run half of hour, they return a fuzzy error message (e.g. ‘an error occurred!’), and nothing more.

Thank God for the Internet, a dynamic knowledge repository, in which lot of knowledge could be find with just a simple click, but also sensitive to the input. In many cases I could found one or more solutions or hints for an error I had, usually just by copy pasting the error number of the error description, or when the description is too long, only the most important part. I observed, that there is an quite important number of professionals that prefer to post their issue in a forum or professional group instead of doing some research by themselves, this lack of effort helping to increase the volume of redundant information on the web, this coming with negative but also positive implications. 

When we perform such a search, we actually rely on the solution provided by other users, shortcutting the troubleshooting process, and with the risk of repeating the same syntagma, it comes with negative but also positive implications. For example, a negative aspect is that people don’t learn how to troubleshoot by themselves relying instead on ready-available solution, while a positive aspect is that less time is spent within troubleshooting process, at least in theory. Actually, considering the mentioned positive aspect, that’s also why I consider as important the “looking back” step, and I’m referring especially at documenting the issue action.
 

References:
[1] G. Polya (1973) How To Solve It: A New Aspect of Mathematical Method, 2nd Ed.Stanford University.  ISBN: 0-691-08097-6.

21 April 2010

#️⃣Software Engineering: Programming (Part II: To get or not Certified?!)

Software Engineering
Software Engineering Series

To get or not certified?! That’s a question I asked myself several times along the years, and frankly it doesn’t have an easy answer because there are many aspects that need to be considered: previous education, targeted certification, availability of time, financial resources or learning material, required software, hand-on experience, certification’s costs, duration/frequency, objectives, value (on the market) or requirements, contexts, etc. 
In many occasions when I had most of the conditions met then I didn’t had the time to do it, or I waited to appear the requirements for the new set of certifications, referring mainly to SQL Server 2005 and 2008 versions, or I preferred to continue my “academic” studies, so here I am after almost 10 years of experience in the world of SQL without any certification, but, I would say, with a rich experience covering mainly full-life cycle development of applications, reporting, data quality and data integration, ETL, etc. 

Enough with the talking about myself and get to the subject. I’ve seen recently this topic appearing again in 1-2 professional groups, so I’ll try to approach this topic from a general point of view because most of the characteristics could apply also to database-related certifications like Microsoft MCITP (Microsoft Certified IT Professional) or MCTS (Microsoft Certified Technology Specialist) for SQL Server.

Naturally, in what concerns the certification, the opinions between professionals are split, an often met argument against it is the believe that a certification is just a piece of paper having a limited value without being backed-up by adequate hand-on experience, while the pro-argumentation is that some companies, employers and customers altogether, are valuing a certification, considering it as a personal achievement reflecting not only owners’ commitment to approach and take a certification exam, but also a basic level of knowledge. Both views are entirely correct from their perspective, weighting differently from person to person, community to community or from one domain of expertise to another, and they have positive and negative aspects, many subjective aspects as they are related to people’s perception.

From a global perspective an IT “certification fulfills a great need by providing standardized exams for the most current and important technologies” [3], allowing judging people’s knowledge on the topics encompassed by it[1], being thus a way to quantify knowledge especially related to general tasks. Certifications offer a path to guide the study of a domain [2], are developed around agreed-upon job tasks [3] and consider a basic knowledge base made of vocabulary, definitions, models, standards, methods, methodologies, guidelines or best practices.

Given the fact that a certification covers most of the topics from a given domain, in theory it provides a wide but of superficial depth coverage of the respective domain, in contrast with the hand-on experience, the professional experience accumulated by solving day-to-day tasks, which provides a narrower (task-based) but deeper coverage of the respective domain. Therefore, from my point of view the two are not necessarily complementary but could offer together a wide and deep coverage of the domain, a certification needs somehow to be based on a certain number of years of hand-on experience in order to get more value out of it. 

On the other side, the variety in hand-on experience could offer wider coverage of the domain, though I suppose that could be accomplished fully by had-on experience but in a longer unit of time. These suppositions are fully theoretical because there are many other parameters that need to be considered, for example a person’s capacity of learning by doing vs. theoretical learning (this involves also understanding of concepts), the learning curve and particularities of the technologies, methods or methodologies involved, the forms of training used, etc.

A certification is not meaningless, as several professionals advance (e.g. J. Shore, T. Graves & others), even when considered from employers’ perspective, and the fact that it doesn’t count for some employers or professionals, that’s another story. A certification could be considered eventually useless, though also that’s not fully true. Maybe the certification itself is useless for a third party, though it’s not from the point of view of the learning process, as long the the knowledge accumulated is further used and the certification is not an end in itself.  
A certification is not or it shouldn’t be an end in itself, it should be a continuous learning process in which knowledge is perpetually discovered, integrated and reused. Most probably in order to keep the learning process continuous several certifications, including MCITP, require to be recertified after a number of years.

There are professional certifications that require provable experience in the respective domain before actually being accepted for a certification, it’s the example of PMP (Project Management Professional) and CAPM (Certified Associate in Project Management) certifications from PMI (Project Management Institute) that require a considerable amount of hours of non-overlapping direct or indirect PM experience, and the example is not singular, if I’m not mistaking also the CISP (Certified Information System Security Professional) certification requires a certain number of years of experience. This type of requirement allows in theory to make most of the learning process being facilitated the integration of knowledge with experience.

How useful is a certification for the certified person?! It depends also how much a certification succeeds in covering the knowledge, skills and abilities required by an actual job, how much of the knowledge acquired will be later used. There are people who focus only on taking the exam, nothing more, though I would say that might come with other downsides on the long term. There are even organizations that encourage and even sponsor their employees’ certification either by providing training material, courses, partial or full-expenses, such initiatives being often part of their strategic effort of creating value and a knowledge-based environment, the professional certification being also a form of recognition, being valued in what concerns employees performance, eventually associated also with a form of remuneration.

I think that a certification could be beneficial for a person with relatively small or no professional experience in a certain domain, the certification bridging to a small degree the gap to hand-on experience. It could be interesting to study whether the on-hand experience could be compensated to some degree by attempting to (re)use the learned concepts in self-driven applications or several examples. 
When learning something new I found it useful to try writing a tutorial or a blog post using a well-defined example, though this won’t replace entirely the on-hand experience, the difference between the two being the limited vs. the global scope of handling tasks, in dealing with real-life situations. Most probably it could be also useful to learn about the use of a technique/technology in several contexts, though this equates with lot of research and effort spent in that direction. Does it worth to do that?!

A certification is an opportunity to enter in a “select” circle of professionals, though now it depends also how each vendor or group of certificates takes advantage of this “asset” and what other benefits are derived out of it. For example by publishing domain related content certificates could be kept up-to-date with new features, trends, best practices, etc., the professional network thus created could benefit of the potential such networks offer especially when considering problem solving, the creation, propagation and mapping of knowledge, etc. Of course, such networks could have also side effects, for example the creation of exclusivist networks. I would say that the potential of professional networks is still theoretic, but with the evolution of the Web new possibilities will emerge.

A person taking such a certification arrives in theory to cover most of the important topics related to a given domain, however this doesn’t guarantee that the person is actually capable of applying (successfully) the concepts and techniques in real life scenarios, the many “brain dumps” and other easy ways of taking a certification decreasing certification’s credibility and value. There are domains over-flooded by people with certifications but not having the skills to approach a real project, a company that gives too much credit to a certification could end up stuck with resources that can’t be used, this aspect impacting negatively other professionals too. 
I’m coming back to the idea that a certification is subject of people’s perception and I have to say that the most important opinion in this direction is not necessarily professionals’ opinion activating in the respective domain, but of the people from HR, PM and partially headhunters, because they are the ones who are making the selection, deciding who’s hired and who’s not. Considering that there are few professionals from HR and PM that are coming from the IT domain, there are lot of false and true presumptions when evaluating such candidates, people arriving to come with their own methods of filtering the candidates, and even if such methods are efficient from the result perspective, many good professional could feel kind of “discriminated”.

In theory it’s easier to identify a person who has a certification than to navigate through the huge collection of related projects and tasks, or to search in a collection of CVs for the various possible combinations or significant terms related to a job description. Somebody (sorry, I don’t remember who) was saying that a manager spends on average 20-30 seconds for each CV, now it depends also how eye-catching is a certification in a simple CV scanning. 
From a semantic point of view I would say that a certification is richer in meaning than any type of written experience, though now it depends also on reviewer’s knowledge about the respective certification. Sure is that when choosing between two professionals with similar experience there are high chances for the one having a certification to be hired. In addition, considering that there are hundreds of applicants for the good jobs on the market, I would say that a certification could allow a candidate, between many other criteria, to distinguish himself from the crowd.

Given the explosion of technologies from IT, domain’s dynamics, segmentation and other intrinsic characteristics , the IT certifications are more specialized, more segmented and less standardized, making difficult their evaluation, especially when domains intersect each other or when the vendors emitting the certifications are competing against each other. Compared with other domains, an IT professional needs to be always up-to-date, cover multiple related domains in order to do his work efficiently, for example in order to provide a full-life cycle solution a developer would have to be kind of expert in Software Engineering, UI and database programming, security, testing, etc. The high segmentation in IT could be seen also in the denominations for the various roles, lot of confusion deriving from this, especially when matching the job descriptions with the roles.

Must be considered also the bottom line: in IT as also in other domains, the knowledge and experience is relative because it depends also on person’s skills and ability of assimilating, using, reusing (creatively) the knowledge given in a domain; a person could in theory accumulate in one year same experience as others in 2 or more years, same as a person who got certified could in theory handle day-to-day tasks without any difficulty, same as in theory a student with no professional experience could handle programming tasks like a professional with several years of experience. 
At least in my country, there are many domains in University that provide also IT-related curricula within non-IT domains (e.g. Mathematics, Economics, Engineering), a number of programming courses being thought also in high school or even lower grades, the theory learned and the small projects facilitating theoretically the certification for a programming language (e.g. C#, Java or C++) or of directly handing day-to-day tasks. It’s true that in school is insisted more on the syntax, basic features and algorithmic nature of programming, but this doesn’t diminish the value of this type of learning when done adequately. Such educational experience is not considered as professional experience at all, even if it provides a considerable advantage when approaching a certification or a job.

It must be highlighted that taking a certification comes with no guarantees for getting a job or being successful in your carrier/profession. You have to ask yourself honestly what you want to achieve with a certification, how you’ll use the learning process in order to get most of it. You actually have to enjoy the road to the final destination rather than dreaming about the potential success brought by such a certification. It could take actually more time until you’ll recover your investment or you’ll see that the actual invested time worth, and, as always some risks need to be assumed. Consider the positive and negative aspects altogether and decide by yourself if it makes sense to go for a certification.

There is actually a third choice – continuing the academic studies, for example pursuing a bachelor, masters or why not, a doctoral degree. The approach of a such a degree imposes similar questions as in the case of a certification, though academic degrees are in theory better accepted by the society even if they come with no guarantees too, require more effort and financial resources.


References:
[1] K. Forsberg, H. Mooz, H. Cotterman. (2005). Visualizing Project Management: Models and Frameworks for Mastering Complex Systems. John Wiley & Sons. ISBN: 0-978-0-471-64848-2.
[2] D. Gibson (2008). MCITP SQL Server 2005 Database Developer All-In-One Exam Guide. McGraw-Hill. ISBN: 978-0071546690.
[3]  L.A. Snyder, D.E. Rupp. G.C. Thornton (2006). Personnel Selection of Information Technology Workers: The People, The Jobs, and Issues for Human Resources Management. Research in Personnel and Human Resources Management, Vol. 25, Martocchio J.J. (Ed.). JAI Press. ISBN: 978-0762313273.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.