Showing posts with label versioning. Show all posts
Showing posts with label versioning. Show all posts

19 July 2025

🏗️Software Engineering: Versioning (Just the Quotes)

"Programs are not used once and discarded, nor are they run forever without change. They evolve. The new version of the integration program has a greater likelihood of surviving changes later without acquiring bugs. It assists instead of intimidating those who must maintain it." (Brian W Kernighan & Phillip J Plauger, "The Elements of Programming Style", 1974)

"Systems with unknown behavioral properties require the implementation of iterations which are intrinsic to the design process but which are normally hidden from view. Certainly when a solution to a well-understood problem is synthesized, weak designs are mentally rejected by a competent designer in a matter of moments. On larger or more complicated efforts, alternative designs must be explicitly and iteratively implemented. The designers perhaps out of vanity, often are at pains to hide the many versions which were abandoned and if absolute failure occurs, of course one hears nothing. Thus the topic of design iteration is rarely discussed. Perhaps we should not be surprised to see this phenomenon with software, for it is a rare author indeed who publicizes the amount of editing or the number of drafts he took to produce a manuscript." (Fernando J Corbató, "A Managerial View of the Multics System Development", 1977)

"When the main design gets changed (as it will), you now have to think about where this design also exists. If you’re in this mode, you are either guaranteeing extra work to keep things in synch or you have a huge versioning problem where it is unclear which version to trust. The former will add time and costs. The latter can introduce errors and affect quality!" (F Alan Goodman, "Defining and Deploying Software Processes", 2006)

"If your code needs comments, consider refactoring it so it doesn’t. Lengthy comments can clutter screen space and might even be hidden automatically by your IDE. If you need to explain a change, do so in the version control system check-in message and not in the code." (Peter Sommerlad, [in Kevlin Henney’s "97 Things Every Programmer Should Know", 2010])

"Releasing software should be easy. It should be easy because you have tested every single part of the release process hundreds of times before. It should be as simple as pressing a button. The repeatability and reliability derive from two principles: automate almost everything, and keep everything you need to build, deploy, test, and release your application in version control." (David Farley & Jez Humble, "Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation", 2010)

"The deployment pipeline has its foundations in the process of continuous integration and is in essence the principle of continuous integration taken to its logical conclusion. The aim of the deployment pipeline is threefold. First, it makes every part of the process of building, deploying, testing, and releasing software visible to everybody involved, aiding collaboration. Second, it improves feedback so that problems are identified, and so resolved, as early in the process as possible. Finally, it enables teams to deploy and release any version of their software to any environment at will through a fully automated process." (David Farley & Jez Humble, "Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation", 2010)

"Many smaller Scrum projects succeed with informal requirements mechanisms such as direct discussion between the Product Owner and Team, but as project complexity and criticality grows, more depth and richness of requirements expression and requirements versioning will likely be required. For example, documentation of interfaces that affect multiple teams becomes critical. Changes to interfaces or new features that cross team boundaries may have a significant impact on the project. These requirements should be elaborated on a just-in-time basis, meaning at, or just prior to the Sprint that implements the new functionality. To address this problem, teams may want centralized support for richer forms of requirements expression, their compilation for review and automated change notification." (Ken Schwaber & Jeff Sutherland, "Software in 30 days: How Agile managers beat the odds, delight their customers, and leave competitors in the dust", 2012)

"DevOps is essentially about gaining fast feedback and decreasing the risk of releases through a holistic approach that is meaningful for both development and operations. One major step for achieving this approach is to improve the fl ow of features from their inception to availability. This process can be refined to the point that it becomes important to reduce batch size" (the size of one package of changes or the amount of work that is done before the new version is shipped) without changing capacity or demand." (Michael Hüttermann et al, "DevOps for Developers", 2013)

"When people use different tools for similar activities" (e.g., version control, work tracking, documentation), they tend to form groups" (camps) around tool usage boundaries. [...] The more we are invested in certain tools, the greater the likelihood of deriving a part of our identity from the tool and its ecosystem." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

"Automated data orchestration is a key DataOps principle. An example of orchestration can take ETL jobs and a Python script to ingest and transform data based on a specific sequence from different source systems. It can handle the versioning of data to avoid breaking existing data consumption pipelines already in place." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data products should remain stable and be decoupled from the operational/transactional applications. This requires a mechanism for detecting schema drift, and avoiding disruptive changes. It also requires versioning and, in some cases, independent pipelines to run in parallel, giving your data consumers time to migrate from one version to another." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"When performing experiments, the first step is to determine what compute infrastructure and environment you need.16 A general best practice is to start fresh, using a clean development environment. Keep track of everything you do in each experiment, versioning and capturing all your inputs and outputs to ensure reproducibility. Pay close attention to all data engineering activities. Some of these may be generic steps and will also apply for other use cases. Finally, you’ll need to determine the implementation integration pattern to use for your project in the production environment." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Configuration is coding in a poorly designed programming language without tests, version control, or documentation." (Gregor Hohpe)

"God could create the world in six days because he didn't have to make it compatible with the previous version." (programmer folklore [attributed to Donald Knuth, Mark Twain])

"It is not usually until you’ve built and used a version of the program that you understand the issues well enough to get the design right." (Rob Pike)

"The third version is the first version that doesn't suck." (Mike Simpson)

11 May 2018

Application Support: One Database, Two Vendors, No Love Story or Maybe…

Data Warehousing

Introduction

Situation: An organization has several BI tools provisioned with data from the same data warehouse (DW), the BI infrastructure being supported by the same service provider (vendor). The organization wants to adopt a new BI technology, though for it must be brought another vendor into the picture. The data the tool requires are already available in the DW, though the DW needs to be extended with logic and other components to support the new tool. This means that two vendors will be active in the same DW, more generally in the same environment.

Question(s): What is the best approach for making this work? Which are the challenges for making it work, considering that two vendors?


Preliminary

    When you ask IT people about this situation, many will tell you that’s not a good idea, being circumspect at having two vendors within the same environment. Some will recall previous experience in which things went really bad, escalated to some degree. They will even show their disagreement through body language or increase tonality. Even if they had also good experiences with having two vendors support the same environment, the negative experiences will prevail. It’s the typical reaction to an idea when something that caused considerable trouble is recalled. This behavior is understandable as generally human tend to remember more the issues they had, rather than successes. Problems leave deeper marks than success, especially when challenges are seen as burdens.

    Reacting defensively is a result of the “I’ve been burned once” syndrome. People react adversely and tend to avoid situations in which they were burned, instead of dealing with them, instead of recognizing which were the circumstances that lead to the situation in the first place, of recognizing opportunities for healing and raising above the challenges.


   Personally, at a first glance, the caution would make me advise as well against having two or more vendors playing within same playground. I had my plate of extreme cases in which something went wrong and the vendors started acting like kids. Parents (in general people who work with children) know what I’m talking about, children don’t like to share their toys and parents often find themselves in the position of mediating between them. When the toy get’s broken it’s easy to blame other kid for it, same as somebody else must put the toy at its place, because that somebody played the last time with it. It’s a mix between I’m in charge and the blame game. Who needs that?

  At second sight, if parents made it, why wouldn’t professionals succeed in making two vendors work together? Sure, parents have more practice in dealing with kids, have such situations on a daily basis, and there are fewer variables to think about it… I have seen vendors sitting together until they come up with a solution, I’ve seen vendors open to communicate, putting the customer on the first place, even if that meant living the ego behind. Where there’s a will there’s a way.


The Solution Space

    In IT there are seldom general recipes that always lead to success, and whether a solution works or not depends on a serious of factors – environment, skills, communication, human behavior and quite often chance, the chance of doing the right thing at the right time. However, the recipe can be used as a starting point, eventually to define the best scenario, what will happen when everything goes well. At the opposite side there is the worst scenario, what will happen when everything goes south. These two opposite scenarios are in general the frame in which a solution can be defined.

    Within this frame one can add several other reference points or paths, and these are made of the experience of people handling and experiencing similar situations – what worked, what didn’t, what could work, what are the challenges, and so on. In general, people’s experience and knowledge prove to be a good estimator in decision making, and even if relative, it proves some insight into the problem at hand.


    Let’s reconsider the parents put in the situation of dealing with children fighting for the same toy, though from the perspective of all the toys available to play with. There are several options available: the kids could take (supervised) turn in playing with the toys, fact that could be a win-win situation if they are willing to cooperate. One can take the toys (temporarily) away, though this could lead to other issues. One can reaffirm who’s the owner of each toy, each kid being allowed to play only with his toy. One could buy a second toy, and thus brake the bank even if this will not make the issue entirely go away. Probably there are other solutions inventive parents might find.

    Similarly, in the above stated problem, one option, and maybe the best, is having the vendors share ownership for the DW by finding a way to work together. Defining the ownership for each tool can alleviate some of the problems but not all, same as building a second DW. We can probably all agree that taking the tools away is not the right thing to do, and even if it’s a solution, it doesn’t support the purpose.


Sharing Ownership

    Complex IT environments like the one of a DW depend on vendors’ capability of working together in reaching the same goal, even if in play are different interests. This presumes the disposition of the parties in relinquishing some control, sharing responsibilities. Unfortunately, not all vendors are willing to do that. That’s the point where imaginary obstacles are built, is where effort needs to be put to eliminate such obstacles.

    When working together, often one of the parties must play the coordinator role. In theory, this role can be played by any of the vendors, and the roles can even change after case. Another approach is when the coordinator role can be taken also by a person or a board from the customer side. In case of a data warehouse it can be an IT professional, a Project Manager or a BI Competency Center (BICC) . This would allow to smoothly coordinate the activities, as well to mediate the communication and other types of challenges faced.


    How will ownership sharing work? Let’s suppose vendor A wants to change something in the infrastructure. The change is first formulated, shortly reviewed, and approved by both vendors and customer, and will then be implemented and documented by vendor A as needed. Vendor B is involved in the process by validating the concept and reviewing the documentation, its involvement being minimized. There can be still some delays in the process, though the overhead is somehow minimized. There will be also scenarios in which vendor B needs only to be informed that a change has occurred, or sometimes is enough if a change was properly documented.

    This approach involves also a greater need for documentation, versioning, established processes, their role being to facilitate the communication and track the changes occurred in the environment.


Splitting Ownership

    Splitting the ownership involves setting clear boundaries and responsibilities within which each vendor can perform the work. One is forced thus to draw a line and say which components or activities belong to each vendor. 

    The architecture of existing solutions makes it sometimes hard to split the ownership when the architecture was not designed for it. A solution would be to redesign the whole architecture, though even then might not be possible to draw a clear line when there are grey areas. One needs eventually to consider the advantages and disadvantages and decide to which vendor the responsibility suits best.


    For example, in the context of a DW security can be enforced via schemas within same or different databases, though there are also objects (e.g. tables with basis data) used by multiple applications. One of the vendors (vendor A) will get the ownership of the objects, thus when vendor B needs a change to those table, it must require the change to vendor A. Once the changes are done the vendor B needs to validate the changes, and if there are problems further communication occurs. Per total this approach will take more time than if the vendor B would have done alone the changes. However, it works even if it comes with some challenges.

    There’s also the possibility to give vendor B temporary permissions to do the changes, fact that will shorten the effort needed. The vendor A will still be in charge, and will have to prove the documentation, and do eventually some validation as well.


Separating Ownership

    Giving each vendor its own playground is a costly solution, though it can be the only solution in certain scenarios. For example, when an architecture is supposed to replace (in time) another, or when the existing architecture has certain limitations. In the context of a DW this involves duplicating the data loads, the data themselves, as well logic, eventually processes, and so on.

    Pushing this just to solve a communication problem is the wrong thing to do. What happens if a third or a fourth vendor joins the party? Would it be for each vendor a new environment created? Hopefully, not…

    On the other side, there are also vendors that don’t want to relinquish the ownership, and will play their cards not to do it. The overhead of dealing with such issues may surpass in extremis the costs of having a second environment. In the end the final decision has the customer.


Hybrid Approach


    A hybrid between sharing and splitting ownership can prove to give the best from the two scenarios. It’s useful and even recommended to define the boundaries of work for each vendor, following to share ownership on the areas where there’s an intersection of concerns, the grey areas. For sensitive areas there could be some restrictions in cooperation.

    A hybrid solution can involve as well splitting some parts of the architecture, though the performance and security are mainly the driving factors.


Conclusion

   I wanted with this post to make the reader question some of the hot-brained decisions made when two or more vendors are involved in the supporting an architecture. Even if the problem is put within the context of a DW it’s occurrence extends far beyond this context. We are enablers and problem solvers. Instead of avoiding challenges we should better make sure that we’re removing or minimizing the risks. 

20 December 2017

🗃️Data Management: Versioning (Just the Quotes)

"There are two different methods to detect and collect changes: data versioning, which evaluates columns that identify rows that have changed (e.g., last-update-timestamp columns, version-number columns, status-indicator columns), or by reading logs that document the changes and enable them to be replicated in secondary systems."  (DAMA International, "DAMA-DMBOK: Data Management Body of Knowledge" 2nd Ed., 2017)

"Moving your code to modules, checking it into version control, and versioning your data will help to create reproducible models. If you are building an ML model for an enterprise, or you are building a model for your start-up, knowing which model and which version is deployed and used in your service is essential. This is relevant for auditing, debugging, or resolving customer inquiries regarding service predictions." (Christoph Körner and Kaijisse Waaijer, "Mastering Azure Machine Learning". 2020)

"Versioning is a critical feature, because understanding the history of a master data record is vital to maintaining its quality and accuracy over time." (Cédrine MADERA, "Master Data and Reference Data in Data Lake Ecosystems" [in "Data Lake" ed. by Anne Laurent et al, 2020])

"Versioning of data is essential for ML systems as it helps us to keep track of which data was used for a particular version of code to generate a model. Versioning data can enable reproducing models and compliance with business needs and law. We can always backtrack and see the reason for certain actions taken by the ML system. Similarly, versioning of models (artifacts) is important for tracking which version of a model has generated certain results or actions for the ML system. We can also track or log parameters used for training a certain version of the model. This way, we can enable end-to-end traceability for model artifacts, data, and code. Version control for code, data, and models can enhance an ML system with great transparency and efficiency for the people developing and maintaining it." (Emmanuel Raj, "Engineering MLOps Rapidly build, test, and manage production-ready machine learning life cycles at scale", 2021)

"DevOps and Continuous Integration/Continuous Deployment (CI/CD) are vital to any software project that is developed by more than one developer and needs to uphold quality standards. A central code repository that offers versioning, branching, and merging for collaborative development and approval workflows and documentation features is the minimum requirement here." (Patrik Borosch, "Cloud Scale Analytics with Azure Data Services: Build modern data warehouses on Microsoft Azure", 2021)

"Automated data orchestration is a key DataOps principle. An example of orchestration can take ETL jobs and a Python script to ingest and transform data based on a specific sequence from different source systems. It can handle the versioning of data to avoid breaking existing data consumption pipelines already in place." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data products should remain stable and be decoupled from the operational/transactional applications. This requires a mechanism for detecting schema drift, and avoiding disruptive changes. It also requires versioning and, in some cases, independent pipelines to run in parallel, giving your data consumers time to migrate from one version to another." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"When performing experiments, the first step is to determine what compute infrastructure and environment you need.16 A general best practice is to start fresh, using a clean development environment. Keep track of everything you do in each experiment, versioning and capturing all your inputs and outputs to ensure reproducibility. Pay close attention to all data engineering activities. Some of these may be generic steps and will also apply for other use cases. Finally, you’ll need to determine the implementation integration pattern to use for your project in the production environment." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

02 January 2017

#️⃣Software Engineering: Programming (Part VII: Documentation - Lessons Learned)

Software Engineering
Software Engineering Series

Introduction


“Documentation is a love letter that you write to your future self.”
Damian Conway

    For programmers as well for other professionals who write code, documentation might seem a waste of time, an effort few are willing to make. On the other side documenting important facts can save time sometimes and provide a useful base for building own and others’ knowledge. I found sometimes on the hard way what I needed to document. With the hope that others will benefit from my experience, here are my lessons learned:

 

Lesson #1: Document your worked tasks


“The more transparent the writing, the more visible the poetry.”
Gabriel Garcia Marquez


   Personally I like to keep a list with what I worked on a daily basis – typically nothing more than 3-5 words description about the task I worked on, who requested it, and eventually the corresponding project, CR or ticket. I’m doing it because it makes easier to track my work over time, especially when I have to retrieve some piece of information that is somewhere else in detail documented.

   Within the same list one can track also the effective time worked on a task, though I find it sometimes difficult, especially when one works on several tasks simultaneously. In theory this can be used to estimate further similar work. One can use also a categorization distinguishing for example between the various types of work: design, development, maintenance, testing, etc. This approach offers finer granularity, especially in estimations, though more work is needed in tracking the time accurately. Therefore track the information that worth tracking, as long there is value in it.

   Documenting tasks offers not only easier retrieval and base for accurate estimations, but also visibility into my work, for me as well, if necessary, for others. In addition it can be a useful sensemaking tool (into my work) over time.

Lesson #2: Document your code


“Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.”
Damian Conway

    There are split opinions over the need to document the code. There are people who advise against it, and probably one of most frequent reasons is rooted in Agile methodology. I have to stress that Agile values “working software over comprehensive documentation”, fact that doesn’t imply the total absence of documentation. There are also other reasons frequently advanced, like “there’s no need to document something that’s already self-explanatory “(like good code should be), “no time for it”, etc. Probably in each statement there is some grain of truth, especially when considering the fact that in software engineering there are so many requirements for documentation (see e.g. ISO/IEC 26513:2009).

   Without diving too deep in the subject, document what worth documenting, however this need to be regarded from a broader perspective, as might be other people who need to review, modify and manage your code.

    Documenting code doesn’t resume only to the code being part of a “deliverables”, but also to intermediary code written for testing or other activities. Personally I find it useful to save within the same fill all the scripts developed within same day. When some piece of code has a “definitive” character then I save it individually for reuse or faster retrieval, typically with a meaningful name that facilitates file’s retrieval. With the code it helps maybe to provide also some metadata like: a short description and purpose (who and when requested it).

   Code versioning can be used as a tool in facilitating the process, though not everything worth versioning.

 

Lesson #3: Document all issues as well the steps used for troubleshooting and fixing


“It’s not an adventure until something goes wrong.”
Yvon Chouinard

   Independently of the types of errors occurring while developing or troubleshooting code, one of the common characteristics is that the errors can have a recurring character. Therefore I found it useful to document all the errors I got in terms of screenshots, ways to fix them (including workarounds) and, sometimes also the steps followed in order to troubleshoot the problem.

   Considering that the issues are rooted in programming fallacies or undocumented issues, there is almost always something to learn from own as well from others’ errors. In fact, that was the reasons why I started the “SQL Troubles” blog – as a way to document some of the issues I met, to provide others some help, and why not, to get some feedback.

 

Lesson #4: Document software installations and changes in configurations


   At least for me this lesson is rooted in the fact that years back quite often release candidate as well final software was not that easy to install, having to deal with various installation errors rooted in OS or components incompatibilities, invalid/not set permissions, or unexpected presumptions made by the vendor (e.g. default settings). Over the years installation became smoother, though such issues are still occurring. Documenting the installation in terms of screenshots with the setup settings allows repeating the steps later. It can also provide a base for further troubleshooting when the configuration within the software changed or as evidence when something goes wrong.


   Talking about changes occurring in the environment, not often I found myself troubleshooting something that stopped working, following to discover that something changed in the environment. It’s useful to document the changes occurring in an environment, importance stressed also in “Configuration Management” section of ITIL® (Information Technology Infrastructure Library).

 

Lesson #5: Document your processes


“Verba volant, scripta manent.” Latin proverb
"Spoken words fly away, written words remain."

    In process-oriented organizations one has the expectation that the processes are documented. One can find that it’s not always the case, some organization relying on the common or individual knowledge about the various processes. Or it might happen that the processes aren’t always documented to the level of detail needed. What one can do is to document the processes from his perspective, to the level of detail needed.

 

Lesson #6: Document your presumptions


“Presumption first blinds a man, then sets him a running.”
Benjamin Franklin

   Probably this is more a Project Management related topic, though I find it useful also when coding: define upfront your presumptions/expectations – where should libraries lie, the type and format of content, files’ structure, output, and so on. Even if a piece of software is expected to be a black-box with input and outputs, at least the input, output and expectations about the environment need to be specified upfront.

 

Lesson #7: Document your learning sources


“Intelligence is not the ability to store information, but to know where to find it.”
Albert Einstein

    Computer specialists are heavily dependent on internet to keep up with the advances in the field, best practices, methodologies, techniques, myths, and other knowledge. Even if one learns something, over time the degree of retention varies, and it can decrease significantly if it wasn’t used for a long time. Nowadays with a quick search on internet one can find (almost) everything, though the content available varies in quality and coverage, and it might be difficult to find the same piece of information. Therefore, independently of the type of source used for learning, I found it useful to document also the information sources.

 

Lesson #8: Document the known as well the unknown

 

“A genius without a roadmap will get lost in any country but an average person
with a roadmap will find their way to any destination.”
Brian Tracy

   Over the years I found it useful to map and structure the learned content for further review, sometimes considering only key information about the subject like definitions, applicability, limitations, or best practices, while other times I provided also a level of depth that allow me and others to memorize and understand the topic. As part of the process I attempted to keep the  copyright attributions, just in case I need to refer to the source later. Together with what I learned I considered also the subjects that I still have to learn and review for further understanding. This provides a good way to map what I known as well what isn’t know. One can use for this a rich text editor or knowledge mapping tools like mind mapping or concept mapping.


Conclusion


    Documentation doesn’t resume only to pieces of code or software but also to knowledge one acquires, its sources, what it takes to troubleshoot the various types of issues, and the work performed on a daily basis. Documenting all these areas of focus should be done based on the principle: “document everything that worth documenting”.



Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.