SQL Troubles: application architecture

Showing posts with label application architecture. Show all posts

28 June 2020

𖣯Strategic Management: Strategy Design (Part II: A System's View)

Each time one discusses in IT about software and hardware components interacting with each other, one talks about a composite referred to as a system. Even if the term Information System (IS) is related to it, a system is defined as a set of interrelated and interconnected components that can be considered together for specific purposes or simple convenience.

A component can be a piece of software or hardware, as well persons or groups if we extend the definition. The consideration of people becomes relevant especially in the context of ecologies, in which systems are placed in a broader context that considers people’s interaction with them, as this raises to important behavior that impacts system’s functioning.

Within a system each part has a role or function determined in respect to the whole as well as to the other parts. The role or function of the component is typically fixed, predefined, though there are also exceptions especially when the scope of a component is enlarged, respectively reduced to the degree that the component can be removed or ignored. What one considers or not considers as part of system defines a system’s boundaries; it’s what distinguishes it from other systems within the environment(s) considered.

The interaction between the components resumes in the exchange, transmission and processing of data found in different aggregations ranging from signals to complex data structures. If in non-IT-based systems the changes are determined by inflow, respectively outflow of energy, in IT the flow is considered in terms of data in its various aggregations (information, knowledge). The data flow (also information flow) represents the ‘fluid’ that nourishes a system’s ‘organism’.

One can grasp the complexity in the moment one attempts to describe a system in terms of components, respectively the dependencies existing between them in term of data and processes. If in nature the processes are extrapolated, in IT they are predefined (even if the knowledge about them is not available). In addition, the less knowledge one has about the infrastructure, the higher the apparent complexity. Even if the system is not necessarily complex, the lack of knowledge and certainty about it makes it complex. The more one needs to dig for information and knowledge to get an acceptable level of knowledge and logical depth, the more time is needed for designing a solution.

Saint Exupéry’s definition of simplicity applies from a system’s functional point of view, though it doesn’t address the relative knowledge about the system, which often is implicit (in people’s heads). People have only fragmented knowledge about the system which makes it difficult to create the whole picture. It’s typically the role of system or process operational manuals, respectively of data descriptions, to make that knowledge explicit, also establishing a fundament for common knowledge and further communication and understanding.

Between the apparent (perceived) and real complexity of a system there’s an important gap that needs to be addressed if one wants to manage the systems adequately, respectively to simplify the systems. Often simplification happens when components or whole systems are replaced, consolidated, or migrated, a mix between these approaches existing as well. Simplifications at data level (aka data harmonization) or process level (aka process optimization and redesign) can have an important impact, being inherent to the good (optimal) functioning of systems.

Whether these changes occur in big-bang or gradual iterations it’s a question of available resources, organizational capabilities, including the ability to handle such projects, respectively the impact, opportunities and risks associated with such endeavors. Beyond this, it’s important to regard the problems from a systemic and systematic point of view, in which ecology’s role is important.

Previous Post <<||>> Next Post

Written: Jun-2020, Last Reviewed: Mar-2024

24 June 2020

𖣯Strategic Management: Strategy Design (Part I: Simple, but not that Simple)

Strategic Management Series

Simplicity of design has been for centuries the wholly grail of architects, while software designers seem to situate themselves in opposition with the trend, as they aim using a mix of technologies that usually increase architecture’s complexity (sometimes the many, the newer and fancier, the better). Unfortunately, despite the implied but not necessarily reachable potential, each component added to an information system or infrastructure has the potential of increasing the overall complexity by a factor proportional to the degree of interactions it creates, respectively by the number of issues it creates or allows to propagate through these interactions.

Conversely, one talks about simplicity in IT without stating what is intended by it, and it can mean many things. Quite often the aim is packed within the ‘keep it simple stupid’ (aka KISS) mantra, a modern and pejorative alternative of Occam’s razor. KISS became a principle in software architecture design, and it can mean that a simple solution works better than a complex one, or that pursuing something in the simplest manner possible is usually better. The nuances are wide enough to cover a wide spectrum of solutions, arriving at statements that the simplest choice to make is the most appropriate one to make, thing that’s not necessarily true in IT, where complexity finds itself home.

Starting with the important number of technologies coexisting in integrations and ending with the exceptions existing in processes or the quality of data, things are almost never as simple as one may wish. An IT infrastructure’s complexity is dependent on the number of existing components, on whether they come from different generations or come from different vendor, on whether are deployed on different operating systems or are supported by different service providers, on the number of customizations made, on the degree of overlapping of the data and integrations needed to keep the data in synch, respectively of the differences existing in data models, quality and use. In general, the more variance, randomness, and challenges one has, the higher the overall complexity.

Paraphrasing Saint Exupéry, in IT simplicity is reached when there is no longer anything to add or anything to take away, or in Hans Hofmann’s words, simplicity is reflected in ‘the ability to simplify means to eliminate the unnecessary so that the necessary may speak’. This refers to the features, what a piece of software can do, respectively the functionality, how a certain outcome is reached, which arrive to be packed in various logical aggregations (function point, functional requirement, story, epic, model, product, etc.) or physical aggregations (classes, components, packages, services, models, etc.). These are the levels at which one needs to address simplicity adequately.

To make something simple one must be able either to design a solution up to the detail that there’s nothing to add or remove, or to start with something and remove or things to reach simplicity. Both approaches involve a considerable effort, time, and multiple iterations, however the first approach can easily become utopian as some architectures are so complex that sooner or later the second approach comes into play. Therefore, one needs in general to focus on what seems an optimal solution and optimize it continuously in further iterations. Aiming for perfection from the beginning or also later in the improvement process is a foolhardy wish.

Even if simplicity is hard to achieve, one can still talk about the elegance of a solution, scenarios in which the various components fit together like the pieces of a puzzle, or about robustness, reliability, correctness, maintainability, (re)usability, or learnability. These latter characteristics are known in Software Engineering as (software) quality attributes.

01 February 2020

#️⃣☯Software Engineering: Concept Documents (The Good, the Bad and the Ugly)

A concept document (simply a concept) is a document that describes at high level the set of necessary steps and their implications in order to achieve a desired result, typically making the object of a project. In other words, it describes how something can be done or achieved, respectively how a problem can be solved.

The Good: The main aim of the document is to give all the important aspects and to assure that the idea is worthy of consideration, that the steps considered provide a good basis for further work, respectively to provide a good understanding for the various parties involved, Therefore, concepts are used as a basis for the sign-off, respectively for the implementation of software and hardware solutions.

A concept provides information about the context, design, architecture, security, usage, purpose and/or objectives of the future solution together with the set of assumptions, constraints and implications. A concept is not necessarily a recipe because it attempts providing a solution for a given problem or situation that needs a solution. Even if it bears many similarities in content and structure a concept it also not a strategy, because the strategy offers an interpretation of the problem, and also not a business case, because the later focuses mainly on the financial aspects.

A concept proves thus to be a good basis for implementing the described solution, being often an important enabler. On the other side, a written concept is not always necessary, even if conceptualization must exist in implementers’ head.

The Bad: From these considerations projects often consider the elaboration of a concept before further work can be attempted. To write such a document is needed to understand the problem/situation and be capable of sketching a solution in which the various steps or components fit together as the pieces of a puzzle. The problem is that the more complex the problem to be solved, the fuzzier the view and understanding of the various pieces becomes, respectively, the more challenging it becomes to fit the pieces together. In certain situations, it becomes almost impossible for a single person to understand and handle all the pieces. Solving the puzzle becomes a collective approach where the complexity is broken in manageable parts in the detriment of other aspects.

Writing a concept is a time-consuming task. The more accuracy and details are needed, the longer it takes to write and review the document, time that’s usually stolen from other project phases, especially when the phases are considered as sequential. It takes about 20% from the total effort needed to write a ‘perfect’ concept for writing a concept that covers only 80% of the facts, while 80% from the effort to consider the remaining 20% of the facts as the later involve multiple iterations. In extremis, aiming for perfection will make one start the implementation late or not start at all. It’s a not understandable pedantry with an important impact on projects'
timeline and quality in the hope of a quality increase, which is sometimes even illusory.

The Ugly: The concept-based approach is brought to extreme in ERP implementations where for each process or business area is needed to write a concept, which often carries fancy names – solution design document, technical design document, business process document, etc. Independently how it is called, the purpose is to describe how the solution is implemented. The problem is that the conceptualization phase tends to take much longer than planned given the dependencies between the various business area in terms of functionality and activities. The complexity can become overwhelming, with an important impact on project’s budget, time and quality.

Previous Post <<||>> Next Post

29 July 2019

🧱IT: Best Practices (Definitions)

"A preferred and repeatable action or set of actions completed to fulfill a specific requirement or set of requirements during the phases within a product-development process." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A process or method that is generally recognized to produce superior results. The application of these should result in a positive, measurable change." (Tilak Mitra et al, "SOA Governance", 2008)

"A technique or methodology that, through past experience and research, has proven to reliably lead to a desired result. A commitment to using the best practices in any field (for example, in the domain of IT Architecture) ensures leveraging past experience and all of the knowledge and technology at one’s disposal to ensure success." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"An effective way of doing something. It can relate to anything from writing program code to IT governance." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"A best practice is commonly understood to be a well-proven, repeatable, and established technique, method, tool, process, or activity that is more certain in delivering the desired results. This indicates that a best practice typically has been used by a large number of people or organizations and/or over a long time, with significant results that are clearly superior over other practices. Knowledge patterns can be used to formalize the description of a best practice." (Jörg Rech et al, "Knowledge Patterns" [in "Encyclopedia of Knowledge Management" 2nd Ed.], 2011)

"A specific method that improves the performance of a team or an organization and can be replicated or adapted elsewhere. Best practices often take the form of guidelines, principles, or ideas that are endorsed by a person or governing body that attests to the viability of the best practice." (Gina Abudi & Brandon Toropov, "The Complete Idiot's Guide to Best Practices for Small Business", 2011)

"A technique, method, process, discipline, incentive, or reward generally considered to be more effective at delivering a particular outcome than by other means." (Craig S Mullins, "Database Administration", 2012)

"In general, Best Practices refer to the methods, currently recognized within a given industry or discipline, to achieve a stated goal or objective. In the OPM3 context, Best Practices are achieved when an organization demonstrates consistent organizational project management processes evidenced by successful outcomes." (Project Management Institute, "Organizational Project Management Maturity Model (OPM3)" 3rd Ed, 2013)

"An effective way of doing something. It can relate to anything from writing program code to IT governance." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Those methods, processes, or procedures that have been proven to be the most effective, based on real-world experience and measured results." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"Best practices are defined as commercial or professional procedures that are accepted or prescribed as being effective most of the time. It can also be considered a heuristic, in that is a rule of thumb that generally succeeds but is not guaranteed to always work in every instance." (Michael Winburn & Aaron Wheeler, "Cloud Storage Security", 2015)

"A 'benchmarking' approach where organisations determine who the leader in a particular practice is and then copy that approach. Useful for achieving efficiencies but may diminish differentiation if not used with caution at the strategic level." (Duncan Angwin & Stephen Cummings, "The Strategy Pathfinder" 3rd Ed., 2017)

"A proven activity or process that has been successfully used by multiple enterprises." (ISACA)

"A superior method or innovative practice that contributes to the improved performance of an organization, usually recognized as best by other peer organizations." (American Society for Quality)

11 July 2019

🧱IT: Cloud Computing (Definitions)

"The service delivery of any IT resource as a networked resource." (David G Hill, "Data Protection: Governance, Risk Management, and Compliance", 2009)

"A technology where the data and the application are stored remotely and made available to the user over the Internet on demand." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"A business model where programs, data storage, collaboration services, and other key business tools are stored on a centralized server that users access remotely, often through a browser." (Rod Stephens, "Start Here! Fundamentals of Microsoft .NET Programming", 2011)

"Technology that is rented or leased on a regular, or as-needed basis." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Using programs and data stored on servers connected to computers via the Internet rather than storing software and data on individual computers." (Gina Abudi & Brandon Toropov, "The Complete Idiot's Guide to Best Practices for Small Business", 2011)

"The delivery of computing as a service. Cloud computing applications rely on a network (typically the Internet) to provide users with shared resources, software, and data." (Craig S Mullins, "Database Administration", 2012)

"Using Internet-based resources (e.g., applications, servers, etc.) as opposed to buying and installing in-house." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook, 3rd Ed", 2012)

"A business strategy where part or all of an organization’s information processing and storage is done by online service providers." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"A computing model that makes IT resources such as servers, middleware, and applications available as services to business organizations in a self-service manner." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Computing resources provided over the Internet using a combination of virtual machines (VMs), virtual storage, and virtual networks." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition, 2nd Ed.", 2013)

"A model for network access in which large, scalable resources are provided via the Internet as a shared service to requesting users. Access, computing, and storage services can be obtained by users without the need to understand or control the location and configuration of the system. Users consume resources as a service, and pay only for the resources that are used." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"The delivery of software and other computer resources as a service over the Internet, rather than as a stand-alone product." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"The provision of computational resources on demand via a network. Cloud computing can be compared to the supply of electricity and gas or the provision of telephone, television, and postal services. All of these services are presented to users in a simple way that is easy to understand without users' needing to know how the services are provided. This simplified view is called an abstraction. Similarly, cloud computing offers computer application developers and users an abstract view of services, which simplifies and ignores much of the details and inner workings. A provider's offering of abstracted Internet services is often called the cloud." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"A computational paradigm that aims at supporting large-scale, high-performance computing in distributed environments via innovative metaphors such as resource virtualization and de-location." (Alfredo Cuzzocrea & Mohamed M Gaber, "Data Science and Distributed Intelligence", 2015)

"A computing model that makes IT resources such as servers, middleware, and applications available as services to business organizations in a self-service manner." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"A delivery model for information technology resources and services that uses the Internet to provide immediately scalable and rapidly provisioned resources as services using a subscription or utility-based fee structure." (James R Kalyvas & Michael R Overly, "Big Data: A Businessand Legal Guide", 2015)

"A service that provides storage space and other resources on the Internet" (Nell Dale & John Lewis, "Computer Science Illuminated, 6th Ed.", 2015)

"Delivering hosted services over the Internet, which includes providing infrastructures, platforms, and software as services." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web 2nd Ed.", 2015)

"The delivery of computer processing capabilities as a service rather than as a product, whereby shared resources, software, and information are provided to end users as a utility. Offerings are usually bundled as an infrastructure, platform, or software." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS), and Analytics-as-a-Service (AaaS)." (Suren Behari, "Data Science and Big Data Analytics in Financial Services: A Case Study", 2016)

"A type of Internet-based technology in which different services (such as servers, storage, and applications) are delivered to an organization’s or an individual’s computers and devices through the Internet." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"A form of distributed computing whereby many computers and applications share the same resources to work together, often across geographically separated areas, to provide a coherent service." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"Cloud computing is a general term for the delivery of hosted services over the Internet. Cloud computing enables companies to consume compute resources as a utility - just like electricity - rather than having to build and maintain computing infrastructures in-house." (Thomas Ochs & Ute A Riemann, "IT Strategy Follows Digitalization", 2018)

"Cloud computing refers to the provision of computational resources on demand via a network. Cloud computing can be compared to the supply of a utility like electricity, water, or gas, or the provision of telephone or television services. All of these services are presented to the users in a simple way that is easy to understand without the users’ needing to know how the services are provided. This simplified view is called an abstraction. Similarly, cloud computing offers computer application developers and users an abstract view of services, which simplifies and ignores many of the details and inner workings. A provider’s offering of abstracted Internet services is often called The Cloud." (Robert F Smallwood, "Information Governance for Healthcare Professionals", 2018)

"The delivery of computing services and resources such as the servers, storage, databases, networking, software, and analytic through the internet." (Babangida Zubairu, "Security Risks of Biomedical Data Processing in Cloud Computing Environment", 2018)

"The use of shared remote computing devices for the purpose of providing improved efficiencies, performance, reliability, scalability, and security." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"A computing model that makes information technology resources such as servers, middleware, and applications available over the internet as services to business organizations in a self-service manner." (K Hariharanath, "BIG Data: An Enabler in Developing Business Models in Cloud Computing Environments", 2019)

"Cloud computing refers to the practice of using a network of remote servers, hosted on the Internet to manage, store and process data instead of using a local server or a personal computer." (Jurij Urbančič et al, "Expansion of Technology Utilization Through Tourism 4.0 in Slovenia", 2020)

"A standardized technology delivery capability (services, software, or infrastructure) delivered via internet-standard technologies in a pay-per-use, self-service way." (Forrester)

"Cloud computing is a style of computing in which scalable and elastic IT-enabled capabilities are delivered as a service using internet technologies." (Gartner)

15 May 2019

#️⃣Software Engineering: Programming (Part XV: Rapid Prototyping - Introduction)

Software Engineering Series

Rapid (software) prototyping (RSP) is a group of techniques applied in Software Engineering to quickly build a prototype (aka mockup, wireframe) to verify the technical or factual realization and feasibility of an application architecture, process or business model. A similar notion is the one of Proof-of-Concept (PoC), which attempts to demonstrate by building a prototype, starting an experiment or a pilot project that a technical concept, business proposal or theory has practical potential. In other words in Software Engineering a RSP encompasses the techniques by which a PoC is lead.

In industries that consider physical products a prototype is typically a small-scale object made from inexpensive material that resembles the final product to a certain degree, some characteristics, details or features being completely ignored (e.g. the inner design, some components, the finishing, etc.). Building several prototypes is much easier and cheaper than building the end product, they allowing to play with a concept or idea until it gets close to the final product. Moreover, this approach reduces the risk of ending up with a product nobody wants.

A similar approach and reasoning is used in Software Engineering as well. Building a prototype allows focusing at the beginning on the essential characteristics or aspects of the application, process or (business) model under consideration. Upon case one can focus on the user interface (UI) , database access, integration mechanism or any other feature that involves a challenge. As in the case of the UI one can build several prototypes that demonstrate different designs or architectures. The initial prototype can go through a series of transformations until it reaches the desired form, following then to integrate more functionality and refine the end product gradually. This iterative and incremental approach is known as rapid evolutional prototyping.

A prototype is useful especially when dealing with the uncertainty, e.g. when adopting (new) technologies or methodologies, when mixing technologies within an architecture, when the details of the implementation are not known, when exploring an idea, when the requirements are expected to change often, etc. Building rapidly a prototype allows validating the requirements, responding agilely to change, getting customers’ feedback and sign-off as early as possible, showing them what’s possible, how the future application can look like, and this without investing too much effort. It’s easier to change a design or an architecture in the concept and design phases than later.

In BI prototyping resumes usually in building queries to identify the source of the data, reengineer the logic from the business application, prove whether the logic is technically feasible, feasibility being translate in robustness, performance, flexibility. In projects that have a broader scope one can attempt building the needed infrastructure for several reports, to make sure that the main requirements are met. Similarly, one can use prototyping to build a data warehouse or a data migration layer. Thus, one can build all or most of the logic for one or two entities, resolving the challenges for them, and once the challenges solved one can go ahead and integrate gradually the other entities.

Rapid prototyping can be used also in the implementation of a strategy or management system to prove the concepts behind. One can start thus with a narrow focus and integrate more functions, processes and business segments gradually in iterative and incremental steps, each step allowing to integrate the lesson learned, address the risks and opportunities, check the progress and change the direction as needed.

Rapid prototyping can prove to be a useful tool when given the chance to prove its benefits. Through its iterative and incremental approaches it allows to reach the targets efficiently

Previous Post <<||>> Next Post

04 May 2019

🧊Data Warehousing: Architecture (Part I: Push vs. Pull)

In data integrations, data migrations and data warehousing there is the need to move data between two or more systems. In the simplest scenario there are only two systems involved, a source and a target system, though there can be complex scenarios in which data from multiple sources need to be available in a common target system (as in the case of data warehouses/marts or data migrations), or data from one source (e.g. ERP systems) need to be available in other systems (e.g. Web shops, planning systems), or there can be complex cases in which there is a many-to-many relationship (e.g. data from two ERP systems are consolidated in other systems).

The data can flow in one direction from the source systems to the target systems (aka unidirectional flow), though there can be situations in which once the data are modified in the target system they need to flow back to the source system (aka bidirectional flow), as in the case of planning or product development systems. In complex scenarios the communication may occur multiple times within same process until a final state is reached.

Independently of the number of systems and the type of communication involved, data need to flow between the systems as smooth as possible, assuring that the data are consistent between the various systems and available when needed. The architectures responsible for moving data between the sources are based on two simple mechanisms - push vs pull – or combinations of them.

A push mechanism makes data to be pushed from the source system into the target system(s), the source system being responsible for the operation. Typically the push can happen as soon as an event occurs in the source system, event that leads to or follows a change in the data. There can be also cases when is preferred to push the data at regular points in time (e.g. hourly, daily), especially when the changes aren’t needed immediately. This later scenario allows to still make changes to the data in the source until they are sent to other system(s). When the ability to make changes is critical this can be controlled over specific business rules.

A pull mechanism makes the data to be pulled from the source system into the target system, the target systems being responsible for the operation. This usually happens at regular points in time or on demand, however the target system has to check whether the data have been changed.

Hybrid scenarios may involve a middleware that sits between the systems, being responsible for pulling the data from the source systems and pushing them into the targets system. Another hybrid scenario is when the source system pushes the data to an intermediary repository, the target system(s) pulling the data on a need basis. The repository can reside on the source, target on in-between. A variation of it is when the source informs the target that a change happened at it’s up to the target to decide whether it needs the data or not.

The main differentiators between the various methods is the timeliness, completeness and consistency of the data. Timeliness refers to the urgency with which data need to be available in the target system(s), completeness refers to the degree to which the data are ready to be sent, while consistency refers to the degree the data from the source are consistent with the data from the target systems.

Based on their characteristics integrations seem to favor push methods while data migrations and data warehousing the pull methods, though which method suits the best depends entirely on the business needs under consideration.

Previous Post <<||>> Next Post

29 April 2019

🗄️Data Management: Data Integration (Part I: From Disintegration to Integration)

Data Management Series

No matter how tight the integration between the various systems or processes there will be always gaps that need to be addressed in one way or another. The problems are in general caused by design errors rooted in the complexity of the logic from the integration layer or from the systems integrated. The errors can range from missing or incorrect validation rules, mappings and parameters to data quality issues.

A unidirectional integration involves distributing data from one system (aka publisher) to one or more systems (aka subscribers), while in bidirectional integrations systems can act as publishers and subscribers, resulting thus complex data flows with multiple endpoints. In simplest integrations the records flow one-to-one between systems, though more complex scenarios can involve logic based on business rules, mappings and other type of transformations. The challenge is to reflect the states as needed by the system with minimal involvement from the users.

Typically, it falls in application/process owners or key users’ responsibilities to make sure that the integration works smoothly. When the integration makes use of interface or staging tables they can be used as starting point for the troubleshooting, however even then the troubleshooting can be troublesome and involve a considerable manual effort. When possible the data can be exported manually from the various systems and matched in Excel or similar solutions. This leads often to personal or departmental solutions hard to maintain, control and support.

A better approach is to automatize the process by importing the data from the integrated systems at regular points in time into the same database (much like in a data warehouse), model the entities and the needed logic in there, and report the differences. Even if this approach involves a small investment in the beginning and some optimization in logic or performance over time, it can become a useful tool for troubleshooting the differences. Such solutions can be used successfully in multiple integration scenarios (e.g. web shop or ERP integrations).

A set of reports for each entity can help identify the differences between the various entities. Starting from the reported differences the users can identify, categorize and devise specific countermeasures for the various issues. The best time to have such a solution is shortly before or during UAT. This would allow to make sure that the integration layer really works, and helps correcting the issues as long they still have a small impact on the systems. Some integration issues might even lead to a postponement of the Go-Live. The second best time is during the time the first important issues were found, as the issues can be used as support for a Business Case for implementing this type of solutions.

In general, it’s recommended to fix the problems in the integration layer and use the reports only for troubleshooting and for assuring that the integration runs smoothly. There are however situations in which the integration problems can’t be fixed without creating more issues. It’s the case in which multiple systems are involved and integrated over an integration bus.

One extreme approach, not advisable though, is to build a second integration to correct the issues of the first. This solution might work in theory however there’s the risk of multiplying the issues is really high and the complexity of troubleshooting increases with the degree of dependency between the two integrations. It would be more advisable to rebuild the integration anew, however also this approach has its advantages and disadvantages.

Bottom line is that integration issues should be addressed while they are small and that an automated solution for comparing the data can help in the process

09 June 2018

📦Data Migrations (DM): Guiding Principles

Data Migrations Series

Introduction

"An army of principles can penetrate where an army of soldiers cannot."
Thomas Paine

In life as well in IT principles serve as patterns of advice in form of general or fundamental ideas, truths or values stated in a context-independent manner. They can be used as guidelines in understanding and modeling the reality, the world we live in. With the invasion of technologies in our lives principles serve as a solid ground on which we can build castles – solutions for our problems. Each technology comes with its own set of principles that defines in general terms its usage. That's why most of the IT books attempt to catch these sets of principles. Unfortunately, few of the technical writers manage to define some meaningful principles and showcase their usages.

Many of the ideas considered as principles in papers on Data Migration (DM) are at best just practices, and some can be considered as best/good practices. Just because something worked good in a previous migration doesn’t mean automatically that the idea behind the respective decision turns automatically in a principle. Some of the advices advanced are just lessons learned in disguise. Principles through their generality apply to a broad range of cases, while practices are more activity specific.

A DM through its nature finds its characteristics at the intersection of several area - database-based architecture design, ETL workflows, data management, project management (PM) and services. From these areas one can pull a set of principles that can be used in building DM architectures.

Architecture Principles

"Architecture starts when you carefully put two bricks together."
Ludwig Mies van der Rohe

There are several general principles that apply to the architecture of applications, independently of the technologies used or the industry, e.g. research first, keep it simple/small, start with the end in mind, model first, design to handle failure, secure by design (aka safety first), prototype, progress iteratively, focus on value, reuse (aka don't reinvent the wheel), test early, early feedback, refactor, govern, validate, document, right tool – right people, make it to last, make it sustainable, partition around limits, scale out, defensive coding, minimal intervention, use common sense, process orientation, follow the data, abstract, anticipate obsolescence, benchmark, single-responsibility, single dispatch, separation of concerns, right perspective.

To them add a range of application design characteristics that can be considered as principles as well: extensibility, modularity, adaptability, reusability, repeatability, modularity, performance, revocability, auditability, subject-orientation, traceability, robustness, locality, heterogeneity, consistency, atomicity, increased cohesion, reduced coupling, monitoring, usability, etc. There are several principles that can be transported from problem solving into design - divide and conquer, prioritize, system’s approach, take inventory, and so on.

A DM’s architecture has more to do with a data warehouse as it relies heavily on ETL tasks and data need to be stored for various purposes. Besides the principles of good database design, a few other principles apply: model (the domain) first, denormalize, design for performance, maintainability and security, validate continuously. From ETL area following principles can be considered: single point of processing, each step must have a purpose, minimize touch points, rest data for checkpoints, leverage existing knowledge, automate the steps, batch processing.

In addition, considering their data-specific character, a DM can be regarded as one or several data products, though in contrast with typical data products DM have typically a limited purpose. From this area following principles could be considered: build trust with transparency, blend in, visualize the complex.

Data Management Principles

Considering that a DM’s focus is an organization's data, some principles need to focus on the management and governance of Data. Data Governance together with Data Quality, Data Architecture, Metadata Management, Master Data Management are functions of Data Management. The focus is on data, metadata and their lifecycle, on processes, ownership and roles and their responsibilities. With this in mind there can be defined several principles supposed to facilitate the functions of Data Management: manage data as asset, manage data lifecycle, the business owns the data, integration across the organization, make data/metadata accessible, transparent and auditable processes, one source of truth.

As part of DM there are customer, employee and vendor information which fall under the General Data Protection Regulation (GDPR) EU 2016/679 regulation which defines the legal framework for data protection and privacy for all individuals within the European Union (EU) and the European Economic Area (EEA) as well the export of personal data outside the EU and EEA. The regulation defines a set of principles that make its backbone: fairness, lawfulness and transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity and confidentiality, accountability [6].

Overseas, the US Federal Trade Commission (FTC) issued in 2012, a report recommending organizations design and implement their own privacy programs based on a set of best practices. The report reaffirms the FTC’s focus on Fair Information Processing Principles, which include notice/awareness, choice/consent, access/participation, integrity/security, enforcement/redress [6].

Project Management (PM) Principles

"Management is doing things right […]"
Peter Drucker

A DM though its characteristics is a project and, to increase the chances of success, it needs to be managed as a project. Managing DM as a project is one of the most important principles to consider. The usage of a PM framework will further increase the chances of success, as long the framework is adequate for the purpose and the organization team is able to use the framework. PMI, Prince2, Agile/Scrum/Kanban are probably the most used PM methodologies and they come with their own sets of principles.

In general, all or some of the PM principles apply independently on whether is used alone or in combination with other PM methodologies: a single project manager, an informed and supportive management, a dedicated team of qualified people to do the work of the project, clearly defined goals addressing stakeholders’ priorities, an integrated plan and schedule, as well a budget of costs and/or resources required [1].

On the other side, an agile approach could prove to be a better match for a DM given that requirements change a lot, frequent and continuous deliveries are needed, collaboration is necessary, agile processes as well self-organizing teams can facilitate the migration. These are just a few of the catchwords that make the backbone of the Agile Manifesto (see [3]).

An agile form of Prince2 could be something to consider as well, especially when Prince2 is used as methodology for other projects. For Prince2 are the following principles to consider: continued business justification, learn from experience, defined roles and responsibilities, manage by stages, management by exception, focus on products, tailor to suit the project environment [2].

All these PM principles reveal important aspects to ponder upon, and maybe with a few exceptions, all can be incorporated in the way the DM project is managed.

Service Principles

Considering the dependencies existing between the DM and Data Quality as well to the broader project, a DM can have the characteristics of a service. It’s not an IT Service per se, as IT only supports technically and eventually from a PM perspective the project. Even if a DM is not a ITSM service, some of the ITIL principles can still apply: focus on value, design for experience, start where you are, work holistically, progress iteratively, observe directly, be transparent, collaborate and keep it simple [4].

Conclusion

“Obey the principles without being bound by them.”
Bruce Lee

Within a DM all the above principles can be considered, though the network of implication they create can easily shift the focus from the solution to the philosophical aspects, and that’s a marshy road to follow. Even if all principles are noble, not all can be considered. It would be utopic to consider each possible principle. The trick is to identify the most "important" principles (principles that make sense) and prioritize them according to existing requirements. In theory, this is a one-time process that involves establishing a 'framework" of best/good practices for the DM, in next migrations needing only to consider the new facts and aspects.

Previous Post <<||>> Next Post

References:
[1] “Principles of project management”, by J. A. Bing, PM Network, 1994 (link)
[2] Axelos (2018) What is PRINCE2? (link)
[3] Agile Manifesto (2001) Principles behind the Agile Manifesto (link)
[4] Axelos (2018) ITIL® Practitioner 9 Guiding Principles (link)
[5] The Data Governance Institute (2018) Goals and Principles for Data Governance (link)
[6] Navigating the Labyrinth: An Executive Guide to Data Management, by Laura Sebastian-Coleman for DAMA International, Technics Publications, 2018 (link)

02 October 2010

#️⃣Software Engineering: Programming (Part V: Is MS Access or Excel the Answer to your Problems?

Software Engineering Series

Introduction

That’s one of the topics that followed me for years, quite often being asked by customers to provide a MS Access or MS Excel solution as an answer to a business need. The beauty of this question is that there is no right answer and, as I stressed out in several occasions, there is not always a straightforward answer to such a question in IT, the feasibility of an IT solution relying on many variables formulated typically in term of business and IT requirements.

When a customer is requesting to built a MS Access or Excel solution outside of Office paradigm, I’m kind of circumspect, and this not because they are not great tools, but because they are not adequate for all purposes. I even recommend the two for personal or for small-scale solutions, though their applicability should stop right there.

A personal solution is an application developed for personal use, for example to store and maintain the data for a report, to process data automatically or any other attempt of automating some tasks. By small-scale solutions I’m referring to the following types of applications:
- applications of basic to average complexity, that don’t require complex design or could be developed by a developer with average skills.
- applications that target a small number of users, usually a small group of max 10-20 concurrent users, it may be occasionally a whole department or it could be cross departmental as long the previous mentioned condition are met.

A Short Review

MS Excel is the perfect tool for storing non-relational tabular data, manipulating data manually or with the help of formulas, doing data analysis with pivoting and charting, or of querying various data sources. Its extensibility based on its DOM (Document Object Model), VBA (Visual Basic for Applications) and its IDE (Integrated Development Environment), Forms, add-ins, in-house or third-party developed libraries, the template and wizard-based approach, make from Excel a powerful development environment. I would say that Excel’s weakness resides in its intrinsic design, the DOM model which lacks a rich event model, in the fact that Excel is mainly a tool for data entry, analysis and reporting, the other types of functionality coming on a secondary plan. Excepting a few new features built in Excel itself, the important new functionality comes as add-on – SQL Server-based data mining add-in, MS Sharepoint Server-based Web Services features like multiuser collaboration, slicer and a few other.

The extensibility capabilities mentioned above are not only a particularity of Excel but apply to the whole Office family: Access, Word, Outlook, Powerpoint, and even Visio if is considered the “extended family”, each of them with its role. Access’ role is that of flexible relational data storage, querying and reporting solution, its strength relying mainly in the easiness of providing a simple UI (User Interface) for maintaining and navigating the data, in the easiness of pulling data from various sources for further analysis. As in the case of Excel, Access’ weakness resides in its DOM, in the fact that it’s not a full RDBMS (Relational Database Management System) and all the consequences deriving from it.

Programming for the Masses/Citizens

The great thing about VBA is that also non-developers could successfully adventure in developing Office-based applications, the possibility of learning from the code built with “Record Macro” functionality allowing a small learning curve. Enabling “non-developers” to built applications makes from Office a powerful and altogether dangerous tool because such applications could be easily misused. Misused here refers to the fact that often is attempted to built in Excel or Access complex applications that sooner or later break apart under their complexity, that organizations arrive to have a multitude of such applications with no control over their existence, maintenance, security, etc.

Unfortunately the downsides of such applications are discovered late in the process, when intended functionality is not available, thus arriving to reinvent the wheel, patch up functionality in a jumble, in a tumble. With some hard-work you could achieve the alike functionality as the one available in powerful frameworks like .Net, WPF, WCF or Silverlight, to mention the Microsoft technologies I’m somewhat acquainted to. VBA is great but with time became less powerful than VB, C# or C++ (the comparison between VBA and C++ is a little forced), to mention the most important programming languages for writing managed code in .Net. The barriers between the capabilities of the two types of programming languages are somehow broken by the possibility of developing add-ins and libraries for MS Office or of using Office DOM in .Net applications, though few (non-) programmers adventure on this path.

The Architectural Perspective

There is another important architectural perspective – separating the data storage and eventually data processing from presentation. Also when using Access or Excel the data storage could be separated from presentation, though I’ve seen few solutions doing that, the three layers coexisting usually within the same tire. An Access solution could be split in two, one for database and other for UI and processing, allowing more flexibility in what concerns the architecture, security, version management, etc.

Access is good for data presentation and rapid prototyping, though the concept and the data controls are quite old, having several limitations when compared with similar controls available for example in .Net. The advantage of using simple drag-and-drop or wizards in Access is for long over, the same functionality existing also in Visual Studio (Express), environment in which applications could be built with drag-and-drop and wizards too, in plus taking advantage of additional built-in features. The database layer could be replaced with a full RDBMS, same as the presentation layer could be replaced with a .Net UI. It’s not much easier then to built the architecture around the .Net UI and a RDBMS?!

Excel is considered by many as a (relational) database, is it really so? It’s true the data could be stored in tabular format in which a sheet plays the role of a table and queryable through the various drivers available, though no primary key is available, less control over the data entered and many other features available in RDBMS need to be provided programmatically, again reinventing the wheel. Same as in the case of Access, Excel could be considered for data storage and presentation, its functionality being reduced when compared with the one of Access.

Many people are used with the data entry mechanism available in Excel, especially in what concerns data manipulation, wanting similar functionality in other tools. If this was Excels’ advantage some time ago, that’s no more valid, several rich data grids offering similar data entry functionality which, with some effort, could simulate to an acceptable degree the functionality of Excel, and they could provide also richer validation functionality.

It’s all about Costs

In the past MS Excel and Access were quite cheap as "development platforms" when compared with the purchasing of existing IDE, especially when we consider their extensibility through VBA and IDE’s availability, thus the functionality vs. extensibility favorable ratio. Recently were introduced express (aka community) versions of powerful IDEs for Visual Studio, respectively open source IDE and development frameworks that provide rich capabilities, the report of forces changed dramatically in the favor of the later.

Today you could put together a small-scale application with a minimum of investment, making sometimes obsolete the use of Office tools outside of the Office solutions. The pool of software tools and technologies changed in the past years considerable, but the mentality in what concerns the IT infrastructure and software development changed less. It’s true that sometimes organizations lack the resources who could architect and design such solutions, relying mainly on external resources, or being much easier to rely on an employee’s programming skills who knows “exactly” what's needed and it would be in theory much easier in order to attempt solving a problem directly rather than writing the requirements down.

In VBA’s advantage comes also the fact that normally software solutions evolve and need to be changed in order to reflect business or philosophy changes, being much easier to introduce such changes directly by the employee who built the application in contrast with starting a whole project for this purpose. This aspect is rooted in other perspective – sometimes organizations ignore the software needs, falling in employees attribution to find cheap and fast ways of automating tasks in particular, solving work-related problems in general, Excel or Access being quite handy for this purpose. Sure, you can do almost anything also in Excel/Access but with what costs?

The Strategic Context

Several times I heard people talking about replacing the collection of Excel sheets with an Access solution. I know that in the absence of adequate solutions people arrive to store various types of data in Excel sheets, duplicating data, loosing the control over versions, data quality, making data unsecure/unavailable or un-processable. Without a good data management and infrastructure strategy the situation doesn’t change significantly by using an Access solution.

It’s true that the data could be easier stored in a global place, some validation could result in better data quality, while security, availability and data maintainability could suffer some improvements too, however the gain is insignificant when compared with the capabilities of a full-featured RDBMS. Even if a company doesn’t have the resources to invest in a mature RDBMS like Oracle or SQL Server, there are also the Express versions for the respective databases, several other free solutions existing on the market especially in the area of open source. On the other side it’s true that MS Access, through its easy to use SQL Designer, allows people building queries with simple drag-and-drops and limited SQL knowledge, though its value is relative.

Talking about data management strategy, it concerns mainly the data quality as a function of its 6 main dimensions (accuracy, conformity, consistency, completeness, duplicates, referential integration) to which add data actuality, accessibility, security, relevance, usability, and so on. The main problem with personal solutions is that they lead to data and logic duplication, and even when such solutions are consolidated in one form or another, their consolidation and integration is quite complex because you have to consider not only the various designs but also the overall requirements from a higher perspective. On the other side it’s difficult to satisfy the needs of all the people in an organization, in a form or another, duplication of data being inevitable, with direct or indirect implications on data quality. It is required some effort and a good strategy in what concerns these aspects, finding the balance between the various requirements and the number of solutions to satisfy them.

Reformulating the Question

How can we determine which tool or set of tools is appropriate for our problem? Normally the answer to this question depends on the needed functionality. The hard road in answering this question is to identify all the requirements, the features available in the various tools, weight both of them, and decide what worth best. Unfortunately that’s not an easy task, it need to be considered not only actual but also future requirements, organization’s strategy, and whatever might come around.

Reports, best practices, lessons learned or other type of succinct content might help as well in taking a decision without going too deep in analyzing features and requirements thoroughly. Sometimes a gut feeling might work as well, especially when comes from a person with experience in the field. Other times you don’t have too many options – time, resources, knowledge, IT infrastructure, philosophy or politics reducing your area of maneuverability/decision. In the end we learn by doing, by fighting with the constraints and problems we have, hopefully we learn also from our or others’ mistakes…

PS: Even if I’m having several good cumulated years in developing solutions based on Excel and Access, and I can’t pretend that I know their full potential, especially when judged from the perspective of the new features introduced with Excel 2007 or 2010, even more when considering their integration with SharePoint, SQL Server or other similar platforms. The various software tools or platforms existing on the market allow people to mix functionality theoretically in unlimited ways, the separation of functionality between layers, SaaS (software as a service) and data meshes changing the way we program and perceive software development.

Previous Post <<||>> Next Post

04 August 2008

Application Architecture: Enterprise Service Bus (Definitions)

"A layer of middleware that enables the delivery and sharing of services across and between business applications. ESBs are typically used to support communication, connections, and mediation in a service-oriented architecture." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"The infrastructure of a SOA landscape that enables the interoperability of services. Its core task is to provide connectivity, data transformations, and (intelligent) routing so that systems can communicate via services. The ESB might provide additional abilities that deal with security, reliability, service management, and even process composition. However, there are different opinions as to whether a tool to compose services is a part of an ESB or just an additional platform to implement composed and process services outside the ESB." (Nicolai M Josuttis, "SOA in Practice", 2007)

"A middleware software architecture construct that provides foundational services for more complex architectures via an event-driven and standards-based messaging engine (the bus). An ESB generally provides an abstraction layer on top of an implementation of an enterprise messaging system. |" (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

"The infrastructure of an SOA landscape that enables the interoperability of services. Its core task is to provide connectivity, data transformations, and routing so that systems can communicate via services." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A software layer that provides data between services on an event-driven basis, using standards for data transmission between the services." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A packaged set of middleware services that are used to communicate between business services in a secure and predictable manner." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

SQL Troubles

Pages

28 June 2020

𖣯Strategic Management: Strategy Design (Part II: A System's View)

24 June 2020

𖣯Strategic Management: Strategy Design (Part I: Simple, but not that Simple)

01 February 2020

#️⃣☯Software Engineering: Concept Documents (The Good, the Bad and the Ugly)

29 July 2019

🧱IT: Best Practices (Definitions)

11 July 2019

🧱IT: Cloud Computing (Definitions)

15 May 2019

#️⃣Software Engineering: Programming (Part XV: Rapid Prototyping - Introduction)

04 May 2019

🧊Data Warehousing: Architecture (Part I: Push vs. Pull)

29 April 2019

🗄️Data Management: Data Integration (Part I: From Disintegration to Integration)

09 June 2018

📦Data Migrations (DM): Guiding Principles

Data Migrations Series

Introduction

Architecture Principles

Data Management Principles

Project Management (PM) Principles

Conclusion

02 October 2010

#️⃣Software Engineering: Programming (Part V: Is MS Access or Excel the Answer to your Problems?

04 August 2008

Application Architecture: Enterprise Service Bus (Definitions)

About Me