SQL Troubles

25 October 2012

ⵌProgramming: Assertion (Definitions)

"A constraint that is not attached to a table but is instead a distinct database object. It can therefore be used to enforce rules that apply to multiple tables or to verify that tables are not empty." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A declaration or statement, often without support." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"A statement about a program that the code claims to be true." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"A statement that the code claims is true. If the statement is false, the program stops running so you can decide whether a bug occurred." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A component of a regular expression that must be true for the pattern to match but does not necessarily match any characters itself. Often used specifically to mean a zero-width assertion." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A statement about the program and its data that is supposed to be true. If the statement isn’t true, the assertion throws an exception to tell you that something is wrong." (Rod Stephens, "Beginning Software Engineering", 2015)

"A language feature used to test for conditions that should be guaranteed by program logic. If a condition checked by an assertion is found to be false, a fatal error is thrown. For added performance, assertions can be disabled when an application is deployed." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

"A way of ensuring that a method has access to a particular resource, even if the method's callers do not have the required permission. During a stack walk, if a stack frame asserting the required permission is encountered, a security check for that permission will succeed without proceeding further. To perform an assertion of a permission, code must not only have that permission, but also be granted the SecurityPermission.Assertion permission. Unwise use of assertions can create security holes, so they should be used only with the utmost caution." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

24 October 2012

ⵌProgramming: Assembly (Definitions)

"An assembly is the unit of deployment and versioning in the .NET Framework. An assembly contains a manifest, metadata, MSIL, and possibly binary resources. Most assemblies are single files, but an assembly can consist of multiple files, such as DLLs, picture files, and even HTML files." (Adam Nathan, ".NET and COM: The Complete Interoperability Guide", 2002)

"The unit of deployment and versioning in the .NET Framework. It establishes the namespace for resolving requests for types and determines which types and resources are exposed externally and which are accessible only from within the assembly. An assembly includes an assembly manifest that describes the assembly's contents." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A managed application module that contains class metadata and managed code as an object in SQL Server. By referencing an assembly, CLR functions, CLR stored procedures, CLR triggers, user-defined aggregates, and user-defined types can be created in SQL Server." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"A managed application module, composed of class metadata and managed code, that can be embedded in a database solution as a database object in SQL Server 2005." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"Application logic that is stored in, and managed by, the SQL Server database server, including objects like triggers, CLR software, and stored procedures. Assemblies are written in a .NET language, such a C# or Visual Basic." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"In SQL Server, a .NET assembly is a compiled SQL CLR executable or DLL." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"A managed application module that contains class metadata and managed code." (Jim Joseph et al, "Microsoft® SQL Server™ 2008 Reporting Services Unleashed", 2009)

"In .NET applications, the smallest self-contained unit of compiled code. An assembly can be a complete application, or a library that can be called by other applications." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"The smallest independent unit of compiled code. Typically, this is a Dynamic Link Library (DLL) or executable program." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A managed application module containing class metadata and managed code as an object in SQL Server, against which CLR functions, stored procedures, triggers, user-defined aggregates, and user-defined types can be created in SQL Server." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In SQL Server, a .NET assembly is a compiled SQL CLR executable or DLL." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"The fundamental logical unit of managed code, consisting of one or more files containing Common Intermediate Language instructions and metadata. See also CIL." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

23 October 2012

ⵌProgramming: Array (Definitions)

"A group of cells arranged by dimensions. A table is a two-dimensional array in which the cells are arranged in rows and columns, with one dimension forming the rows and the other dimension forming the columns. A cube is a three-dimensional array and can be visualized as a cube, with each dimension of the array forming one edge of the cube." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A collection of objects all of the same type." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours" 3rd Ed., 2001)

"A list of variables that have the same name and data type." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"Values whose members, called elements, are accessed by an index rather than by name. An array has a rank that specifies the number of indices needed to locate an element (sometimes called the number of dimensions) within the array. It may have either zero or nonzero lower bounds in each dimension." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A collection of data items, all of the same type, in which each item is uniquely addressed by a 32-bit integer index. Java arrays behave like objects but have some special syntax. Java arrays begin with the index value 0." (Marcus Green & Bill Brogden, "Java 2™ Programmer Exam Cram™ 2 (Exam CX-310-035)", 2003)

"A device that aggregates large collections of hard drives into a logical whole." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A data structure containing an ordered list of elements—any Ruby object—starting with an index of 0." (Michael Fitzgerald, "Learning Ruby", 2007)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"In a SQL database, an ordered collection of elements of the same data type stored in a single column and row of a table." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A group of values stored together in a single variable and accessed by index." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A grouping of similar items of the same storage type in a sequential pattern, and referenced by a sequential index value." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A variable that holds a series of values with the same data type. An index into the array lets the program select a particular value." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"An ordered collection of values. Arrays can be defined as a basic Objective-C type and are implemented as objects under Foundation through the NSArray, and NSMutableArray classes." (Stephen G Kochan, "Programming in Objective-C" 4th Ed., 2011)

"A basic collection of values that is a sequence represented by a single block of memory. Arrays have efficient direct access, but do not easily grow or shrink." (Mark C Lewis, "Introduction to the Art of Programming Using Scala", 2012)

"An ordered sequence of values, stored such that you can easily access any of the values using an integer subscript that specifies the value’s offset in the sequence." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A group of variables stored under a single name." (Matt Telles, "Beginning Programming", 2014)

"A structure composed of multiple identical variables that can be individually addressed." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A structure that contains an ordered collection of elements of the same data type in which each element can be referenced by its index value or ordinal position in the collection. See also element, ordinary array." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"An array is a data structure where elements are associated with an index. They are implemented differently in different programming languages." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

05 October 2012

🗄️Data Management: Business Rules – An Introduction

Data Management Series

    "Business rules" seems to be a recurring theme these days – developers, DBAs, architects, business analysts, IT and non-IT professionals talk about the necessity to enforce them in data and semantic models, information systems, processes, departments or whole organizations. They seem to affect the important layers of an organization. In fact the same business rule can affect multiple levels either directly, or indirectly through the hierarchical or networked structure of causality it belongs to. When considered all the business rules, the overall picture can become very complex. The fact that there are multiple levels of interconnected layers, with applications and implications at macro or micro level, makes the complexity to fight back because in order to solve business-specific problems often you have to go at least one level above the level where the problems were defined, or to simplify the problems to a level of detail that allows to tackled.

    The Business Rules Group defines a business rule as "a statement that defines or constrains some aspect of the business" [1], definition which seems to be closer to the vocabulary of IT people. Ronald G. Ross, in his book Principles of the Business Rule Approach, defines it as "a directive intended to influence or guide business behavior" [2], definition closer to the vocabulary of HR people. In fact the two definitions are kind of similar, highlighting the constrictor or guiding role of business rules. They raise also an important question – can everything that is catalogued as constraint or guidelines considered as a business rule? In theory yes, practically there are constraints and guidelines that have different impact on the business, so depending on context they need to be considered or not. What to consider is itself an art, which adds up to the art of problem solving.

    Besides identification, neither the definition nor management of business rules seems easy tasks. R.G. Ross considers that business rules need to be written and made explicit, expressed in plain language, independent of procedures and workflows, built on facts, motivated by identifiable and important business factors, accessible to authorized parties, specific, single sourced, managed, specified by those people who have relevant knowledge, and they should guide or influence behavior in desired ways [2]. This summarizes the various aspects that need to be considered when defining and managing business rules. Many organization seems to be challenged by this, and it can be challenging when lacks business management maturity.

    Many business rules exist already in functional and technical specifications written for the various software products built on request, in documentation of purchases software, in processes, procedures, standards, internal defined and external enforced policies, in the daily activities and knowledge exchanged or hold by workers. Sure, the formulations existing in such resources need to be enhanced and aggregated in order to be brought at the status of business rule. And here comes the difficulty, as iterative work needs to be performed in order to bring them to the level indicated by R.G Ross. For sure Ross’ specifications are idealistic, though they offer a “framework” for defining business rules. In what concerns their management, there is a lot to be done within an organization, as this aspect needs to be integrated with other activities and strategies existing in an organization.

    Often, when an important initiative, better said project, starts within an organization, then is felt in particular the lack of up-front defined and understood business rules. Such events trigger the identification and elicitation of business rules; they are addressed in documentation and remain buried in there. It is also true that it’s difficult to build a business case for further processing of business rules. An argument could be the costs associated from decisional mistakes taken by not knowing the existing rules, though that’s something difficult to quantify and make visible in an organization. In the end, most probably an organization will recognize the value of business rules when it reached a certain level of maturity.

References:
[1] Business Rules Group (2000) Defining Business Rules - What Are They Really? [Online] Available from: http://businessrulesgroup.org/first_paper/BRG-whatisBR_3ed.pdf
[2] Ronald G. Ross (2003) Principles of the Business Rule Approach. Addison Wesley. ISBN: 0-201-78893-4.

29 September 2012

Programming: Pair Programming (Definitions)

"An XP practice requiring that each piece of source code to be integrated into the software product should be created by two programmers jointly at one computer."" (Johannes Link & Peter Fröhlich, "Unit Testing in Java", 2003)

"A coding technique where one programmer (the driver) writes code and explains what he or she is doing, while another watches and looks for problems." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"A software development approach whereby lines of code (production and/or test) of a component are written by two programmers sitting at a single computer. This implicitly means ongoing real-time code reviews are performed." (IQBBA, "Standard glossary of terms used in Software Engineering", 2011)

"An Extreme Programming practice where two (or three) programmers work together at the same computer. The driver or pilot types while the observer, navigator, or pointer watches and reviews each line of code as it is typed." (Rod Stephens, "Beginning Software Engineering", 2015)

25 September 2012

ⵌProgramming: BLOB (Definitions)

"A type of data column containing binary data such as graphics, sound, or compiled code. This is a general term for text or image data type. BLOBs are not stored in the table rows themselves, but in separate pages referenced by a pointer in the row." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"BLOB is a data type for fields containing large binary data such as images." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A binary large object. Large value data types [varchar(max), nvarchar(max), and varbinary(max)] are stored as BLOBs. Within SQL Server 2005, BLOBs can be as large as 2GB." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A data type that can hold large objects of arbitrary content such as video files, audio files, images, and so forth. Because the data can be any arbitrary chunk of binary data, the database does not understand its contents so you cannot search in these fields." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Binary large object (BLOB) data is data that is stored using the varbinary(max) data type. A BLOB column or variable can hold up to 2.1 GB of data, as opposed to a regular non-LOB varbinary or binary column or variable, which can max out at 8,000 bytes of data." (Michael Coles & Rodney Landrum, , "Expert SQL Server 2008 Encryption", 2008)

"Very large binary representation of multimedia objects that can be stored and used in some enhanced relational databases." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A discrete packet of binary data that has an exceptionally large size, such as pictures or audio tracks stored as digital data, or any variable or table column large enough to hold such values. The designation 'binary large object' typically refers to a packet of data that is stored in a database and is treated as a sequence of uninterpreted bytes." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A large assemblage of binary data (e.g., images, movies, multimedia files, even collections of executable binary code) that are associated with a common group identifier and that can, in theory, be moved (from computer to computer) or searched as a singled data object. Traditional databases do not easily handle BLOBs." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"A blob is any resource whose internal structure is functionally opaque for the purpose at hand." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

24 September 2012

ⵌProgramming: Block (Definitions)

"A series of statements enclosed by BEGIN and END. Blocks define which set of statements will be affected by control-of-flow language such as IF or WHILE. You can nest BEGIN...END blocks within other BEGIN... END blocks." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A section of code grouped together by braces that sets apart a section of code in a smaller area than a full procedure. A procedure might contain several blocks of code." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours 2nd Ed.", 2001)

"A sequence of PL/SQL code, beginning with DECLARE or BEGIN and ending with END. The block is a core organizational unit of PL/SQL programming. See Chapter 2 for a thorough discussion." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A series of Transact-SQL statements enclosed by BEGIN and END. You can nest BEGIN...END blocks within other BEGIN...END blocks." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A syntactic construct consisting of a sequence of Perl statements that is delimited by braces. The if and while statements are defined in terms of BLOCKs, for instance. Sometimes we also say 'block' to mean a lexical scope; that is, a sequence of statements that acts like a BLOCK, such as within an eval or a file, even though the statements aren’t delimited by braces." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A Transact-SQL statement enclosed by BEGIN and END." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The information stored in a sector" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A set of rows retrieved from a database server that are transmitted as a single result set to satisfy a cursor FETCH request." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

06 July 2012

🚧Project Management: Change Control Board (Definitions)

"A group of project members, possibly including one or two customers, that reviews and approves or rejects change requests." (Rod Stephens, "Beginning Software Engineering", 2015)

"A committee that makes decisions regarding whether proposed changes to a system should be implemented." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A central authority for the management of change requests." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"A formally constituted group of stakeholders responsible for reviewing, evaluating, approving, delaying, or rejecting changes to a project, with all decisions and recommendations being recorded." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

05 July 2012

🚧Project Management: Project Manager (Definitions)

"The individual responsible for managing a project." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"The person responsible for managing a project." (Margaret Y Chu, "Blissful Data ", 2004)

"The person responsible for planning, directing, and controlling a project." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"The person assigned by the performing organization to lead the team that is responsible for achieving the project objectives." (Project Management Institute, "The Standard for Program Management" 3rd Ed., 2013)

"A professional who is responsible for managing a project’s pursuit of its intended outputs and/or outcomes. Project managers are operational leaders who are responsible for assuring that a project meets its operational goals for delivering work products with prescribed specifications - on time and on budget." (Richard J Heaslip, "Managing Complex Projects and Programs", 2014)

"Individual or body with authority, accountability and responsibility for managing a project to achieve specific objectives." (Chartered Institute of Building, "Code of Practice for Project Management for Construction and Development" 5th Ed., 2014)

"The individual responsible for managing a project and its completion within its scope, budget, and schedule." (Christopher Carson et al, "CPM Scheduling for Construction: Best Practices and Guidelines", 2014)

"The person primarily responsible for managing a project to its successful completion." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"An organizational employee, representative, or consultant appointed to prepare project plans and organize the resources required to complete a project, prior to, during, and upon closure of the project life cycle." (H James Harrington & William S Ruggles, "Project Management for Performance Improvement Teams", 2018)

01 July 2012

📦Data Migrations (DM): An Introduction

Data Migrations Series

Introduction

Basically, Data Migration is the movement of data from one IS (Information System), the legacy system, to a new IS, the target system, supposed to replace entirely or partially the legacy system. In the best scenario there are no differences between the two IS or the differences are minimal, negligible. In the worst scenario, there are multiple legacy systems used as source, and even multiple target systems, with important differences between them, differences that can even be translated in incompatibilities at multiple levels. Such architectures can span geographies, departments, organizations or industries; can involve a multitude of vendors, generations of systems, network types, different regulations, etc. In many Data Migrations the overall picture can be really complex, though for the sake of simplicity it’s enough to focus on the simplest scenario in which there is a single source and a single target system, with some differences between them. Abstraction can be made also of the fact that many migrations are parts of bigger projects, for example ERP implementations or any other type of applications migrations.

Data Migration is quite a complex topic, for many appearing like a black box in which data come in and data come out. That’s valid for the typical user as well for the IT professionals who haven’t been involved in Data Migration projects. There are many books on topics that are tangent to Data Migration – Data Management, Data Quality, Data Integration or Data Warehousing. Excepting some presentations available on the Web, a few methodologies exposed by important companies, one or two books, and a few blogs, there isn’t much material available on Data Migration. The “trend” is also a reflection of the low importance given to Data Migration as subject, even if many professionals working in the field warn about the considerable impact a Data Migration can have on a project in particular, and on business in general.

Approaching a topic like Data Migration can be, upon case, a complex task, however with a little intuition and some guidance its complexity falls apart. Often, when exploring such a topic, of help can be the 5W1H technique or its extended forms. The technique resumes to searching for answers to the “what”, “where”, “why”, “how”, “when”, “who” and “with what” questions. In case of Data Migration the questions are formulated as: what (data) to migrate, where to migrate, why to migrate, how to migrate, when to migrate, who migrates and with what to migrate?

Why to migrate?

A Data Migration occurs as follow up of a need – an old system exists in place and can’t cope anymore with business’ growth, a company made an acquisition and the systems need to be consolidated, or the organization decided to change its infrastructure, the processes, the business model in order address nowadays business requirements like flexibility, availability, manageability, automation, cost cuts, etc. In other words a Data Migration occurs as a need for change, and it can be itself a change in what concerns technical infrastructure, process, procedures, data flow, ways of doing business. A migration has quite an impact on the business, so here is an entitled question: does it really makes sense to migrate? Why not start from 0 with the new system?!

The migration can be a 0 point for an organization, though unless a company is starting anew, there are some data laying there in the old system(s) that need to be further available - for example open Purchase Orders that need to be fulfilled, Invoices that need to be paid, a catalog with all the Products and the available stock, information about Customers, what they bought, what they browsed or what they want to buy for Christmas, etc. At least some of the data need to be made available in one form and another also within the new architecture, if not the new system.

The availability of old data can be solved by keeping the old system(s) in place, functional, even if the system won’t be fed with new data, or maybe it will. Keeping a system alive involves additional costs for maintaining the infrastructure – software and hardware licenses, consultants, administrators and other people responsible for the optimal work of such a system. This can become with time quite an unnecessary burden. It can be an acceptable choice for some organizations, but unlikely as best/good practice. And even if the system is kept, more likely there will be data that need to be available also in the new system. Can be discussed also about integration of the two systems, but again, does it make sense? The bottom line is that in multiple scenarios a Data Migration can prove to be the optimal solution for an organization.

What data to migrate?

Even if it looks like a silly question, it can be one of most complex questions to answer. In theory is needed to migrate all the data, but are really needed all the data? Typically in a database can be found historical data not used anymore by the business, obsolete data marked or not for deletion, garbage data entered by mistake or remained after incomplete deletions, all these having low or no value for the business. Hopefully there are also “good data”, quintessential for the business. Somebody would say “what a hack, why do we need to philosophize so much, let’s migrate all the data!”. The decision can be understandable, though what if the percentage of “good data” is quite small in comparison with the total volume of data which can measure a few terabytes?! Sure, nowadays data centers can handle without problems terabytes of data, though there are some factors to be considered – it can be quite a challenge to migrate so many data, the volume of data affects also the performance of databases in particular, and IS in general, and a more natural reason – why store something that has minimal value for you?!

It makes sense to migrate only the data that have value for an organization, but what data are needed then? Normally this starts by understanding what entities the business deals with and which are the attributes that characterizes them. Many of the entities can be met in organization’s daily activity, and maybe are already defined in organization’s glossary or Data Dictionary, so a review of the available inventory might do. If not, more effort needs to be spent for this purpose; activities specific to Data Discovery, Data Categorization, Data Definition or Data Profiling tasks can help after case to fill the understanding gaps. Except categorization the others are not all necessary, same as the analysis can be deep enough to serve the purpose.

A first categorization was made above when data were considered as valuable, not valuable or in between. A second categorization can be made based on data’s usage: obsolete (not used anymore or marked for deletion), new (not used and recently entered), historical (data used in the past) and actual (data in use). A third categorization can be made on the status of the entities they represent, status that can be associated to the phase of the process the entity represent (e.g. active, inactive, open, invoices, closed, blocked, etc.). There can be considered other meaningful categorizations as long they prove to be important in identifying the useful data.

An important categorization in migrations, in particular, and Data Management, in general, is to split data in master data, transaction data and setup data. Master data are data are data that change only seldom and have a long life (until become obsolete), are referenced through all the system, and are vital to an organization through their meaning (e.g. Customers, Suppliers, Products, Assets, Employees, Accounts, etc.). Transaction data in exchange are data that change often and have a relatively short life, typically are referenced by other transactions and can be associated with documents or movements through the system (e.g. Purchase Orders, Sales Orders, Invoices, Receipts, Assets Movements, etc.). Setup data are data used to configure a system (e.g. Transaction Types, Document Types, Roles, Permissions, etc.). This categorization deserves the full attention, because each of the three elements needs a different handling approach in migration or Data Management.

Based on the identified categories can be considered some rough migration rules in deciding what data (actually records) to migrate, for example: - master data, unless they become obsolete, and open transactions are often considered to be migrated entirely; - historical transaction data spanning a few years back can be migrated in case they are needed in the process; - master data referenced by transaction data migrated need to be migrated too - setup data are entered manually - historical data are archived. There can be also exceptions from the rules, so such possible scenarios need to be considered too.

Each entity is defined by multiple attributes (also called properties, dimensions). They need to go through a similar “categorization” process. In deciding what attributes to migrate is important to consider especially their role in defining the entity. Some of them define uniquely an entity (e.g. Customer Number, Product Number, Serial Number), physical characteristics of the entity (e.g. color, weight, height), categorize the entity (e.g. Category, Type) or its status (e.g. Active, Blocked, Invoiced), imply various events (e.g. Creation Date, Delivery Date, Invoice Date), and so on. It looks like another type of categorization, and it is, though it’s more difficult to create some rough rules based on it, because in the end the business dictates which Attributes are needed. In fact, most of the Attributes used (with distinct not null values) in the legacy system are more likely needed also in the new system, unless the process changed considerably, or the business is supposed to change also its model.

Where to migrate the data?

When the Data Migration subject is brought on the table, a decision was already made about the target system. So the “where” question is partially answered, however it addresses only the peak of the iceberg. It shows that an iceberg lies there, in front of us, though under the deep of the waters there is something more, lot of questions and issues that need to be addressed. Like the source, the target needs to be further detailed in entities and their attributes; the targeted processes and procedures need to be considered together with the constraints imposed by the new system. It’s actually needed to identify the data requirements for the new systems and corroborate them with the requirements of the old system. Mapping the entities and attributes available in the two systems, process known as Data Mapping, can offer a good overview of what lays ahead, what similarities and gaps exist. There will be attributes that are available in the legacy but not in the target system, and therefore the target system needs to be extended or the data associated with the respective attributes can be left out. From the opposed perspective, there can be mandatory attributes in the target system which are not available in the organization, and therefore the associated data must be collected and/or made available for the migration. There can be cases when the data are not available in the legacy system but distributed in various other external or internal sources, so there can be an option to migrate or integrate the respective data, extend the processes to accommodate such scenarios, etc.

Only when the mapping of data is ready and the various related questions addressed, the “where” question is fully answered. Given the continuous changes done to the target system that may still happen a few days before Go Live, Data Mapping can remain a hot topic until then.

With what to migrate?

This question addresses the mix of tools used to migrate the data, and by extension the whole architecture developed for this purpose. As many experts point out, there is no general solution for such an approach because each migration is challenged by different requirements and architectures. ETL (Extract, Transform, Load) and Data Integration tools were mainly designed for this kind of purposes – moving data between data sources – therefore more likely the whole Data Migration architecture will be built around such a tool. In addition is needed to be addressed topics like assessment and reporting of Data Quality, Data Cleaning, Data Enrichment, Data Backup or Data Security. They will technically ensure that the data are migrated within intended level of quality and security.

For each of these topics are available one or more tools on the market. The challenge is to find the right mixture for the overall architecture, to make them work together in an efficient and effective manner. One of the problems such tools have is that they look to the Data Migration or similar problems from their own perspective, making them hard to integrate with other tools. Given the increasing need for Data Migration, more likely exist there tools that cover most of its requirements, each with its own advantages and disadvantages. Starting with a new tool can prove to be quite challenge in itself. Many recommend following a methodology and using tools that already proved their capabilities in other projects. That’s a good approach, though need to be considered also costs, available resources, effort to build the infrastructure, the learning curve, etc. For some migrations MS Excel or Access will do, for others a more complex framework is needed. Keep in mind that there is no perfect architecture, just the architecture that will drive you to achieve your targets.

How to migrate the data?

“How” refers mainly to the migration approach, steps, methodologies, processes and procedures used to migrate the data. Secondly, and not less important, it refers to how the mix of tools is used for migration – in other words the implementation. Despite the huge variety of tools and means of achieving the target, there can be depicted some generalities for each of these topics.

Migration approach refers to the overall strategy considered for a migration – typically on whether the data are migrated all together, the new system becoming functional and replacing the legacy system (the big-bang migration), or the data are migrated in phases, the legacy and target systems functioning in parallel for a certain amount of time (the phased-out migration). Can be met other variations of migration approaches, under various denominations. It’s important to know the advantages and disadvantages of both or all approaches, especially in what concerns their application in your organization.

“Steps” is just a misnomer for the actual Project Plan in which are considered the different phases and activities of such a project. In a general Data Migration project, can be discussed about Data Discovery, Data Definition, Data Collection, Data Consolidation, Data Mapping, Data Conversion, Data Transformation, Data Quality Assessment, Data Cleaning, Data Storage, etc. Some of these steps can be considered as standalone processes, sometimes being already part of the processes’ landscape existing in an organization. Other steps are just simple activities. Both types of steps share some important characteristics – they can be highly iterative and complex, are owned by the business, the IT functioning as facilitator, each of them depends on the input from other steps, and require continuous feedback, etc.

A Data Migration is (should be) managed as any other IT project, and therefore can be discussed about project-specific methodologies like PMBOK, Prince2 or PRISM. Many of the before mentioned steps come with their luggage of methodologies too. In addition, considering that IT functions as a service, could be considered service-specific methodologies like ITIL, ISO/IEC, Six Sigma, etc.

The actual implementation of all these depends entirely on the project’s scope, the knowledge of all those involved, the constraints met and the resources available for such a project. Many of the IT-specific problems and situations are specific across all IT projects.

Who will migrate the data?

There is no Data Migration project that can be done without the business, the de facto owner of such a project and its output. There is lot of input needed from the business, its continuous involvement through the various stages is necessary for the whole duration. Unless the Data Migration resumes to a rudimentary tool like Excel and can be handled without too much expertise, a Data Migration needs technical resources that can elicit the requirements, translate them in technical requirements, built the infrastructure and maybe migrate the data. It entirely depends on the overall architecture and methodology what people are involved. In the best case scenario the migration will resume to one person pushing a button and the data flow as magic from source to the target system. In reality, multiple people will have to take care of migration, pushing some magic buttons in a chain of parallel and even redundant steps, monitoring and validating the process. Data owners, data stewards, data custodians, data architects, database administrators, migration and quality assurance specialists, developers, consultants and many other people can be involved, each of them playing their role.

When to migrate the data?

Intuitively, data are or should be migrated when the target system is ready to receive the new data, thus when the development was finished, the system tested, and all the preparation for Data Migration were made. The statement is valid for any type of migration. How such a date or dates are calculated when a project starts is in itself kind of science or just a matter of needs. There are projects in which the dates for each milestone or phase are calculated back from a desired Go Live date, or projects in which the Go Live is calculated incrementally based on the steps to be performed. For dates’ calculation can be used also benchmarking from the field. The bottom line is that the data must be migrated on time for the Go Live and with a minimum disruption for the business.

Conclusion

Whether standalone or as subproject of another project, a Data Migration can be or become quite a complex thematic that, through its outcomes, affects the business considerably. In the above paragraphs were considered some of the important aspects of such a project, the focus being more on figuring out what a migration implies rather than a detailed exploration. It’s also a mental exercise and an invitation into the thematic.

Previous Post <<||>> Next Post

06 June 2012

🚧Project Management: Lessons Learned (Definitions)

"The learning gained from the process of performing the project. Lessons learned may be identified at any point. Also considered a project record." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"The systematic capturing of experiences to learn for the future and to avoid mistakes. Lessons learned sessions can, for instance, be held at completion of a project or project phase." (Lars Dittmann et al, "Automotive SPICE in Practice", 2008)

"The learning gained from the process of performing the project. Lessons learned may be identified at any point. Also considered a project record, to be included in the lessons learned knowledge base." (Project Management Institute, "Practice Standard for Project Estimating", 2010)

"An important step in wrapping up management of any implementation process, this step documents successes and failures in each systems development phase as well as the project as a whole." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Information from past projects, such as problems that occurred and risks that were encountered, that can inform future projects." (Gina Abudi & Brandon Toropov, "The Complete Idiot's Guide to Best Practices for Small Business", 2011)

"Information provided by team members and other stakeholders about how aspects of the project can be repeated or enhanced and how problems and issues can be prevented or resolved. This information can be used to improve the performance of future projects. Lessons learned are collected in a knowledge base or report." (Bonnie Biafore & Teresa Stover, "Your Project Management Coach: Best Practices for Managing Projects in the Real World", 2012)

"The knowledge gained during a project which shows how project events were addressed or should be addressed in the future with the purpose of improving future performance. |" (For Dummies, "PMP Certification All-in-One For Dummies, 2nd Ed.", 2013)

03 June 2012

📦Data Migrations (DM): What is Data Migration?

Data Migrations Series

If you are working in a data-centric business it’s almost impossible for the average worker not to have heard this term, even tangentially. Considering the meaning of “migration” - the act or process of moving from one place to another - the intuition might even tell what data migration is about: the process of moving data from one place to another. It’s pretty basic, isn’t it? Now as data are moved over and over again between various places, for example the various layers of an applications, between databases, between media storage devices, and so on, we need some precision in defining the term because not all these can be considered as data migration examples. Actually we can talk about data copying or data movement without speaking of data migration. So, what is data migration? Here are a few takes on defining data migration:

“process of transferring data from one platform or operating system to another” (Babylon)

"Data migration is the process of transferring data between storage types, formats, or computer systems." (Wikipedia)

"Data migration is the movement of legacy data to new media and technologies as the older ones are displaced." (Toolbox)

“The purpose of data migration is to transfer existing data to the new environment.” (Talend)

“Data Migration is the process of moving data from one or more sources into a target application” (Utopia Inc.)

“[…] is the one off selection, preparation and transportation of appropriate data, of the right quality, to the right place, at the right time.” (J. Morris)

Resuming the above definitions, data migration can be defined as “the process of selecting, assessing, converting, preparing, validating and moving data from one or more information systems to another system”. The definition isn’t at all perfect, first of all because some of the terms need further explanation, secondly because any of the steps may be skip or other important steps can be identified in the process, and thirdly because further clarifications are needed. Anyway, it offers some precision, and at least for this reason, could be preferred to the above definitions.

So, resuming, data migration supposes the movement of data from one or more information systems, referred as source systems, to another one, the target system. Typically the new system replaces the old systems, they being retired, or they can continue to be used with reduced scope, for example for reporting purposes or . Even if performed in stages, the movement is typically one time activity, so everything has to be perfect. That’s the purpose of the other steps – to minimize the risks of something going wrong. The choice of steps and their complexity depends on the type of information systems involved, on the degree of resemblance between source and target, business needs, etc.

As mentioned above, not everything that involves data movement can be considered as data migration. For example data integration involves the movement and combination of data from various information systems in order to provide a unified view. Data synchronization involves the movement of data in order to reflect the changes of data in one information system into another, when data from the two systems need to be consistent. Data mirroring involves the synchronization of data, though it involves an exact copy of the data, the mirroring occurring continuously in real time. Data backup involves the movement/copy of data at a given point in time for eventual restore in case of data loss. Data transfer refers to the movement of row data between the layers of information systems. To make things even fuzzier, these types of data movements can be considered in a data migration too, as data need to be locally integrated, synchronized, transferred, mirrored or back up. Data migration is overall a complex thematic.

Previous Post <<||>> Next Post

01 June 2012

🚧Project Management: Business Case (Definitions)

"The business reason or value for undertaking the project." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

"A “needs assessment” or justification to show why an endeavor is necessary to an organization." (Margaret Y Chu, "Blissful Data ", 2004)

"A business problem, situation, or opportunity that justifies the pursuit of a technology project." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"[...] a formal document used to justify investments in new products, product enhancements, and marketing expenditures." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"The process that documents the customer's justification for their investments in products and services." (Steven Haines, "The Product Manager's Desk Reference", 2008)

"A document that provides a justification for a business investment, often used in terms of technology investments. Business cases can be built by identifying financial (hard-dollar) benefits and intangible benefits, including mitigation of risk." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"A structured format for organizing the reasons, benefits, and estimated costs for initiating a project or program." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A written document that is used by managers to justify funding for a specific investment and also to provide the bridge between the initial plan and its execution." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"An analysis of the benefits and costs of making a change to the way things are done." (Mike Clayton, "Brilliant Project Leader", 2012)

"The justification for a project or program, against which performance is compared throughout the life cycle. Typically, the business case contains costs, benefits, risks, and timescales." (Paul C Dinsmore et al, "Enterprise Project Governance", 2012)

"Framework for making decisions, explanation of benefits and costs, anticipated outcomes, and project factors associated with a performance improvement effort." (Joan C Dessinger, "Fundamentals of Performance Improvement" 3rd Ed, 2012)

"A documented economic feasibility study used to establish validity of the benefits to be delivered by a program." (Project Management Institute, "The Standard for Program Management" 3rd Ed., 2013)

"A documented economic feasibility study used to establish validity of the benefits of a selected component lacking sufficient definition and that is used as a basis for the authorization of further project management activities." (For Dummies, "PMP Certification All-in-One For Dummies, 2nd Ed.", 2013)

"Information necessary to enable approval, authorisation and policy-making bodies to assess a project proposal and reach a reasoned decision." (Chartered Institute of Building, "Code of Practice for Project Management for Construction and Development, 5th Ed", 2014)

"A written analysis of the financial, productivity, auditability, and other factors to justify the investment in software and hardware systems, implementation, and training." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"A compilation of costs and benefits associated with a planned project or investment." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"A business case captures the reasoning for initiating a project or task. The business case justifies the investment." (by Brian Johnson & Leon-Paul de Rouw, "Collaborative Business Design", 2017)

"A documented evaluation (pre-project) of the potential impact a problem or an opportunity has on the organization to determine if it is worthwhile investing the resources to correct the problem or take advantage of the opportunity for improvement. It captures the reason for initiating a potential project or program." (H James Harrington & William S Ruggles, "Project Management for Performance Improvement Teams", 2018)

"Documentation of the rationale for making a business investment, used both to support a business decision on whether to proceed with the investment and as an operational tool to support management of the investment through its full economic life cycle." (ISACA)

30 May 2012

🚧Project Management: Performance [Measurement] Baseline (Definitions)

"The original approved plan for work such as a project. Usually used with a modifier, e.g., cost baseline, schedule baseline, performance measurement baseline." (Margaret Y Chu, "Blissful Data ", 2004)

"An approved integrated scope-schedule-cost plan for the project work against which project execution is compared to measure and manage performance. Technical and quality parameters may also be included." (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"An approved plan for a project, plus or minus approved changes. It is compared to actual performance to determine if performance is within acceptable variance thresholds. Generally refers to the current baseline, but may refer to the original or some other baseline. Usually used with a modifier (e.g., cost performance baseline, schedule baseline, performance measurement baseline, technical baseline). |" (Cynthia Stackpole, "PMP Certification All-in-One For Dummies", 2011)

"An approved, integrated scope-schedule-cost plan for the project work against which project execution is compared to measure and manage performance. The PMB includes contingency reserve, but excludes management reserve." (For Dummies, "PMP Certification All-in-One For Dummies" 2nd Ed., 2013)

"Integrated scope, schedule, and cost baselines used for comparison to manage, measure, and control project execution." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK Guide)", 2017)

"An approved set of integrated plans for the project’s Scope, Schedule, and Budget against which project execution will be compared to measure and manage performance. They include 'contingency reserve' but not 'management reserve' allocations." (H James Harrington & William S Ruggles, "Project Management for Performance Improvement Teams", 2018)

"The approved version of a work product that can be changed using formal change control procedures, and is used as the basis for comparison to actual results." (Project Management Institute, "Practice Standard for Scheduling" 3rd Ed., 2019)

06 May 2012

🚧Project Management: Variance (Definitions)

"The difference of revenues, costs, and profit from the planned amounts. One of the most important phases of responsibility accounting is establishing standards in costs, revenues, and profit and establishing performance by comparing actual amounts with the standard amounts. The differences (variances) are calculated for each responsibility center, analyzed, and unfavorable variances are investigated for possible remedial action." (Jae K Shim & Joel G Siegel, "Budgeting Basics and Beyond", 2008)

"A quantifiable deviation, departure, or divergence away from a known baseline or expected value. " (Cynthia Stackpole, "PMP® Certification All-in-One For Dummies®", 2011)

"The difference between the baseline and estimated dates, work, or cost in a project." (Bonnie Biafore, "Successful Project Management: Applying Best Practices and Real-World Techniques with Microsoft® Project", 2011)

"The difference between planned or baseline schedule or cost data and the actual schedule or cost data." (Bonnie Biafore & Teresa Stover, "Your Project Management Coach: Best Practices for Managing Projects in the Real World", 2012)

"Deviation or difference between an estimated value and the actual value." (Tom Klammer, "Statement of Cash Flows: Preparation, Presentation, and Use", 2018)

"The difference between a planned value and the actual measured value" (ITIL)