Saturday, November 10, 2012

Data Quality: Data Migration’s Perspective – Part I: A Bird’s-Eye View

    Imagine you just finished a Data Migration (DM) project, everything went smoothly, the data were loaded into the new system with a minimum amount of issues, inherent sometimes to such complex projects, the users started to use the new system, everybody seemed to be satisfied, and a few weeks later within the company rumors propagate with the speed of light – “the migrated data are wrong”, “the new system can’t be used” , “IT did a bad job”, “we have to get back to the previous system”, and so on. The panic propagates, a few heads fall, the business tries to revert to the old system but there’s lot of new data available in the new system, and it’s not so trivial to move the data back to the old, in the meantime other rumors appear, and… it’s just a scenario but this could happen to any company if not the appropriate measures were taken at the right time. What could help a company when something like this happens?! A good Plan B aka a good Migration Fallback Plan/Policy, but that’s something nobody would like to do except extreme situations.

    A common approach to any type of projects as well to a DM project is to identify and mitigate the risks before or during the project. That’s something I started to do a few days ago, to prepare a list with the risks associated with DM projects. For this exercise I tried to remember what things went wrong in previous similar projects I worked on and to figure out what else could go wrong. Some online resources helped me to refresh my memory too, and I think I found also two or three things I haven’t really thought about. My attempt was primarily focused on this type of problem mentioned above – minimizing the risks of not having the right data when the new system goes live. Before jumping into the thematic I would like to sketch the bigger picture, as I perceive it.

    Having the right data when a system goes live primarily means having good Data Quality (DQ) in the target system after the data were migrated! As a DM is the best exemplification of the GIGO (Garbage-In Garbage-Out) principle, in order to have good DQ in the target is important to handle DQ latest during the DM project. That’s essential and common sense – you can’t expect to have good data in the new system when there’s lot of garbage in the old. So, a DM and a Risk Management for such a project should be built around this. In fact not having a DQ initiative or project in a DM project is one of the most important risks a company can take. Maybe in small DM, a DQ initiative isn’t necessary, though when the data are important for your company, DQ is a must! In addition DQ assessments have to be performed in alignment with the new system, and not the old. Even if the data have good quality into the old system, the quality of your data after DM will be judged in corroboration with the new system. This is a requirement that can be easily overlooked and its implications misunderstood!

    Many think that DQ is one time activity, we do it for a DM project and we’ll have quality data and never have to care about their quality anymore. Totally false! DQ has to be part of a broader strategy, call it Data Governance, Master Data Management, Data Management or any other initiative in which data plays an important role. DQ is an on going, iterative and consolidated effort, it doesn’t end after DM but continues for the whole data life-cycle, as long the data have value for an organization. It doesn’t help if the data have high quality when the data are migrated and a few weeks or months later the overall quality and trust in data decreased considerably.

    Keeping an acceptable level of DQ must be an organization’s strategy, and must be built a culture toward DQ. People need to be aware of the importance of having good quality data, and especially the consequences of having bad quality data. DQ doesn’t concern only the owners or stewards of data, or the people working with data, it concerns the whole organization because decisions are made based on those data, processes are changed and improved, an organization’s performance is often judged based on data. The quality of data is a matter of perception, on how users see the quality of data in corroboration with the needs they have, and the needs change over time. Primarily being aware what good DQ means and which are an organization’s needs in respect to data, it’s also a way of minimizing the negative perception of data, of gaining trust in data and some solid basis on which decisions can be made. Secondarily, these organizational data needs need to be addressed in a DM, they are the success factors upon which the success of a DM project is judged.

    For sure considerable costs are associated with DQ initiatives and everything related to data which doesn’t always represent a direct cost component in the products or services handled by an organization. Considering that not all data have the same importance for an organization, it makes sense to prioritize the DQ effort as a whole and the data cleaning needs in particular, the focus should be the data with the highest impact and with time to tackle data with lower and lower impact. It must be found equilibrium between the DQ costs and the value of data. Most probably is important to spend resources on raising people’s awareness in respect to DQ early rather than cleaning retroactively data later. It also make sense to invest in tools that help to clean data using automated or semi-automated methods, though some manual/visual control need to be in place too.

    DQ and the way the problems associated with it are tackled depends more on an organization’s internal kitchen – people, partners, organization, strategy, maturity, culture, geography, infrastructure, methodologies used, etc. What it matters is how the various negative and important aspects of an organization are aligned in order to take advantage of one of the most important assets an organization has is its data! For this is important to adopt methodologies that support DQ, align them and tweak them as requested, in order to make most of your data! But before or while doing that remember that a DM is an organization’s opportunity to change the quality of its data and its data strategy!

Thursday, October 04, 2012

Business Rules – An Introduction

    "Business rules" seems to be a recurring theme these days – developers, DBAs, architects, business analysts, IT and non-IT professionals talk about the necessity to enforce them in data and semantic models, information systems, processes, departments or whole organizations. They seem to affect the important layers of an organization. In fact the same business rule can affect multiple levels either directly, or indirectly through the hierarchical or networked structure of causality it belongs to. When considered all the business rules, the overall picture can become very complex. The fact that there are multiple levels of interconnected layers, with applications and implications at macro or micro level, makes the complexity to fight back because in order to solve business-specific problems often you have to go at least one level above the level where the problems were defined, or to simplify the problems to a level of detail that allows to tackled.

    The Business Rules Group defines a business rule as "a statement that defines or constrains some aspect of the business" [1], definition which seems to be closer to the vocabulary of IT people. Ronald G. Ross, in his book Principles of the Business Rule Approach, defines it as "a directive intended to influence or guide business behavior" [2], definition closer to the vocabulary of HR people. In fact the two definitions are kind of similar, highlighting the constrictor or guiding role of business rules. They raise also an important question – can everything that is catalogued as constraint or guidelines considered as a business rule? In theory yes, practically there are constraints and guidelines that have different impact on the business, so depending on context they need to be considered or not. What to consider is itself an art, which adds up to the art of problem solving.

    Besides identification, neither the definition nor management of business rules seems easy tasks. R.G. Ross considers that business rules need to be written and made explicit, expressed in plain language, independent of procedures and workflows, built on facts, motivated by identifiable and important business factors, accessible to authorized parties, specific, single sourced, managed, specified by those people who have relevant knowledge, and they should guide or influence behavior in desired ways [2]. This summarizes the various aspects that need to be considered when defining and managing business rules. Many organization seems to be challenged by this, and it can be challenging when lacks business management maturity.

    Many business rules exist already in functional and technical specifications written for the various software products built on request, in documentation of purchases software, in processes, procedures, standards, internal defined and external enforced policies, in the daily activities and knowledge exchanged or hold by workers. Sure, the formulations existing in such resources need to be enhanced and aggregated in order to be brought at the status of business rule. And here comes the difficulty, as iterative work needs to be performed in order to bring them to the level indicated by R.G Ross. For sure Ross’ specifications are idealistic, though they offer a “framework” for defining business rules. In what concerns their management, there is a lot to be done within an organization, as this aspect needs to be integrated with other activities and strategies existing in an organization.

    Often, when an important initiative, better said project, starts within an organization, then is felt in particular the lack of up-front defined and understood business rules. Such events trigger the identification and elicitation of business rules; they are addressed in documentation and remain buried in there. It is also true that it’s difficult to build a business case for further processing of business rules. An argument could be the costs associated from decisional mistakes taken by not knowing the existing rules, though that’s something difficult to quantify and make visible in an organization. In the end, most probably an organization will recognize the value of business rules when it reached a certain level of maturity.

References:
[1] Business Rules Group (2000) Defining Business Rules - What Are They Really? [Online] Available from: http://businessrulesgroup.org/first_paper/BRG-whatisBR_3ed.pdf
[2] Ronald G. Ross (2003) Principles of the Business Rule Approach. Addison Wesley. ISBN: 0-201-78893-4.

Saturday, June 30, 2012

Data Migration – An Introduction

Introduction

    Basically, Data Migration is the movement of data from one IS (Information System), the legacy system, to a new IS, the target system, supposed to replace entirely or partially the legacy system. In the best scenario there are no differences between the two IS or the differences are minimal, negligible. In the worst scenario, there are multiple legacy systems used as source, and even multiple target systems, with important differences between them, differences that can even be translated in incompatibilities at multiple levels. Such architectures can span geographies, departments, organizations or industries; can involve a multitude of vendors, generations of systems, network types, different regulations, etc. In many Data Migrations the overall picture can be really complex, though for the sake of simplicity it’s enough to focus on the simplest scenario in which there is a single source and a single target system, with some differences between them. Abstraction can be made also of the fact that many migrations are parts of bigger projects, for example ERP implementations or any other type of applications migrations.

    Data Migration is quite a complex topic, for many appearing like a black box in which data come in and data come out. That’s valid for the typical user as well for the IT professionals who haven’t been involved in Data Migration projects. There are many books on topics that are tangent to Data Migration – Data Management, Data Quality, Data Integration or Data Warehousing. Excepting some presentations available on the Web, a few methodologies exposed by important companies, one or two books, and a few blogs, there isn’t much material available on Data Migration. The “trend” is also a reflection of the low importance given to Data Migration as subject, even if many professionals working in the field warn about the considerable impact a Data Migration can have on a project in particular, and on business in general.

    Approaching a topic like Data Migration can be, upon case, a complex task, however with a little intuition and some guidance its complexity falls apart. Often, when exploring such a topic, of help can be the 5W1H technique or its extended forms. The technique resumes to searching for answers to the “what”, “where”, “why”, “how”, “when”, “who” and “with what” questions. In case of Data Migration the questions are formulated as: what (data) to migrate, where to migrate, why to migrate, how to migrate, when to migrate, who migrates and with what to migrate?

Why to migrate?

    A Data Migration occurs as follow up of a need – an old system exists in place and can’t cope anymore with business’ growth, a company made an acquisition and the systems need to be consolidated, or the organization decided to change its infrastructure, the processes, the business model in order address nowadays business requirements like flexibility, availability, manageability, automation, cost cuts, etc. In other words a Data Migration occurs as a need for change, and it can be itself a change in what concerns technical infrastructure, process, procedures, data flow, ways of doing business. A migration has quite an impact on the business, so here is an entitled question: does it really makes sense to migrate? Why not start from 0 with the new system?!

    The migration can be a 0 point for an organization, though unless a company is starting anew, there are some data laying there in the old system(s) that need to be further available - for example open Purchase Orders that need to be fulfilled, Invoices that need to be paid, a catalog with all the Products and the available stock, information about Customers, what they bought, what they browsed or what they want to buy for Christmas, etc. At least some of the data need to be made available in one form and another also within the new architecture, if not the new system.

    The availability of old data can be solved by keeping the old system(s) in place, functional, even if the system won’t be fed with new data, or maybe it will. Keeping a system alive involves additional costs for maintaining the infrastructure – software and hardware licenses, consultants, administrators and other people responsible for the optimal work of such a system. This can become with time quite an unnecessary burden. It can be an acceptable choice for some organizations, but unlikely as best/good practice. And even if the system is kept, more likely there will be data that need to be available also in the new system. Can be discussed also about integration of the two systems, but again, does it make sense? The bottom line is that in multiple scenarios a Data Migration can prove to be the optimal solution for an organization.

What data to migrate?

    Even if it looks like a silly question, it can be one of most complex questions to answer. In theory is needed to migrate all the data, but are really needed all the data? Typically in a database can be found historical data not used anymore by the business, obsolete data marked or not for deletion, garbage data entered by mistake or remained after incomplete deletions, all these having low or no value for the business. Hopefully there are also “good data”, quintessential for the business. Somebody would say “what a hack, why do we need to philosophize so much, let’s migrate all the data!”. The decision can be understandable, though what if the percentage of “good data” is quite small in comparison with the total volume of data which can measure a few terabytes?! Sure, nowadays data centers can handle without problems terabytes of data, though there are some factors to be considered – it can be quite a challenge to migrate so many data, the volume of data affects also the performance of databases in particular, and IS in general, and a more natural reason – why store something that has minimal value for you?!

    It makes sense to migrate only the data that have value for an organization, but what data are needed then? Normally this starts by understanding what entities the business deals with and which are the attributes that characterizes them. Many of the entities can be met in organization’s daily activity, and maybe are already defined in organization’s glossary or Data Dictionary, so a review of the available inventory might do. If not, more effort needs to be spent for this purpose; activities specific to Data Discovery, Data Categorization, Data Definition or Data Profiling tasks can help after case to fill the understanding gaps. Except categorization the others are not all necessary, same as the analysis can be deep enough to serve the purpose.

    A first categorization was made above when data were considered as valuable, not valuable or in between. A second categorization can be made based on data’s usage: obsolete (not used anymore or marked for deletion), new (not used and recently entered), historical (data used in the past) and actual (data in use). A third categorization can be made on the status of the entities they represent, status that can be associated to the phase of the process the entity represent (e.g. active, inactive, open, invoices, closed, blocked, etc.). There can be considered other meaningful categorizations as long they prove to be important in identifying the useful data.

    An important categorization in migrations, in particular, and Data Management, in general, is to split data in master data, transaction data and setup data. Master data are data are data that change only seldom and have a long life (until become obsolete), are referenced through all the system, and are vital to an organization through their meaning (e.g. Customers, Suppliers, Products, Assets, Employees, Accounts, etc.). Transaction data in exchange are data that change often and have a relatively short life, typically are referenced by other transactions and can be associated with documents or movements through the system (e.g. Purchase Orders, Sales Orders, Invoices, Receipts, Assets Movements, etc.). Setup data are data used to configure a system (e.g. Transaction Types, Document Types, Roles, Permissions, etc.). This categorization deserves the full attention, because each of the three elements needs a different handling approach in migration or Data Management.

    Based on the identified categories can be considered some rough migration rules in deciding what data (actually records) to migrate, for example: - master data, unless they become obsolete, and open transactions are often considered to be migrated entirely; - historical transaction data spanning a few years back can be migrated in case they are needed in the process; - master data referenced by transaction data migrated need to be migrated too - setup data are entered manually - historical data are archived. There can be also exceptions from the rules, so such possible scenarios need to be considered too.

    Each entity is defined by multiple attributes (also called properties, dimensions). They need to go through a similar “categorization” process. In deciding what attributes to migrate is important to consider especially their role in defining the entity. Some of them define uniquely an entity (e.g. Customer Number, Product Number, Serial Number), physical characteristics of the entity (e.g. color, weight, height), categorize the entity (e.g. Category, Type) or its status (e.g. Active, Blocked, Invoiced), imply various events (e.g. Creation Date, Delivery Date, Invoice Date), and so on. It looks like another type of categorization, and it is, though it’s more difficult to create some rough rules based on it, because in the end the business dictates which Attributes are needed. In fact, most of the Attributes used (with distinct not null values) in the legacy system are more likely needed also in the new system, unless the process changed considerably, or the business is supposed to change also its model.

Where to migrate the data?

    When the Data Migration subject is brought on the table, a decision was already made about the target system. So the “where” question is partially answered, however it addresses only the peak of the iceberg. It shows that an iceberg lies there, in front of us, though under the deep of the waters there is something more, lot of questions and issues that need to be addressed. Like the source, the target needs to be further detailed in entities and their attributes; the targeted processes and procedures need to be considered together with the constraints imposed by the new system. It’s actually needed to identify the data requirements for the new systems and corroborate them with the requirements of the old system. Mapping the entities and attributes available in the two systems, process known as Data Mapping, can offer a good overview of what lays ahead, what similarities and gaps exist. There will be attributes that are available in the legacy but not in the target system, and therefore the target system needs to be extended or the data associated with the respective attributes can be left out. From the opposed perspective, there can be mandatory attributes in the target system which are not available in the organization, and therefore the associated data must be collected and/or made available for the migration. There can be cases when the data are not available in the legacy system but distributed in various other external or internal sources, so there can be an option to migrate or integrate the respective data, extend the processes to accommodate such scenarios, etc.

    Only when the mapping of data is ready and the various related questions addressed, the “where” question is fully answered. Given the continuous changes done to the target system that may still happen a few days before Go Live, Data Mapping can remain a hot topic until then.

With what to migrate?

    This question addresses the mix of tools used to migrate the data, and by extension the whole architecture developed for this purpose. As many experts point out, there is no general solution for such an approach because each migration is challenged by different requirements and architectures. ETL (Extract, Transform, Load) and Data Integration tools were mainly designed for this kind of purposes – moving data between data sources – therefore more likely the whole Data Migration architecture will be built around such a tool. In addition is needed to be addressed topics like assessment and reporting of Data Quality, Data Cleaning, Data Enrichment, Data Backup or Data Security. They will technically ensure that the data are migrated within intended level of quality and security.

    For each of these topics are available one or more tools on the market. The challenge is to find the right mixture for the overall architecture, to make them work together in an efficient and effective manner. One of the problems such tools have is that they look to the Data Migration or similar problems from their own perspective, making them hard to integrate with other tools. Given the increasing need for Data Migration, more likely exist there tools that cover most of its requirements, each with its own advantages and disadvantages. Starting with a new tool can prove to be quite challenge in itself. Many recommend following a methodology and using tools that already proved their capabilities in other projects. That’s a good approach, though need to be considered also costs, available resources, effort to build the infrastructure, the learning curve, etc. For some migrations MS Excel or Access will do, for others a more complex framework is needed. Keep in mind that there is no perfect architecture, just the architecture that will drive you to achieve your targets.

How to migrate the data?

    “How” refers mainly to the migration approach, steps, methodologies, processes and procedures used to migrate the data. Secondly, and not less important, it refers to how the mix of tools is used for migration – in other words the implementation. Despite the huge variety of tools and means of achieving the target, there can be depicted some generalities for each of these topics.

    Migration approach refers to the overall strategy considered for a migration – typically on whether the data are migrated all together, the new system becoming functional and replacing the legacy system (the big-bang migration), or the data are migrated in phases, the legacy and target systems functioning in parallel for a certain amount of time (the phased-out migration). Can be met other variations of migration approaches, under various denominations. It’s important to know the advantages and disadvantages of both or all approaches, especially in what concerns their application in your organization.

    “Steps” is just a misnomer for the actual Project Plan in which are considered the different phases and activities of such a project. In a general Data Migration project, can be discussed about Data Discovery, Data Definition, Data Collection, Data Consolidation, Data Mapping, Data Conversion, Data Transformation, Data Quality Assessment, Data Cleaning, Data Storage, etc. Some of these steps can be considered as standalone processes, sometimes being already part of the processes’ landscape existing in an organization. Other steps are just simple activities. Both types of steps share some important characteristics – they can be highly iterative and complex, are owned by the business, the IT functioning as facilitator, each of them depends on the input from other steps, and require continuous feedback, etc.

    A Data Migration is (should be) managed as any other IT project, and therefore can be discussed about project-specific methodologies like PMBOK, Prince2 or PRISM. Many of the before mentioned steps come with their luggage of methodologies too. In addition, considering that IT functions as a service, could be considered service-specific methodologies like ITIL, ISO/IEC, Six Sigma, etc.

    The actual implementation of all these depends entirely on the project’s scope, the knowledge of all those involved, the constraints met and the resources available for such a project. Many of the IT-specific problems and situations are specific across all IT projects.

Who will migrate the data?

    There is no Data Migration project that can be done without the business, the de facto owner of such a project and its output. There is lot of input needed from the business, its continuous involvement through the various stages is necessary for the whole duration. Unless the Data Migration resumes to a rudimentary tool like Excel and can be handled without too much expertise, a Data Migration needs technical resources that can elicit the requirements, translate them in technical requirements, built the infrastructure and maybe migrate the data. It entirely depends on the overall architecture and methodology what people are involved. In the best case scenario the migration will resume to one person pushing a button and the data flow as magic from source to the target system. In reality, multiple people will have to take care of migration, pushing some magic buttons in a chain of parallel and even redundant steps, monitoring and validating the process. Data owners, data stewards, data custodians, data architects, database administrators, migration and quality assurance specialists, developers, consultants and many other people can be involved, each of them playing their role.

When to migrate the data?

    Intuitively, data are or should be migrated when the target system is ready to receive the new data, thus when the development was finished, the system tested, and all the preparation for Data Migration were made. The statement is valid for any type of migration. How such a date or dates are calculated when a project starts is in itself kind of science or just a matter of needs. There are projects in which the dates for each milestone or phase are calculated back from a desired Go Live date, or projects in which the Go Live is calculated incrementally based on the steps to be performed. For dates’ calculation can be used also benchmarking from the field. The bottom line is that the data must be migrated on time for the Go Live and with a minimum disruption for the business.

Conclusion

    Whether standalone or as subproject of another project, a Data Migration can be or become quite a complex thematic that, through its outcomes, affects the business considerably. In the above paragraphs were considered some of the important aspects of such a project, the focus being more on figuring out what a migration implies rather than a detailed exploration. It’s also a mental exercise and an invitation into the thematic.

Sunday, June 03, 2012

Data Migration – What is Data Migration?

    If you are working in a data-centric business it’s almost impossible for the average worker not to have heard this term, even tangentially. Considering the meaning of “migration” - the act or process of moving from one place to another - the intuition might even tell what data migration is about: the process of moving data from one place to another. It’s pretty basic, isn’t it? Now as data are moved over and over again between various places, for example the various layers of an applications, between databases, between media storage devices, and so on, we need some precision in defining the term because not all these can be considered as data migration examples. Actually we can talk about data copying or data movement without speaking of data migration. So, what is data migration? Here are a few takes on defining data migration:

    “process of transferring data from one platform or operating system to another” (Babylon)

   "Data migration is the process of transferring data between storage types, formats, or computer systems." (Wikipedia)

    "Data migration is the movement of legacy data to new media and technologies as the older ones are displaced." (Toolbox)

    “The purpose of data migration is to transfer existing data to the new environment.” (Talend)

    “Data Migration is the process of moving data from one or more sources into a target application” (Utopia Inc.)

    “[…] is the one off selection, preparation and transportation of appropriate data, of the right quality, to the right place, at the right time.(J. Morris)

    Resuming the above definitions, data migration can be defined as “the process of selecting, assessing, converting, preparing, validating and moving data from one or more information systems to another system”. The definition isn’t at all perfect, first of all because some of the terms need further explanation, secondly because any of the steps may be skip or other important steps can be identified in the process, and thirdly because further clarifications are needed. Anyway, it offers some precision, and at least for this reason, could be preferred to the above definitions.

    So, resuming, data migration supposes the movement of data from one or more information systems, referred as source systems, to another one, the target system. Typically the new system replaces the old systems, they being retired, or they can continue to be used with reduced scope, for example for reporting purposes or . Even if performed in stages, the movement is typically one time activity, so everything has to be perfect. That’s the purpose of the other steps – to minimize the risks of something going wrong. The choice of steps and their complexity depends on the type of information systems involved, on the degree of resemblance between source and target, business needs, etc.

    As mentioned above, not everything that involves data movement can be considered as data migration. For example data integration involves the movement and combination of data from various information systems in order to provide a unified view. Data synchronization involves the movement of data in order to reflect the changes of data in one information system into another, when data from the two systems need to be consistent. Data mirroring involves the synchronization of data, though it involves an exact copy of the data, the mirroring occurring continuously in real time. Data backup involves the movement/copy of data at a given point in time for eventual restore in case of data loss. Data transfer refers to the movement of row data between the layers of information systems. To make things even fuzzier, these types of data movements can be considered in a data migration too, as data need to be locally integrated, synchronized, transferred, mirrored or back up. Data migration is overall a complex thematic.

Monday, April 09, 2012

BI between Products, Partners, People and Processes

    In the previous post, “BI between Potential, Reality, Quality and Stories”, I was commenting five of the important findings of a study led by KPMG in respect to the state of art in BI initiatives. My comments were centered mainly on the first 3 of the 4Ps (Products, People, Partners, respectively Processes) considered in ITSM (IT Service Management). The connection to IT Service Management isn’t accidental, BI being also an organizational capability. Many of the aspects related to the 4Ps perspectives, reveal the maturity of an organization in leveraging its BI infrastructure.  In this post I would like to consider BI landscape from these 4 perspectives.

Products

    Products or technology perspective has within BI context a dual nature. First of all we have to consider the BI infrastructure – the whole set of BI tools we have at disposal for our shiny reports. Secondly, because the BI infrastructure doesn’t stand on itself, we have to consider also IT infrastructure on which BI infrastructure is based upon – a full range of ISs (Information Systems) in which data are entered, processed, transported and consumed before they are used by the BI tools. For Data Quality issues, we often have to consider the broader perspective, and tackle the problems at the source. Otherwise we might arrive to treat the symptoms and not the causes. It’s important to note that the two layers or perspectives are interconnected, the consequences being bidirectional.

    A typical BI infrastructure revolves around several databases, maybe one or more data warehouses and data marts, and one or more reporting systems. Within the most basic scenario, the data flow is unidirectional from databases to data warehouse/marts, reports being built on top of the data warehouse/marts or directly on the IS’ databases. In more complex scenarios, the data can flow between the various ISs when they were integrated, and even between data warehouses/marts, within a unidirectional or bidirectional flow.  Unless the reports are based directly on the ISs’ databases, such architectures lead to data duplication, conversions between complex schemas, delays between the various layers, to mention just a few of the most important implications. In some point in time the complexity falls down on you.

    One of the problems I met is that a considerable percent of the IS are not developed to address BI requirements. It starts with data validation, with the way data are modeled, structured, formatted and made available for BI consumption. If you want to increase the quality of your data, you have sooner or later to address them. It’s important thus the degree to which the systems are designed to cover the BI needs in particular, and decision making in general. This presumes that BI requirements need to be addressed in early phases of implementations, software design or when tools are consider for purchase.

    In addition many ISs come with their own (standard) reports or reporting frameworks, becoming thus part of your BI infrastructure, intended or unintended. Even if such reports are intended to cover basic immediate reporting requirements, they not always so easy to consume, the logic behind them is not visible, are hard to extend, are not always tested, the additional reports built in other tools need to be synchronized with them, etc.

Partners

    We gather huge volumes of data, we are drowning in it; we want to take decision rooted in data and get visibility into the past, actual and future state of business. How can we achieve that if we don’t have the knowledge and human resources to achieve that? “Partners” is the magic word – external suppliers specialized, in theory, to provide this kind of services: BI analysts and developers, business analysts, data miners, and other IT professionals work together in order to build your BI infrastructure. One detail many people forget is that BI tools provide potentiality; are the skills and knowledge of those working with them that transforms that potentiality into success. On their capabilities depends the success of such projects. Not to forget that BI projects are similar to other IT projects, falling under same type of fallacies plus a few other fallacies of their own derived from exploratory and complex nature of BI projects.

    There is a dual nature also in “partners” perspective – except the external perspective which concerns the external partners and the IT department or the business as a whole, there is also the internal perspective in which the IT department plays again a central role. I heard it often loudly affirmed that the other departments are customers of the IT department, or the reciprocal. I have seen also this conception brought to extreme, in which the IT had no word to say in what concerns the IT infrastructure in general, respectively the BI infrastructure in particular. As long the IT department isn’t treated as a business partner, an organization will be more likely sabotaged from inside. Sabotage it’s a word too strong maybe, though it kind of reflects the state of art.

People

    Same as partners, people perspective includes a considerable variety of types: IT staff, executives, managers, end-users and other types of stakeholders, each of them with a word to say, grouped in various groups of interests that don’t always converge, situations in which politics plays a major role. It’s actually interesting to see how the decision for a given BI solution is made, how the solution takes its place into the landscape, how it’s used and misused, how personalities and knowledge harness it or stand in the way. I feel that there are organizations (people) which do BI just for the sake of doing something, copying sometimes recipes of success, without uniting the dots, without clear goals and strategy. There are people who juggle with numbers and BI concepts without knowing their meaning and what they involve. This aspect is reflected in how BI tools are selected, implemented and used.

    Having the best tools, consultants and highest data quality, won’t guarantee the success of BI initiative without users’ acceptance, without teaching them how to make constructive use of tools and data, on how to use and built models in order to solve the problems the business is confronted with, on how to address strategic, tactical and operational requirements. The transformation from a robot to a knowledge worker doesn’t happen over night. Is needed to make people aware of the various aspects of BI – data quality, process and data ownership, on how models can be used and misused, on how models evolve or become obsolete, how the BI infrastructure has to evolve with the business’ dynamics. There are so many aspects that need to be considered. It’s a continuous learning process.

Processes

    In processes perspective can be depicted a dual nature too. First of all we have to consider the processes which are used to manage efficiently and effectively the whole BI infrastructure. They are widely discussed in various methodologies like ITIL, whose implementation is thoroughly documented. Secondly, it’s the reflection of departmental processes within the various data perspectives – how they are measured, and how the measurements are further used for continuous improvement.  Considering that this aspect is correlated with an organization’s capability model, I don’t think that many organizations go/rich that far. Sure the trend is to define meaningful KPIs, growth, health and other type of metrics, but the question is – are you using those metrics constructively, are you aligning them with your strategic, tactic and operational goals? I think there is lot of potential in this, though in order to measure processes accordingly is imperative to have also the system designed for this purpose. Back to technological perspective…

Friday, April 06, 2012

BI between Potential, Reality, Quality and Stories

    Have you ever felt that you are investing quite a lot of time, effort, money and other resources into your BI infrastructure, and in the end you don’t meet your expectations? As it seems you’re not the only one. The “Does your business intelligence tell you the whole story” paper released in 2009 by KPMG provides some interesting numbers to support that:
1. “More than 50% of business intelligence projects fail to deliver the expected benefit” (BI projects failure)
2. “Two thirds of executives feel that the quality of and timely access to data is poor and inconsistent” (reports and data quality)
3. “Seven out of ten executives do not get the right information to make business decisions.” (BI value)
4. “Fewer than 10% of organizations have successfully used business intelligence to enhance their organizational and technological infrastructures”  (BI alignment)
5. “those with effective business intelligence outperform the market by more than 5% in terms of return on equity” (competitive advantage)

    The numbers reflect to some degree also my expectations, though they seem more pessimistic than I expected. That’s not a surprise, considering that such studies can be strongly biased, especially because in them are reflected expectations, presumptions and personal views over the state of art within an organization.

    KPMG builds on the above numbers and several other aspects that revolve around the use of governance and alignment in order to increase the value provided by BI to the business, though I feel that they are hardly scratching the surface. Governance and alignment look great into studies and academic work, though they alone can’t bring success, no matter how much their importance and usage is accentuated. Sometimes I feel that people hide behind big words without even grasping the facts. The importance of governance and alignment can’t be neglected, though the argumentation provided by KPMG isn’t flawless. There are statements I can agree with, and many which are circumstantial. Anyway, let’s look a little deeper at the above numbers.

    I suppose there is no surprise concerning the huge rate of BI projects’ failure. The value is somewhat close to the rate of software projects’ failure. Why would make a BI project an exception from a typical software project, considering that they are facing almost the same environments and challenges?  In fact, given the role played by BI in decision making, I would say that BI projects are more sensitive to the various factors than a typical software project.  It doesn’t make sense to retake the motives for which software projects fail, but some particular aspects need to be mentioned. KPMG insists on the poor quality of data, on the relevance and volume of reports and metrics used, the lack of reflecting organization’s objectives, the inflexibility of data models, lack of standardization, all of them reflecting in a degree or other on the success of a BI project. There is much more to it!

    KPMG refers to a holistic approach concentrated on the change of focus from technology to the actual needs, a change of process and funding.  A reflection of the holistic approach is also the view of the BI infrastructure from the point of view of the entire IT infrastructure, of the organization, network of partners and of the end-products – mainly models and reports. Many of the problems BI initiatives are confronted with refer to the quality of data and its many dimensions (duplicates, conformity, consistency, integrity, accuracy, availability, timeliness, etc.) , problems which could be in theory solved in the source systems, mainly through design. Other problems, like dealing with complex infrastructures based on more or less compatible IS or BI tools, might involve virtualization, consolidation or harmonization of such solutions, plus the addition of other tools.

    Looking at the whole organization, other problems appear: the use of reports and models without understanding the whole luggage of meaning hiding behind them, the different views within the same data and models, the difference of language, problems, requirements and objectives, the departmental and organizational politics, the lack of communication, the lack of trust in the existing models and reports, and so on. What all these points have in common are people! The people are the maybe the most important factor in the adoption and effective usage of BI solutions. It starts with them – identifying their needs, and it ends with them – as end users. Making them aware of all contextual requirements, actually making them knowledge workers and not considering them just simple machines could give a boost to your BI strategy.

    Partners doesn’t encompass just software vendors, service providers or consultants, but also the internal organizational structures – teams, departments, sites or any other similar structure. Many problems in BI can be tracked down to partners and the ways a partnership is understood, on how resources are managed, how different goals and strategies are harmonized, on how people collaborate and coordinate. Maybe the most problematic is the partnership between IT and the other departments on one side, and between IT and external partners on the other side. As long IT is not seen as a partner, as long IT is skip from the important decisions or isn’t acting as a mediator between its internal and external partners, there are few chances of succeeding. There are so many aspects and lot of material written on this topic, there are models and methodologies supposed to make things work, but often between theory and practice there is a long distance.

    How many of the people you met were blaming the poor quality of the data without actually doing something to improve anything? If the quality of your data in one of your major problems then why aren’t you doing something to improve that?  Taking the ownership over your data is a major step on the way to better data quality, though a data management strategy is needed. This involve the design of a framework that facilitates data quality and data consumption, the design and use of policies, practices and procedures to properly manage the full data lifecycle. Also this can be considered as part of your BI infrastructure, and given the huge volume, the complexity and diversity of data, is nowadays a must for an organization.

   The “right information” is an evasive construct. In order to get the right information you must be capable to define what you want, to design your infrastructure with that in mind and to learn how to harness your data. You don’t have to look only at your data and information but also at the whole DIKW pyramid. The bottom line is that you don’t have to build only a BI infrastructure but a knowledge management infrastructure, and methodologies like ITIL can help you achieve that, though they are not sufficient. Sooner or later you’ll arrive to blame the whole DIKW pyramid - the difficulty of extracting information from data, knowledge from information, and the ultimate translation into wisdom. Actually that’s also what the third and fourth of the above statements are screaming out loud – it’s not so easy to get information from the silos of data, same as it’s not easy to align the transformation process with organizations’ strategy.

    Also timeliness has a relative meaning. It’s true that nowadays’ business dynamics requires faster access to data, though it requires also to be proactive, many organizations lacking this level of maturity. In order to be proactive it’s necessary to understand your business’ dynamics thoroughly, that being routed primarily in your data, in the tools you are using and the skill set your employees acquired in order to move between the DIKW layers. I would say that the understanding of DIKW is essential in harnessing your BI infrastructure.

    KPMG considers that the 5% increase in return on equity associated with the effective usage of BI is a positive sign, not necessarily. The increase can be associated with hazard or other factors as well, even if it’s unlikely probable to be so. The increase it’s quite small when considered with the huge amount of resources spent on BI infrastructure. I believe that BI can do much more for organizations when harnessed adequately. It’s just a belief that needs to be backed up by numbers, hopefully that will happen someday, soon.

Saturday, February 18, 2012

Programming Reviewed – Part 1: What Programming Is About

    According to Wikipedia, computer programming (shortly programming or coding) is “the process of designing, writing, testing, debugging, and maintaining the source code of computer programs”. That’s an extensive definition, because typically programming refers mainly to the writing of a set of instructions understandable by a computer or any other electronic device. At least that’s what programming was at its beginnings. With time, giving the increasing complexity of software, programming included also activities like gathering requirements, architecting, designing, testing, debugging and troubleshooting, refactoring, documenting, configuring, deploying, performing maintenance, etc. Each of these activities comes with their own set of methods, procedures, processes, models, methodologies, best/good practices, standards and tools. In addition, when we look at the architecture of an application, we can delimit several layers: server vs. client, front-end (user interface), business layer, backend (database), transport (network), communication or hardware – they coming with their own set of technologies and knowledge luggage, and requiring some specialization too.

    However, making abstraction of all these, programming implies the (partial) knowledge of a programming language, an artificial language used to communicate with machines, in terms of language syntax, semantics and built-in libraries, and of a IDE (Integrated Development Environment), an application in which the code is written, compiled/interpreted and debugged. As programming can be often a redundant task, being necessary to solve the same kind of problems or to write the same kind of instructions, in addition to the various structures and techniques made available in order to minimize redundancy, a programmer can take advantage of a huge collection of algorithms, abstracted step-by-step instructions, and afferent technical literature. The deeper needs to go their understanding, the broader the set of knowledge to be acquired for it.

    And even if we consider all above, that’s not enough because programming is used in order to model and solve business-specific problems. So is required some minimal knowledge of the respective business domains, and that’s quite a lot if we consider that each project may address one or more business domains. Talking about projects, as most of the programming is performed within projects, a programmer needs to have some knowledge of the procedures, methods and methodologies for project management and team management. That’s not programming anymore, but it’s part of the landscape and nowadays is kind of a must because programming is performed within projects and teams. This means also that a programmer needs to cover several important interpersonal skills, to which add up customer oriented, social and thinking skills. They are important because they impact directly or indirectly the act of programming, and many ignore this.

    It’s important to stress that programming is not only the knowledge of languages, algorithms, tools, methods, models, practices, methodologies, standards, but also their adequate use in order to make most of the programming experience. Or as a long time ago retrieved quotes puts it: “programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science” (anonymous). We all (or almost all) master our native language to the degree of writing sentences or communicating, though it takes skills to communicate effectively and efficiently, or of making from language a tool of expression through poetry or other forms o literature. Fantasizing a little, programming is like writing poetry, is one thing to write chunks of words, and another thing to write something meaningful. And programming is a lot about interpretation and representation of meaning in order to solve problems, is about understanding and breaking down complexity to a level that can be translated in meaning to machines. 

    Programming is an art, to the same degree each endeavor can be transformed in art. It requires skills, knowledge, dedication, creativity, and most of all the pleasure of programming. Programming is a state of spirit, is a way or model of thinking, of seeing the whole world in computable terms.