06 February 2010

The Data-Driven Enterprise

    I read today ‘The Data-Driven Enterprise’ White Paper from Informatica, quite useful paper, especially when it comes from one of the leaders in integration software and services. In this paper the term data-driven enterprise refers to the organizations that are “able to take advantage of their data assets to work faster, better and smarter” [1], in order to achieve this state of art being necessary to” invest in the people, processes and technology needed to know where the data resides, to understand it, to clean it and keep it clean, and to get it to where it is needed, when and how it is needed” [1]. It seems that the data-driven enterprise, same as data-driven corporation [2], is just an alternative term for the data-driven organization concept already in use since several good years. Following the DIKW pyramid a data-driven organization follow a four stage evolution from data, to information and further to knowledge and wisdom, of importance being especially how knowledge is derived from data and information, the organizations capable of creating, managing and putting knowledge into use being known as knowledge-based organizations. It’s interesting that the paper makes no direct reference to knowledge and information, focusing on data as asset and possible ignoring information respectively knowledge as asset. I think it would help if the concepts from this paper would have been anchored also within these two contexts.

    The paper touches several important aspects related to Data Management, approaching concepts like “value of data”, “data quality”, “data integration”, “business involvement”, “data trust”, “relevant data”, “timely data” “virtualized access”, “compliant reporting”, “Business-IT collaboration”, highlighting the importance of having adequate processes, infrastructure and culture in order to bring more value for the business. I totally agree with the importance of these concepts though I think that there are many other aspects that need to be considered. With such concepts almost all vendors juggle, though what’s often missing is the knowledge/wisdom and method to put philosophies and technologies into use, to redesign an organization’s infrastructure and culture so it could bring the optimum benefit.

    Since the appearance of data warehouses concepts, the efficient integration of the various data islands existing within and outside of an organization become a Holy Grail for IT vendors and organizations, though given the fast pace with which new technologies appear this hunt looks more like a Morgan le Fey in the desert. Informatica builds a strong case for data integration in general and for Informatica 9 in particular, their new infrastructure platform targeting to enable organizations to become data-driven by providing a centralized architecture for enforcing data policy and addressing issues like data timeliness, format, semantics, privacy and quality[3]. On the other side the grounds on which Informatica builds its launching strategy could be contra-argumented considering the grey zone they were placed in.

Quantifying Value of Data

    How many of the organizations could say that they could quantify (easily) the real value of their data when there is no market value they could be benchmarked against? I would say that data have only a potential value that could increase only with its use, once you learned to explore the data, find patterns and new uses for the data, derive knowledge out of it and use it wisely in order to derive profit and a competitive advantage, and it might take years to arrive there. People who witnessed big IT projects like ERP/CRM implementations or data warehousing have seen how their initial expectations were hardly met, how much are they willing to invest in an initiative that could prove its value maybe only years later, especially when there are still many organizations fighting the crisis? How could they create a business case for such a project? How much could they rely on the numbers advanced by vendors and by the nice slogans behind their tools just good for selling a product? Taking a quote from the video presentation of Sohaib Abbasi, Chairman and CEO at Informatica, “70% of all current SOA initiatives will be restarted or simply abandoned (Gartner)” [3], and I would bet that many such projects are targeting to integrate the various systems existing in an organization. Once you had several bad such experiences, how much are you willing to invest in a new one?

    There are costs that can be quantified, like the number of hours employees spent on maintaining the duplicate data, correcting the issues driven by bad data quality, or more general the costs related to waste, and there are costs that can’t be quantified so easily, like the costs associated with bad decisions or lost opportunities driven by missing data or inadequate reflection of reality. There is another aspect, even if organizations reach to quantify such costs, without having some transparency on how they arrived to the respective numbers it felts like somebody just took out some numbers from a magician’s hat. It would be great if the quantification of such costs is somehow standardized, though that’s difficult to do given the fact that each organization approaches Data Management from its own perspective and requirements.

From Data to Meaning

    Reports are used only to aggregate, analyze and navigate data, while it’s in Users attribution to give adequate meaning to the data, and together with the data analyst to find the who, how, when, where, what, why, which and by what means, in a word to understand the factors that impact the business positively/negatively, the correlation between them and how they can be strengthened/mitigated in order to achieve better quality/outcomes.

    People want nice charts and metrics that can give them a birds-eye view of the current state, though the aggregated data could easily hide the reality because of the quality of the data, quality of the reports itself, the degree to which they cover the reality. Part of the data-driven philosophy resume in understanding the data, and reacting to data. I met people who were ignoring the data, preferring to take wild guesses, sometimes they were right, other times they were wrong.

From Functionality to Usability

    There are Users who once they have a tool they want to find all about its capabilities, play with the tool, find other uses and they could even come with nice to have features. There are also Users who don’t want to bother in getting the data by themselves, they just want the data timely and in the format they need them. The fact that Informatica allows Users to analyze the data by themselves it’s quite of a deal, though as I already stressed in a previous post, you can’t expect from a User to become a data expert overnight, there are even developers that have difficulties in handling complex data analysis requirements. The guys from Informatica tried to make simple this aspect in their presentation though it’s not as simple as it seems, especially when dealing with ERP systems like Oracle or SAP that have hundreds of tables, each of them with a bunch of attributes and complex relations, one of the important challenges for developers is to reengineer the logic implemented in such systems. It’s a whole mindset that needs to be developed, there are also best practices that needs to be considered, specific techniques that allow getting the data in the most efficient way.

    Allowing users to decide which logic to apply in their reports could prove to be a two edged sword, organizations risking ending up with multiple versions of the same story. It’s needed to align the various reports, bring users on the same page from the point of view of expectations and constraints. On the other side some Users prefer to prepare the data by themselves because they know the issues existing in the data or because they have more flexibility in making the data to look positive.

Trust, Relevance and Timeliness

    An important part of Informatica’s strategy is based on data trust, relevancy and timeliness, three important but hard to quantify dimensions of Data Quality. Trust is often correlated with Users’ perception over the overall Data Quality, the degree to which the aggregated data presented in reports can be backed up with detailed data to support them, the visibility they have on the business rules and transformations used. If the Users can get a feeling of the data with click-through, drilldown or drill-through reports, if the business rules and transformations are documented, then most probably that data trust won’t be an issue anymore. Data relevancy and data timeliness are heavily requirement-dependent, for some Users being enough to work with one week old data while others need live data. In a greater or less degree, all data used by the business are relevant otherwise I don’t see why maintaining them.

Software Tools as Enablers

    Sometimes being aware that there is a problem and doing something to fix it already brings an amount of value to the business, and this without investing in complex technologies but handling things methodologically and enforcing some management practices – identifying, assessing, addressing, monitoring and controlling issues. I bet this alone could bring a benefit for an organization, and everything starts just by recognizing that there is a problem and doing something to fix the root causes. On the other side software technologies could enable performing the various tasks more efficient and effective, with better quality, less resources, in less time and eventually with lowers costs. Now what’s the value of the saving based on addressing the issue and what’s the value of saving by using a software technology in particular?!

    Software tools like Informatica are just enablers, they don’t guarantee results and don’t eliminate barriers unless people know how to use them and make most of it. For this are needed experts that know the business, the various software tools involved, and good experienced managers to bring such projects on the right track. When the objectives are not met or the final solution doesn’t satisfies all requirements, then people reach to develop alternative solutions, which I categorize as personal solutions – spreadsheets, MS Access applications, an organization ending up with such islands of duplicated data/logic. Often Users need to make use of such solutions in order to understand their data, and this is an area in which Informatica could easily gain adepts.

Business-IT collaboration

    There is no news that the IT/IM and other functional departments don’t function as partners, IT initiatives not being adequately supported by the business, while in many IT technology-related initiatives driven by the business at corporate level the IT department is involved only as executor and has little to say in the decision of using one technology or another, many of such initiatives ignoring aspects specific to IT – usability of such a solution, integration with other solutions, nuances of internal architecture and infrastructure. Of course that phrases like “business struggling in working with IT” appear when IT and the business function as separate entities with a minimum of communication, when the various strategies are not aligned as they are supposed to. If you’re not informing the IT department on the expectations, and vice-versa, each department will reach to address issues as they appear and not proactively, so there will be no wonder when it takes weeks or months until a solution is provided. The responsiveness of IT is strongly correlated with the resources, the existing infrastructure and policies in place. In addition for the IT to do its work the business has to share the necessary business knowledge, how can you expect to address issues when even the business is not able to articulate adequately the requirements – in many cases the business figures out what they want only when a first solution/prototype is provided. It’s an iterative process, and many people ignore this aspect.

    No matter of the slogans and the concepts the vendors juggle with, I’m sorry, but I can’t believe that there is one tool that matches all requirements, that provides a fully integrated solution, that the tool itself is sufficient for eliminating the language and collaboration barriers between the business and IT!

Human Resources & Co.

    Many organizations don’t have in-house the human resources needed for the various projects related to Data Management, therefore bringing consultants or outsourcing parts of the projects. A consultant needs time in order to understand the processes existing in an organization, organization’s particularities. Even if business analysts reach to augment the requirements in solid specifications, it’s difficult to cover all the aspects without having a deep knowledge about the architecture used, same as for consultants it’s difficult to put the pieces of the puzzle together especially when more of the pieces are missing. The consultants expect in general to have all the pieces of the puzzles, while the other sides expect consultants to identify the missing pieces.

    When outsourcing tasks (e.g. data analysis) or data-related infrastructure (e.g. data warehouses, data marts) an organization risks to lose control over what’s happening, the communication issues being reflected in longer cycle times for issues’ resolution, making everything to become a challenge. There are many other issues related to outsourcing that maybe deserve to be addressed in detail.

The Lack of Vision, Policy and Strategy

    An organization needs to have a vision, policy and strategy toward data quality in particular and Data Management in general, in order to plan, enforce and coordinate the overall effort toward quality. Their lack can have unpredictable impact on information systems and reporting infrastructure in particular and on the business as a whole, without it data quality initiatives can have local and narrow scope, without the expected effectiveness, resulting in rework and failure stories. The syntagm “it’s better to prevent than to cure” reliefs the best the philosophy on which Data Management should be centered.

Lack of Ownership

    In the context of the lack of policy and strategy can be put also the lack of ownership, though given its importance it deserves a special attention. The syntagm “each employee is responsible for quality” applies to data quality too, each user and department need to take the ownership over the data they have to maintain, for their own or others’ departments scope, same as they have to take the ownership over the reports that make scope of their work, assure their quality and the afferent documentation, over the explicit and implicit islands of knowledge existing.

References:
[1] Informatica. (2009). The Data-Driven Enterprise. [Online] Available from: http://www.informatica.com/downloads/7060_data_driven_wp_web.pdf (Accessed: 6 February 2010).
[2] Herzler. (2006). Eight Aspects of the Data Driven Corporation – Exploring your Gap to Entitlement. [Online] Available from: http://www.hertzler.com/php/portfolio/white.paper.detail.php?article=31 (Accessed: 6 February 2010).
[3] Informatica. (2009). Informatica 9: Infrastructure Platform for the Data-Driven Enterprise, Speaker: Sohaib Abbasi, Chairman and CEO. [Online] Available from: http://www.informatica.com/9/thelibrary.html#page=page-5 (Accessed: 6 February 2010).

No comments: