30 January 2017

Data Management: Dirty Data (Definitions)

"Data that contain errors or cause problems when accessed and used. Some examples of dirty data are:   Values in data elements that exceed a reasonable range, e.g., an employee with 4299 years of service. Values in data elements that are invalid, e.g., a value of 'X' in a gender field, where the only valid values are 'M' and 'F'. Missing values, e.g., a blank value in a gender field, where the only valid values are 'M' and 'F'.  Incomplete data, e.g., a company has 10 products but data for only 8 products are included." (Margaret Y Chu, "Blissful Data ", 2004)

"Data that contain inaccuracies and/or inconsistencies." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Poor quality data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly." (Craig S Mullins, "Database Administration", 2012)

"Data with inaccuracies and potential errors." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

29 January 2017

Data Management: Data Dictionary (Definitions)

"The system tables that contain descriptions of the database objects and how they are structured." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A set of system tables stored in a catalog. A data dictionary includes definitions of database structures and related information, such as permissions." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Software in which metadata is stored, manipulated and defined – a data dictionary is normally associated with a tool used to support software engineering." (Keith Gordon, "Principles of Data Management", 2007)

"A list of descriptions of data items to help developers stay on the same track." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The place where information about data that exists in the organization is stored. This should include both technical and business details about each data element." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"Data dictionary are mini database management systems that manages metadata. It is a repository of information about a database that documents data elements of a database. The data dictionary is an integral part of the database management systems and stores metadata or information about the database, attribute names and definitions for each table in the database." (Vijay K Pallaw, "Database Management Systems" 2nd Ed., 2010)

"In the days of mainframe computers, this was a listing of record layouts, describing each field in each type of file." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Software coupled with a data store for managing data definitions." (Craig S Mullins, "Database Administration", 2012)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations. They also contain information about which programs use which data and which users are interested in which reports." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A reference by which a team can understand what data assets they have, how those assets were created, what they mean, and where to find them." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A repository of the metadata useful to the corporation" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A comprehensive record of business and technical definitions of the elements within a dataset. Also referred to as a business glossary." (Jonathan Ferrar et al, "The Power of People", 2017)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations." (BAAN)

"A read-only collection of database tables and views containing reference information about the database, its structures, and its users." (Oracle)

"A set of system tables, stored in a catalog, that includes definitions of database structures and related information, such as permissions." (Microsoft Technet)

"A set of tables that keep track of the structure of both the database and the inventory of database objects." (IBM)

"A specialized type of database containing metadata; a repository of information describing the characteristics of data used to design, monitor, document, protect, and control data in information systems and databases; an application system supporting the definition and management of database metadata." (TOGAF)

"Metadata that keeps track of database objects such as tables, indexes, and table columns." (MySQL)

Data Management: Master Data (Definitions)

"Data describing the people, places, and things involved in an organization’s business. Examples include people (e.g., customers, employees, vendors, suppliers), places (e.g., locations, sales territories, offices), and things (e.g., accounts, products, assets, document sets). Master data tend to be grouped into master records, which may include associated reference data." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Master data is the core information for an enterprise, such as information about customers or products, accounts or locations, and the relationships between them. In many companies, this master data is unmanaged and can be found in many, overlapping systems and is often of unknown quality." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"Data that describes the important details of a business subject area such as customer, product, or material across the organization. Master data allows different applications and lines of business to use the same definitions and data regarding the subject area. Master data gives an accurate, 360° degree view of the business subject." (Tony Fisher, "The Data Asset", 2009)

"The set of codes and structures that identify and organize data, such as customer numbers, employee IDs, and general ledger account numbers." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"The data that provides the context for business activity data in the form of common and abstract concepts that relate to the activity. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions, such as customers, products, employees, vendors, and controlled domains (code values)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The critical data of a business, such as customer, product, location, employee, and asset. Master data fall generally into four groupings: people, things, places, and concepts and can be further categorized. For example, within people, there are customer, employee, and salesperson. Within things, there are product, part, store, and asset. Within concepts, there are things like contract, warrantee, and licenses. Finally, within places, there are office locations and geographic divisions." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Data that is key to the operation of a business, such as data about customers, suppliers, partners, products, and materials." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"The data that describes the important details of a business subject area such as customer, product, or material across the organization. Master data allows different applications and lines of business to use the same definitions and data regarding the subject area. Master data gives an accurate, 360-degree view of the business subject." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"Informational objects that represent the core business objects (customers, suppliers, products and so on) and are fundamental to an organization. Master data must be referenced in order to be able to perform transactions. In contrast with transaction or inventory data, master data does not change very often." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The most critical data is called master data and the companioned discipline of master data management, which is about making the master data within the organization accessible, secure, transparent, and trustworthy." (Piethein Strengholt, "Data Management at Scale", 2020)

27 January 2017

Data Management: Data Quality (Just the Quotes)

"[...] it is a function of statistical method to emphasize that precise conclusions cannot be drawn from inadequate data." (Egon S Pearson & H Q Hartley, "Biometrika Tables for Statisticians" Vol. 1, 1914)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods." (John R Riggleman & Ira N Frisbee, "Business Statistics", 1951)

"Data are of high quality if they are fit for their intended use in operations, decision-making, and planning." (Joseph M Juran, 1964)

"There is no substitute for honest, thorough, scientific effort to get correct data (no matter how much it clashes with preconceived ideas). There is no substitute for actually reaching a correct chain of reasoning. Poor data and good reasoning give poor results. Good data and poor reasoning give poor results. Poor data and poor reasoning give rotten results." (Edmund C Berkeley, "Computers and Automation", 1969)

"We have found that some of the hardest errors to detect by traditional methods are unsuspected gaps in the data collection (we usually discovered them serendipitously in the course of graphical checking)." (Peter Huber, "Huge data sets", Compstat '94: Proceedings, 1994)

"Data obtained without any external disturbance or corruption are called clean; noisy data mean that a small random ingredient is added to the clean data." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Unfortunately, just collecting the data in one place and making it easily available isn’t enough. When operational data from transactions is loaded into the data warehouse, it often contains missing or inaccurate data. How good or bad the data is a function of the amount of input checking done in the application that generates the transaction. Unfortunately, many deployed applications are less than stellar when it comes to validating the inputs. To overcome this problem, the operational data must go through a 'cleansing' process, which takes care of missing or out-of-range values. If this cleansing step is not done before the data is loaded into the data warehouse, it will have to be performed repeatedly whenever that data is used in a data mining operation." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage." (Margaret Y Chu, "Blissful Data", 2004)

"Let’s define dirty data as: ‘… data that are incomplete, invalid, or inaccurate’. In other words, dirty data are simply data that are wrong. […] Incomplete or inaccurate data can result in bad decisions being made. Thus, dirty data are the opposite of blissful data. Problems caused by dirty data are significant; be wary of their pitfalls."  (Margaret Y Chu, "Blissful Data", 2004)

"Processes must be implemented to prevent bad data from entering the system as well as propagating to other systems. That is, dirty data must be intercepted at its source. The operational systems are often the source of informational data; thus dirty data must be fixed at the operational data level. Implementing the right processes to cleanse data is, however, not easy." (Margaret Y Chu, "Blissful Data", 2004)

"Our culture, obsessed with numbers, has given us the idea that what we can measure is more important than what we can't measure. Think about that for a minute. It means that we make quantity more important than quality." (Donella Meadows, "Thinking in Systems: A Primer", 2008)

"Many new data scientists tend to rush past it to get their data into a minimally acceptable state, only to discover that the data has major quality issues after they apply their (potentially computationally intensive) algorithm and get a nonsense answer as output. (Sandy Ryza, "Advanced Analytics with Spark: Patterns for Learning from Data at Scale", 2009)

"Access to more information isn’t enough - the information needs to be correct, timely, and presented in a manner that enables the reader to learn from it. The current network is full of inaccurate, misleading, and biased information that often crowds out the valid information. People have not learned that 'popular' or 'available' information is not necessarily valid." (Gene Spafford, 2010)

"Accuracy and coherence are related concepts pertaining to data quality. Accuracy refers to the comprehensiveness or extent of missing data, performance of error edits, and other quality assurance strategies. Coherence is the degree to which data - item value and meaning are consistent over time and are comparable to similar variables from other routinely used data sources." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"How good the data quality is can be looked at both subjectively and objectively. The subjective component is based on the experience and needs of the stakeholders and can differ by who is being asked to judge it. For example, the data managers may see the data quality as excellent, but consumers may disagree. One way to assess it is to construct a survey for stakeholders and ask them about their perception of the data via a questionnaire. The other component of data quality is objective. Measuring the percentage of missing data elements, the degree of consistency between records, how quickly data can be retrieved on request, and the percentage of incorrect matches on identifiers (same identifier, different social security number, gender, date of birth) are some examples." (Aileen Rothbard, "Quality Issues in the Use of Administrative Data Records", 2015)

"When we find data quality issues due to valid data during data exploration, we should note these issues in a data quality plan for potential handling later in the project. The most common issues in this regard are missing values and outliers, which are both examples of noise in the data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Errors using inadequate data are much less than those using no data at all." (Charles Babbage)

26 January 2017

Data Management: Data Governance (Definitions)

"The infrastructure, resources, and processes involved in managing data as a corporate asset." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"A process focused on managing the quality, consistency, usability, security, and availability of information." (Alex Berson & Lawrence Dubov, "Master Data Management and Customer Data Integration for a Global Enterprise", 2007)

"The practice of organizing and implementing policies, procedures, and standards for the effective use of an organization's structured or unstructured information assets." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"The process for addressing how data enters the organization, who is accountable for it, and how - using people, processes, and technologies - data achieves a quality standard that allows for complete transparency within an organization." (Tony Fisher, "The Data Asset", 2009)

"A framework of processes aimed at defining and managing the quality, consistency, usability, security, and availability of information with the primary focus on cross-functional, cross-departmental, and/or cross-divisional concerns of information management." (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

"The policies and processes that continually work to improve and ensure the availability, accessibility, quality, consistency, auditability, and security of data in a company or institution." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"The exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data governance is the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of data and information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of data and information in enabling an organization to achieve its goals." (Oracle, "Enterprise Information Management: Best Practices in Data Governance", 2011)

"Processes and controls at the data level; a newer, hybrid quality control discipline that includes elements of data quality, data management, information governance policy development, business process improvement, and compliance and risk management."(Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"The process for addressing how data enters the organization, who is accountable for it, and how that data achieves the organization's quality standards that allow for complete transparency within an organization." (Jim Davis & Aiman Zeid, "Business Transformation", 2014) 

"A company-wide framework that determines which decisions must be made and who should make them. This includes the definition of roles, responsibilities, obligations and rights in handling the company’s resource data. In this, data governance pursues the goal of maximizing the value of the data in the company. While data governance determines how decisions should be made, data management makes the actual decisions and implements them." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The discipline of applying controls to data in order to ensure its integrity over time." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"Data governance refers to the overall management of the availability, usability, integrity and security of the data employed in an enterprise. Sound data governance programs include a governing body or council, a defined set of procedures and a standard operating procedure." (Dennis C Guster, "Scalable Data Warehouse Architecture: A Higher Education Case Study", 2018)

"It is a combination of people, processes and technology that drives high-quality, high-value information. The technology portion of data governance combines data quality, data integration and master data management to ensure that data, processes, and people can be trusted and accountable, and that accurate information flows through the enterprise driving business efficiency." (Richard T Herschel, "Business Intelligence", 2019)

"The processes and technical infrastructure that an organization has in place to ensure data privacy, security, availability, usability, and integrity." (Lili Aunimo et al, "Big Data Governance in Agile and Data-Driven Software Development: A Market Entry Case in the Educational Game Industry", 2019)

"The management of data throughout its entire lifecycle in the company to ensure high data quality. Data Governance uses guidelines to determine which standards are applied in the company and which areas of responsibility should handle the tasks required to achieve high data quality." (Mohammad K Daradkeh, "Enterprise Data Lake Management in Business Intelligence and Analytics: Challenges and Research Gaps in Analytics Practices and Integration", 2021)

"A set of processes that ensures that data assets are formally managed throughout the enterprise. A data governance model establishes authority and management and decision making parameters related to the data produced or managed by the enterprise." (NSA/CSS)

"The management of the availability, usability, integrity and security of the data stored within an enterprise." (Solutions Review)

"The process of defining the rules that data has to follow within an organization." (Talend)

Data governance 2.0: "An agile approach to data governance focused on just enough controls for managing risk, which enables broader and more insightful use of data required by the evolving needs of an expanding business ecosystem." (Forrester)

"Data governance encompasses the strategies and technologies used to ensure data is in compliance with regulations and organization policies with respect to data usage." (Adobe)

"Data governance encompasses the strategies and technologies used to make sure business data stays in compliance with regulations and corporate policies." (Informatica) [source]

"Data Governance includes the people, processes and technologies needed to manage and protect the company’s data assets in order to guarantee generally understandable, correct, complete, trustworthy, secure and discoverable corporate data." (BI Survey) [source]

"Data governance is a control that ensures that data entry by a business user or an automated process meets business standards. It manages a variety of things including availability, usability, accuracy, integrity, consistency, completeness, and security of data usage. Through data governance, organizations are able to exercise positive control over the processes and methods to handle data." (Logi Analytics) [source]

"Data governance is a structure put in place allowing organisations to proactively manage data quality." (experian) [source]

"Data governance is an organization's internal policy framework that determines the way people make data management decisions. All aspects of data management must be carried out in accordance with the organization's governance policies." (Xplenty) [source]

"Data Governance is the exercise of decision-making and authority for data-related matters." (The Data Governance Institute)

"Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods." (The Data Governance Institute)

"Data governance is the practice of organizing and implementing policies, procedures and standards for the effective use of an organization's structured/unstructured information assets." (Information Management)

"Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics." (Gartner)

"The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets. It refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures." (CODATA)

20 January 2017

Data Management: Data Element (Definitions)

"An atomic unit of data; in most cases, a field." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"(1) an attribute of an entity; (2) a uniquely named and well-defined category of data that consists of data items and that is included in a record of an activity." (William H Inmon, "Building the Data Warehouse", 2005)

"The most atomic, pure, and simple fact that either describes or identifies an entity. This is also known as an attribute. It can be deployed as a column in a table in a physical structure." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"The smallest unit of data that is named. The values are stored in a column or a field in a database." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

[data attribute:] "1.An inherent fact, property, or characteristic describing an entity or object; the logical representation of a physical field or relational table column. A given attribute has the same format, interpretation, and domain for all occurrences of an entity. Attributes may contain adjective values (red, round, active, etc.). 2.A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of characteristics. 3.A representation of a data characteristic variation in the logical or physical data model. A data attribute may or may not be atomic." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A single unit of data." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A primitive item of data; one that has a value within the context of study and is not further decomposed." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"A unit of data (fact) that can be uniquely defined and used. Example: last name is a data element that can be defined as the family name of an individual and is distinct from other name-related elements." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"A basic unit of information that has a unique meaning and subcategories (data items) of distinct value. Examples of data elements include gender, race, and geographic location." (CNSSI 4009-2015)

Data Management: Data Literacy (Definitions)

"Understanding what data mean, including how to read charts appropriately, draw correct conclusions from data and recognize when data are being used in misleading or inappropriate ways." (Jake R Carlson et al., "Determining Data Information Literacy Needs: A Study of Students and Research Faculty", 2011) [source]

"Data literacy is the ability to collect, manage, evaluate, and apply data, in a critical manner." (Chantel Ridsdale et al, "Strategies and Best Practices for Data Literacy Education", [knowledge synthesis report] 2016) [source]

"The data-literate individual understands, explains, and documents the utility and limitations of data by becoming a critical consumer of data, controlling his/her personal data trail, finding meaning in data, and taking action based on data. The data-literate individual can identify, collect, evaluate, analyze, interpret, present, and protect data." (IBM, Building "Global Interest in Data Literacy: A Dialogue", [workshop report] 2016) [source]

"The ability to recognize, evaluate, work with, communicate, and apply data in the context of business priorities and outcomes." (Forrester)

"Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word." (Techtarget) [source]

"Data literacy is the ability to read, work with, analyze and communicate with data, building the skills to ask the right questions of data and machines to make decisions and communicate meaning to others. "(Qlik) [source]

"Data literacy is the ability to read, write and communicate data in context, with an understanding of the data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case application and resulting business value or outcome." (Gartner)

"Data literacy is the ability to read, work with, analyze and communicate with data. It’s a skill that empowers all levels of workers to ask the right questions of data and machines, build knowledge, make decisions, and communicate meaning to others." (Sumo Logic) [source]

"Data literacy is the skill set of reading, communicating, and deriving meaningful information from data. Collecting the data is only the first step. The real value comes from being able to put the information in context and tell a story." (Sisense) [source]

18 January 2017

Data Management: Business Rules (Definitions)

"A statement expressing a policy or condition that governs business actions and establishes data integrity guidelines." (Larry P English, "Improving Data Warehouse and Business Information Quality", 1999)

"An organizational standard operating procedure that requires that certain policies be followed to ensure that a business is run correctly. Business rules ensure that the database maintains its accuracy with business policies."  (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

"[…] a business rule is a compact statement about an aspect of a business. The rule can be expressed in terms that can be directly related to the business, using simple, unambiguous language that's accessible to all interested parties: business owner, business analyst, technical architect, and so on." (Tony Morgan, "Business Rules and Information Systems", 2002) 

"the set of conditions that govern a business event so that it occurs in a way that is acceptable to the business." (Barbara von Halle, 2002)

"The logical rules that are used to run a business." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A set of methods or guidelines associated with a company’s data and business processing that reflect its methods of conducting business operations." (Jill Dyché & Evan Levy, "Customer Data Integration" , 2006)

"A statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business." (Alex Berson & Lawrence Dubov, "Master Data Management and Customer Data Integration for a Global Enterprise", 2007)

"Business-specific rule that constrains the data." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The defined operations and constraints that help organizations create a data environment that promotes efficient operations and decision making. An example of a business rule for a hospital would be that no male patient can be marked pregnant. Organizations typically have thousands of business rules, but not all facets of the same organizations follow all of them, and, in some cases, the rules can conflict." (Tony Fisher, "The Data Asset", 2009)

"Either a set of conditions, a directive, or an 'element of guidance'. A constraint on a business’s behavior. There is not yet an industry standard definition of business rule although authors seem to be converging." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A directive, intended to govern, guide or influence business behavior, in support of a business policy that has been formulated in response to an opportunity, threat, strength or weakness." (The Business Rules Group, "The Business Motivation Model: Business Governance in a Volatile World", 2005)

"An element of guidance that introduces an obligation or necessity, [and] that is under business jurisdiction" (Business Rules Team, 'Semantics of Business Vocabulary and Business Rules", 2005)

"The logical rules that are used to run a business" (Microsoft)

16 January 2017

Data Management: Data Flow (Definitions)

"The sequence in which data transfer, use, and transformation are performed during the execution of a computer program."  (IEEE," IEEE Standard Glossary of Software Engineering Terminology", 1990)

"A component of a SQL Server Integration Services package that controls the flow of data within the package." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"Activities of a business process may exchange data during the execution of the process. The data flow graph of the process connects activities that exchange data and - in some notations - may also represent which input/output parameters of the activities are involved." (Cesare Pautasso, "Compiling Business Process Models into Executable Code", 2009)

"Data dependency and data movement between process steps to ensure that required data is available to a process step at execution time." (Christoph Bussler, "B2B and EAI with Business Process Management", 2009)

[logical data flow:] "A data flow diagram that describes the flow of information in an enterprise without regard to any mechanisms that might be required to support that flow." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

[physical data flow:] "A data flow diagram that identifies and represents data flows and processes in terms of the mechanisms currently used to carry them out." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The fact that data, in the form of a virtual entity class, can be sent from a party, position, external entity, or system process to a party, position, external entity, or system process." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction [Beizer]." (International Qualifications Board for Business Analysis, "Standard glossary of terms used in Software Engineering", 2011)

"Data flow refers to the movement of data from one purpose to another; also the movement of data through a set of systems, or through a set of transformations within one system; it is a nontechnical description of how data is processed. See also Data Chain." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The movement of data through a group of connected elements that extract, transform, and load data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A path that carries packets of information of known composition; a roadway for data. Every data flow’s composition is recorded in the data dictionary." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"the path, in information systems or otherwise, through which data move during the active phase of a study." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The lifecycle movement and storage of data assets along business process networks, including creation and collection from external sources, movement within and between internal business units, and departure through disposal, archiving, or as products or other outputs." (Kevin J Sweeney, "Re-Imagining Data Governance", 2018)

"A graphical model that defines activities that extract data from flat files or relational tables, transform the data, and load it into a data warehouse, data mart, or staging table." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"An abstract representation of the sequence and possible changes of the state of data objects, where the state of an object is any of: creation, usage, or destruction." (Software Quality Assurance)

Data Management: Data Quality Management (Definitions)

[Total Data Quality Management:] "An approach that manages data proactively as the outcome of a process, a valuable asset rather than the traditional view of data as an incidental by-product." (Karolyn Kerr, "Improving Data Quality in Health Care", 2009)

"The application of total quality management concepts and practices to improve data and information quality, including setting data quality policies and guidelines, data quality measurement (including data quality auditing and certification), data quality analysis, data cleansing and correction, data quality process improvement, and data quality education." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data Quality Management (DQM) is about employing processes, methods, and technologies to ensure the quality of the data meets specific business requirements." (Mark Allen & Dalton Cervo, "Strategy, Scope, and Approach" [in "Multi-Domain Master Data Management"], 2015)

"DQM is the management of company data in a manner aware of quality. It is a sub-function of data management and analyzes, improves and assures the quality of data in the company. DQM includes all activities, procedures and systems to achieve the data quality required by the business strategy. Among other things, DQM transfers approaches for the management of quality for physical goods to immaterial goods like data." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Data quality management (DQM) is a set of practices aimed at improving and maintaining the quality of data across a company’s business units." (altexsoft) [source]

"Data quality management is a set of practices that aim at maintaining a high quality of information. DQM goes all the way from the acquisition of data and the implementation of advanced data processes, to an effective distribution of data. It also requires a managerial oversight of the information you have." (Data Pine) [source]

"Data quality management is a setup process, which is aimed at achieving and maintaining high data quality. Its main stages involve the definition of data quality thresholds and rules, data quality assessment, data quality issues resolution, data monitoring and control." (ScienceSoft) [source]

"Data quality management is the act of ensuring suitable data quality." (Xplenty) [source]

"Data quality management provides a context-specific process for improving the fitness of data that’s used for analysis and decision making. The goal is to create insights into the health of that data using various processes and technologies on increasingly bigger and more complex data sets." (SAS) [source]

"Data quality management (DQM) refers to a business principle that requires a combination of the right people, processes and technologies all with the common goal of improving the measures of data quality that matter most to an enterprise organization." (BMC) [source]

"Put most simply, data quality management is the process of reviewing and updating your customer data to minimize inaccuracies and eliminate redundancies, such as duplicate customer records and duplicate mailings to the same address." (EDQ) [source]

12 January 2017

Data Management: Reference Data (Definitions)

"Reference data is focused on defining and distributing collections of common values to support accurate and efficient processing of operational and analytical activities." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"Sets of values or classification schemas referred to by systems, applications, data stores, processes, and reports, as well as by transactional and master records. Examples include lists of valid values, code lists, status codes, flags, product types, charts of accounts, product hierarchy." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Data that describe the infrastructure of an enterprise. These comprise the 'type' entity classes that provide lists of values for other attributes." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Data characterized by shared read operations and infrequent changes. Examples of reference data include flight schedules and product catalogs. Windows Server AppFabric offers the local cache feature for storing this type of data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Corporate data that has been defined externally and is uniformly changed across company boundaries, such as country codes, currency codes and geo-data." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Reference data is commonly used to link and give additional details to the data. It is the data used to classify, organize, or categorize other data. Reference data can also contain value hierarchies, for example, the relationships between product and geographic hierarchies. It is escorted by the discipline Reference Data Management, which makes sure the reference data is consistent and that different versions are managed and distributed properly." (Piethein Strengholt, "Data Management at Scale", 2020)

Data Management: Data Lifecycle (Definitions)

"The data life cycle is the set of processes a dataset goes through from its origin through its use(s) to its retirement. Data that moves through multiple systems and multiple uses has a complex life cycle." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The recognition that as data ages, that data takes on different characteristics" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The development of a record in the company’s IT systems from its creation until its deletion. This process may also be designated as “CRUD”, an acronym for the Create, Read/Retrieve, Update and Delete database operations." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The series of stages that data moves though from initiation, to creation, to destruction. Example: the data life cycle of customer data has four distinct phases and lasts approximately eight years." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

10 January 2017

Data Management: Metadata (Definitions)

"Data about data. That is, information about the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column, information about the structure of data, or information that specifies the design of objects such as cubes or dimensions. Metadata is an important aspect of SQL Server, Data Transformation Services, and OLAP Services." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"'Data about data' - for example, all information in the data dictionary." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"Any data maintained to support the operations or use of a data warehouse, similar to an encyclopedia for the data warehouse. Nearly all data staging and access tools require some private meta data in the form of specifications or status. There are few coherent standards for meta data viewed in a broader sense. Distinguished from the primary data in the dimension and fact tables." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit 2nd Ed ", 2002)

"Data (or information) about data. In the CLR, metadata is used to describe assemblies and types. It is stored with them in the executable files, and is used by compilers, tools, and the runtime system to provide a wide range of services. Metadata is essential for runtime type information and dynamic method invocation. Many architectures/systems use metadata - for example, type libraries in COM provide metadata." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"Generally described as data about data. It is the data, beyond the data, describing the context in which the data resides." (William A Giovinazzo, "Internet-Enabled Business Intelligence", 2002)

"Information inside an assembly that describes its types. Metadata is required by .NET compilers for binding, required by the CLR for many of its services, and used by object browsers and IntelliSense to provide a rich programming experience. Metadata is the .NET version of COM type information (as found in a type library), but much more expressive." (Adam Nathan, ".NET and COM: The Complete Interoperability Guide", 2002)

"Information about the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"(1) data about data; (2) the description of the structure, content, keys, indexes, and so forth, of data." (William H Inmon, "Building the Data Warehouse", 2005)

"Information about the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column. It can also be information about the structure of data or information that specifies the design of objects, such as cubes or dimensions." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"Literally, data about data. Metadata includes data associated with either an information system or an information object for description, administration, legal requirements, technical functionality, use, and preservation. Business metadata includes business names and unambiguous definitions of the data including examples and business rules for the data. Technical metadata is information about column width, data types, and other technical information that would be useful to a programmer or database administrator (DBA)." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"Data which provides context or otherwise describes information in order to make it more valuable (e.g., more easily retrievable or maintainable); data about data." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"Information about how data is stored and structured as well as what the data means." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"Information about the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column. Metadata can also be information about the structure of data or information that specifies the design of objects, such as cubes or dimensions." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"Metadata usually refers to definitions and business rules that have been agreed on and stored in a centralized repository so that the business users - even those across departments and systems - use common terminology for key business terms. Metadata can include information about data’s currency, ownership, source system, derivation (e.g., profit = revenues minus costs), or usage rules. " (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"The tables and the fields defining the structure of the data; the data about the data." (Gavin Powell, "Beginning Database Design", 2006)

"(1) Data about data; (2) the description of the structure, content, keys, and indexes of data." (William H Inmon & Anthony Nesavich, "Tapping into Unstructured Data", 2007)

"Data about data is meta data. In other words, metadata is the data about the structure of the data in a database." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"All the information that defines and describes the structures, operations, and contents of the DW/BI system. We identify three types of metadata in the DW/BI system: technical, business, and process." (Ralph Kimball, "The Data Warehouse Lifecycle Toolkit", 2008) 

"Data about data that label, describe, or characterize other data, and make it easier to retrieve, interpret, or use information. Major types include technical, business, and audit trail metadata. (See the definitions for the individual types.)" (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Data about the database such as table names, column names, column data types, column lengths, keys, and indexes. Some relational databases allow you to query tables that contain the database's metadata." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"In general terms, we will use the term metadata for descriptive information that is useful for people or systems to understand something. Common examples include a database catalog or an XML schema, both of which describe the structure of data." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"Data about the organization’s data, found in every data source throughout the enterprise. Metadata describes the information in these data resources. Metadata can be technical, describing the physical characteristics of the data, or it can be business-oriented, describing the way the data represents the needs of the business." (Tony Fisher, "The Data Asset", 2009)

"May be regarded as a subset of data, and are data about data. Metadata summarise data content, context, structure, inter-relationships, and provenance (information on history and origins). They add relevance and purpose to data, and enable the identification of similar data in different data collections." (Mark Olive, "SHARE: A European Healthgrid Roadmap", 2009)

"The definitions, mappings, and other characteristics used to describe how to find, access, and use the company’s data and software components." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"The information describing the properties, such as the type of data in a column (numeric, text, and so on), the length of a column, the structure of database objects, such as tables, measures, dimensions, and cubes, and so on." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"Data about data and data processes. Metadata is important because it aids in clarifying and finding the actual data." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"Data about data and data processes. Metadata is important because it aids in clarifying and finding the actual data." (David Lyle & John G Schmidt, "Lean Integration: An Integration Factory Approach to Business Agility", 2010)

"Data about data, that is, data concerning data characteristics and relationships." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Literally, ‘data about data’; data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Way of describing data so that it can be used by a wide variety of applications." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Data about the data; a description or definition of the rows, columns, and/or links in a data set." (Gary Miner et al, "Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications", 2012) 

"Information about the properties or structure of data that is not part of the values the data contains." (SQL Server 2012 Glossary, "Microsoft", 2012)

"Metadata is usually defined as 'data about data', but it would be better defined as explicit knowledge, documented to enable a common understanding of an organization’s data, including what the data is intended to represent (definition of terms and business rules), how it effects this representation (conventions of representation, data definition, system design, system processes), the limits of that representation (what it does not represent), what happens to it as it moves through processes and systems (provenance, lineage, information chain and information life cycle), how data is used and can be used, and how it should not be used." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The simplest definition of metadata is 'data about data'. To be a bit more precise, metadata describes data, providing information such as type, length, textual description, and other characteristics." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures 2nd Ed", 2012)

"Information stored within an assembly concerning the classes defined in that assembly (such as names and types of fields, method signatures, dependence on other classes, and so on)." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition" 2nd Ed., 2013)

"The definitions, mappings, and other characteristics used to describe how to find, access, and use the company’s data and software components." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"This term can mean a number of things depending on the context in which it is used. It can denote how a set of information is structured, such as the ISBN values assigned to books, the format of the UPC barcodes, and the Library of Congress classifications used in catalog books. It can also be a keyword assigned to a set of data to make it more easily searched for. For example, the list of keywords at the beginning of this book or the definition for hashtag used in online text message exchanges." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"Data about data, or detailed information describing context, content, and structure of records and their management through time." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"Descriptive data about data that is stored and managed in a database, in order to facilitate access to captured and archived data for further use." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"The classic definition of metadata is 'data about the data'." (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The definitions, mappings, and other characteristics used to describe how to find, access, and use the company’s data and software components." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"Data about data, such as definitions, lists of values and access rights." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Data holding the description of other data. Meta means 'an underlying description'. Misnomer Term that suggests a wrong meaning or inappropriate name." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"Artifacts of events and objects and contextual information that helps us understand the structure and meaning of data or facts. Example: the definitions of our data elements are metadata we store in the business glossary." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"Metadata is often defined as 'data about data', a definition that is nearly as ubiquitous as it is unhelpful. A more content-full definition of metadata is that it is structured description for information resources of any kind." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

"Data associated with other data that describes some important characteristics of the data to which it is bound. For example, the file length and file type associated with a file are metadata." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"Data that describes the characteristics of data; descriptive data." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Metadata describes the data itself. The term metadata is often used in relation to digital media, but in today’s world it plays a vital role in the overall data strategy and architectural design. Obviously metadata is companioned with the discipline metadata data management." (Piethein Strengholt, "Data Management at Scale", 2020)

"A repository whose data associates the tables and columns of a data warehouse with user-defined attributes and facts to enable the mapping of the business view, terms, and needs to the underlying database structure. Metadata can reside on the same server as the data warehouse or on a different database server. It can even be held in a different RDBMS." (Microstrategy)

"A set of data that gives information about other data." (Insight Software)

"descriptive data about data that is stored and managed in a database, in order to facilitate access to captured and archived data for further use." (SAS)

"Information about the properties or structure of data that is not part of the values the data contains." (Microsoft)

"Information describing the characteristics of data including, for example, structural metadata describing data structures (e.g., data format, syntax, and semantics) and descriptive metadata describing data contents (e.g., information security labels)." (NIST SP 800-53)

"Metadata describes other data within a database and is responsible for organization while a business or organization sifts through data sets." (Solutions Review)

"Metadata is data that summarizes information about other data." (Logi Analytics)

"Metadata is information that describes various facets of an information asset to improve its usability throughout its life cycle. It is metadata that turns information into an asset. Generally speaking, the more valuable the information asset, the more critical it is to manage the metadata about it, because it is the metadata definition that provides the understanding that unlocks the value of data." (Gartner)

"Refers to 'data about data', such as: means of creation of the data, purpose of the data, time and date of creation, author of the data, location of the data, and standards used when created." (Board International)


03 January 2017

Data Management: Transactional Data (Definitions)

"Data about the day-to-day dynamic activities of a company, such as invoices." (Gavin Powell, "Beginning Database Design", 2006)

"Data that describe an internal or external event or transaction that takes place as an organization conducts its business. Examples include sales orders, invoices, purchase orders, shipping documents, passport applications, credit card payments, and insurance claims. Transactional data are typically grouped into transactional records, which include associated master and reference data." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The set of records of individual business activities or events." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"Data related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and non-monetary interactions." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A type of data that gathers information about contracts, deliveries, invoices, payments and so forth and exhibits a high frequency of change. Transaction data provide a key to the activities of the core business objects." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Information stored from a time-based instance, like a bank deposit or phone call." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Master data and reference data with associated time dimension." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)


02 January 2017

Lessons Learned: Documentation

Introduction


“Documentation is a love letter that you write to your future self.”
Damian Conway

    For programmers as well for other professionals who write code, documentation might seem a waste of time, an effort few are willing to make. On the other side documenting important facts can save time sometimes and provide a useful base for building own and others’ knowledge. I found sometimes on the hard way what I needed to document. With the hope that others will benefit from my experience, here are my lessons learned:

 

Lesson #1: Document your worked tasks


“The more transparent the writing, the more visible the poetry.”
Gabriel Garcia Marquez


   Personally I like to keep a list with what I worked on a daily basis – typically nothing more than 3-5 words description about the task I worked on, who requested it, and eventually the corresponding project, CR or ticket. I’m doing it because it makes easier to track my work over time, especially when I have to retrieve some piece of information that is somewhere else in detail documented.

   Within the same list one can track also the effective time worked on a task, though I find it sometimes difficult, especially when one works on several tasks simultaneously. In theory this can be used to estimate further similar work. One can use also a categorization distinguishing for example between the various types of work: design, development, maintenance, testing, etc. This approach offers finer granularity, especially in estimations, though more work is needed in tracking the time accurately. Therefore track the information that worth tracking, as long there is value in it.

   Documenting tasks offers not only easier retrieval and base for accurate estimations, but also visibility into my work, for me as well, if necessary, for others. In addition it can be a useful sensemaking tool (into my work) over time.

Lesson #2: Document your code


“Always code as if the guy who ends up maintaining your code will be
a violent psychopath who knows where you live.”
Damian Conway

    There are split opinions over the need to document the code. There are people who advise against it, and probably one of most frequent reasons is rooted in Agile methodology. I have to stress that Agile values “working software over comprehensive documentation”, fact that doesn’t imply the total absence of documentation. There are also other reasons frequently advanced, like “there’s no need to document something that’s already self-explanatory “(like good code should be), “no time for it”, etc. Probably in each statement there is some grain of truth, especially when considering the fact that in software engineering there are so many requirements for documentation (see e.g. ISO/IEC 26513:2009).

   Without diving too deep in the subject, document what worth documenting, however this need to be regarded from a broader perspective, as might be other people who need to review, modify and manage your code.

    Documenting code doesn’t resume only to the code being part of a “deliverables”, but also to intermediary code written for testing or other activities. Personally I find it useful to save within the same fill all the scripts developed within same day. When some piece of code has a “definitive” character then I save it individually for reuse or faster retrieval, typically with a meaningful name that facilitates file’s retrieval. With the code it helps maybe to provide also some metadata like: a short description and purpose (who and when requested it).

   Code versioning can be used as a tool in facilitating the process, though not everything worth versioning.

 

Lesson #3: Document all issues as well the steps used for troubleshooting and fixing


“It’s not an adventure until something goes wrong.”
Yvon Chouinard

   Independently of the types of errors occurring while developing or troubleshooting code, one of the common characteristics is that the errors can have a recurring character. Therefore I found it useful to document all the errors I got in terms of screenshots, ways to fix them (including workarounds) and, sometimes also the steps followed in order to troubleshoot the problem.

   Considering that the issues are rooted in programming fallacies or undocumented issues, there is almost always something to learn from own as well from others’ errors. In fact, that was the reasons why I started the “SQL Troubles” blog – as a way to document some of the issues I met, to provide others some help, and why not, to get some feedback.

 

Lesson #4: Document software installations and changes in configurations


   At least for me this lesson is rooted in the fact that years back quite often release candidate as well final software was not that easy to install, having to deal with various installation errors rooted in OS or components incompatibilities, invalid/not set permissions, or unexpected presumptions made by the vendor (e.g. default settings). Over the years installation became smoother, though such issues are still occurring. Documenting the installation in terms of screenshots with the setup settings allows repeating the steps later. It can also provide a base for further troubleshooting when the configuration within the software changed or as evidence when something goes wrong.


   Talking about changes occurring in the environment, not often I found myself troubleshooting something that stopped working, following to discover that something changed in the environment. It’s useful to document the changes occurring in an environment, importance stressed also in “Configuration Management” section of ITIL® (Information Technology Infrastructure Library).

 

Lesson #5: Document your processes


“Verba volant, scripta manent.” Latin proverb
"Spoken words fly away, written words remain."

    In process-oriented organizations one has the expectation that the processes are documented. One can find that it’s not always the case, some organization relying on the common or individual knowledge about the various processes. Or it might happen that the processes aren’t always documented to the level of detail needed. What one can do is to document the processes from his perspective, to the level of detail needed.

 

Lesson #6: Document your presumptions


“Presumption first blinds a man, then sets him a running.”
Benjamin Franklin

   Probably this is more a Project Management related topic, though I find it useful also when coding: define upfront your presumptions/expectations – where should libraries lie, the type and format of content, files’ structure, output, and so on. Even if a piece of software is expected to be a black-box with input and outputs, at least the input, output and expectations about the environment need to be specified upfront.

 

Lesson #7: Document your learning sources


“Intelligence is not the ability to store information, but to know where to find it.”
Albert Einstein

    Computer specialists are heavily dependent on internet to keep up with the advances in the field, best practices, methodologies, techniques, myths, and other knowledge. Even if one learns something, over time the degree of retention varies, and it can decrease significantly if it wasn’t used for a long time. Nowadays with a quick search on internet one can find (almost) everything, though the content available varies in quality and coverage, and it might be difficult to find the same piece of information. Therefore, independently of the type of source used for learning, I found it useful to document also the information sources.

 

Lesson #8: Document the known as well the unknown

 

“A genius without a roadmap will get lost in any country but an average person
with a roadmap will find their way to any destination.”
Brian Tracy

   Over the years I found it useful to map and structure the learned content for further review, sometimes considering only key information about the subject like definitions, applicability, limitations, or best practices, while other times I provided also a level of depth that allow me and others to memorize and understand the topic. As part of the process I attempted to keep the  copyright attributions, just in case I need to refer to the source later. Together with what I learned I considered also the subjects that I still have to learn and review for further understanding. This provides a good way to map what I known as well what isn’t know. One can use for this a rich text editor or knowledge mapping tools like mind mapping or concept mapping.


Conclusion


    Documentation doesn’t resume only to pieces of code or software but also to knowledge one acquires, its sources, what it takes to troubleshoot the various types of issues, and the work performed on a daily basis. Documenting all these areas of focus should be done based on the principle: “document everything that worth documenting”.

Data Management: Information (Definitions)

"Information is data that increases the knowledge of the person who consumes it. Information is distinguished from data in that data may or may not be meaningful whereas information is always meaningful. For example, the numeric portion of an address is data, but it is not information." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"Usable, processed data, typically output from a computer program." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"Data that has been processed in such a way that it can increase the knowledge of the person who receives it." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"data that human beings assimilate and evaluate to solve a problem or make a decision." (William H Inmon, "Building the Data Warehouse", 2005)

"Information is data with context. It can be externally validated and is independent of applications." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Information can be defined as all inputs that people process to gain understanding." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"Sets of data presented in a context. Information about a business and its environment." (Steve Williams & Nancy Williams, "The Profit Impact of Business Intelligence", 2007)

"1.Generally, understanding concerning any objects such as facts, events, things, processes, or ideas, including concepts that, within a certain context and timeframe, have a particular meaning." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data that have been organized so they have meaning and value to the recipient." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Refers to all or part of a raw data item, which, on examination, turns out to be of interest. Such interest can be justified by means of explicit criteria. Also denotes an observation conducted in the field." (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

"The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management 9th Ed", 2011)

"Data with additional context in the form of metadata, including definition and relationships between data and possibly other information. Data in context with metadata makes information." (Craig S Mullins, "Database Administration", 2012)

"In the context of this book, a collection of descriptors derived from observation, measurement, calculation, inference, or imagination in a form that can be shared with or communicated to others, or both. The format can be tangible or intangible or some combination of both." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"An organised and formatted collection of data" (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

"Any communication on or representation of facts or data in all forms (textual, graphical, audiovisual, digital)." (Gilbert Raymond & Philippe Desfray, "Modeling Enterprise Architecture with TOGAF", 2014)

"Data that has been organized or processed in a useful manner" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"One view understands information to be content- and purpose- specific knowledge, which is exchanged during human communication. Another takes the view of a purely informational processing perspective, according to which data is the building blocks for information. Accordingly, data is processed into information." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"Data that has been processed to create meaning. Information is intended to expand the knowledge of the person who receives it. Information is the output of decision support systems and information systems." (Ciara Heavin & Daniel J Power, "Decision Support, Analytics, and Business Intelligence 3rd Ed.", 2017)

"Organized or structured data, processed for a specific purpose to make it meaningful, valuable, and useful in specific contexts." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide)", 2017)

"A structured collection of data presented in a form that people can understand and process. Information is converted into knowledge when it is contextualised with the rest of a person’s knowledge and world model." (Open Data Handbook)

Related Posts Plugin for WordPress, Blogger...