11 February 2017

⛏️Data Management: Data Collection (Definitions)

"The gathering of information through focus groups, interviews, surveys, and research as required to develop a strategic plan." (Teri Lund & Susan Barksdale, "10 Steps to Successful Strategic Planning", 2006)

"The process of gathering raw or primary specific data from a single source or from multiple sources." (Adrian Stoica et al, "Field Evaluation of Collaborative Mobile Applications", 2008) 

"A combination of human activities and computer processes that get data from sources into files. It gets the file data using empirical methods such as questionnaire, interview, observation, or experiment." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"A systematic process of gathering and measuring information about the phenomena of interest." (Kaisa Malinen et al, "Mobile Diary Methods in Studying Daily Family Life", 2015)

"The process of capturing events in a computer system. The result of a data collection operation is a log record. The term logging is often used as a synonym for data collection." (Ulf Larson et al, "Guidance for Selecting Data Collection Mechanisms for Intrusion Detection", 2015)

"This refers to the various approaches used to collect information." (Ken Sylvester, "Negotiating in the Leadership Zone", 2015)

"Set of techniques that allow gathering and measuring information on certain variables of interest." (Sara Eloy et al, "Digital Technologies in Architecture and Engineering: Exploring an Engaged Interaction within Curricula", 2016)

"with respect to research, data collection is the recording of data for the purposes of a study. Data collection for a study may or may not be the original recording of the data." (Meredith Zozus, "The Data Book: Collection and Management of Research Data", 2017)

"The process of retrieving data from different sources and storing them in a unique location for further use." (Deborah Agostino et al, "Social Media Data Into Performance Measurement Systems: Methodologies, Opportunities, and Risks", 2018)

"It is the process of gathering data from a variety of relevant sources in an established systematic fashion for analysis purposes." (Yassine Maleh et al, 'Strategic IT Governance and Performance Frameworks in Large Organizations", 2019)

"A process of storing and managing data." (Neha Garg & Kamlesh Sharma, "Machine Learning in Text Analysis", 2020)

"The process and techniques for collecting the information for a research project." (Tiffany J Cresswell-Yeager & Raymond J Bandlow, "Transformation of the Dissertation: From an End-of-Program Destination to a Program-Embedded Process", 2020)

"The method of collecting and evaluating data on selected variables, which helps in analyzing and answering relevant questions is known as data collection." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc." (Open Data Handbook)

⛏️Data Management: Data Mapping (Definitions)

"The process of identifying correspondence between source data elements and target data elements when migrating data." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The process of noting the relationship of a data element to something or somebody." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"(1) The process of associating one data element, field, or idea with another data element, field, or idea. (2) In source-to-target mapping, the process of determining (and the resulting documentation of) where the data in a source data store will be moved to another (target) data store." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The assignment of source data entities and attributes to target data entities and attributes, and the resolution of disparate data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data mapping is the process of creating data element mappings between two distinct data models. This activity is considered to be part of data integration." (Piethein Strengholt, "Data Management at Scale", 2020)

"The process defining a link between two disparate data models. It is often the first step towards data integration." (MuleSoft)

"The process of assigning a source data element to a target data element." (Information Management)

"Data mapping is the process of creating data element mappings between two different data models and is used as a first step for a wide array of data integration tasks, including data transformation between a data source and a destination." (Solutions Review)

"Data mapping is the process of defining a link between two disparate data models in the aim of future data integration." (kloudless)

"Data mapping is the process of mapping source data fields to destination related target fields." (Adobe)

09 February 2017

⛏️Data Management: Data Discovery (Definitions)

"Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined." (Robert Hawker, "Practical Data Quality", 2023)

"The process of analyzing the type, quality, accessibility, and location of data in all available data repositories. It's critical for determining the current state of a data environment, especially when a recent and accurate data dictionary doesn't exist." (Forrester)

"Data Discovery describes a range of techniques designed to collect and consolidate information before an alysing it to find relationships and outliers between entities (or data items) that may exist. This process may be done on data from the same database or across multiple, disparate databases. (experian) [source]

"Data discovery involves the collection and evaluation of data from various sources and is often used to understand trends and patterns in the data." (Tibco) [source]

Data discovery is not a tool. It is a business user oriented process for detecting patterns and outliers by visually navigating data or applying guided advanced analytics. Discovery is an iterative process that does not require extensive upfront model creation. (BI Survey) [source]

"Data discovery is the process of using a range of technologies that allow users to quickly clean, combine, and analyze complex data sets and get the information they need to make smarter decisions and impactful discoveries." (Qlik) [source]

"The process of analyzing the type, quality, accessibility, and location of data in all available data repositories. It's critical for determining the current state of a data environment, especially when a recent and accurate data dictionary doesn't exist." (Forrester)

06 February 2017

⛏️Data Management: Data Validation (Definitions)

"Evaluating and checking the accuracy, consistency, timeliness, and security of information, for example by evaluating the believability or reputation of its source." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"The process of ensuring accurate data based on data acceptance and exception handling rules." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"The process of ensuring that the values of data conform to specified formats and/or values." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"(1) To confirm the validity of data. (2) A feature of data cleansing tools." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The act of determining that data is sound. In security, generally used in the context of validating input." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"Determining and confirming that something satisfies or conforms to defined rules, business rules, integrity constraints, defined standards, etc. The system cannot perform any validating unless it first has a definition of the way things should be validity The degree to which data conforms to domain values and defined business rules." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"This involves demonstrating that the conclusions that come from data analyses fulfill their intended purpose and are consistent." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"The act of testing a model with data that was not used in the model-fitting process." (Meta S Brown, "Data Mining For Dummies", 2014)

[data integrity validation:] "Data integrity validation allows you to verify the integrity of the data that was secured by data protection operations." (CommVault, "Documentation 11.20", 2018)

05 February 2017

⛏️Data Management: Data Stewardship (Definitions)

"Data stewardship is the function that is largely responsible for managing data as an enterprise asset. The data steward is responsible for ensuring that the data provided by the Corporate Information Factory is based on an enterprise view. An individual, a committee, or both may perform data stewardship." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"Necessary for resolving conflicts in peer-to-peer replication, data stewardship involves assigning an owner to data that resides on multiple servers and creating rules for how the data should be updated." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"An approach to data governance that formalizes accountability for managing information resources on behalf of others and in the best interests of the organization." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Deals with the ownership and accountability of data, and how people manage the data to the benefit of the organization. Data stewardship functions at two levels - Business Data Stewards deal with the higher-level metadata and governance concerns, while Operational Data Stewards focus primarily on the instances of master data in the enterprise." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"This is known as a role assigned to a person that is responsible for defining and executing data governance policies." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"1.The formal, specifically assigned, and entrusted accountability for business (non-technical) responsibilities ensuring effective control and use of data and information resources. 2.The formal accountability for business responsibilities ensuring effective control and use of data assets. (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2011)

"Responsibility and accountability for the actions taken upon a defined set of data, including the definition of the consumers of the data. A data steward is not necessarily the data owner." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)


04 February 2017

💠🛠️SQL Server: Administration (Killing Sessions - Killing ‘em Softly and other Snake Stories)

Introduction

There are many posts on the web advising succinctly how to resolve a blocking situation by terminating a session via kill command, though few of them warn about its use and several important aspects that need to be considered. The command is powerful and, using an old adagio, “with power comes great responsibility”, responsibility not felt when reading between the lines. The easiness with people treat the topic can be seen in questions like “is it possibly to automate terminating sessions?” or in explicit recommendations of terminating the sessions when dealing with blockings.

A session is created when a client connects to a RDBMS (Relational Database Management System) like SQL Server, being nothing but an internal logical representation of the connection. It is used further on to perform work against the database(s) via (batches of) SQL statements. Along its lifetime, a session is uniquely identified by an SPID (Server Process ID) and addresses one SQL statement at a time. Therefore, when a problem with a session occurs, it can be traced back to a query, where the actual troubleshooting needs to be performed.

Even if each session has a defined scope and memory space, and cannot interact with other sessions, sessions can block each other when attempting to use the same data resources. Thus, a blocking occurs when one session holds a lock on a specific resource and a second session attempts to acquire a conflicting lock type on the same resource. In other words, the first session blocks the second session from acquiring a resource. It’s like a drive-in to a fast-food in which autos must line up into a queue to place an order. The second auto can’t place an order until the first don’t have the order – is blocked from placing an order. The third auto must wait for the second, and so on. Similarly, sessions wait in line for a resource, fact that leads to a blocking chain, with a head (the head/lead blocking) and a tail (the sessions that follow). It’s a FIFO (first in, first out) queue and using a little imagination one can compare it metaphorically with a snake. Even if imperfect, the metaphor is appropriate for highlighting some important aspects that can be summed up as follows:

  • Snakes have their roles in the ecosystem
  • Not all snakes are dangerous
  • Grab the snake by its head
  • Killing ‘em Softly
  • Search for a snake’s nest
  • Snakes can kill you in sleep
  • Snake taming

Warning: snakes as well blockings need to be handled by a specialist, so don’t do it by yourself unless you know what are you doing!

Snakes have their roles in the ecosystem

Snakes as middle-order predators have an important role in natural ecosystems, as they feed on prey species, whose numbers would increase exponentially if not kept under control. Fortunately, natural ecosystems have such mechanism that tend to auto-regulate themselves. Artificially built ecosystems need as well such auto-regulation mechanisms. As a series of dynamical mechanisms and components that work together toward a purpose, SQL Server is an (artificial) ecosystem that tends to auto-regulate itself. When its environment is adequately sized to handle the volume of information or data it must process then the system will behave smoothly. As soon it starts processing more data than it can handle, it starts misbehaving to the degree that one of its resources gets exhausted.

Just because a blocking occurs doesn’t mean that is a bad thing and needs to be terminated. Temporary blockings occur all the time, as unavoidable characteristic of any RDBMS with lock-based concurrency like SQL Server. They are however easier to observe in systems with heavy workload and concurrent access. The more users in the system touch the same data, the higher the chances for a block to occur. A good design database and application architecture typically minimize blockings’ occurrence and duration, making them almost unobservable. At the opposite extreme poor database design combined with poor application design can make from blockings a DBA’s nightmare. Persistent blockings can be a sign of poor database or application design or a sign that one of the environment’s limits was reached. It’s a sign that something must be done. Restarting the SQL server, terminating sessions or adding more resources have only a temporary effect. The opportunity lies typically in addressing poor database and application design issues, though this can be costlier with time.

Not all snakes are dangerous

A snake’s size is the easiest characteristic on identifying whether a snake is dangerous or not. Big snakes inspire fear for any mortal. Similarly, “big” blockings (blockings consuming an important percentage of the available resources) are dangerous and they have the potential of bringing the whole server down, eating its memory resources slowly until its life comes to a stop. It can be a slow as well a fast death.

Independently of their size, poisonous snakes are a danger for any living creature. By studying snakes’ characteristics like pupils’ shape and skin color patterns the folk devised simple general rules (with local applicability) for identifying whether snakes are poisonous or not. Thus, snakes with diamond-shaped pupils or having color patterns in which red touches yellow are likely/believed to be poisonous. By observing the behavior of blockings and learning about SQL Server’s internals one can with time understand the impact of each blocking on server’s performance.

Grab the snake by its head

Restraining a snake’s head assures that the snake is not able to bite, though it can be dangerous, as the snake might believe is dealing with a predator that is trying to hurt it, and reach accordingly. On the other side troubleshooting blockings must start with the head, the blocking session, as it’s the one which created the blocking problem in the first place.

In SQL Server sp_who and its alternative sp_who2 provide a list of all sessions, with their status, SPID and a reference with the SPID of the session blocking it. It displays thus all the blocking pairs. When one deals with a few blockings one can easily see whether the sessions form a blocking chain. Especially in environments under heavy load one can deal with a handful of blockings that make it difficult to identify all the formed blocking chains. Identifying blocking chains is necessary because by identifying and terminating directly the head blocking will often make the whole blocking chain disappear. The other sessions in the chain will perform thus their work undisturbed.

Going and terminating each blocking session in pairs as displayed in sp_who is not recommended as one terminates more sessions than needed, fact that could have unexpected repercussions. As a rule, one should restore system’s health by making minimal damage.

In many cases terminating the head session will make the blocking chain disperse, however there are cases in which the head session is replaced by other session (e.g. when the sessions involve the same or similar queries). One will need to repeat the needed steps until all blocking chain dissolve.

Killing ‘em Softly 

Killing a snake, no matter how blamable the act, it is sometimes necessary. Therefore, it should be used as ultimate approach, when there is no other alternative and when needed to save one’s or others’ life. Similarly killing a session should be done only in extremis, when necessary. For example, when server’s performance has deprecated considerably affecting other users, or when the session is hanging indefinitely.


Kill command is powerful, having the power of a hammer. The problem is that when you have a hammer, every session looks like a nail. Despite all the aplomb one has when using a tool like a hammer, one needs to be careful in dealing with blockings. A blocking not addressed correspondingly can kick back, and in special cases the bite can be deadly, for system as well for one’s job. Killing the beast is the easiest approach. Kill one beast and another one will take its territory. It’s one of the laws of nature applicable also to database environments. The difference is that if one doesn’t addresses the primary cause that lead to a blocking, the same type of snake more likely will appear repeatedly.


Unfortunately, the kill command is no bulletproof for terminating a session, it may only severe the snake. As the documentation warns, there can be cases in which the method won’t have any effect on the blocking, the blocking continuing to room around. So, might be a good idea to check whether the session disappeared and keep an eye on it until it totally disappeared. Especially when dealing with a blocking chain it can happen that the head session is replaced by another session, which probably was waiting for the same resources as the previous head session. It may happen that one deals with two or more blocking chains independent from each other. Such cases appear seldom but are possible.


Killing the head session with a blocking without gathering some data provides less opportunities for learning, for understanding what’s happening in your system, of identifying what caused the blocking to occur. Therefore, before jumping to kill a session, collect the data you need for further troubleshooting.

Search for a snake’s nest 

With the warning that unless one deals with small snakes, might not be advisable in searching for a snake’s nest, the idea behind this heuristic is that with a snake’s occurrence more likely there is also a nest not far away, where several other snakes might hide. Similarly, a query that causes permanent blockings might be the indication for code that generates a range of misbehaving queries. It can be same code or different pieces of code. One can attempt to improve the performance of a query that leads to blockings by adding more resources on the server or by optimizing SQL Server’s internals, though one can’t compensate for poor programming. When possible, one needs to tackle the problem at the source, otherwise performance improvements are only temporary.

Snakes can kill you in sleep 

When wondering into the wild as well when having snakes as pets one must take all measures to assure that nobody’s health is endangered. Same principle should apply to databases as well, and the first line of defense resides in actively monitoring the blockings and addressing them timely as they occur. Being too confident that nothing happens and no taking the necessary precautions can prove to be a bad strategy when a problem occurs. In some situations, the damage might be acceptable in comparison with the effort and costs needed to build the monitoring infrastructure, though for critical systems it can come with important costs.

Snakes’ Taming 

Having snakes as pets doesn’t seem like a good idea, and there are so many reasons why one shouldn’t do it (see PETA’s reasons)! On the other side, there are also people with uncommon hobbies, that not only limit themselves at having a snake pet, but try to tame them, to have them behave like pets. There are people who breed snakes to harness their venom for various purposes, occupation that requires handling snakes closely. There are also people who brought their relation with snakes at level of art, since ancient Egypt snake charming being a tradition in countries from Southeast Asia, Middle East, and North Africa. Even if not all snakes are tameable, snake’s taming and charming is possible. In the process the tamer must deprogram or control snakes’ behavior, following a specific methodology in a safe environment.

No matter how much one tries to avoid persistent blockings, one can learn from troubleshooting blockings, about their sources, behavior as well about own limitations. One complex blocking can be a good example with which one can test his knowledge about SQL Server internals as well about applications’ architecture. Each blocking provides a scenario in which one can learn something.

When fighting with a blocking, it’s wise to do it within a safe environment, typically a test or development environment. Fighting with it in a production environment can cause unnecessary stress and damage. So, if you don’t have a safe environment in which to carry the fight, then build one and try to keep the same essential characteristics as in production environment!

There will be also situations in which one must fight with a blocking in the production environment. Then, be careful in not damaging the data as well the environment, and take all the needed precautions!


Conclusion

The comparison between snakes and blockings might not be perfect, though hopefully it will imprint in reader’s mind the dangers of handling blockings inappropriately and increase the awareness in what concerns related topics.

⛏️Data Management: Data Matching (Definitions)

[deterministic matching:] "Deterministic matching algorithms compare and match records according to hard-coded business rules according to their precision. For instance, a rule can be set up that stipulates that every “Bill” be matched with a 'William'." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

[probabilistic matching:] "Uses statistical algorithms to deduce the best match between two records. Probabilistic matching usually tracks statistical confidence that two records refer to the same customer." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"A feature of data cleansing tools or the process that matches, or links, associated records through a user-defined or common algorithm." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Data-matching involves bringing together data from disparate services or sources, comparing it, and eliminating duplicate data. There are two types of algorithms that are used in data matching: (1) deterministic algorithms, which strictly use match criteria and weighting to determine the results, and (2) probabilistic algorithms, which use statistical models to adjust the matching based on the frequency of values found in the data." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"A highly specialized set of technologies that allows users to derive a high-confidence value of the party identification that can be used to construct a total view of a party from multiple party records." (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

[deterministic matching:] "A type of matching that relies on defined patterns and rules for assigning weights and scores for determining similarity." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

[probabilistic matching:]"A type of matching that relies on statistical analysis of a sample data set to project results on the full data set." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

[fuzzy matching:] "A technique of decomposing words into component parts and comparing the parts to find an acceptable level of correspondence." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Matching is a technique for statistical control of confounding. In the simplest form, individuals from the two study groups are paired on the basis of similar values of one or more covariates. Matching can be viewed as a special case of stratification in which each stratum consists of only two individuals." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"The process of comparing rows in data sets to determine which rows describe the same thing and are therefore either complimentary or redundant." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

01 February 2017

⛏️Data Management: Data Strategy (Definitions)

"A business plan for leveraging an enterprise’s data assets to maximum advantage." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[enterprise data strategy:] "A data strategy supporting the entire enterprise." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[data management strategy:] "Selected courses of actions setting the direction for data management within the enterprise, including vision, mission, goals, principles, policies, and projects." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A data strategy is a plan for maintaining and improving the quality, integrity, security, and access of the enterprise data. It typically includes business plans that describe how the data will be used to support the enterprise business strategy and goals." (Dewey E Ray, "Valuing Data: An Open Framework", 2018)

"A data strategy is not an algorithm, buzzword, IT project, technology or application, collection of data in storage, department or team, or project or tactic. A data strategy is a set of organization-wide objectives leading to highly efficient processes that turn data resources into outcomes that help the organization fulfill its mission." (Harvinder Atwal, "Practical DataOps: Delivering Agile Data Science at Scale", 2019)

"A data strategy is a plan designed to improve all the ways you acquire, store, manage, share, and use data." (Evan Levy, "TDWI Data Strategy Assessment Guide", 2021)

"A data strategy is a central, integrated concept that articulates how data will enable and inspire business strategy." (MIT CISR)

"A data strategy is a common reference of methods, services, architectures, usage patterns and procedures for acquiring, integrating, storing, securing, managing, monitoring, analyzing, consuming and operationalizing data." (DXC.Technology) [source]

"A data strategy is a highly dynamic process employed to support the acquisition, organization, analysis, and delivery of data in support of business objectives." (Gartner)

⛏️Data Management: Data Management [DM] (Definitions)

"The day-to-day tasks necessary to tactically manage data, including overseeing its quality, lineage, usage, and deployment across systems, organizations, and user communities." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"A corporate service which helps with the provision of information services by controlling or coordinating the definitions and usage of reliable and relevant data." (Keith Gordon, "Principles of Data Management", 2007)

"The policies, procedures, and technologies that dictate the granular management of data in an organization. This includes supervising the quality of data and ensuring it is used and deployed properly." (Tony Fisher, "The Data Asset", 2009)

"Structured approach for capturing, storing, processing, integrating, distributing, securing, and archiving data effectively throughout their life cycle." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"The business function that develops and executes plans, policies, practices, and projects that acquire, control, protect, deliver, and enhance the value of data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of managing data as a resource that is valuable to an organization or business, including the process of developing data architectures, developing practices and procedures for dealing with data, and then executing these aspects on a regular basis." (Jim Davis & Aiman Zeid, "Business Transformation", 2014)

"Processes by which data across multiple platforms is integrated, cleansed, migrated, and managed." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"The full lifecycle care of organizational data assets, through the implementation of accepted good practice, to develop and maintain their value." (Kevin J Sweeney, "Re-Imagining Data Governance", 2018)

"Controlling, protecting, and facilitating access to data in order to provide information consumers with timely access to the data they need. The functions provided by a database management system." (Information Management)

"The development and execution of architectures, policies and practices to manage the data life-cycle needs of an enterprise." (Solutions Review)

"The policies, procedures, and technical choices used to handle data through its entire lifecycle from data collection to storage, preservation and use. A data management policy should take account of the needs of data quality, availability, data protection, data preservation, etc." (Open Data Handbook) 

"The processes, procedures, policies, technologies, and architecture that manage data from definition to destruction, which includes transformation, governance, quality, security, and availability throughout its life cycle." (Forrester)

"The process by which data is acquired, validated, stored, protected, and processed. In turn, its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users. Data management properly oversees the full data lifecycle needs of an enterprise." (Insight Software)

"Data management comprises all the disciplines related to ingesting, organizing, and maintaining data as a valuable resource." (OmiSci) [source]

"Data management (DM) consists of the practices, architectural techniques, and tools for achieving consistent access to and delivery of data across the spectrum of data subject areas and data structure types in the enterprise, to meet the data consumption requirements of all applications and business processes." (Gartner)

"Data management consists of practices and tools used to ingest, store, organize, and maintain the data created and gathered by an organization in order to deliver reliable and timely data to users." (Qlik) [source]

"Data management is a strategy used by organizations to make data secure, efficient, and available for any relevant business purposes." (Xplenty) [source]

"Data management is the implementation of policies and procedures that put organizations in control of their business data regardless of where it resides. […] Data management is concerned with the end-to-end lifecycle of data, from creation to retirement, and the controlled progression of data to and from each stage within its lifecycle." (Informatica) [source]

"The function of controlling the acquisition, analysis, storage, retrieval, and distribution of data." (IEEE 610.5-1990)


30 January 2017

⛏️Data Management: Dirty Data (Definitions)

"Data that contain errors or cause problems when accessed and used. Some examples of dirty data are:   Values in data elements that exceed a reasonable range, e.g., an employee with 4299 years of service. Values in data elements that are invalid, e.g., a value of 'X' in a gender field, where the only valid values are 'M' and 'F'. Missing values, e.g., a blank value in a gender field, where the only valid values are 'M' and 'F'.  Incomplete data, e.g., a company has 10 products but data for only 8 products are included." (Margaret Y Chu, "Blissful Data ", 2004)

"Data that contain inaccuracies and/or inconsistencies." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Poor quality data." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Data that is incorrect, out-of-date, redundant, incomplete, or formatted incorrectly." (Craig S Mullins, "Database Administration", 2012)

"Data with inaccuracies and potential errors." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

29 January 2017

⛏️Data Management: Data Dictionary (Definitions)

"The system tables that contain descriptions of the database objects and how they are structured." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A set of system tables stored in a catalog. A data dictionary includes definitions of database structures and related information, such as permissions." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Software in which metadata is stored, manipulated and defined – a data dictionary is normally associated with a tool used to support software engineering." (Keith Gordon, "Principles of Data Management", 2007)

"A list of descriptions of data items to help developers stay on the same track." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The place where information about data that exists in the organization is stored. This should include both technical and business details about each data element." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"Data dictionary are mini database management systems that manages metadata. It is a repository of information about a database that documents data elements of a database. The data dictionary is an integral part of the database management systems and stores metadata or information about the database, attribute names and definitions for each table in the database." (Vijay K Pallaw, "Database Management Systems" 2nd Ed., 2010)

"In the days of mainframe computers, this was a listing of record layouts, describing each field in each type of file." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Software coupled with a data store for managing data definitions." (Craig S Mullins, "Database Administration", 2012)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations. They also contain information about which programs use which data and which users are interested in which reports." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A reference by which a team can understand what data assets they have, how those assets were created, what they mean, and where to find them." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A repository of the metadata useful to the corporation" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A comprehensive record of business and technical definitions of the elements within a dataset. Also referred to as a business glossary." (Jonathan Ferrar et al, "The Power of People", 2017)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations." (BAAN)

"A read-only collection of database tables and views containing reference information about the database, its structures, and its users." (Oracle)

"A set of system tables, stored in a catalog, that includes definitions of database structures and related information, such as permissions." (Microsoft Technet)

"A set of tables that keep track of the structure of both the database and the inventory of database objects." (IBM)

"A specialized type of database containing metadata; a repository of information describing the characteristics of data used to design, monitor, document, protect, and control data in information systems and databases; an application system supporting the definition and management of database metadata." (TOGAF)

"Metadata that keeps track of database objects such as tables, indexes, and table columns." (MySQL)

⛏️Data Management: Master Data (Definitions)

"Data describing the people, places, and things involved in an organization’s business. Examples include people (e.g., customers, employees, vendors, suppliers), places (e.g., locations, sales territories, offices), and things (e.g., accounts, products, assets, document sets). Master data tend to be grouped into master records, which may include associated reference data." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Master data is the core information for an enterprise, such as information about customers or products, accounts or locations, and the relationships between them. In many companies, this master data is unmanaged and can be found in many, overlapping systems and is often of unknown quality." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"Data that describes the important details of a business subject area such as customer, product, or material across the organization. Master data allows different applications and lines of business to use the same definitions and data regarding the subject area. Master data gives an accurate, 360° degree view of the business subject." (Tony Fisher, "The Data Asset", 2009)

"The set of codes and structures that identify and organize data, such as customer numbers, employee IDs, and general ledger account numbers." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"The data that provides the context for business activity data in the form of common and abstract concepts that relate to the activity. It includes the details (definitions and identifiers) of internal and external objects involved in business transactions, such as customers, products, employees, vendors, and controlled domains (code values)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The critical data of a business, such as customer, product, location, employee, and asset. Master data fall generally into four groupings: people, things, places, and concepts and can be further categorized. For example, within people, there are customer, employee, and salesperson. Within things, there are product, part, store, and asset. Within concepts, there are things like contract, warrantee, and licenses. Finally, within places, there are office locations and geographic divisions." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Data that is key to the operation of a business, such as data about customers, suppliers, partners, products, and materials." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"The data that describes the important details of a business subject area such as customer, product, or material across the organization. Master data allows different applications and lines of business to use the same definitions and data regarding the subject area. Master data gives an accurate, 360-degree view of the business subject." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"Informational objects that represent the core business objects (customers, suppliers, products and so on) and are fundamental to an organization. Master data must be referenced in order to be able to perform transactions. In contrast with transaction or inventory data, master data does not change very often." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The most critical data is called master data and the companioned discipline of master data management, which is about making the master data within the organization accessible, secure, transparent, and trustworthy." (Piethein Strengholt, "Data Management at Scale", 2020)

26 January 2017

⛏️Data Management: Data Governance (Definitions)

"The infrastructure, resources, and processes involved in managing data as a corporate asset." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"A process focused on managing the quality, consistency, usability, security, and availability of information." (Alex Berson & Lawrence Dubov, "Master Data Management and Customer Data Integration for a Global Enterprise", 2007)

"The practice of organizing and implementing policies, procedures, and standards for the effective use of an organization's structured or unstructured information assets." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"The process for addressing how data enters the organization, who is accountable for it, and how - using people, processes, and technologies - data achieves a quality standard that allows for complete transparency within an organization." (Tony Fisher, "The Data Asset", 2009)

"A framework of processes aimed at defining and managing the quality, consistency, usability, security, and availability of information with the primary focus on cross-functional, cross-departmental, and/or cross-divisional concerns of information management." (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

"The policies and processes that continually work to improve and ensure the availability, accessibility, quality, consistency, auditability, and security of data in a company or institution." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"The exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data governance is the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of data and information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of data and information in enabling an organization to achieve its goals." (Oracle, "Enterprise Information Management: Best Practices in Data Governance", 2011)

"Processes and controls at the data level; a newer, hybrid quality control discipline that includes elements of data quality, data management, information governance policy development, business process improvement, and compliance and risk management."(Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"The process for addressing how data enters the organization, who is accountable for it, and how that data achieves the organization's quality standards that allow for complete transparency within an organization." (Jim Davis & Aiman Zeid, "Business Transformation", 2014) 

"A company-wide framework that determines which decisions must be made and who should make them. This includes the definition of roles, responsibilities, obligations and rights in handling the company’s resource data. In this, data governance pursues the goal of maximizing the value of the data in the company. While data governance determines how decisions should be made, data management makes the actual decisions and implements them." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The discipline of applying controls to data in order to ensure its integrity over time." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"Data governance refers to the overall management of the availability, usability, integrity and security of the data employed in an enterprise. Sound data governance programs include a governing body or council, a defined set of procedures and a standard operating procedure." (Dennis C Guster, "Scalable Data Warehouse Architecture: A Higher Education Case Study", 2018)

"It is a combination of people, processes and technology that drives high-quality, high-value information. The technology portion of data governance combines data quality, data integration and master data management to ensure that data, processes, and people can be trusted and accountable, and that accurate information flows through the enterprise driving business efficiency." (Richard T Herschel, "Business Intelligence", 2019)

"The processes and technical infrastructure that an organization has in place to ensure data privacy, security, availability, usability, and integrity." (Lili Aunimo et al, "Big Data Governance in Agile and Data-Driven Software Development: A Market Entry Case in the Educational Game Industry", 2019)

"The management of data throughout its entire lifecycle in the company to ensure high data quality. Data Governance uses guidelines to determine which standards are applied in the company and which areas of responsibility should handle the tasks required to achieve high data quality." (Mohammad K Daradkeh, "Enterprise Data Lake Management in Business Intelligence and Analytics: Challenges and Research Gaps in Analytics Practices and Integration", 2021)

"A set of processes that ensures that data assets are formally managed throughout the enterprise. A data governance model establishes authority and management and decision making parameters related to the data produced or managed by the enterprise." (NSA/CSS)

"The management of the availability, usability, integrity and security of the data stored within an enterprise." (Solutions Review)

"The process of defining the rules that data has to follow within an organization." (Talend)

Data governance 2.0: "An agile approach to data governance focused on just enough controls for managing risk, which enables broader and more insightful use of data required by the evolving needs of an expanding business ecosystem." (Forrester)

"Data governance encompasses the strategies and technologies used to ensure data is in compliance with regulations and organization policies with respect to data usage." (Adobe)

"Data governance encompasses the strategies and technologies used to make sure business data stays in compliance with regulations and corporate policies." (Informatica) [source]

"Data Governance includes the people, processes and technologies needed to manage and protect the company’s data assets in order to guarantee generally understandable, correct, complete, trustworthy, secure and discoverable corporate data." (BI Survey) [source]

"Data governance is a control that ensures that data entry by a business user or an automated process meets business standards. It manages a variety of things including availability, usability, accuracy, integrity, consistency, completeness, and security of data usage. Through data governance, organizations are able to exercise positive control over the processes and methods to handle data." (Logi Analytics) [source]

"Data governance is a structure put in place allowing organisations to proactively manage data quality." (experian) [source]

"Data governance is an organization's internal policy framework that determines the way people make data management decisions. All aspects of data management must be carried out in accordance with the organization's governance policies." (Xplenty) [source]

"Data Governance is the exercise of decision-making and authority for data-related matters." (The Data Governance Institute)

"Data Governance is a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods." (The Data Governance Institute)

"Data governance is the practice of organizing and implementing policies, procedures and standards for the effective use of an organization's structured/unstructured information assets." (Information Management)

"Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics." (Gartner)

"The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets. It refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures, and a plan to execute those procedures." (CODATA)

20 January 2017

⛏️Data Management: Data Element (Definitions)

"An atomic unit of data; in most cases, a field." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"(1) an attribute of an entity; (2) a uniquely named and well-defined category of data that consists of data items and that is included in a record of an activity." (William H Inmon, "Building the Data Warehouse", 2005)

"The most atomic, pure, and simple fact that either describes or identifies an entity. This is also known as an attribute. It can be deployed as a column in a table in a physical structure." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"The smallest unit of data that is named. The values are stored in a column or a field in a database." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

[data attribute:] "1.An inherent fact, property, or characteristic describing an entity or object; the logical representation of a physical field or relational table column. A given attribute has the same format, interpretation, and domain for all occurrences of an entity. Attributes may contain adjective values (red, round, active, etc.). 2.A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of characteristics. 3.A representation of a data characteristic variation in the logical or physical data model. A data attribute may or may not be atomic." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A single unit of data." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A primitive item of data; one that has a value within the context of study and is not further decomposed." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"A unit of data (fact) that can be uniquely defined and used. Example: last name is a data element that can be defined as the family name of an individual and is distinct from other name-related elements." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"A basic unit of information that has a unique meaning and subcategories (data items) of distinct value. Examples of data elements include gender, race, and geographic location." (CNSSI 4009-2015)

⛏️Data Management: Data Literacy (Definitions)

"Understanding what data mean, including how to read charts appropriately, draw correct conclusions from data and recognize when data are being used in misleading or inappropriate ways." (Jake R Carlson et al., "Determining Data Information Literacy Needs: A Study of Students and Research Faculty", 2011) [source]

"Data literacy is the ability to collect, manage, evaluate, and apply data, in a critical manner." (Chantel Ridsdale et al, "Strategies and Best Practices for Data Literacy Education", [knowledge synthesis report] 2016) [source]

"The data-literate individual understands, explains, and documents the utility and limitations of data by becoming a critical consumer of data, controlling his/her personal data trail, finding meaning in data, and taking action based on data. The data-literate individual can identify, collect, evaluate, analyze, interpret, present, and protect data." (IBM, Building "Global Interest in Data Literacy: A Dialogue", [workshop report] 2016) [source]

"the ability to understand the principles behind learning from data, carry out basic data analyses, and critique the quality of claims made on the basis of data."  (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"The ability to recognize, evaluate, work with, communicate, and apply data in the context of business priorities and outcomes." (Forrester)

"Data literacy is the ability to derive meaningful information from data, just as literacy in general is the ability to derive information from the written word." (Techtarget) [source]

"Data literacy is the ability to read, work with, analyze and communicate with data, building the skills to ask the right questions of data and machines to make decisions and communicate meaning to others. "(Qlik) [source]

"Data literacy is the ability to read, write and communicate data in context, with an understanding of the data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case application and resulting business value or outcome." (Gartner)

"Data literacy is the ability to read, work with, analyze and communicate with data. It’s a skill that empowers all levels of workers to ask the right questions of data and machines, build knowledge, make decisions, and communicate meaning to others." (Sumo Logic) [source]

"Data literacy is the skill set of reading, communicating, and deriving meaningful information from data. Collecting the data is only the first step. The real value comes from being able to put the information in context and tell a story." (Sisense) [source]

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.