SQL Troubles: Transactions

Showing posts with label Transactions. Show all posts

03 July 2023

📦🔖💫Data Migrations (DM): Comments on "Planning for Successful Data Migration" II (Technical Aspects in Dynamics 365 Finance & Operations)

Data Migrations Series

Introduction

This weekend I read the chapter 5 on Data Migrations (Planning for Successful Data Migration) from Brent Dawson’s recently released book "Becoming a Dynamics 365 Finance and Supply Chain Solution Architect" (published by Packt Publishing, available on Amazon). The chapter makes a few good points, however there are statements that require further clarifications, while others can be questionable.

Concerning the Data Migration (DM), besides several architectural recommendations, the author makes also several technical recommendations that can be summarized as follows:

(10) migrate transactional data manually via direct input or by using the Excel add-in (and it doesn’t recommend migrating transactional data using data packages because data change frequently)
(11) put in place a data outage should be as a part of the cutover timeframe
(12) new transactional data should be migrated after Go-live;
(13) include the effort for data entry in the cutover plan.

General Aspects

In what concerns the data there are 4 important phases during a cutover: configuring the production environment, migrating the master data, migrating the transactional data, respectively importing/creating the new transactional data. After each of these phases a data validation step is required to assure and sign-off on data quality.

Ideally, one can make sure that the production environment is correctly set up by deploying a copy of the database with the gold configuration (e.g. export the database and restore it in the target environment). Otherwise, direct data entry and templates, when available, can help obtain the same result, though the effort and risks for errors are higher.

Moving to next phases, it’s important to understand that a data migration is not a copy paste of some data from one system to another. Often the systems have different schemas, data definitions or granularity of the data entities. Ideally, a DM layer in between should take as input the source data and prepare the load data for the target system. This applies to master as well as to transactional data.

For importing data in D365 FO there are the following main options:
(a) manual data entry
(b) import via Excel-add in and templates
(c) manual/automated data packages
(d) batch API

As rule of thumb, if one has no more than 100-200 records for a data entity, it might be Ok to enter the data manually, eventually by splitting the effort between several users. This would allow users to accommodate themselves with the system, even if errors are made in the process. However, giving the importance of having “clean data” and a repeatable process for Go-Live makes this approach less desirable. On the other side, there will be cases when this will be the only available option.

As soon data's volume goes above this threshold, the effort doesn’t make sense. Preparing the data in Excel and importing them via the Excel add-in is in most cases recommended, as long as the volume of data is manageable. Moreover, data can be partitioned and imported in batches of 1000-2000 records. Ideally, the data should be available in the same structure as required by the templates used.

There will be however a second threshold that makes a batch API solution more attractive. How big is this threshold? It depends. I was able to import 50-100k records via partitioning in Excel add-in, though these values shouldn’t be taken as fix.

The dependencies existing between data will dictate the order in which data must be imported, while the size of each data entity can be used to decide which approach will be used.

Master Data

In theory, the migration of master data can start as soon as the corresponding configuration is available. However, it is recommended to split the two phases and make sure that the environment is fully configured. This helps take a backup of the configuration, when such a snapshot is not available (see golden configuration in previous post).

Before taking a snapshot of the master data from the source system(s) it’s recommended to disable the access for changing the respective data (aka master data freeze). Otherwise, besides the fact that the changes will not appear in the target system(s), changes can make master data’s validation more complex. Sometimes, that's a risk the business is willing to take.

The master data are typically imported a few days before the transactional data need to be imported to allow the team to validate the master data and if the data don’t have the expected quality, perform at most one more migration. Thus, the migration of master data can start one or two weeks earlier, however the longer the timeframe, the higher the chances that the business will be impacted by this (e.g. new orders with new products are needed urgently).

Transaction Data

Before migrating the transactional data, a few processes must be run (e.g. monthly/yearly closing, inventory counting, receiving goods in transit, etc.). Once this accomplished, the system can be frozen and thus the access to making changes disabled. This can happen in phases, depending on the requirements (e.g. migrating the balance can happen much later, even weeks after Go-Live).

What one can migrate are only open transactions (e.g. open purchase orders, open sales orders, open customer/vendor invoices, active assets) and balances (e.g. inventory, trial balance). Usually migrating historical data is out of the question. A data warehouse or similar data repository is more appropriate for storing historical data. Otherwise, keeping the source system(s) available for some users for regulatory requirements would be a better option, when feasible.

The biggest issue with transactional data is that the referenced values (products, customers, vendors) must be available in the target system(s). Even if names and descriptions are maybe the same, the unique identifiers or the surrogate keys are more likely to change. E.g. a product, vendor or customer will have other product number, vendor number or customer number than in the source system(s). This means that the old values need to be replaced with the new ones and this can become a tedious and error-prone process even for Excel. Unless the number of records is really small and there’s no other solution, I don’t recommend this approach.

The alternative would be to build a data migration layer that can address many of the challenges of data migrations. The effort for building such a layer might be high comparable with a manual transformation of the data, though it increases the chances of success by a considerable factor.

During and Post-Go-Live

After validating and signing off on the DM, and here extracts from source and target systems can help, the Go-Live will depend only on the functional testing’s results (and many things can go wrong in this area).

During the freeze period(s) of the source systems, more likely that new master and transactional data needed to be created. Ideally, these data should be entered after the Go-Live announcement, though it isn’t a must if a backup of the target system was taken before. For this the Excel add-ins can become the tool of choice.

With the Go-Live the DM should be over, though there will always be inquiries from the business. In fact only when the auditor signed off the DM is over. Even when one thinks that everything is over a few more surprises can appear – forgotten data, data enrichment, data for new features, etc.

Wrap Up

These are the most important aspects the reader should be aware of. There is more to say about the DM architecture and process, there are more best practices that need to be considered in areas like planning, conceptualization, quality assurance, principles, etc.

Comming back to the best practices from the book, it's worth to stress out that the frequency with which data changes is not the main driver for what approach to use in the DM. Definitely more important is the volume and complexity of data entities to be migrated, and this applies to master and transactional data altogether. Therefore, the argumentation behind (10) doesn't stand entirely.

Concerning (11), a multi-level data freeze is more appropriate than an outage, even if the author intended maybe to say the same thing.

(12) and (13) make sense, though the new data are part of daily business (business as usual) and not of the DM. Moreover, if the data entry or import fails because of whatever reason, it can't be the DM to blame. Even if the lessons learned during DM can be further used for mass data entry and updates, this doesn't mean that the DM project continues to exist. In theory, the DM layer can be used further on, though the respective layer was build on different premises that become obsolete with the Go-Live. One needs to think only from the perspective of the new system. Data Management or more specifically Master Data Management should be responsible for this type of data changes!

Previous Post <<||>> Next Post

25 July 2019

🧱IT: Blockchain (Definitions)

"A block chain is a perfect place to store value, identities, agreements, property rights, credentials, etc. Once you put something like a Bit coin into it, it will stay there forever. It is decentralized, disinter mediated, cheap, and censorship-resistant." (Kirti R Bhatele et al, "The Role of Artificial Intelligence in Cyber Security", 2019)

"A system made-up of blocks that are used to record transactions in a peer-to-peer cryptocurrency network such as bitcoins." (Murad Al Shibli, "Hybrid Artificially Intelligent Multi-Layer Blockchain and Bitcoin Cryptology", 2020)

"A chain of blocks containing data that is bundled together. This database is shared across a network of computers (so-called distributed ledger network). Each data block links to the previous block in the blockchain through a cryptographic hash of the previous block, a timestamp, and transaction data. The blockchain only allows data to be written, and once that data has been accepted by the network, it cannot be changed." (Jurij Urbančič et al, "Expansion of Technology Utilization Through Tourism 4.0 in Slovenia", 2020)

"A system in which a record of transactions made in Bitcoin or another cryptocurrency is maintained across several computers that are linked in a peer-to-peer network. Amany M Alshawi, "Decentralized Cryptocurrency Security and Financial Implications: The Bitcoin Paradigm", 2020)

"An encrypted ledger that protects transaction data from modification." (David T A Wesley, "Regulating the Internet, Encyclopedia of Criminal Activities and the Deep Web", 2020)

"Blockchain is a decentralized, immutable, secure data repository or digital ledger where the data is chronologically recorded. The initial block named as Genesis. It is a chain of immutable data blocks what has anonymous individuals as nodes who can transact securely using cryptology. Blockchain technology is subset of distributed ledger technology." (Umit Cali & Claudio Lima, "Energy Informatics Using the Distributed Ledger Technology and Advanced Data Analytics", 2020)

"Blockchain is a meta-technology interconnected with other technologies and consists of several architectural layers: a database, a software application, a number of computers connected to each other, peoples’ access to the system and a software ecosystem that enables development. The blockchain runs on the existing stack of Internet protocols, adding an entire new tier to the Internet to ensure economic transactions, both instant digital currency payments and complicated financial contracts." (Aslı Taşbaşı et al, "An Analysis of Risk Transfer and Trust Nexus in International Trade With Reference to Turkish Data", 2020)

"Is a growing list of records, called blocks, which are linked using cryptography. Each block contains a cryptographic hash of the previous block a timestamp, and transaction data. (Vardan Mkrttchian, "Perspective Tools to Improve Machine Learning Applications for Cyber Security", 2020)

"This is viewed as a mechanism to provide further protection and enhance the security of data by using its properties of immutability, auditability and encryption whilst providing transparency amongst parties who may not know each other, so operating in a trustless environment." (Hamid Jahankhani & Ionuț O Popescu, "Millennials vs. Cyborgs and Blockchain Role in Trust and Privacy", 2020)

"A blockchain is a data structure that represents the record of each accounting move. Each account transaction is signed digitally to protect its authenticity, and no one can intervene in this transaction." (Ebru E Saygili & Tuncay Ercan, "An Overview of International Fintech Instruments Using Innovation Diffusion Theory Adoption Strategies", 2021)

"A system in which a record of transactions made in bitcoin or another cryptocurrency are maintained across several computers that are linked in a peer-to-peer network." (Silvije Orsag et al, "Finance in the World of Artificial Intelligence and Digitalization", 2021)

"It is a decentralized computation and information sharing platform that enables multiple authoritative domains, who don’t trust each other, to cooperate, coordinate and collaborate in a rational decision-making process." (Vinod Kumar & Gotam Singh Lalotra, "Blockchain-Enabled Secure Internet of Things", 2021)

"A concept consisting of the methods, technologies, and tool sets to support a distributed, tamper-evident, and reliable way to ensure transaction integrity, irrefutability, and non-repudiation. Blockchains are write-once, append-only data stores that include validation, consensus, storage, replication, and security for transactions or other records." (Forrester)

[hybrid blockchain:] "A network with a combination of characteristics of public and private blockchains where a blockchain may incorporate select privacy, security and auditability elements required by the implementation." (AICPA)

[private blockchain:] "A restricted access network controlled by an entity or group which is similar to a traditional centralized network." (AICPA)

"A technology that records a list of records, referred to as blocks, that are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp and transaction data." (AICPA)

[public blockchain:] "An open network where participants can view, read and write data, and no one participant has control (e.g., Bitcoin, Ethereum)." (AICPA)

31 December 2018

🔭Data Science: Big Data (Just the Quotes)

"If we gather more and more data and establish more and more associations, however, we will not finally find that we know something. We will simply end up having more and more data and larger sets of correlations." (Kenneth N Waltz, "Theory of International Politics Source: Theory of International Politics", 1979)

“There are those who try to generalize, synthesize, and build models, and there are those who believe nothing and constantly call for more data. The tension between these two groups is a healthy one; science develops mainly because of the model builders, yet they need the second group to keep them honest.” (Andrew Miall, “Principles of Sedimentary Basin Analysis”, 1984)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Big data can change the way social science is performed, but will not replace statistical common sense." (Thomas Landsall-Welfare, "Nowcasting the mood of the nation", Significance 9(4), 2012)

"Big Data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it." (Edd Wilder-James, "What is big data?", 2012) [source]

"The secret to getting the most from Big Data isn’t found in huge server farms or massive parallel computing or in-memory algorithms. Instead, it’s in the almighty pencil." (Matt Ariker, "The One Tool You Need To Make Big Data Work: The Pencil", 2012)

"Big data is the most disruptive force this industry has seen since the introduction of the relational database." (Jeffrey Needham, "Disruptive Possibilities: How Big Data Changes Everything", 2013)

"No subjective metric can escape strategic gaming [...] The possibility of mischief is bottomless. Fighting ratings is fruitless, as they satisfy a very human need. If one scheme is beaten down, another will take its place and wear its flaws. Big Data just deepens the danger. The more complex the rating formulas, the more numerous the opportunities there are to dress up the numbers. The larger the data sets, the harder it is to audit them." (Kaiser Fung, "Numbersense: How To Use Big Data To Your Advantage", 2013)

"There is convincing evidence that data-driven decision-making and big data technologies substantially improve business performance. Data science supports data-driven decision-making - and sometimes conducts such decision-making automatically - and depends upon technologies for 'big data' storage and engineering, but its principles are separate." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The other buzzword that epitomizes a bias toward substitution is 'big data'. Today’s companies have an insatiable appetite for data, mistakenly believing that more data always creates more value. But big data is usually dumb data. Computers can find patterns that elude humans, but they don’t know how to compare patterns from different sources or how to interpret complex behaviors. Actionable insights can only come from a human analyst (or the kind of generalized artificial intelligence that exists only in science fiction)." (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"We have let ourselves become enchanted by big data only because we exoticize technology. We’re impressed with small feats accomplished by computers alone, but we ignore big achievements from complementarity because the human contribution makes them less uncanny. Watson, Deep Blue, and ever-better machine learning algorithms are cool. But the most valuable companies in the future won’t ask what problems can be solved with computers alone. Instead, they’ll ask: how can computers help humans solve hard problems?" (Peter Thiel & Blake Masters, "Zero to One: Notes on Startups, or How to Build the Future", 2014)

"As business leaders we need to understand that lack of data is not the issue. Most businesses have more than enough data to use constructively; we just don't know how to use it. The reality is that most businesses are already data rich, but insight poor." (Bernard Marr, Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance, 2015)

"Big data is based on the feedback economy where the Internet of Things places sensors on more and more equipment. More and more data is being generated as medical records are digitized, more stores have loyalty cards to track consumer purchases, and people are wearing health-tracking devices. Generally, big data is more about looking at behavior, rather than monitoring transactions, which is the domain of traditional relational databases. As the cost of storage is dropping, companies track more and more data to look for patterns and build predictive models." (Neil Dunlop, "Big Data", 2015)

"Big Data often seems like a meaningless buzz phrase to older database professionals who have been experiencing exponential growth in database volumes since time immemorial. There has never been a moment in the history of database management systems when the increasing volume of data has not been remarkable." (Guy Harrison, "Next Generation Databases: NoSQL, NewSQL, and Big Data", 2015)

"Dimensionality reduction is essential for coping with big data - like the data coming in through your senses every second. A picture may be worth a thousand words, but it’s also a million times more costly to process and remember. [...] A common complaint about big data is that the more data you have, the easier it is to find spurious patterns in it. This may be true if the data is just a huge set of disconnected entities, but if they’re interrelated, the picture changes." (Pedro Domingos, "The Master Algorithm", 2015)

"Science’s predictions are more trustworthy, but they are limited to what we can systematically observe and tractably model. Big data and machine learning greatly expand that scope. Some everyday things can be predicted by the unaided mind, from catching a ball to carrying on a conversation. Some things, try as we might, are just unpredictable. For the vast middle ground between the two, there’s machine learning." (Pedro Domingos, "The Master Algorithm", 2015)

"The human side of analytics is the biggest challenge to implementing big data." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"To make progress, every field of science needs to have data commensurate with the complexity of the phenomena it studies. [...] With big data and machine learning, you can understand much more complex phenomena than before. In most fields, scientists have traditionally used only very limited kinds of models, like linear regression, where the curve you fit to the data is always a straight line. Unfortunately, most phenomena in the world are nonlinear. [...] Machine learning opens up a vast new world of nonlinear models." (Pedro Domingos, "The Master Algorithm", 2015)

"Underfitting is when a model doesn’t take into account enough information to accurately model real life. For example, if we observed only two points on an exponential curve, we would probably assert that there is a linear relationship there. But there may not be a pattern, because there are only two points to reference. [...] It seems that the best way to mitigate underfitting a model is to give it more information, but this actually can be a problem as well. More data can mean more noise and more problems. Using too much data and too complex of a model will yield something that works for that particular data set and nothing else." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2015)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"Big data is, in a nutshell, large amounts of data that can be gathered up and analyzed to determine whether any patterns emerge and to make better decisions." (Daniel Covington, Analytics: Data Science, Data Analysis and Predictive Analytics for Business, 2016)

"Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"While Big Data, when managed wisely, can provide important insights, many of them will be disruptive. After all, it aims to find patterns that are invisible to human eyes. The challenge for data scientists is to understand the ecosystems they are wading into and to present not just the problems but also their possible solutions." (Cathy O'Neil, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy", 2016)

"Big Data allows us to meaningfully zoom in on small segments of a dataset to gain new insights on who we are." (Seth Stephens-Davidowitz, "Everybody Lies: What the Internet Can Tell Us About Who We Really Are", 2017)

"Effects without an understanding of the causes behind them, on the other hand, are just bunches of data points floating in the ether, offering nothing useful by themselves. Big Data is information, equivalent to the patterns of light that fall onto the eye. Big Data is like the history of stimuli that our eyes have responded to. And as we discussed earlier, stimuli are themselves meaningless because they could mean anything. The same is true for Big Data, unless something transformative is brought to all those data sets… understanding." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"The term [Big Data] simply refers to sets of data so immense that they require new methods of mathematical analysis, and numerous servers. Big Data - and, more accurately, the capacity to collect it - has changed the way companies conduct business and governments look at problems, since the belief wildly trumpeted in the media is that this vast repository of information will yield deep insights that were previously out of reach." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Just as they did thirty years ago, machine learning programs (including those with deep neural networks) operate almost entirely in an associational mode. They are driven by a stream of observations to which they attempt to fit a function, in much the same way that a statistician tries to fit a line to a collection of points. Deep neural networks have added many more layers to the complexity of the fitted function, but raw data still drives the fitting process. They continue to improve in accuracy as more data are fitted, but they do not benefit from the 'super-evolutionary speedup'." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Apart from the technical challenge of working with the data itself, visualization in big data is different because showing the individual observations is just not an option. But visualization is essential here: for analysis to work well, we have to be assured that patterns and errors in the data have been spotted and understood. That is only possible by visualization with big data, because nobody can look over the data in a table or spreadsheet." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"With the growing availability of massive data sets and user-friendly analysis software, it might be thought that there is less need for training in statistical methods. This would be naïve in the extreme. Far from freeing us from the need for statistical skills, bigger data and the rise in the number and complexity of scientific studies makes it even more difficult to draw appropriate conclusions. More data means that we need to be even more aware of what the evidence is actually worth." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"Big data is revolutionizing the world around us, and it is easy to feel alienated by tales of computers handing down decisions made in ways we don’t understand. I think we’re right to be concerned. Modern data analytics can produce some miraculous results, but big data is often less trustworthy than small data. Small data can typically be scrutinized; big data tends to be locked away in the vaults of Silicon Valley. The simple statistical tools used to analyze small datasets are usually easy to check; pattern-recognizing algorithms can all too easily be mysterious and commercially sensitive black boxes." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Making big data work is harder than it seems. Statisticians have spent the past two hundred years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster, and cheaper these days, but we must not pretend that the traps have all been made safe. They have not." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"Many people have strong intuitions about whether they would rather have a vital decision about them made by algorithms or humans. Some people are touchingly impressed by the capabilities of the algorithms; others have far too much faith in human judgment. The truth is that sometimes the algorithms will do better than the humans, and sometimes they won’t. If we want to avoid the problems and unlock the promise of big data, we’re going to need to assess the performance of the algorithms on a case-by-case basis. All too often, this is much harder than it should be. […] So the problem is not the algorithms, or the big datasets. The problem is a lack of scrutiny, transparency, and debate." (Tim Harford, "The Data Detective: Ten easy rules to make sense of statistics", 2020)

"The problem is the hype, the notion that something magical will emerge if only we can accumulate data on a large enough scale. We just need to be reminded: Big data is not better; it’s just bigger. And it certainly doesn’t speak for itself." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"[...] the focus on Big Data AI seems to be an excuse to put forth a number of vague and hand-waving theories, where the actual details and the ultimate success of neuroscience is handed over to quasi- mythological claims about the powers of large datasets and inductive computation. Where humans fail to illuminate a complicated domain with testable theory, machine learning and big data supposedly can step in and render traditional concerns about finding robust theories. This seems to be the logic of Data Brain efforts today. (Erik J Larson, "The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"We live on islands surrounded by seas of data. Some call it 'big data'. In these seas live various species of observable phenomena. Ideas, hypotheses, explanations, and graphics also roam in the seas of data and can clarify the waters or allow unsupported species to die. These creatures thrive on visual explanation and scientific proof. Over time new varieties of graphical species arise, prompted by new problems and inner visions of the fishers in the seas of data." (Michael Friendly & Howard Wainer, "A History of Data Visualization and Graphic Communication", 2021)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

"Visualisation is fundamentally limited by the number of pixels you can pump to a screen. If you have big data, you have way more data than pixels, so you have to summarise your data. Statistics gives you lots of really good tools for this." (Hadley Wickham)

13 January 2010

🗄️Data Management: Data Quality Dimensions (Part III: Completeness)

Data Management Series

Completeness refers to the extent to which there are missing data in a dataset, fact reflected in the number of the missing values, also referred as empty (when an empty string or default values is used) or 'Nulls' (aka unknown values), and/or in the number of missing records.

The missing values are typically considered in report to mandatory attributes, attributes that need a not-Null value for each record, though after case might be applied to non-mandatory attributes (optional attributes) too, for example when is intended to understand whether the attributes are adequately maintained or not. It’s interesting that [1] considers also the inapplicable attributes referring to the attributes not applicable (relevant) for certain scenarios (e.g. physical dimensions for service-based materials), which together with the applicable attributes (relevant) can be considered as another type of categorization for attributes. Whether an attribute is mandatory is decided upon business context and not necessarily upon the physical structure containing the attribute, in other words an attribute could be optional as per database schema and mandatory per business rules.

'Missing records' can be a misleading term because is used in several contexts, however within data completeness context it refers only to the cases not covered by data integrity. For example in parent-child table relations the header data was entered though the detail data is missing, either not entered or deleted; such a case is not covered by referential integrity because there is no missing reference, but just the parent without child data (1:n cardinality).

A mixed example occurs when the same entity is split across several tables at the same level of detail. One of the tables must function as a parent, falling in the previous mentioned example (1:1 cardinality). In such a scenario it depends how one reports the nonconformances per record: (1) the error is counted only once, independently on how many dimensions an error was raised; (2) the error is counted for each dimension. For (2) when the referential integrity failed, an error is raised also for each mandatory attribute.

Both examples are dealing with explicit data referents – the 'parent' data, though there are cases in which the referents are implicit, for example when the data are not available for a certain time interval (e.g. period, day) even if needed, though also for this case the referents could be made explicit, falling in the previous mentioned examples. In such scenarios all the attributes corresponding to the missing records will be null.

Normally the completeness of parent-child relations is enforced with the help of referential integrity and database transactions, a set of actions performed as a single unit of work, they allow saving the parent data only if the child data were saved successfully, though such type of constraints is not always necessary.

Data should be cleaned when feasible in the source system(s) and this applies to incomplete data as well. It might be feasible to clean the values in Excel files or similar tools, by exporting and then reimporting the clean values back into the respective systems.

In data migrations or similar scenarios, the completeness in particular and data quality in general must be judged against the target system(s) and thus the dataset must be enriched in an intermediate layer as needed. Upon case, one can consider using default values, though this sounds like a technical debt, likely improbably to be addressed later. Moreover, one should prioritize the effort and consider first the attributes which are needed for the good functioning of the target system(s).

Ideally, for the mandatory fields should be applied data validation techniques in the source systems, when feasible.

Previous Post <<||>> Next Post

Written: Jan-2010, Last Reviewed: Mar-2024

References:
[1] David Loshin (2009) "Master Data Management"

15 August 2009

🛢DBMS: Deadlocks (Definitions)

"A situation which arises when two users, each having a lock on one piece of data, attempt to acquire a lock on the other’s piece of data. The SQL Server detects deadlocks, and kills one user’s process." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A situation that arises when two users, each having a lock on one piece of data, attempt to acquire a lock on the other’s piece. Each user waits for the other to release his or her lock. SQL Server detects deadlocks and kills one user’s process, returning error code 1205." (Patrick Dalton, "Microsoft SQL Server Black Book", 1997)

"A condition that arises when two or more transactions are waiting for one another to release locks." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A situation in SQL Server 2000 when two users each have a lock on one piece of data and they attempt to acquire a lock on the other’s piece of data. Each user would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user’s process." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A state in which two users or processes cannot continue processing because they each have a resource that the other needs." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"Occurs when multiple connections that hold locks are waiting for locks to be released that are held by the other transactions, that is, the transactions are waiting for each other to release locks." (Sara Morganand & Tobias Thernstrom , "MCITP Self-Paced Training Kit : Designing and Optimizing Data Access by Using Microsoft SQL Server 2005 - Exam 70-442", 2007)

"A specific type of locking problem when concurrent processes are competing for locks. For example, program1 holds a lock on A and is waiting for a lock on B; program2 holds a lock on B and is waiting for a lock on A." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed, 2012)

"A deadlock occurs when two or more user processes each have a lock on a separate page or table and each wants to acquire a lock on the other process’s page or table. The transaction with the least accumulated CPU time is killed and all of its work is rolled back." (Sybase)

"A situation in which two or more users are waiting for data locked by each other." (Oracle)

"A situation where different transactions are unable to proceed, because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither one ever releases the locks it holds." (MySQL)

"Unresolved contention for the use of resources." (IBM)

15 July 2009

🛢DBMS: Online Transaction Processing [OLTP] (Definitions)

"A database management system representing the state of a particular business function at a specific point in time. An OLTP database is typically characterized by having large numbers of concurrent users actively adding and modifying data." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A data processing system designed to record all of the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Any software capability that applies transactional updates and inquiries interactively." (Margaret Y Chu, "Blissful Data", 2004)

"A relational database system used to manage the day-to-day operations of an organization." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The operational processes for executing a business activity while the customer or end user waits for the execution to complete. One example of OLTP would be an automated teller transaction." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"A data-processing system designed to record all the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data. Typically, OLTP systems perform large numbers of relatively small transactions." (Jim Joseph et al, "Microsoft® SQL Server™ 2008 Reporting Services Unleashed", 2009)

"Online transaction processing (OLTP) systems are the fundamental systems used to run the business. These are also called operational systems or operational applications. They are often used as sources of data for the data warehouse." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A data-processing system designed to record all the business transactions of an organization as they occur. An OLTP system is characterized by many concurrent users actively adding and modifying data. Typically, OLTP systems perform large numbers of relatively small transactions." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"An approach to database design that focuses on data transactions in particular inserting, updating, and deleting data." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"Class of systems that facilitate and manage transaction-oriented applications." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"A transaction processing system where transactions are executed as soon as they occur." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A type of computer processing in which the computer responds immediately to user requests. Each request is a transaction. The opposite of transaction processing is batch processing." (Craig S Mullins, "Database Administration", 2012)

"A type of interactive application in which requests that are submitted by users are processed as soon as they are received. Results are returned to the requester in a relatively short period of time." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Online transaction processing (OLTP) is a mode of processing that is characterized by short transactions recording business events and that normally requires high availability and consistent, short response times. This category of applications requires that a request for service be answered within a predictable period that approaches 'real time'. Unlike traditional mainframe data processing, in which data is processed only at specific times, transaction processing puts terminals online, where they can update the database instantly to reflect changes as they occur. In other words, the data processing models the actual business in real time, and a transaction transforms this model from one business state to another. Tasks such as making reservations, scheduling and inventory control are especially complex; all the information must be current. (Gartner)

09 July 2009

🛢DBMS: Rollback (Definitions)

"A Transact-SQL statement used with a user-defined transaction (before a commit transaction has been received) that cancels the transaction and undoes any changes that were made to the database." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Rollback of a user-specified transaction to the last savepoint inside a transaction or to the beginning of a transaction." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Terminates a transaction so that all resources updated within a transaction revert to the original state before the transaction started." (Atul Apte, "Java Connector Architecture: Building Custom Connectors and Adapters", 2002)

"The point in a transaction when all updates to any resources involved in the transaction are reversed." (Kim Haase et al, "The J2EE Tutorial", 2002)

"To remove the updates performed by one or more partially completed transactions. Rollbacks are required to restore the integrity of a database after an application, database, or system failure." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Rolling back a transaction means you are undoing the actual transaction you are currently in." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"This command undoes any database changes not yet committed to the database using the COMMIT command." (Gavin Powell, "Beginning Database Design", 2006)

"A DBMS recovery technique that aborts active applications and attempts to reinstate the state of the database prior to initiating the applications active at the time the database failed." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A process that reverts writes operations to ensure the consistency of all replica set members." (MongoDb, "Glossary", 2008)

"Undoes changes performed within a transaction before the transaction is committed." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"An operation that returns the database to a previous state. The transaction can be rolled back completely, canceling a pending transaction, or to a specified point. Rollbacks allow the database to be restored to a valid state if invalid operations are performed or after the database server fails." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"A SQL command that restores the database table contents to their original condition (the condition that existed after the last COMMIT statement)." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"To undo the database statements performed prior to a commit of the transaction." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An operation that ends a current transaction and cancels all the recent changes to the database until the previous checkpoint/ commit point." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"The process of restoring a set of data or process state to a previous consistent state saved through a checkpoint operation." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

01 July 2009

🛢DBMS: Transactions (Definitions)

"A mechanism for ensuring that a set of actions is treated as a single unit of work." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A series of SQL statements that constitute an atomic unit of work: either all are committed as a unit or they are all rolled back as a unit. A transaction begins with the first statement since the last transaction end and finishes with a transaction end (either COMMIT or ROLLBACK) statement." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A group of database operations combined into a logical unit of work that is either wholly committed or rolled back. A transaction is atomic, consistent, isolated, and durable." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A logical unit of work consisting of one or more SQL statements that must all succeed or all fail to keep the database in a logically consistent state. A transfer of funds from a bank account is a logical transaction, in that both the withdrawal from one account and the deposit to another account must succeed for the transaction to succeed." (Bob Bryla, "Oracle Database Foundations", 2004)

"A series of database operations that should be treated as a single atomic operation so either they all occur or none of them occur." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"One or more SQL statements that make up a unit of work performed against the database. Either all the statements in a transaction are committed as a unit or all the statements are rolled back as a unit." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"A group of database operations combined into a logical unit of work that is either wholly committed or rolled back. A transaction is atomic, consistent, isolated, and durable." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"An atomic unit of work with respect to recovery and consistency." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed, 2012)

"An atomic unit of work with respect to recovery and consistency." (Craig S Mullins, "Database Administration", 2012)

"Each individual purchase. Each time customers swipe a card, shell out cash, or press the purchase confirmation button online, a transaction takes place. This data often is referred to as transaction log (T-log) data." (Brittany Bullard, "Style and Statistics", 2016)

25 May 2009

🛢DBMS: Atomicity (Definitions)

"One of the ACID properties; all or none of the transaction must occur. If all parts of the transaction cannot occur successfully, all effects of the transaction must be undone or 'rolled back'." (Atul Apte, "Java Connector Architecture: Building Custom Connectors and Adapters", 2002)

"Atomicity is a feature provided by transactions. It is a principle that states either all of the transactions’ data modifications are performed or none of them are performed." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

[atomic transaction:] "A possibly complex series of actions that is considered as a single operation by those not involved directly in performing the transaction." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The requirement that tasks within a transaction occur as a group as if they were a single complex task. The tasks are either all performed or none of them are performed. It's all or nothing." (Rod Stephens, "Beginning Database Design Solutions", 2008)

[atomic unit of work:] "A general category of work that an activity might perform. This kind of activity encapsulates the logic to perform a unit of work synchronously on the workflow thread. The unit of work that is performed is atomic in the sense that it is completed entirely during a single execution of the activity. It doesn’t need to suspend execution and wait for external input. It is short-lived and doesn’t perform time-consuming operations. It executes synchronously on the workflow thread and doesn’t create or use other threads." (Bruce Bukovics, "Pro WF: Windows Workflow in .NET 4", 2010)

"Atomicity is the state or fact of being composed of individual units (NOAD). With regard to data, atomicity refers to what constitutes a unit. In modeling, as data is normalized, each attribute is expected to represent one thing, not a set of things. One system may define a name as one thing. Another may define it as three things (first name, middle initial, last name). Atomicity results from decisions about how to structure data." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement", 2012)

"The characteristic of a transaction whereby database modifications must adhere to an 'all or nothing' rule. If any single part of a transaction fails, the entire transaction fails. A database management system (DBMS) must maintain atomicity despite software and hardware failures (the A in ACID)." (Craig S Mullins, "Database Administration", 2012)

"An attribute or property of a transaction whereby a group of statements are run as if a single operation or none of the statements are run." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

15 April 2009

🛢DBMS: Transactional Replication (Definitions)

"A type of replication that marks selected transactions in the Publisher's database transaction log for replication and then distributes them asynchronously to Subscribers as incremental changes, while maintaining transactional consistency." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A type of replication where an initial snapshot of data is applied at Subscribers, and then when data modifications are made at the Publisher, the individual transactions are captured and propagated to Subscribers." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A type of replication where data and database objects are distributed by first applying an initial snapshot at the Subscriber and then later capturing transactions made at the Publisher and propagating them to individual Subscribers. Transactional replication, as with all replication types, begins with a synchronizing snapshot. After the initial synchronization, transactions, which are committed at the Publisher, are automatically replicated to the Subscribers." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A replication type that relies on DML operations being captured from a published database and automatically sent to the subscriber database(s). Starts with a snapshot of the publication. Incremental changes both in data and schema at the source are replicated to the destination as they occur." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"Replication that starts with a snapshot and then keeps the Subscribers up-to-date by using the transaction log. Transactions are recorded on the Publisher, distributed to the Subscribers, and then applied to keep the Subscribers up-to-date." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"A type of replication that typically starts with a snapshot of the publication database objects and data." (SQL Server 2012 Glossary, "Microsoft", 2012)

"In SQL Replication, a type of processing in which every transaction is replicated to the target table when it is committed in the source table." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A type of replication that typically starts with a snapshot of the publication database objects and data." (Microsoft Technet)

13 April 2009

🛢DBMS: Roll Back (Definitions)

"To remove partially completed transactions after a database or other system failure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"To reverse changes made by transactions that were uncommitted at the point in time to which a database is being recovered." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"To reverse or undo changes made so far." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"The process of undoing uncommitted transactions. As a part of the recovery process, uncommitted transactions are rolled back to ensure the database is recovered in a consistent state." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"Undo the changes made by a transaction, restoring the database to the state it was in before the transaction began." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"End a transaction, undoing any changes made by the transaction and restoring the database to the state it was in before the transaction began." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"To reverse changes." (Microsoft, "SQL Server 2012 Glossary", 2012)

"To undo the database statements performed prior to a commit of the transaction." (Craig S Mullins, "Database Administration", 2012)

"To restore data that is changed by an SQL statement to the state at its last commit point." (IBM, "Informix Servers 12.1", 2014)

"To return to a previous version, as in rolling back a device driver." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"To restore data that is changed by an SQL statement to the state at its last commit point." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"To return the values changed by a transaction to their original state." (Microsoft, "ODBC Glossary")

19 February 2009

🛢DBMS: Dirty Read (Definitions)

"Occurs when one transaction modifies a row, and then a second transaction reads that row before the first transaction commits the change. If the first transaction rolls back the change, the information read by the second transaction becomes invalid." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"Reads that contain uncommitted data. For example, transaction 1 changes a row. Transaction 2 reads the changed row before transaction 1 commits the change. If transaction 1 rolls back the change, transaction 2 reads a row that is considered to have never existed." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A problem arising with concurrent transactions. The Dirty Read problem occurs when a transaction reads a row that has been changed but not committed by another transaction. The result is that Transaction #2's work is based on a change that never really happened. You can avoid Dirty Read by using an isolation level of READ COMMITTED or higher." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A problem with uncontrolled concurrent use of a database where a transaction acts on data that have been modified by an update transaction that hasn't committed and is later rolled back." (Jan L Harrington, "Relational Database Design and Implementation" 3rd Ed., 2009)

"The problem that arises when a transaction reads the same data more than once, including data modified by concurrent transactions that are later rolled back." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"A read that contains uncommitted data." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A read request that does not involve any locking mechanism. This means that data can be read that might later be rolled back resulting in an inconsistency between what was read and what is in the database." (IBM, "Informix Servers 12.1", 2014)

"A transaction reads data that has been written by another transaction that has not been committed yet. Oracle Database never permits dirty reads." (Oracle)

"An operation that retrieves unreliable data, data that was updated by another transaction but not yet committed. It is only possible with the isolation level known as read uncommitted. This kind of operation does not adhere to the ACID principle of database design. It is considered very risky, because the data could be rolled back, or updated further before being committed; then, the transaction doing the dirty read would be using data that was never confirmed as accurate." (MySQL)

"Reads that contain uncommitted data." (Microsoft Technet)

15 February 2009

🛢DBMS: Throughput (Definitions)

"The volume of work completed in a given time period. It is usually measured in transactions per second (TPS)." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"The number of operations the DBMS can do in a time unit." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"The amount of work performed by a computer system within a specified time interval; for example, the number of transactions of a certain type that can be processed per second. See also response time, workload, channel capacity." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"Amount of activity a system can sustain over a period of time." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

"The amount of data that is transferred from sender to receiver over time." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"The number of work items processed per unit of time." (Max Domeika, "Software Development for Embedded Multi-core Systems", 2011)

"Given a set of tasks to be performed, the rate at which those tasks are completed. Throughput measures the rate of computation, and it is given in units of tasks per unit time." (Michael McCool et al, "Structured Parallel Programming", 2012)

"The amount of work completed in a unit of time." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"The rate at which transactions are completed in a system." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

13 February 2009

🛢DBMS: Savepoint (Definitions)

"A marker that the user includes in a user-defined transaction. When transactions are rolled back, they can be rolled back only to the savepoint." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A marker that allows an application to roll back part of a transaction if a minor error is encountered. The application must still commit or roll back the full transaction when it is complete." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A location to which a transaction can return if part of the transaction is conditionally canceled or encounters an error, hence offering a mechanism to roll back portions of transactions." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A named entity that represents the state of data and schemas at a particular point in time within a unit of work." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

[nested savepoint] "A savepoint that is included or positioned within another savepoint. Nested savepoints allow an application to have multiple levels of savepoints active at a time and allow the application to roll back to any active savepoint as required." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A marker that allows an application to roll back part of a transaction if a minor error is encountered." (Microsoft Technet)

"A named SCN in a transaction to which the transaction can be rolled back." (Oracle, "Oracle Database Concepts")

"Savepoints help to implement nested transactions. They can be used to provide scope to operations on tables that are part of a larger transaction. For example, scheduling a trip in a reservation system might involve booking several different flights; if a desired flight is unavailable, you might roll back the changes involved in booking that one leg, without rolling back the earlier flights that were successfully booked." (MySQL, "MySQL 8.0 Reference Manual Glossary")

06 February 2009

🛢DBMS: Two-Phase Commit [2PC] (Definitions)

"An approach for maintaining consistency over multiple systems. In the first phase, all backends are asked to confirm a requested change so that in the second phase the commitment of the updates usually succeeds." (Nicolai M Josuttis, "SOA in Practice", 2007)

"A protocol that ensures that transactions that apply to more than one server are completed on all servers or on none." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A mechanism to synchronize updates on different machines or platforms, so that they all fall or all succeed together. The decision to commit is centralized, but each participant has the right to veto. This is a key process in real-time, transaction-based environments." (Atul Apte, "Java™ Connector Architecture: Building Custom Connectors and Adapters", 2002)

"A process that ensures transactions that apply to more than one server are completed on all servers or on none." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is a special transaction involving two servers in which the transaction must be applied to both servers, or the entire transaction is rolled back from both servers." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A transaction processing protocol that first ensures the transaction holds locks on all records involved before committing any updates." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A protocol that ensures that transactions that apply to more than one server are completed on all servers or none at all. Two-phase commit is coordinated by the transaction manager and supported by resource managers." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A two-step process by which recoverable resources and an external subsystem are committed. During the first step, the database manager subsystems are polled to ensure that they are ready to commit. If all subsystems respond positively, the database manager instructs them to commit." (IBM, "Informix Servers 12.1", 2014)

"A mechanism that is another control used in databases to ensure the integrity of the data held within the database." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A feature of transaction processing systems that enables a database to be returned to the pretransaction state if an error condition occurs." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures", 2012)

"A process that ensures transactions that apply to more than one server are completed on all servers or on none." (Microsoft Technet)

"An operation that is part of a distributed transaction, under the XA specification. (Sometimes abbreviated as 2PC.) When multiple databases participate in the transaction, either all databases commit the changes, or all databases roll back the changes." (MySQL, "MySQL 8.0 Reference Manual Glossary")

"The process of committing a distributed transaction in two phases. In the first phase, the transaction processor checks that all parts of the transaction can be committed. In the second phase, all parts of the transaction are committed. If any part of the transaction indicates in the first phase that it cannot be committed, the second phase does not occur. ODBC does not support two-phase commits." (Microsoft, "ODBC Glossary")

05 March 2007

🌁Software Engineering: Protocol (Definitions)

"The language or rules and conventions that two computers use to pass messages across a network medium. Networking software generally implements multiple levels of protocols layered one on top of another." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"A set of rules or standards designed to enable computers to connect with one another and exchange information." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The way in which two computers transfer data between each other." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours 2nd Ed.", 2001)

"A list of methods that a class must implement to conform or adopt the protocol. Protocols provide a way to standardize an interface across classes." (Stephen G Kochan, "Programming in Objective-C", 2003)

"A set of rules that govern a transaction." (Marcus Green & Bill Brogden, "Java 2™ Programmer Exam Cram™ 2 (Exam CX-310-035)", 2003)

"A set of semantic and syntactic rules that determines the behavior of functions in achieving communication." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A language and a set of rules that allow computers to interact in a well-defined way. Examples are FTP, HTTP, and NNTP." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A specification - often a standard - that describes how computers communicate with each other, for example, the TCP/IP suite of communication protocols or the OAI-PMH." (J P Getty Trust, "Introduction to Metadata" 2nd Ed., 2008)

"To communicate effectively, client applications and database servers need a commonly agreed-upon approach. A protocol is a communication standard adhered to by both parties that makes these conversations possible." (Robert D Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

"A set of rules that computers use to establish and maintain communication amongst themselves." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"the forms and ceremony used to manage the interaction of elements." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"The rules governing the syntax, semantics, and synchronization of communication." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"A list of methods that a class must implement to conform to or adopt the protocol. Protocols provide a way to standardize an interface across classes. See also formal protocol and informal protocol." (Stephen G Kochan, "Programming in Objective-C" 4th Ed., 2011)

"A set of conventions that govern the communications between processes. Protocol specifies the format and content of messages to be exchanged." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The standard or set of rules that govern how devices on a network exchange and how they need to function in order to 'talk' to each other." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"A standard set of formats and procedures that enable computers to exchange information." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In networking, an agreed-upon way of sending messages back and forth so that neither correspondent will get too confused." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A set of guidelines defining network traffic formats for the easy communication of data between two hosts." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition, 2nd Ed.", 2013)

"A set of instructions, policies, or fully described procedures for accomplishing a service, operation, or task." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"A set of rules controlling the communication and transfer of data between two or more devices or systems in a communication network." (IBM, "Informix Servers 12.1", 2014)

"A rule or custom that governs how something is done. In a computer context, it refers to a standard for transferring data." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"A set of rules that defines how data is formatted and processed on a network" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"Defined policies or standards that users adhere to. Protocols are well-defined and accepted procedures. In computer networking, the term refers to algorithms for exchanging various types of data and their interpretation at origination and destination." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

SQL Troubles

Pages