SQL Troubles: timeliness

Showing posts with label timeliness. Show all posts

04 May 2019

🧊Data Warehousing: Architecture (Part I: Push vs. Pull)

In data integrations, data migrations and data warehousing there is the need to move data between two or more systems. In the simplest scenario there are only two systems involved, a source and a target system, though there can be complex scenarios in which data from multiple sources need to be available in a common target system (as in the case of data warehouses/marts or data migrations), or data from one source (e.g. ERP systems) need to be available in other systems (e.g. Web shops, planning systems), or there can be complex cases in which there is a many-to-many relationship (e.g. data from two ERP systems are consolidated in other systems).

The data can flow in one direction from the source systems to the target systems (aka unidirectional flow), though there can be situations in which once the data are modified in the target system they need to flow back to the source system (aka bidirectional flow), as in the case of planning or product development systems. In complex scenarios the communication may occur multiple times within same process until a final state is reached.

Independently of the number of systems and the type of communication involved, data need to flow between the systems as smooth as possible, assuring that the data are consistent between the various systems and available when needed. The architectures responsible for moving data between the sources are based on two simple mechanisms - push vs pull – or combinations of them.

A push mechanism makes data to be pushed from the source system into the target system(s), the source system being responsible for the operation. Typically the push can happen as soon as an event occurs in the source system, event that leads to or follows a change in the data. There can be also cases when is preferred to push the data at regular points in time (e.g. hourly, daily), especially when the changes aren’t needed immediately. This later scenario allows to still make changes to the data in the source until they are sent to other system(s). When the ability to make changes is critical this can be controlled over specific business rules.

A pull mechanism makes the data to be pulled from the source system into the target system, the target systems being responsible for the operation. This usually happens at regular points in time or on demand, however the target system has to check whether the data have been changed.

Hybrid scenarios may involve a middleware that sits between the systems, being responsible for pulling the data from the source systems and pushing them into the targets system. Another hybrid scenario is when the source system pushes the data to an intermediary repository, the target system(s) pulling the data on a need basis. The repository can reside on the source, target on in-between. A variation of it is when the source informs the target that a change happened at it’s up to the target to decide whether it needs the data or not.

The main differentiators between the various methods is the timeliness, completeness and consistency of the data. Timeliness refers to the urgency with which data need to be available in the target system(s), completeness refers to the degree to which the data are ready to be sent, while consistency refers to the degree the data from the source are consistent with the data from the target systems.

Based on their characteristics integrations seem to favor push methods while data migrations and data warehousing the pull methods, though which method suits the best depends entirely on the business needs under consideration.

Previous Post <<||>> Next Post

20 February 2017

⛏️Data Management: Timeliness (Definitions)

"Coming early or at the right, appropriate or adapted to the times or the occasion." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

[timeliness & availability] "A data quality dimension that measures the degree to which data are current and available for use as specified, and in the time frame in which they are expected." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"the ability of a task to repeatedly meet its timeliness requirements." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"A pragmatic quality characteristic that is a measure of the relative availability of data to support a given process within the timetable required to perform the process." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"1.The degree to which available data meets the currency requirements of information consumers. 2.The length of time between data availability and the event or phenomenon they describe." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Timeliness is a dimension of data quality related to the availability and currency of data. As used in the DQAF, timeliness is associated with data delivery, availability, and processing. Timeliness is the degree to which data conforms to a schedule for being updated and made available. For data to be timely, it must be delivered according to schedule." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The degree to which the model contains elements that reflect the current version of the world Transitive Relation When a relation R is transitive then if R links entity A to entity B, and entity B to entity C, then it also links A to C." (Panos Alexopoulos, "Semantic Modeling for Data", 2020)

"The degree to which the actual time and processing time are separated. The timelier the data is, the smaller the gap is between actual time and record time." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Length of time between data availability and the event or phenomenon they describe." (SDMX)

17 January 2010

🗄️Data Management: Data Quality Dimensions (Part IV: Accuracy)

Data Management Series

Accuracy refers to the extent data is correct, matching the reality with an acceptable level of approximation. Correctness, the value of being correct, same as reality are vague terms, in many cases they are a question of philosophy, perception, having a high degree of interpretability. However, in what concerns data they are typically the result of measurement, therefore a measurement of accuracy relates to the degree the data deviate from physical laws, logics or defined rules, though also this context is a swampy field because, utilizing a well-known syntagm, everything is relative.

From a scientific point of view, we try to model the reality with mathematical models which offer various level of approximation, the more we learn about our world, the more flaws we discover in the existing models, it’s a continuous quest for finding better models that approximate the reality. Things don’t have to be so complicated, for basic measurements there are many tools out there that offer acceptable results for most of the requirements, on the other side, as requirements change, better approximations might be required with time.

Another concept related with the ones of accuracy and measurement systems is the one of precision, and it refers to degree repeated measurements under unchanged conditions lead to the same results, further concepts associated with it being the ones of repeatability and reproducibility. Even if the accuracy and precision concepts are often confounded a measurement system can be accurate but not precise or precise but not accurate (see the target analogy), a valid measurement system targeting thus both aspects. Accuracy and precision can be considered dimensions of correctedness.

Coming back to accuracy and its use in determining data quality, typically accuracy it’s strong related to the measurement tools used, for this being needed to do again the measurements for all or a sample of the dataset and identify whether the requested level of accuracy is met, approach that could involve quite an effort. The accuracy depends also on whether the systems used to store the data are designed to store the data at the requested level of accuracy, fact reflected by the characteristics of data types used (e.g. precision, length).

Given the fact that a system stores related data (e.g. weight, height, width, length) that could satisfy physical, business of common-sense rules could be used rules to check whether the data satisfy them with the desired level of approximation. For example, being known the height, width, length and the composition of a material (e.g. metal bar) could be determined the approximated weight and compared with entered weight, if the difference is not inside of a certain interval then most probably one of the values were incorrect entered. There are even simpler rules that might apply, for example the physical dimensions must be positive real values, or in a generalized formulation - involve maximal or minimal limits that lead to identification of outliers, etc. In fact, most of the time determining data accuracy resumes only at defining possible value intervals, though there will be also cases in which for this purpose are built complex models and specific techniques.

There is another important aspect related to accuracy, time dependency of data – whether the data changes or not with time. Data currency or actuality refers to the extent data are actual. Given the above definition for accuracy, currency could be considered as a special type of accuracy because when the data are not actual then they don’t reflect reality. If currency is considered as a standalone data quality dimension, then accuracy refers only to the data that are not time dependent.

Previous Post <<||>> Next Post

Written: Jan-2010, Last Reviewed: Mar-2024

12 February 2009

🛢DBMS: Latency (Definitions)

"The amount of time that elapses between when a change is completed on the Publisher and when it appears in the destination database on the Subscriber." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The amount of time that elapses between when a data change is completed at one server and when that change appears at another." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"The amount of time that elapses when a data change is completed at one server and when that change appears at another within a replication architecture (for example, the time between when a change is made at a publisher and when it appears at the subscriber)." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"The delay in time for a data change to be propagated between nodes in a replication topology." (Marilyn Miller-White et al, "MCITP Administrator: Microsoft® SQL Server™ 2005 Optimization and Maintenance 70-444", 2007)

[latency of information:] "Latency is a time delay between the moment something is initiated and the moment one of its effects begins or becomes detectable. Latency of information applies this concept to changes, updates, and deletes of information." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

[data latency:] "Technically, the speed in which data is captured is referred to as data latency. It is a measure of data 'freshness', specifically data that are less than 24 hours old." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"The measure of time between two events, such as the initiation and completion of an event, or the read on one system and the write to another system." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The delay that occurs while data is processed or delivered." (Microsoft, "SQL Server 2012 Glossary", 2012)

"In replication, part or all of the approximate difference between the time that a source table is changed and the time that the change is applied to the corresponding target table." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"The amount of time that elapses when a data change is completed at one server and when that change appears at another server." (Microsoft Technet)

SQL Troubles

Pages