Showing posts with label attributes. Show all posts
Showing posts with label attributes. Show all posts

04 February 2021

📦Data Migrations (DM): Conceptualization (Part VII: Data Import Layer)

Data Migration
Data Migrations Series

The data requirements for the Data Migration (DM) and Data Quality (DQ) are driven by the processes implemented in the target system(s). Therefore, a good knowledge of these requirements can decrease the effort needed for these two subprojects considerably. The needed knowledge basis starts with the entities and their attributes, the dependencies existing between them and the various rules that apply, and ends with the parametrization requirements, respectively the architecture(s) that can be used to import the data.

The DM process starts with defining the entities in scope and their attributes, respectively identifying the corresponding entities and attributes from the legacy systems. The attributes not having a correspondent in the legacy system need to be provided by the business and integrated in the DM logic. In addition, it’s needed to consider also the attributes needed by the business and not available in the target system, some of them more likely available in the legacy systems. For such attributes is needed either to misuse an attribute from the target or to extend the target system.

For each entity is created a data mapping that basically documents the data transformations needed for migrating the data. In the process is needed to consider also attributes’ data types, the (standard) formatting, their domain of definition, as well the various rules that apply. Their implementation belongs into the DM layer from which the data are exported in a standard format as needed by the target system.

Exporting the data from the DM layer directly into the target system’s tables has in theory the lowest overhead even if the rejected records are difficult to track, the rejections resulting only from records’ ‘validation against database’s schema. For this approach to work, one must have a good knowledge of the database schema and of the business rules implemented into the target system.

To solve the issue with errors’ logging, systems have a further layer on top of the database model, which also allow running data validation against target system’s business rules. Modern import frameworks allow loading the data via a set of standard files with a predefined structure. The data can be thus imported manually or via load jobs into the system a log with the issues being generated in the process. Some frameworks allow even the manual editing of failed records, respectively to import the data. Unfortunately, calling the layer from the DM layer is not possible from a database, though this would bring seldom a benefit. Some third-party tools attempt to improve the import functionality by calling the target system’s import layer.

The import files must be generated from the DM layer in the required structure with the appropriate formatting. The challenge however resides in identifying all the attributes that should make scope of the load. It’s an iterative process which sometimes is backed by try-and-error heuristics. Unless target system’s validation rules are known beforehand, the rules need to be discovered in this process, which can prove time-consuming. The discoveries need to be integrated also in the DM and from here results the big number of changes that need to be performed.

Given the dependencies existing between entities the files need to be generated and loaded in a predefined order. These dependencies are reflected also in the data processing and the validation rules considered in the DM layer.

A quality checkpoint can be implemented between the export from the DM layer and import to enforce the four-eyes principle. It’s normally the last opportunity for trapping the eventual issues. A further quality check is performed after import by validating on whether the data were imported as expected.

Previous Post <<||>> Next Post

05 May 2018

🔬Data Science: Clustering (Definitions)

"Grouping of similar patterns together. In this text the term 'clustering' is used only for unsupervised learning problems in which the desired groupings are not known in advance." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"The process of grouping similar input patterns together using an unsupervised training algorithm." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"Clustering attempts to identify groups of observations with similar characteristics." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"The process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects, which are 'similar' between them and are 'dissimilar' to the objects belonging to other clusters." (Juan R González et al, "Nature-Inspired Cooperative Strategies for Optimization", 2008)

"Grouping the nodes of an ad hoc network such that each group is a self-organized entity having a cluster-head which is responsible for formation and management of its cluster." (Prayag Narula, "Evolutionary Computing Approach for Ad-Hoc Networks", 2009)

"The process of assigning individual data items into groups (called clusters) so that items from the same cluster are more similar to each other than items from different clusters. Often similarity is assessed according to a distance measure." (Alfredo Vellido & Iván Olie, "Clustering and Visualization of Multivariate Time Series", 2010)

"Verb. To output a smaller data set based on grouping criteria of common attributes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of partitioning the data attributes of an entity or table into subsets or clusters of similar attributes, based on subject matter or characteristic (domain)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A data mining technique that analyzes data to group records together according to their location within the multidimensional attribute space." (SQL Server 2012 Glossary, "Microsoft", 2012)

"Clustering aims to partition data into groups called clusters. Clustering is usually unsupervised in the sense that the training data is not labeled. Some clustering algorithms require a guess for the number of clusters, while other algorithms don't." (Ivan Idris, "Python Data Analysis", 2014)

"Form of data analysis that groups observations to clusters. Similar observations are grouped in the same cluster, whereas dissimilar observations are grouped in different clusters. As opposed to classification, there is not a class attribute and no predefined classes exist." (Efstathios Kirkos, "Composite Classifiers for Bankruptcy Prediction", 2014)

"Organization of data in some semantically meaningful way such that each cluster contains related data while the unrelated data are assigned to different clusters. The clusters may not be predefined." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"Techniques for organizing data into groups of similar cases." (Meta S Brown, "Data Mining For Dummies", 2014)

[cluster analysis:] "A technique that identifies homogenous subgroups or clusters of subjects or study objects." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Clustering is a classification technique where similar kinds of objects are grouped together. The similarity between the objects maybe determined in different ways depending upon the use case. Therefore, clustering in measurement space may be an indicator of similarity of image regions, and may be used for segmentation purposes." (Shiwangi Chhawchharia, "Improved Lymphocyte Image Segmentation Using Near Sets for ALL Detection", 2016)

"Clustering techniques share the goal of creating meaningful categories from a collection of items whose properties are hard to directly perceive and evaluate, which implies that category membership cannot easily be reduced to specific property tests and instead must be based on similarity. The end result of clustering is a statistically optimal set of categories in which the similarity of all the items within a category is larger than the similarity of items that belong to different categories." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

[cluster analysis:]"A statistical technique for finding natural groupings in data; it can also be used to assign new cases to groupings or categories." (Jonathan Ferrar et al, "The Power of People", 2017)

"Clustering or cluster analysis is a set of techniques of multivariate data analysis aimed at selecting and grouping homogeneous elements in a data set. Clustering techniques are based on measures relating to the similarity between the elements. In many approaches this similarity, or better, dissimilarity, is designed in terms of distance in a multidimensional space. Clustering algorithms group items on the basis of their mutual distance, and then the belonging to a set or not depends on how the element under consideration is distant from the collection itself." (Crescenzio Gallo, "Building Gene Networks by Analyzing Gene Expression Profiles", 2018)

"Unsupervised learning or clustering is a way of discovering hidden structure in unlabeled data. Clustering algorithms aim to discover latent patterns in unlabeled data using features to organize instances into meaningfully dissimilar groups." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The term clustering refers to the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters." (Satyadhyan Chickerur et al, "Forecasting the Demand of Agricultural Crops/Commodity Using Business Intelligence Framework", 2019)

"In the machine learning context, clustering is the task of grouping examples into related groups. This is generally an unsupervised task, that is, the algorithm does not use preexisting labels, though there do exist some supervised clustering algorithms." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A cluster is a group of data objects which have similarities among them. It's a group of the same or similar elements gathered or occurring closely together." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"Clustering describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"Describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"The process of identifying objects that are similar to each other and cluster them in order to understand the differences as well as the similarities within the data." (Analytics Insight)

01 July 2012

📦Data Migrations (DM): An Introduction

Data Migration
Data Migrations Series


Basically, Data Migration is the movement of data from one IS (Information System), the legacy system, to a new IS, the target system, supposed to replace entirely or partially the legacy system. In the best scenario there are no differences between the two IS or the differences are minimal, negligible. In the worst scenario, there are multiple legacy systems used as source, and even multiple target systems, with important differences between them, differences that can even be translated in incompatibilities at multiple levels. Such architectures can span geographies, departments, organizations or industries; can involve a multitude of vendors, generations of systems, network types, different regulations, etc. In many Data Migrations the overall picture can be really complex, though for the sake of simplicity it’s enough to focus on the simplest scenario in which there is a single source and a single target system, with some differences between them. Abstraction can be made also of the fact that many migrations are parts of bigger projects, for example ERP implementations or any other type of applications migrations.

Data Migration is quite a complex topic, for many appearing like a black box in which data come in and data come out. That’s valid for the typical user as well for the IT professionals who haven’t been involved in Data Migration projects. There are many books on topics that are tangent to Data Migration – Data Management, Data Quality, Data Integration or Data Warehousing. Excepting some presentations available on the Web, a few methodologies exposed by important companies, one or two books, and a few blogs, there isn’t much material available on Data Migration. The “trend” is also a reflection of the low importance given to Data Migration as subject, even if many professionals working in the field warn about the considerable impact a Data Migration can have on a project in particular, and on business in general.

Approaching a topic like Data Migration can be, upon case, a complex task, however with a little intuition and some guidance its complexity falls apart. Often, when exploring such a topic, of help can be the 5W1H technique or its extended forms. The technique resumes to searching for answers to the “what”, “where”, “why”, “how”, “when”, “who” and “with what” questions. In case of Data Migration the questions are formulated as: what (data) to migrate, where to migrate, why to migrate, how to migrate, when to migrate, who migrates and with what to migrate?

Why to migrate?

A Data Migration occurs as follow up of a need – an old system exists in place and can’t cope anymore with business’ growth, a company made an acquisition and the systems need to be consolidated, or the organization decided to change its infrastructure, the processes, the business model in order address nowadays business requirements like flexibility, availability, manageability, automation, cost cuts, etc. In other words a Data Migration occurs as a need for change, and it can be itself a change in what concerns technical infrastructure, process, procedures, data flow, ways of doing business. A migration has quite an impact on the business, so here is an entitled question: does it really makes sense to migrate? Why not start from 0 with the new system?!

The migration can be a 0 point for an organization, though unless a company is starting anew, there are some data laying there in the old system(s) that need to be further available - for example open Purchase Orders that need to be fulfilled, Invoices that need to be paid, a catalog with all the Products and the available stock, information about Customers, what they bought, what they browsed or what they want to buy for Christmas, etc. At least some of the data need to be made available in one form and another also within the new architecture, if not the new system.

The availability of old data can be solved by keeping the old system(s) in place, functional, even if the system won’t be fed with new data, or maybe it will. Keeping a system alive involves additional costs for maintaining the infrastructure – software and hardware licenses, consultants, administrators and other people responsible for the optimal work of such a system. This can become with time quite an unnecessary burden. It can be an acceptable choice for some organizations, but unlikely as best/good practice. And even if the system is kept, more likely there will be data that need to be available also in the new system. Can be discussed also about integration of the two systems, but again, does it make sense? The bottom line is that in multiple scenarios a Data Migration can prove to be the optimal solution for an organization.

What data to migrate?

Even if it looks like a silly question, it can be one of most complex questions to answer. In theory is needed to migrate all the data, but are really needed all the data? Typically in a database can be found historical data not used anymore by the business, obsolete data marked or not for deletion, garbage data entered by mistake or remained after incomplete deletions, all these having low or no value for the business. Hopefully there are also “good data”, quintessential for the business. Somebody would say “what a hack, why do we need to philosophize so much, let’s migrate all the data!”. The decision can be understandable, though what if the percentage of “good data” is quite small in comparison with the total volume of data which can measure a few terabytes?! Sure, nowadays data centers can handle without problems terabytes of data, though there are some factors to be considered – it can be quite a challenge to migrate so many data, the volume of data affects also the performance of databases in particular, and IS in general, and a more natural reason – why store something that has minimal value for you?!

It makes sense to migrate only the data that have value for an organization, but what data are needed then? Normally this starts by understanding what entities the business deals with and which are the attributes that characterizes them. Many of the entities can be met in organization’s daily activity, and maybe are already defined in organization’s glossary or Data Dictionary, so a review of the available inventory might do. If not, more effort needs to be spent for this purpose; activities specific to Data Discovery, Data Categorization, Data Definition or Data Profiling tasks can help after case to fill the understanding gaps. Except categorization the others are not all necessary, same as the analysis can be deep enough to serve the purpose.

A first categorization was made above when data were considered as valuable, not valuable or in between. A second categorization can be made based on data’s usage: obsolete (not used anymore or marked for deletion), new (not used and recently entered), historical (data used in the past) and actual (data in use). A third categorization can be made on the status of the entities they represent, status that can be associated to the phase of the process the entity represent (e.g. active, inactive, open, invoices, closed, blocked, etc.). There can be considered other meaningful categorizations as long they prove to be important in identifying the useful data.

An important categorization in migrations, in particular, and Data Management, in general, is to split data in master data, transaction data and setup data. Master data are data are data that change only seldom and have a long life (until become obsolete), are referenced through all the system, and are vital to an organization through their meaning (e.g. Customers, Suppliers, Products, Assets, Employees, Accounts, etc.). Transaction data in exchange are data that change often and have a relatively short life, typically are referenced by other transactions and can be associated with documents or movements through the system (e.g. Purchase Orders, Sales Orders, Invoices, Receipts, Assets Movements, etc.). Setup data are data used to configure a system (e.g. Transaction Types, Document Types, Roles, Permissions, etc.). This categorization deserves the full attention, because each of the three elements needs a different handling approach in migration or Data Management.

Based on the identified categories can be considered some rough migration rules in deciding what data (actually records) to migrate, for example: - master data, unless they become obsolete, and open transactions are often considered to be migrated entirely; - historical transaction data spanning a few years back can be migrated in case they are needed in the process; - master data referenced by transaction data migrated need to be migrated too - setup data are entered manually - historical data are archived. There can be also exceptions from the rules, so such possible scenarios need to be considered too.

Each entity is defined by multiple attributes (also called properties, dimensions). They need to go through a similar “categorization” process. In deciding what attributes to migrate is important to consider especially their role in defining the entity. Some of them define uniquely an entity (e.g. Customer Number, Product Number, Serial Number), physical characteristics of the entity (e.g. color, weight, height), categorize the entity (e.g. Category, Type) or its status (e.g. Active, Blocked, Invoiced), imply various events (e.g. Creation Date, Delivery Date, Invoice Date), and so on. It looks like another type of categorization, and it is, though it’s more difficult to create some rough rules based on it, because in the end the business dictates which Attributes are needed. In fact, most of the Attributes used (with distinct not null values) in the legacy system are more likely needed also in the new system, unless the process changed considerably, or the business is supposed to change also its model.

Where to migrate the data?

When the Data Migration subject is brought on the table, a decision was already made about the target system. So the “where” question is partially answered, however it addresses only the peak of the iceberg. It shows that an iceberg lies there, in front of us, though under the deep of the waters there is something more, lot of questions and issues that need to be addressed. Like the source, the target needs to be further detailed in entities and their attributes; the targeted processes and procedures need to be considered together with the constraints imposed by the new system. It’s actually needed to identify the data requirements for the new systems and corroborate them with the requirements of the old system. Mapping the entities and attributes available in the two systems, process known as Data Mapping, can offer a good overview of what lays ahead, what similarities and gaps exist. There will be attributes that are available in the legacy but not in the target system, and therefore the target system needs to be extended or the data associated with the respective attributes can be left out. From the opposed perspective, there can be mandatory attributes in the target system which are not available in the organization, and therefore the associated data must be collected and/or made available for the migration. There can be cases when the data are not available in the legacy system but distributed in various other external or internal sources, so there can be an option to migrate or integrate the respective data, extend the processes to accommodate such scenarios, etc.

Only when the mapping of data is ready and the various related questions addressed, the “where” question is fully answered. Given the continuous changes done to the target system that may still happen a few days before Go Live, Data Mapping can remain a hot topic until then.

With what to migrate?

This question addresses the mix of tools used to migrate the data, and by extension the whole architecture developed for this purpose. As many experts point out, there is no general solution for such an approach because each migration is challenged by different requirements and architectures. ETL (Extract, Transform, Load) and Data Integration tools were mainly designed for this kind of purposes – moving data between data sources – therefore more likely the whole Data Migration architecture will be built around such a tool. In addition is needed to be addressed topics like assessment and reporting of Data Quality, Data Cleaning, Data Enrichment, Data Backup or Data Security. They will technically ensure that the data are migrated within intended level of quality and security.

For each of these topics are available one or more tools on the market. The challenge is to find the right mixture for the overall architecture, to make them work together in an efficient and effective manner. One of the problems such tools have is that they look to the Data Migration or similar problems from their own perspective, making them hard to integrate with other tools. Given the increasing need for Data Migration, more likely exist there tools that cover most of its requirements, each with its own advantages and disadvantages. Starting with a new tool can prove to be quite challenge in itself. Many recommend following a methodology and using tools that already proved their capabilities in other projects. That’s a good approach, though need to be considered also costs, available resources, effort to build the infrastructure, the learning curve, etc. For some migrations MS Excel or Access will do, for others a more complex framework is needed. Keep in mind that there is no perfect architecture, just the architecture that will drive you to achieve your targets.

How to migrate the data?

“How” refers mainly to the migration approach, steps, methodologies, processes and procedures used to migrate the data. Secondly, and not less important, it refers to how the mix of tools is used for migration – in other words the implementation. Despite the huge variety of tools and means of achieving the target, there can be depicted some generalities for each of these topics.

Migration approach refers to the overall strategy considered for a migration – typically on whether the data are migrated all together, the new system becoming functional and replacing the legacy system (the big-bang migration), or the data are migrated in phases, the legacy and target systems functioning in parallel for a certain amount of time (the phased-out migration). Can be met other variations of migration approaches, under various denominations. It’s important to know the advantages and disadvantages of both or all approaches, especially in what concerns their application in your organization.

“Steps” is just a misnomer for the actual Project Plan in which are considered the different phases and activities of such a project. In a general Data Migration project, can be discussed about Data Discovery, Data Definition, Data Collection, Data Consolidation, Data Mapping, Data Conversion, Data Transformation, Data Quality Assessment, Data Cleaning, Data Storage, etc. Some of these steps can be considered as standalone processes, sometimes being already part of the processes’ landscape existing in an organization. Other steps are just simple activities. Both types of steps share some important characteristics – they can be highly iterative and complex, are owned by the business, the IT functioning as facilitator, each of them depends on the input from other steps, and require continuous feedback, etc.

A Data Migration is (should be) managed as any other IT project, and therefore can be discussed about project-specific methodologies like PMBOK, Prince2 or PRISM. Many of the before mentioned steps come with their luggage of methodologies too. In addition, considering that IT functions as a service, could be considered service-specific methodologies like ITIL, ISO/IEC, Six Sigma, etc.

The actual implementation of all these depends entirely on the project’s scope, the knowledge of all those involved, the constraints met and the resources available for such a project. Many of the IT-specific problems and situations are specific across all IT projects.

Who will migrate the data?

There is no Data Migration project that can be done without the business, the de facto owner of such a project and its output. There is lot of input needed from the business, its continuous involvement through the various stages is necessary for the whole duration. Unless the Data Migration resumes to a rudimentary tool like Excel and can be handled without too much expertise, a Data Migration needs technical resources that can elicit the requirements, translate them in technical requirements, built the infrastructure and maybe migrate the data. It entirely depends on the overall architecture and methodology what people are involved. In the best case scenario the migration will resume to one person pushing a button and the data flow as magic from source to the target system. In reality, multiple people will have to take care of migration, pushing some magic buttons in a chain of parallel and even redundant steps, monitoring and validating the process. Data owners, data stewards, data custodians, data architects, database administrators, migration and quality assurance specialists, developers, consultants and many other people can be involved, each of them playing their role.

When to migrate the data?

Intuitively, data are or should be migrated when the target system is ready to receive the new data, thus when the development was finished, the system tested, and all the preparation for Data Migration were made. The statement is valid for any type of migration. How such a date or dates are calculated when a project starts is in itself kind of science or just a matter of needs. There are projects in which the dates for each milestone or phase are calculated back from a desired Go Live date, or projects in which the Go Live is calculated incrementally based on the steps to be performed. For dates’ calculation can be used also benchmarking from the field. The bottom line is that the data must be migrated on time for the Go Live and with a minimum disruption for the business.


Whether standalone or as subproject of another project, a Data Migration can be or become quite a complex thematic that, through its outcomes, affects the business considerably. In the above paragraphs were considered some of the important aspects of such a project, the focus being more on figuring out what a migration implies rather than a detailed exploration. It’s also a mental exercise and an invitation into the thematic.

31 January 2010

🧭Business Intelligence: Enterprise Reporting (Part V: Choosing Report’s Attributes)

Business Intelligence
Business Intelligence Series


How are chosen the attributes of a report? Attributes are added primarily based on users’ specifications, however often they can be too high level or the user ignored willingly or by mistake certain aspects. In general in a report is need to be shown the attributes of high relevance to a certain topic, for example Document information (Document Number, Type, Dates, Statuses, etc.), Product main information (Product Number, Description, Type, Status, etc.), Quantities, Prices, Amounts, Responsible Users (e.g. Buyers, Preparers, Managers, etc.) or Responsible Third Parties (e.g. Customers, Vendors, Carriers).

When choosing the attributes for a report, there are several important sets of attributes which needs to be considered:

Unique identifiers

Together with the various Names (e.g. Vendor Name, Customer Name) associated with entities, a report should include also the “unique identifier” (UID) for each entity, even if formed from one or more attributes. The UID allows identifying for example if duplicate records appear in report or it could be used to match/join the data from the reports with other data sets in order to pull details or for further analysis of data. For example in a PO report over PO Shipments a unique (natural) key could be identified by using the PO Number, Line Number and Shipment Number; for a Vendor could be used the Vendor Name or the GSL (Global Supplier Location) Number, though the later it’s more adequate because it’s more general and accurate, making easier Vendor’s identification. In theory, for the same scope could be used also the database (surrogate) unique identifier from PO Shipments table, the elements dictating report’s level of detail, respectively the Vendor ID, though even if surrogate UID are easier to use in joins, they could create confusion and overload the reports, given that surrogate UIDs need to be provided also for the other elements.

Documents like Invoices include an external and internal unique identifier, the Invoice Number together with the Vendor, typically unique in a system, form the external UID, while the Document or Voucher Number is used as internal UID. The external UID it’s easier to use for external-based considerations, while the internal UID it’s easier to use for internal needs, so it makes sense to include both types of unique identifiers.

Quantities & Related Attributes

In Item-related reports, most of the times it makes sense to include also the quantities (e.g. Transaction, Ordered, Delivered, Invoiced, On-Hand Quantities) together with the Unit of Measure (UOM) in which they are represented. It has to be made distinction between the Primary UOM, the UOM in which the item is stored, and the Transactional UOM, the UOM in which the Item is transacted; for example the Purchasing UOM, Sales UOM or Transaction UOM could be different than the Primary UOM in which the item is stored in Warehouse. In such cases together with the Transactional UOM should be provided also the Primary UOM and eventually the UOM Conversion Rate, when applicable.

Prices/Amounts & Related Attributes

For Item-related reports and not only, include the various Prices (e.g. Sales, Purchase, Standard Price) together with the Currency Code used even if only one Currency is used, same rule applying also for the amounts stored (e.g. Invoice, Sales Order, Purchase amounts). For financial reports it’s advisable to show both functional amounts, the amounts in the Currency used by GL (General Ledger), and transactional amounts, the Currency used in the transaction. When the level of details allows it, show also the Quantity, Price Unit used to calculate the amounts, the eventual Exchange Rate or UOM Conversion Rate used. When available, include also the Period when the Amount was booked in the system.


Typically should be included the Document Date (e.g. Invoice Date, Order Date) and Document Creation Date, together with the other Dates important for the business or data analysis (e.g. Need By Date, GL Date, Value Date). In general the Document Date or Document Creation Date, and GL Date for financial reports, should be mandatory attributes because they could be used to segment (partition) a data set in time units (e.g. days, weeks, months, periods, years, etc.).


The various record statuses and document statuses should be again mandatory attributes in reports. Record statuses show whether a record is active, was cancelled or marked as deleted, while document statuses show documents’ processing status, often being associated with a workflow (e.g. approval or processing workflows). The record statuses could be synchronized and even merged with the document statuses.

Either expressed as flags or list of values, statuses are essential in delimiting the data set that needs to be considered for further calculations, because often not approved documents or cancelled records have low or no relevance for the business. Not approved documents are typically not considered for the various calculations until they were not approved, while cancelled records are associated with mistakes or the lack of need. Not being able to identify the active records can mess things pretty badly, because for example there are reports that show only active, while others show all the data available in a system. Therefore showing of statuses in reports can be important in the mitigation of differences between reports, especially when dealing with calculations.

It’s advisable to have the possibility to see also the cancelled records, for example in order to analyze the amount of waste expressed as overwork or for identifying the records that were cancelled by mistake.

In reports with multiple levels of details, it can be useful to show the statuses from all levels, as statuses might not be in sync or because they have different meaning. In theory, when the statuses are in synch and especially when considering cancellations, it should be enough to consider the status from the lowest level of detail from each logical entity (e.g. PO Shipment Status when considering PO, Invoice Line status when considering Invoices, both mentioned statuses when considering POs together with Invoices), though reality can prove to be a tough world for statuses, as programming errors and other business scenarios need to be considered.

Action Owners

Include Requestors, Document Preparers, Buyers, Managers or any other type of action owners, so a user can track the direct or indirect issues back to them.

Such attributes can be used as base to calculate/reflect action owner’s performance, fact that can infringe country or organization regulations so you need to check if there are any constraints in this direction and which set of attributes might be impacted. For example might be no problem to show the Buyer, though might be a problem to show information about who created/modified the record. Eventually if needed to calculate the performance at action owner level, substitute any attribute that can be used to identify a person with a random value, however if the mapping between the action owner and value used as substitute is known (in case unique identifiers are used) or easy to get (by checking records in the system), the data might be misused.

03 March 2009

🛢DBMS: Attribute (Definitions)

"A qualifier of an entity or a relation describing its character quantity, quality, degree, or extent. In database design, tables represent entities and columns represent attributes of those entities. For example, the title column represents an attribute of the entity titles." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A column (field) in a dimension table." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"An attribute is the lowest level of information relating to any entity. It models a specific piece of information or a property of a specific entity. Dimensional modeling has a more restrictive definition; it refers to information that describes the characteristics of a dimension." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"A data item that has been 'attached' to an entity. By doing this, a distinction can be made between the generic characteristics of the data item itself (for instance, data type and default documentation) and the entity-specific characteristics (for example, identifying and entity-specific documentation). It’s a distinct characteristic of an entity for which data is maintained. An attribute is a value that describes or identifies an entity, and an entity contains one or more attributes that characterize the entity as a whole. An entity example is Employee, and an attribute example is Employee Last Name." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A property that can assume values for entities or relationships. Entities can be assigned several attributes (for example, a tuple in a relationship consists of values). Some systems also allow relationships to have attributes as well." (William H Inmon, "Building the Data Warehouse", 2005)

"Information about a specific dimension member." (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"The equivalent of a relational database field, used more often to describe a similar low-level structure in object structures." (Gavin Powell, "Beginning Database Design", 2006)

"The differing data items within a relation. An attribute is a named column of a relation." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"The formal database term for column." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Individual data element that is represented and stored in a dimension. Each attribute contains data relating to that dimension." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A primitive data element that provides descriptive detail about an entity; a data field or data item in a record. For example, lastname would be an attribute for the entity customer. Attributes may also be used as descriptive elements for certain relationships among entities." (Toby J Teorey, "Database Modeling and Design 4th Ed", 2010)

"A characteristic of an entity or object. An attribute has a name and a data type." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Characteristic describing an entity. Also known as a field." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A single characteristic or additional piece of information (financial or non-financial) that exists in a database." (Microsoft, "SQL Server 2012 Glossary", 2012)

"An inherent fact, property, or characteristic describing an entity. Every attribute does one of three things: describes, identifies, or relates." (Craig S Mullins, "Database Administration", 2012)

"In modeling, an attribute represents a characteristic of an entity. Because of this use, attribute is sometimes understood as a data element (which is a component piece of a data used to represent an entity), or a field (which is part of a system used to display or intake data), or a column (which is a place in a table to store a defined characteristic of a represented entity, that is, to store values associated with data elements)." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"A data element that describes an entity or a relationship. Each attribute applies to every occurrence of its entity or relationship." (James Robertson et al, "Complete Systems Analysis: The Workbook, the Textbook, the Answers", 2013)

"In the context of information, a descriptor that is not usually associated with a numerical value. Some examples are bad, excellent, red, green, tall, small, wide, far, heavy, fast, portrait, and scenic." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"A value of data that is distinguishable from other values" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The property or characteristic of an object that can be distinguished quantitatively or qualitatively by human or automated means." (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

"Characteristics of an object we capture in a catalog or model for data management purposes. Example: last name is an attribute of a person." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

03 April 2006

♯OOP: Attribute (Definitions)

"Additional characteristics or information defined for an entity." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"A named characteristic or property of a class." (Craig Larman, "Applying UML and Patterns", 2004)

"A characteristic, quality, or property of an entity class. For example, the properties 'First Name' and 'Last Name' are attributes of entity class 'Person'." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Another name for a field, used by convention in many object-oriented programming languages. Scala follows Java’s convention of preferring the term field over attribute." (Dean Wampler & Alex Payne, "Programming Scala", 2009)

"1. (UML diagram) A descriptor of a kind of information captured about an object class. 2. (Relational theory) The definition of a descriptor of a relation." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A fact type element (specifically a characteristic assignment) that is a descriptor of an entity class." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A characteristic of an object." (Requirements Engineering Qualifications Board, "Standard glossary of terms used in Requirements Engineering", 2011)

"An inherent characteristic, an accidental quality, an object closely associated with or belonging to a specific person, place, or office; a word ascribing a quality." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.