SQL Troubles: data modeling

Showing posts with label data modeling. Show all posts

17 September 2024

#️⃣Software Engineering: Mea Culpa (Part V: All-Knowing Developers are Back in Demand?)

Software Engineering Series

I’ve been reading many job descriptions lately related to my experience and curiously or not I observed that many organizations look for developers with Microsoft Dynamics experience in the CRM, respectively Finance and Operations (F&O) and Business Central (BC) areas. It’s a good sign that the adoption of Microsoft solutions for CRM and ERP increases, especially when one considers the progress made in the BI and AI areas with the introduction of Microsoft Fabric, which gives Microsoft a considerable boost. Conversely, it seems that the "developers are good for everything" syntagma is back, at least from what one reads in job descriptions.

Of course, it’s useful to have an inhouse developer who can address all the aspects of an implementation, though that’s a lot to ask considering the different non-programming areas that need to be addressed. It’s true that a developer with experience can handle Requirements, Data and Process Management, respectively Data Migrations and Business Intelligence topics, though if one considers that each of the topics can easily become a full-time job before, during and post-project implementations. I’ve been there and I (hopefully) know that the jobs imply. Even if an experienced programmer can easily handle the different aspects, there will be also times when all the topics combined will be too much for a person!

It's not a novelty that job descriptions are treated like Christmas lists, but it’s difficult to differentiate between essential and nonessential skillset. I read many jobs descriptions lately in which among a huge list of demands, one of the requirements is to program in the F&O framework, sign that D365 programmers are in high demand. I worked for many years as programmer and Software Engineer, respectively in the BI area, where SQL and non-SQL code is needed. Even if I can understand the code in F&O, does it make sense to learn now to program in X++ and the whole framework?

It's never too late to learn new tricks, respectively another programming language and/or framework. It even helps to provide better solutions in usual areas, though frankly I would invest my time in other areas, and AI-related topics like AI prompting or Data Science seem to be more interesting on the long run, especially when they are already in demand!

There seems to be a tendency for Data Science professionals to do everything, building their own solutions, ignoring the experience accumulated respectively the data models built in BI and Data Analytics areas, as if the topics and data models are unrelated! It’s also true that AI-modeling comes with its own requirements in what concerns data modeling (e.g. translating non-numeric to numeric values), though I believe that common ground can be found!

Similarly, the notebook-based programming seems to replicate logic in each solution, which occasionally makes sense, though personally I wouldn’t recommend it as practice! The other day, I was looking at code developed in Python to mimic the joining of tables, when a view with the same could be easier (re)used, maintained, read and probably more efficient, even if different engines will be used. It will be interesting to see how the mix of spaghetti solutions will evolve over time. There are developers already complaining of the number of objects used in the process by building logic for each layer from the medallion architecture! Even if it makes sense from architectural considerations, it will become a nightmare in time.

One can wonder also about nomenclature used – Data Engineer or Prompt Engineering for the simple manipulation of data between structures in data transformations, respectively for structuring the prompts for AI. I believe that engineering involves more than this, no matter the context!

Previous Post <<||>> Next Post

22 August 2024

🧭Business Intelligence: Perspectives (Part 15: From Data to Storytelling III)

Business Intelligence Series

As children we heard or later read many stories, and even if few remained imprinted in memory, we can still recognize some of the metaphors and ideas used. Stories prepared us for life, and one can suppose that the business stories we hear nowadays have similar intent, charge and impact. However, if we dig deeper into each story and dissect it, we may be disappointed by its simplicity, the resemblance to other stories, to what we've heard over time. Moreover, stories can bring also negative connotations, that can impact any other story we hear.

From the scores or hundreds of distinct stories that have been told, few reach a magnitude that can become more than the stories themselves, few become a catalyst for the auditorium, and even then they tend to manipulate. Conversely, well-written transformative stories can move mountains when they resonate with the auditorium. In a leader’s motivational speech such stories can become a catalyst that moves people in the intended direction.

Children stories are quite simple and apparently don’t need special constructs even if the choice of words, structure and messages is important. Moving further into organizations, storytelling becomes more complex, upon case, structures and messages need to follow certain conventions within some politically correct scripts. Facts become important to the degree they serve the story, though the purposes they serve change with time, becoming secondary to the story. Storytelling becomes thus just of way of changing the facts as seems fit to the storyteller.

Storytelling has its role in organizations for channeling the multitude of messages across various structures. However, the more one hears the word storytelling, the more likely one is closer to fiction than to business decision-making. It's also true that the word in itself carries a power we all tasted during childhood and why not much later. The word has a magic power that appeals to our memories, to our feelings, to our expectations. However, as soon one's expectations are not met, the fight with the chimeras turns into a battle of our own. Yes, storytelling has great power when used right, when there's a story to tell, when the business narratives are worth telling.

The problem with stories is that no matter how much they are based on real facts or happenings, they become fictitious in time, to the degree that they lose some of the most important facts they were based on. That’s valid especially when there’s no written track of the story, though even then various versions of the story can multiply outside of the standard channels and boundaries.

Even if the author tried to keep the story as close to the facts, the way stories are understood, remembered and retold depend on too many factors - the words used, the degree to which metaphors and similar elements are understood, remembered and transmitted correctly, the language used, the mental structure existing in the auditorium, the association of words, ideas or metaphors, etc.

Unfortunately, the effect of stories can be negative too, especially when stories are designed to manipulate the auditorium beyond any ethical norms. When they don’t resonate with the crowd or are repeated unnecessary, the narratives may have adverse effects and the messages can get lost in the crowd or create resistance. Moreover, stories may have a multifold and opposite effect within different segments of the auditorium.

Storytelling can make hearts and minds resonate with the carried messages, though misdirected, improper or poorly conceived stories have also the power to destroy all that have been built over the years. Between the two extremes there’s a small space to send the messages across!

Previous Post <<||>> Next Post

07 August 2024

🧭Business Intelligence: Perspectives (Part 12: From Data to Data Models)

Business Intelligence Series

A data model can be defined as an abstract, self-contained, logical definition of the data structures available in a database or similar repositories. It’s typically an abstraction of the data structures underpinning a set of processes, procedures and business logic used for a predefined purpose. A data model can be formed also of unrelated micromodels, depicting thus various aspects of a business.

The association between data and data models is bidirectional. Given a set of data, a data model can be built to underpin the respective data. Conversely, one can create or generate data based on a data model. However, in business setups a bidirectional relationship between data and the data model(s) underpinning them is more realistic as the business evolves. In extremis, the data model can be used to reflect a business’ needs, at least when the respective needs are addressed accordingly by extending the data model(s).

Given a set of data (e.g. the data stored in one or more spreadsheets or other type of files) there can be defined in theory multiple data models to reflect the respective data. Within a data model, the fields (aka attributes) are partitioned into a set of data entities, where a data entity is thus a nonunique grouping of attributes that attempt to define together one unitary aspect of the world. Customers, Vendors, Products, Invoices or Sales Orders are examples of such data entities, though entities can have a broader granularity (e.g. Customers can be modeled over several tables like Entity, Addresses, Contact information, etc.).

From an operational database’s perspective, a data entity is based on one or more tables, though several entities can share some of the tables. From a BI artifact’s perspective, an entity should be easy to create from the underlying tables, with a minimal set of transformations. Ideally, the BI data model should be as close as possible to the needed entity for reporting, however an optimal solution lies usually somewhere in between. In this resides the complexity of modeling BI solutions – providing an optimal data model which can be easily built on the source tables, and which allows addressing all or at least most of the BI requirements.

In other words, we deal with two optimization problems of two distinct data models. On one side the business data model must be flexible enough to provide fast read/write operations while keeping the referential data’s granularity efficient. Conversely, a BI data model needs to abstract these entities and provide a fast way of processing the data, while making data reads extremely efficient. These perspectives must apply when we move to Microsoft Fabric too.

The operational data layer must provide this abstraction, and in this resides the complexity of building optimal BI solutions. This is the layer at which the modeling problems need to be tackled. The challenge of BI and Analytics resides in finding an optimal data model that allows us to address most or ideally all the BI requirements. Several overlapping layers of abstraction may be built in the process.

Looking at the data modeling techniques used in notebooks and other similar solutions, data modeling has the chance of becoming a redundant practice prone to errors. Moreover, data models have a tendency of being multilayered and of being based on certain perspectives into the processes they model. Providing reliable flexible models involves finding the right view into the data for modeling aspects of the business. Database views allow us to easily model such perspectives, often in a unique way. Moving away from them just shifts the burden on the multiple solutions built around the base data, which can create other important challenges.

Previous Post <<||>> Next Post

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

Business Intelligence Series

We live in a complex world, which makes it difficult to model and work with the complex models that attempt to represent it. Thus, we try to simplify it to the degree that it becomes processable and understandable for us, while further simplification is needed when we try to depict it by digital means that make it processable by machines, respectively by us. Whenever we simplify something, we lose some aspects, which might be acceptable in many cases, but create issues in a broader number of ways.

With each layer of simplification results a model that addresses some parts while ignoring some parts of it, which restricts models’ usability to the degree that makes them unusable. The more one moves toward the extremes of oversimplification or complexification, the higher the chances for models to become unusable.

This aspect is relevant also in what concerns the business processes we deal with. Many processes are oversimplified to the degree that we track the entry and exit points, respectively the quantitative aspects we are interested in. In theory this information should be enough when answering some business questions, though might be insufficient when one dives deeper into processes. One can try to approximate, however there are high chances that such approximations deviate too much from the value approximated, which can lead to strange outcomes.

Therefore, when a date or other values are important, organizations consider adding more fields to reflect the implemented process with higher accuracy. Unfortunately, unless we save a history of all the important changes in the data, it becomes challenging to derive the snapshots we need for our analyses. Moreover, it is more challenging to obtain consistent snapshots. There are systems which attempt to obtain such snapshots through the implementation of the processes, though also this approach involves some complexity and other challenges.

Looking at the way business processes are implemented (see ERP, CRM and other similar systems), the systems track the created, modified and a few other dates that allow only limited perspectives. The fields typically provide the perspectives we need for data analysis. For many processes, it would be interesting to track other events and maybe other values taken in between.

There is theoretical potential in tracking more detailed data, but also a complexity that’s difficult to transpose into useful information about the processes themselves. Despite tracking more data and the effort involved in such activities, processes can still behave like black boxes, especially when we have no or minimal information about the processes implemented in Information Systems.

There’s another important aspect - even if systems provide similar implementations of similar processes, the behavior of users can make an important difference. The best example is the behavior of people entering the relevant data only when a process closes and ignoring the steps happening in between (dates, price or quantity changes).

There is a lot of missing data/information not tracked by such a system, especially in what concerns users’ behavior. It’s true that such behavior can be tracked to some degree, though that happens only when data are modified physically. One can suppose that there are many activities happening outside of the system.

The data gathered represents only the projection of certain events, which might not represent accurately and completely the processes or users’ behavior. We have the illusion of transparency, though we work with black boxes. There can be a lot of effort happening outside of these borders.

Fortunately, we can handle oversimplified processes and data maintenance, though one can but wonder how many important things can be found beyond the oversimplifications we work with, respectively what we miss in the process.

Previous Post <<||>> Next Post

10 April 2024

🧭Business Intelligence: Perspectives (Part 11: Ways of Thinking about Data)

Business Intelligence Series

One can observe sometimes the tendency of data professionals to move from a business problem directly to data and data modeling without trying to understand the processes behind the data. One could say that the behavior is driven by the eagerness of exploring the data, though even later there are seldom questions considered about the processes themselves. One can argue that maybe the processes are self-explanatory, though that’s seldom the case.

Conversely, looking at the datasets available on the web, usually there’s a fact table and the associated dimensions, the data describing only one process. It’s natural to presume that there are data professionals who don’t think much about, or better said in terms of processes. A similar big jump can be observed in blog posts on dashboards and/or reports, bloggers moving from the data directly to the data model.

In the world of complex systems like Enterprise Resource Planning (ERP) systems thinking in terms of processes is mandatory because a fact table can hold the data for different processes, while processes can span over multiple fact-like tables, and have thus multiple levels of detail. Moreover, processes are broken down into sub-processes and procedures that have a counterpart in the data as well.

Moreover, within a process there can be multiple perspectives that are usually module or role dependent. A perspective is a role’s orientation to the word for which the data belongs to, and it’s slightly different from what the data professional considers as view, the perspective being a projection over a set of processes within the data, while a view is a projection of the perspectives into the data structure.

For example, considering the order-to-cash process there are several sub-processes like order fulfillment, invoicing, and payment collection, though there can be several other processes involved like credit management or production and manufacturing. Creating, respectively updating, or canceling an order can be examples of procedures.

The sales representative, the shop worker and the accountant will have different perspectives projected into the data, focusing on the projection of the data on the modules they work with. Thinking in terms of modules is probably the easiest way to identify the boundaries of the perspectives, though the rules are occasionally more complex than this.

When defining and/or attempting to understand a problem it’s important to understand which perspective needs to be considered. For example, the sales volume can be projected based on Sales orders or on invoiced Sales orders, respectively on the General ledger postings, and the three views can result in different numbers. Moreover, there are partitions within these perspectives based on business rules that determine what to include or exclude from the logic.

One can define a business rule as a set of conditional logic that constraints some part of the data in the data structures by specifying what is allowed or not, though usually we refer to a special type called selection business rule that determines what data are selected (e.g. open Purchase orders, Products with Inventory, etc.). However, when building the data model we need to consider business rules as well, though we might need to check whether they are enforced as well.

Moreover, it’s useful to think also in terms of (data) entities and sub-entities, in which the data entity is an abstraction from the physical implementation of database tables. A data entity encapsulates (hides internal details) a business concept and/or perspective into an abstraction (simplified representation) that makes development, integration, and data processing easier. In certain systems like Dynamics 365 is important to think at this level because data entities can simplify data modelling considerably.

Previous Post <<||>> Next Post

27 February 2018

🔬Data Science: Data Modeling (Definitions)

"The task of developing a data model that represents the persistent data of some enterprise." (Keith Gordon, "Principles of Data Management", 2007)

"An analysis and design method, building data models to
a) define and analyze data requirements,
b) design logical and physical data structures that support these requirements, and
c) define business and technical meta-data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of creating a data model by applying formal data model descriptions using data modeling techniques." (Christian Galinski & Helmut Beckmann, "Concepts for Enhancing Content Quality and eAccessibility: In General and in the Field of eProcurement", 2012)

"The process of creating the abstract representation of a subject so that it can be studied more cheaply (a scale model of an airplane in a wind tunnel), at a particular moment in time (weather forecasting), or manipulated, modified, and altered without disrupting the original (economic model)." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"A method used to define and analyze the data requirements needed to support an entity’s business processes, defining the relationship between data elements and structures." (Solutions Review)

"A method used to define and analyze data requirements needed to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and data structures. (Microstrategy)

"Refers to the process of defining, analyzing, and structuring data within data models." (Insight Software)

"Data modeling is a way of mapping out and visualizing all the different places that a software or application stores information, and how these sources of data will fit together and flow into one another." (Sisense) [source]

"Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. The diagram can be used to ensure efficient use of data, as a blueprint for the construction of new software or for re-engineering a legacy application." (Techtarget) [source]

10 February 2018

🔬Data Science: Data Mining (Definitions)

"The non-trivial extraction of implicit, previously unknown, and potentially useful information from data" (Frawley et al., "Knowledge discovery in databases: An overview", 1991)

"Data mining is the efficient discovery of valuable, nonobvious information from a large collection of data." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Data mining is the process of examining large amounts of aggregated data. The objective of data mining is to either predict what may happen based on trends or patterns in the data or to discover interesting correlations in the data." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A data-driven approach to analysis and prediction by applying sophisticated techniques and algorithms to discover knowledge." (Paulraj Ponniah, "Data Warehousing Fundamentals", 2001)

"A class of undirected queries, often against the most atomic data, that seek to find unexpected patterns in the data. The most valuable results from data mining are clustering, classifying, estimating, predicting, and finding things that occur together. There are many kinds of tools that play a role in data mining. The principal tools include decision trees, neural networks, memory- and cased-based reasoning tools, visualization tools, genetic algorithms, fuzzy logic, and classical statistics. Generally, data mining is a client of the data warehouse." (Ralph Kimball & Margy Ross, "The Data Warehouse Toolkit" 2nd Ed., 2002)

"The discovery of information hidden within data." (William A Giovinazzo, "Internet-Enabled Business Intelligence", 2002)

"the process of extracting valid, authentic, and actionable information from large databases." (Seth Paul et al. "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis", 2002)

"Advanced analysis or data mining is the analysis of detailed data to detect patterns, behaviors, and relationships in data that were previously only partially known or at times totally unknown." (Margaret Y Chu, "Blissful Data", 2004)

"Analysis of detail data to discover relationships, patterns, or associations between values." (Margaret Y Chu, "Blissful Data ", 2004)

"An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques, and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"the process of analyzing large amounts of data in search of previously undiscovered business patterns." (William H Inmon, "Building the Data Warehouse", 2005)

"A type of advanced analysis used to determine certain patterns within data. Data mining is most often associated with predictive analysis based on historical detail, and the generation of models for further analysis and query." (Jill Dyché & Evan Levy, "Customer Data Integration", 2006)

"Refers to the process of identifying nontrivial facts, patterns and relationships from large databases. The databases have often been put together for a different purpose from the data mining exercise." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"Data mining is the process of discovering implicit patterns in data stored in data warehouse and using those patterns for business advantage such as predicting future trends." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"Digging through data (usually in a data warehouse or data mart) to identify interesting patterns." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"Intelligently analyzing data to extract hidden trends, patterns, and information. Commonly used by statisticians, data analysts and Management Information Systems communities." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"The process of extracting valid, authentic, and actionable information from large databases." (Darril Gibson, "MCITP SQL Server 2005 Database Developer All-in-One Exam Guide", 2008)

"The process of retrieving relevant data to make intelligent decisions." (Robert D Schneider & Darril Gibson, "Microsoft SQL Server 2008 All-in-One Desk Reference For Dummies", 2008)

"A process that minimally has four stages: (1) data preparation that may involve 'data cleaning' and even 'data transformation', (2) initial exploration of the data, (3) model building or pattern identification, and (4) deployment, which means subjecting new data to the 'model' to predict outcomes of cases found in the new data." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"Automatically searching large volumes of data for patterns or associations." (Mark Olive, "SHARE: A European Healthgrid Roadmap", 2009)

"The use of machine learning algorithms to find faint patterns of relationship between data elements in large, noisy, and messy data sets, which can lead to actions to increase benefit in some form (diagnosis, profit, detection, etc.)." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"A data-driven approach to analysis and prediction by applying sophisticated techniques and algorithms to discover knowledge." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A way of extracting knowledge from a database by searching for correlations in the data and presenting promising hypotheses to the user for analysis and consideration." (Toby J Teorey, "Database Modeling and Design" 4th Ed., 2010)

"The process of using mathematical algorithms (usually implemented in computer software) to attempt to transform raw data into information that is not otherwise visible (for example, creating a query to forecast sales for the future based on sales from the past)." (Ken Withee, "Microsoft Business Intelligence For Dummies", 2010)

"A process that employs automated tools to analyze data in a data warehouse and other sources and to proactively identify possible relationships and anomalies." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"Process of analyzing data from different perspectives and summarizing it into useful information (e.g., information that can be used to increase revenue, cuts costs, or both)." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"The process of sifting through large amounts of data using pattern recognition, fuzzy logic, and other knowledge discovery statistical techniques to identify previously unknown, unsuspected, and potentially meaningful data content relationships and trends." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data mining, a branch of computer science, is the process of extracting patterns from large data sets by combining statistical analysis and artificial intelligence with database management. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage." (T T Wong & Loretta K W Sze, "A Neuro-Fuzzy Partner Selection System for Business Social Networks", 2012)

"Field of analytics with structured data. The model inference process minimally has four stages: data preparation, involving data cleaning, transformation and selection; initial exploration of the data; model building or pattern identification; and deployment, putting new data through the model to obtain their predicted outcomes." (Gary Miner et al, "Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications", 2012)

"The process of identifying commercially useful patterns or relationships in databases or other computer repositories through the use of advanced statistical tools." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The process of exploring and analyzing large amounts of data to find patterns." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"An umbrella term for analytic techniques that facilitate fast pattern discovery and model building, particularly with large datasets." (Meta S Brown, "Data Mining For Dummies", 2014)

"Analysis of large quantities of data to find patterns such as groups of records, unusual records, and dependencies" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The practice of analyzing big data using mathematical models to develop insights, usually including machine learning algorithms as opposed to statistical methods."(Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"Data mining is the analysis of data for relationships that have not previously been discovered." (Piyush K Shukla & Madhuvan Dixit, "Big Data: An Emerging Field of Data Engineering", Handbook of Research on Security Considerations in Cloud Computing, 2015)

"A methodology used by organizations to better understand their customers, products, markets, or any other phase of the business." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Extracting information from a database to zero in on certain facts or summarize a large amount of data." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"It refers to the process of identifying and extracting patterns in large data sets based on artificial intelligence, machine learning, and statistical techniques." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"The process of exploring and analyzing large amounts of data to find patterns." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"Term used to describe analyzing large amounts of data to find patterns, correlations, and similarities." (Brittany Bullard, "Style and Statistics", 2016)

"The process of extracting meaningful knowledge from large volumes of data contained in data warehouses." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"A class of analytical applications that help users search for hidden patterns in a data set. Data mining is a process of analyzing large amounts of data to identify data–content relationships. Data mining is one tool used in decision support special studies. This process is also known as data surfing or knowledge discovery." (Daniel J Power & Ciara Heavin, "Decision Support, Analytics, and Business Intelligence" 3rd Ed., 2017)

"The process of collecting, searching through, and analyzing a large amount of data in a database to discover patterns or relationships." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"Data mining involves finding meaningful patterns and deriving insights from large data sets. It is closely related to analytics. Data mining uses statistics, machine learning, and artificial intelligence techniques to derive meaningful patterns." (Amar Sahay, "Business Analytics" Vol. I, 2018)

"The analysis of the data held in data warehouses in order to produce new and useful information." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"The process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems." (Dmitry Korzun et al, "Semantic Methods for Data Mining in Smart Spaces", 2019)

"A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. It can predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. This is also known as data surfing." (Information Management)

"An analytical process that attempts to find correlations or patterns in large data sets for the purpose of data or knowledge discovery." (NIST SP 800-53)

"Extracting previously unknown information from databases and using that data for important business decisions, in many cases helping to create new insights." (Solutions Review)

"is the process of collecting data, aggregating it according to type and sorting through it to identify patterns and predict future trends." (Accenture)

"the process of analyzing large batches of data to find patterns and instances of statistical significance. By utilizing software to look for patterns in large batches of data, businesses can learn more about their customers and develop more effective strategies for acquisition, as well as increase sales and decrease overall costs." (Insight Software)

"The process of identifying commercially useful patterns or relationships in databases or other computer repositories through the use of advanced statistical tools." (Microsoft)

"The process of pulling actionable insight out of a set of data and putting it to good use. This includes everything from cleaning and organizing the data; to analyzing it to find meaningful patterns and connections; to communicating those connections in a way that helps decision-makers improve their product or organization." (KDnuggets)

"Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue. Data mining is also known as data discovery and knowledge discovery." (Techopedia)

"Data mining is an automated analytical method that lets companies extract usable information from massive sets of raw data. Data mining combines several branches of computer science and analytics, relying on intelligent methods to uncover patterns and insights in large sets of information." (Sisense) [source]

"Data mining is the process of analyzing data from different sources and summarizing it into relevant information that can be used to help increase revenue and decrease costs. Its primary purpose is to find correlations or patterns among dozens of fields in large databases." (Logi Analytics) [source]

"Data mining is the process of analyzing massive volumes of data to discover business intelligence that helps companies solve problems, mitigate risks, and seize new opportunities." (Talend) [source]

"Data Mining is the process of collecting data, aggregating it according to type and sorting through it to identify patterns and predict future trends." (Accenture)

"Data mining is the process of discovering meaningful correlations, patterns and trends by sifting through large amounts of data stored in repositories. Data mining employs pattern recognition technologies, as well as statistical and mathematical techniques." (Gartner)

"Data mining is the process of extracting relevant patterns, deviations and relationships within large data sets to predict outcomes and glean insights. Through it, companies convert big data into actionable information, relying upon statistical analysis, machine learning and computer science." (snowflake) [source]

"Data mining is the work of analyzing business information in order to discover patterns and create predictive models that can validate new business insights. […] Unlike data analytics, in which discovery goals are often not known or well defined at the outset, data mining efforts are usually driven by a specific absence of information that can’t be satisfied through standard data queries or reports. Data mining yields information from which predictive models can be derived and then tested, leading to a greater understanding of the marketplace." (Informatica) [source]

01 February 2018

🔬Data Science: Data Analysis (Definitions)

"Obtaining information from measured or observed data." (Ildiko E Frank & Roberto Todeschini, "The Data Analysis Handbook", 1994)

"Refers to the process of organizing, summarizing and visualizing data in order to draw conclusions and make decisions." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A combination of human activities and computer processes that answer a research question or confirm a research hypotheses. It answers the question from data files, using empirical methods such as correlation, t-test, content analysis, or Mill’s method of agreement." (Jens Mende, "Data Flow Diagram Use to Plan Empirical Research Projects", 2009)

"The study and presentation of data to create information and knowledge." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Process of applying statistical techniques to evaluate data." (Sally-Anne Pitt, "Internal Audit Quality", 2014)

"Research phase in which data gathered from observing participants are analysed, usually with statistical procedures." (K N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Data analysis is the process of creating meaning from data. […] Data analysis is the process of creating information from data through the creation of data models and mathematics to find patterns." (Michael Heydt, "Learning Pandas" 2nd Ed, 2017)

"Data analysis is the process of organizing, cleaning, transforming, and modeling data to obtain useful information and ultimately, new knowledge." (John R. Hubbard, Java Data Analysis, 2017)

"Techniques used to organize, assess, and evaluate data and information." (Project Management Institute, "A Guide to the Project Management Body of Knowledge (PMBOK® Guide )", 2017)

"This is a class of statistical methods that make it possible to process a very large volume of data and identify the most interesting aspects of its structure. Some methods help to extract relations between different sets of data, and thus, draw statistical information that makes it possible to describe the most important information contained in the data in the most succinct manner possible. Other techniques make it possible to group data in order to identify its common denominators clearly, and thereby understand them better." (Soraya Sedkaoui, "Big Data Analytics for Entrepreneurial Success", 2019)

"The process and techniques for transforming and evaluating information using qualitative or quantitative tools to discover findings or inform conclusions." (Tiffany J Cresswell-Yeager & Raymond J Bandlow, "Transformation of the Dissertation: From an End-of-Program Destination to a Program-Embedded Process", 2020)

"Data Analysis is a process of gathering and extracting information from the data already present in different ways and order to study the pattern occurs." (Kirti R Bhatele, "Data Analysis on Global Stratification", 2020)

"A data lifecycle stage that involves the techniques that produce synthesized knowledge from organized information. A process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains." (CODATA)

"is the process of inspecting, cleansing, transforming, and modeling data to discover useful information, and support decision-making. The many different types of data analysis include data mining, a predictive technique used for modeling and knowledge discovery, and business intelligence, which relies on aggregation and focuses on business information." (Accenture)

"This discipline is the little brother of data science. Data analysis is focused more on answering questions about the present and the past. It uses less complex statistics and generally tries to identify patterns that can improve an organization." (KDnuggets)

"Data Analysis is the process of inspecting, cleansing, transforming, and modeling data to discover useful information, and support decision-making. The many different types of data analysis include data mining, a predictive technique used for modeling and knowledge discovery, and business intelligence, which relies on aggregation and focuses on business information." (Accenture)

20 March 2017

⛏️Data Management: Data Structure (Definitions)

"A logical relationship among data elements that is designed to support specific data manipulation functions (trees, lists, and tables)." (William H Inmon, "Building the Data Warehouse", 2005)

"Data stored in a computer in a way that (usually) allows efficient retrieval of the data. Arrays and hashes are examples of data structures." (Michael Fitzgerald, "Learning Ruby", 2007)

"A data structure in computer science is a way of storing data to be used efficiently." (Sahar Shabanah, "Computer Games for Algorithm Learning", 2011)

"Data structure is a general term referring to how data is organized. In modeling, it refers more specifically to the model itself. Tables are referred to as 'structures'." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

[probabilistic *] "A data structure which exploits randomness to boost its efficiency, for example skip lists and Bloom filters. In the case of Bloom filters, the results of certain operations may be incorrect with a small probability." (Wei-Chih Huang & William J Knottenbelt, "Low-Overhead Development of Scalable Resource-Efficient Software Systems", 2014)

"A collection of methods for storing and organizing sets of data in order to facilitate access to them. More formally data structures are concise implementations of abstract data types, where an abstract data type is a set of objects together with a collection of operations on the elements of the set." (Ioannis Kouris et al, "Indexing and Compressing Text", 2015)

"A representation of the logical relationship between elements of data." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Is a schematic organization of data and relationship to express a reality of interest, usually represented in a diagrammatic form." (Maria T Artese Isabella Gagliardi, "UNESCO Intangible Cultural Heritage Management on the Web", 2015)

"The implementation of a composite data field in an abstract data type" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A way of organizing data so that it can be efficiently accessed and updated." (Vasileios Zois et al, "Querying of Time Series for Big Data Analytics", 2016)

"A particular way of storing information, allowing to a high level approach on the software implementation." (Katia Tannous & Fillipe de Souza Silva, "Particle Shape Analysis Using Digital Image Processing", 2018)

"It is a particular way of organizing data in a computer so that they can be used efficiently." (Edgar C Franco et al, "Implementation of an Intelligent Model Based on Machine Learning in the Application of Macro-Ergonomic Methods...", 2019)

"Way information is represented and stored." (Shalin Hai-Jew, "Methods for Analyzing and Leveraging Online Learning Data", 2019)

"A physical or logical relationship among a collection of data elements." (IEEE 610.5-1990)

15 February 2017

⛏️Data Management: Data Architecture (Definitions)

"Data Architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. Data architecture includes topics such as database design, information integration, metadata management, business semantics, data modeling, metadata workflow management, and archiving." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"Describes how data is organized and structured to support the development, maintenance, and use of the data by application systems. This includes guidelines and recommendations for historical retention of the data, and how the data is to be used and accessed." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"the organized arrangement of components to optimize the function, performance, feasibility, cost, and/or aesthetics of an overall structure." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The logical-data architecture describes the specific data elements held by the team in a platform-agnostic and business-friendly manner. It plots out the specific tables, fields, and relationships within the team’s data assets and is usually fully normalized to minimize redundancy and represents the highest level of design efficiency possible." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"The physical-data architecture is the lowest level of detail in data architecture. It describes how the logical architecture is actually implemented within the data mart and describes elements by their technical (rather than business) names." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"Defines a company-wide, uniform model of corporate data (the corporate data model). It also describes the architecture for the distribution and retention of data. This describes which data will be stored in which systems, which systems are single sources of truth for which data objects or attributes and the flow of data between the systems." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"One of the layers of the enterprise architecture (EA) that focuses on the IT data architecture side, both for transactional and business intelligence IT data architecture." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"The discipline, methods, and outputs related to understanding data, its meaning and relationships." (Gregory Lampshire et al, "The Data and Analytics Playbook", 2016)

"Models, policies, and guidelines that structure how data are collected, stored, used, managed, and integrated within an organization." (Jonathan Ferrar et al, "The Power of People", 2017)

"Data architecture is the structure that enables the storage, transformation, exploitation, and governance of data." (Pradeep Menon, "Data Lakehouse in Action", 2022)

"Data architecture is the process of designing and building complex data platforms. This involves taking a comprehensive view, which includes not only moving and storing data but also all aspects of the data platform. Building a well-designed data ecosystem can be transformative to a business." (Brian Lipp, "Modern Data Architectures with Python", 2023)

"A data architecture defines a high-level architectural approach and concept to follow, outlines a set of technologies to use, and states the flow of data that will be used to build your data solution to capture big data. [...] Data architecture refers to the overall design and organization of data within an information system." (James Serra, "Deciphering Data Architectures", 2024)

"Data architecture encompasses the rules, policies, models, and standards that govern data collection and how that data is then stored, managed, processed, and used within an organization’s databases and data systems." (snowflake) [source]

"Data architecture is the process by which an organization aligns its data environment with its operational goals." (Xplenty) [source]

09 June 2009

🛢DBMS: Data Modeling (Definitions)

"A method of representing a database using a logical and graphical view. Data modeling can be performed using something as simple as pencil and paper or as involved as sophisticated software. The purpose of data modeling is to bridge the gap between the actual business process and the physical database implementation. The output of data modeling is usually a graphical representation of the data structures." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database." (Bob Bryla, "Oracle Database Foundations", 2004)

"The activity wherein subject areas of data and relationships between them are depicted in a diagram." (Margaret Y Chu, "Blissful Data ", 2004)

"A structured approach used to identify major components of an information system’s specifications. Data modeling enables you to promote data as a corporate asset to share across the enterprise, provide business professionals with a graphical display of their business rules and requirements, bridge the gap between business experts and technical experts, establish consensus/agreement, and build a stable data foundation." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

[Evolutionary data modeling:] "A process in which you model the data aspects of a system iteratively and incrementally, to ensure that the database schema evolves in step with the application code." (Pramod J Sadalage & Scott W Ambler, "Refactoring Databases: Evolutionary Database Design", 2006)

[evolutionary data modeling]: "Methodologies to iteratively and incrementally model database systems so that schema and applications evolve in a parallel way." (Vincenzo Deufemia et al, "Evolutionary Database: State of the Art and Issues", 2009)

[E-R Data Modeling:] "A popular data modeling technique used for representing business entities and the relationships among them." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"1.An analysis and design method, building data models to a) define and analyze data requirements, b) design logical and physical data structures that support these requirements, and c) define business and technical meta-data. 2.The act of creating a data model." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[enterprise data modeling:] "The development of a common consistent view and understanding of data entities and attributes, and their relationships across the enterprise." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data modeling is the ability and process of specifying and constructing complex data structures that represent specific semantics. In SQL, this can be performed with the ANSI-92 LEFT outer join operation that can inherently define and process complex data structures." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"A model that is used to either logically or physically organize the data elements in a database, including the definition of the data elements and of the relationships among the data elements for a specific industry, such as banking." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"Considers data independently of the way the data are processed and of the components that process the data. A process used to define and analyze data requirements needed to support the business processes." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"The process of architecting data objects and structures as they relate to a business or other context." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"The process of identifying and representing the definition, usage, and/or storage of data." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"With dimensional data modeling or denormalization, data is collapsed, combined, or grouped together. Within dimensional data modeling, the concepts of facts (measures) and dimensions (context) are used. If dimensions are collapsed into single structures, the data model is also often called a star schema. If the dimensions are not collapsed, the data model is called snowflake. The dimensional models are typically seen within data warehouse systems." (Piethein Strengholt, "Data Management at Scale", 2020)

"A method that is used to define and analyze the data requirements that are needed in order to support the business functions of an enterprise. These data requirements are recorded as a conceptual data model with associated data definitions. Data modeling defines the relationships between data elements and structures." (Genesys)

"The analysis of data objects using data modelling techniques to create insights from the data." (Analytics Insight)

SQL Troubles

Pages

17 September 2024

#️⃣Software Engineering: Mea Culpa (Part V: All-Knowing Developers are Back in Demand?)

22 August 2024

🧭Business Intelligence: Perspectives (Part 15: From Data to Storytelling III)

07 August 2024

🧭Business Intelligence: Perspectives (Part 12: From Data to Data Models)

06 August 2024

🧭Business Intelligence: Perspectives (Part 16: On the Cusps of Complexity)

10 April 2024

🧭Business Intelligence: Perspectives (Part 11: Ways of Thinking about Data)

27 February 2018

🔬Data Science: Data Modeling (Definitions)

10 February 2018

🔬Data Science: Data Mining (Definitions)

01 February 2018

🔬Data Science: Data Analysis (Definitions)

20 March 2017

⛏️Data Management: Data Structure (Definitions)

15 February 2017

⛏️Data Management: Data Architecture (Definitions)

09 June 2009

🛢DBMS: Data Modeling (Definitions)

About Me