Showing posts with label domain. Show all posts
Showing posts with label domain. Show all posts

12 March 2024

🏭🗒️Microsoft Fabric: OneLake [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Mar-2024

Microsoft Fabric & OneLake
Microsoft Fabric & OneLake

[Microsoft Fabric] OneLake

  • a single, unified, logical data lake for the whole organization [2]
    • designed to be the single place for all an organization's analytics data [2]
    • provides a single, integrated environment for data professionals and the business to collaborate on data projects [1]
    • stores all data in a single open format [1]
    • its data is governed by default
    • combines storage locations across different regions and clouds into a single logical lake, without moving or duplicating data
      • similar to how Office applications are prewired to use OneDrive
      • saves time by eliminating the need to move and copy data 
  • comes automatically with every Microsoft Fabric tenant [2]
    • automatically provisions with no extra resources to set up or manage [2]
    • used as native store without needing any extra configuration [1
  • accessible by all analytics engines in the platform [1]
    • all the compute workloads in Fabric are preconfigured to work with OneLake
      • compute engines have their own security models (aka compute-specific security) 
        • always enforced when accessing data using that engine [3]
        • the conditions may not apply to users in certain Fabric roles when they access OneLake directly [3]
  • built on top of ADLS  [1]
    • supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications [2]
    • inherits its hierarchical structure
    • provides a single-pane-of-glass file-system namespace that spans across users, regions and even clouds
  • data can be stored in any format
    • incl. Delta, Parquet, CSV, JSON
    • data can be addressed in OneLake as if it's one big ADLS storage account for the entire organization [2]
  • uses a layered security model built around the organizational structure of experiences within MF [3]
    • derived from Microsoft Entra authentication [3]
    • compatible with user identities, service principals, and managed identities [3]
    • using Microsoft Entra ID and Fabric components, one can build out robust security mechanisms across OneLake, ensuring that you keep your data safe while also reducing copies and minimizing complexity [3]
  • hierarchical in nature 
    • {benefit} simplifies management across the organization
    • its data is divided into manageable containers for easy handling
    • can have one or more capacities associated with it
      • different items consume different capacity at a certain time
      • offered through Fabric SKU and Trials
  • {component} OneCopy
    • allows to read data from a single copy, without moving or duplicating data [1]
  • {concept} Fabric tenant
    • a dedicated space for organizations to create, store, and manage Fabric items.
      • there's often a single instance of Fabric for an organization, and it's aligned with Microsoft Entra ID [1]
        • ⇒ one OneLake per tenant
      • maps to the root of OneLake and is at the top level of the hierarchy [1]
    • can contain any number of workspaces [2]
  • {concept} capacity
    • a dedicated set of resources that is available at a given time to be used [1]
    • defines the ability of a resource to perform an activity or to produce output [1]
  • {concept} domain
    • a way of logically grouping together workspaces in an organization that is relevant to a particular area or field [1]
    • can have multiple [subdomains]
      • {concept} subdomain
        • a way for fine tuning the logical grouping of the data
  • {concept} workspace 
    • a collection of Fabric items that brings together different functionality in a single tenant [1]
      • different data items appear as folders within those containers [2]
      • always lives directly under the OneLake namespace [4]
      • {concept} data item
        • a subtype of item that allows data to be stored within it using OneLake [4]
        • all Fabric data items store their data automatically in OneLake in Delta Parquet format [2]
      • {concept} Fabric item
        • a set of capabilities bundled together into a single component [4] 
        • can have permissions configured separately from the workspace roles [3]
        • permissions can be set by sharing an item or by managing the permissions of an item [3]
    • acts as a container that leverages capacity for the work that is executed [1]
      • provides controls for who can access the items in it [1]
        • security can be managed through Fabric workspace roles
      • enable different parts of the organization to distribute ownership and access policies [2]
      • part of a capacity that is tied to a specific region and is billed separately [2]
      • the primary security boundary for data within OneLake [3]
    • represents a single domain or project area where teams can collaborate on data [3]
  • [encryption] encrypted at rest by default using Microsoft-managed key [3]
    • the keys are rotated appropriately per compliance requirements [3]
    • data is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and it is FIPS 140-2 compliant [3]
    • {limitation} encryption at rest using customer-managed key is currently not supported [3]
  • {general guidance} write access
    • users must be part of a workspace role that grants write access [4] 
    • rule applies to all data items, so scope workspaces to a single team of data engineers [4] 
  • {general guidance}Lake access: 
    • users must be part of the Admin, Member, or Contributor workspace roles, or share the item with ReadAll access [4] 
  • {general guidance} general data access 
    • any user with Viewer permissions can access data through the warehouses, semantic models, or the SQL analytics endpoint for the Lakehouse [4] 
  • {general guidance} object level security:
    • give users access to a warehouse or lakehouse SQL analytics endpoint through the Viewer role and use SQL DENY statements to restrict access to certain tables [4]
  • {feature|preview} trusted workspace access
    • allows to securely access firewall-enabled Storage accounts by creating OneLake shortcuts to Storage accounts, and then use the shortcuts in the Fabric items [5]
    • based on [workspace identity]
    • {benefit} provides secure seamless access to firewall-enabled Storage accounts from OneLake shortcuts in Fabric workspaces, without the need to open the Storage account to public access [5]
    • {limitation} available for workspaces in Fabric capacities F64 or higher
  • {concept} workspace identity
    • a unique identity that can be associated with workspaces that are in Fabric capacities
    • enables OneLake shortcuts in Fabric to access Storage accounts that have [resource instance rules] configured
    • {operation} creating a workspace identity
      • Fabric creates a service principal in Microsoft Entra ID to represent the identity [5]
  • {concept} resource instance rules
    • a way to grant access to specific resources based on the workspace identity or managed identity [5] 
    • {operation} create resource instance rules 
      • created by deploying an ARM template with the resource instance rule details [5]
https://sql-troubles.blogspot.com/2024/03/microsoft-fabric-medallion-architecture.html
Acronyms:
ADLS - Azure Data Lake Storage
AES - Advanced Encryption Standard 
ARM - Azure Resource Manager
FIPS - Federal Information Processing Standard
SKU - Stock Keeping Units

References:
[1] Microsoft Learn (2023) Administer Microsoft Fabric (link)
[2] Microsoft Learn (2023) OneLake, the OneDrive for data (link)
[3] Microsoft Learn (2023) OneLake security (link)
[4] Microsoft Learn (2023) Get started securing your data in OneLake (link}
[5] Microsoft Fabric Updates Blog (2024) Introducing Trusted Workspace Access for OneLake Shortcuts, by Meenal Srivastva (link)

Resources:
[1] 


13 December 2018

🔭Data Science: Bayesian Networks (Just the Quotes)

"The best way to convey to the experimenter what the data tell him about theta is to show him a picture of the posterior distribution." (George E P Box & George C Tiao, "Bayesian Inference in Statistical Analysis", 1973)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain 'exact' posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don't see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science, 1995)

"Bayesian methods are complicated enough, that giving researchers user-friendly software could be like handing a loaded gun to a toddler; if the data is crap, you won't get anything out of it regardless of your political bent." (Brad Carlin, "Bayes offers a new way to make sense of numbers", Science, 1999)

"Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian inference is appealing when prior information is available since Bayes’ theorem is a natural way to combine prior information with data. Some people find Bayesian inference psychologically appealing because it allows us to make probability statements about parameters. […] In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In general, they need not agree." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The Bayesian approach is based on the following postulates: (B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. […] (B2) We can make probability statements about parameters, even though they are fixed constants. (B3) We make inferences about a parameter θ by producing a probability distribution for θ. Inferences, such as point estimates and interval estimates, may then be extracted from this distribution." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004) 

"Bayesian networks can be constructed by hand or learned from data. Learning both the topology of a Bayesian network and the parameters in the CPTs in the network is a difficult computational task. One of the things that makes learning the structure of a Bayesian network so difficult is that it is possible to define several different Bayesian networks as representations for the same full joint probability distribution." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"Bayesian networks provide a more flexible representation for encoding the conditional independence assumptions between the features in a domain. Ideally, the topology of a network should reflect the causal relationships between the entities in a domain. Properly constructed Bayesian networks are relatively powerful models that can capture the interactions between descriptive features in determining a prediction." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015) 

"Bayesian networks use a graph-based representation to encode the structural relationships - such as direct influence and conditional independence - between subsets of features in a domain. Consequently, a Bayesian network representation is generally more compact than a full joint distribution (because it can encode conditional independence relationships), yet it is not forced to assert a global conditional independence between all descriptive features. As such, Bayesian network models are an intermediary between full joint distributions and naive Bayes models and offer a useful compromise between model compactness and predictive accuracy." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Bayesian networks inhabit a world where all questions are reducible to probabilities, or (in the terminology of this chapter) degrees of association between variables; they could not ascend to the second or third rungs of the Ladder of Causation. Fortunately, they required only two slight twists to climb to the top." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The transparency of Bayesian networks distinguishes them from most other approaches to machine learning, which tend to produce inscrutable 'black boxes'. In a Bayesian network you can follow every step and understand how and why each piece of evidence changed the network’s beliefs." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"With Bayesian networks, we had taught machines to think in shades of gray, and this was an important step toward humanlike thinking. But we still couldn’t teach machines to understand causes and effects. [...] By design, in a Bayesian network, information flows in both directions, causal and diagnostic: smoke increases the likelihood of fire, and fire increases the likelihood of smoke. In fact, a Bayesian network can’t even tell what the 'causal direction' is." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

08 March 2018

🔬Data Science: Semantic Network [SN] (Definitions)

"We define a semantic network as 'the collection of all the relationships that concepts have to other concepts, to percepts, to procedures, and to motor mechanisms' of the knowledge." (John F Sowa, "Conceptual Structures", 1984)

"A graph for knowledge representation where concepts are represented as nodes in a graph and the binary semantic relations between the concepts are represented by named and directed edges between the nodes. All semantic networks have a declarative graphical representation that can be used either to represent knowledge or to support automated systems for reasoning about knowledge." (László Kovács et al, "Ontology-Based Semantic Models for Databases", 2009)

"A graph structure useful to represent the knowledge of a domain. It is composed of a set of objects, the graph nodes, which represent the concepts of the domain, and relations among such objects, the graph arches, which represent the domain knowledge. The semantic networks are also a reasoning tool as it is possible to find relations among the concepts of a semantic network that do not have a direct relation among them. To this aim, it is enough 'to follow the arrows' of the network arches that exit from the considered nodes and find in which node the paths meet." (Mario Ceresa, "Clinical and Biomolecular Ontologies for E-Health", Handbook of Research on Distributed Medical Informatics and E-Health, 2009)

"A form of visualization consisting of vertices (concepts) and directed or undirected edges (relationships)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A term used in computer language processing and in RF and OWL to refer to concepts linked by relationships. Memory maps are an informal example of a semantic network." (Kate Taylor, "A Common Sense Approach to Interoperability", 2011)

"nodes, encapsulating data and information, are connected by edges which include information about how these nodes are related to one another." (Simon Boese et al, "Semantic Document Networks to Support Concept Retrieval", 2014)

"A knowledge representation technique that represents the relationships among objects" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A knowledge base that represents semantic relations between concepts. Formally, the underlying representation model is a directed graph consisting of nodes, which represent concepts, and links, which represent semantic relations between concepts, mapping or connecting semantic fields." (Dmitry Korzun et al, "Semantic Methods for Data Mining in Smart Spaces", 2019)

"A knowledge base that represents semantic relations between concepts in a network. The model of knowledge representation is based on a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields." (Svetlana E Yalovitsyna et al, "Smart Museum: Semantic Approach to Generation and Presenting Information of Museum Collections", 2020)

16 December 2013

🎓Knowledge Management: Domains (Just the Quotes)

"Great discoveries which give a new direction to currents of thoughts and research are not, as a rule, gained by the accumulation of vast quantities of figures and statistics. These are apt to stifle and asphyxiate and they usually follow rather than precede discovery. The great discoveries are due to the eruption of genius into a closely related field, and the transfer of the precious knowledge there found to his own domain." (Theobald Smith, Boston Medical and Surgical Journal Volume 172, 1915)

"Learning is any change in a system that produces a more or less permanent change in its capacity for adapting to its environment. Understanding systems, especially systems capable of understanding problems in new task domains, are learning systems." (Herbert A Simon, "The Sciences of the Artificial", 1968)

"A cognitive system is a system whose organization defines a domain of interactions in which it can act with relevance to the maintenance of itself, and the process of cognition is the actual (inductive) acting or behaving in this domain. Living systems are cognitive systems, and living as a process is a process of cognition. This statement is valid for all organisms, with and without a nervous system." (Humberto R Maturana, "Biology of Cognition", 1970)

"No theory ever agrees with all the facts in its domain, yet it is not always the theory that is to blame. Facts are constituted by older ideologies, and a clash between facts and theories may be proof of progress. It is also a first step in our attempt to find the principles implicit in familiar observational notions." (Paul K Feyerabend, "Against Method: Outline of an Anarchistic Theory of Knowledge", 1975)

"A cognitive map is a specific way of representing a person's assertions about some limited domain, such as a policy problem. It is designed to capture the structure of the person's causal assertions and to generate the consequences that follow front this structure. […]  a person might use his cognitive map to derive explanations of the past, make predictions for the future, and choose policies in the present." (Robert M Axelrod, "Structure of Decision: The cognitive maps of political elites", 1976)

"The thinking person goes over the same ground many times. He looks at it from varying points of view - his own, his arch-enemy’s, others’. He diagrams it, verbalizes it, formulates equations, constructs visual images of the whole problem, or of troublesome parts, or of what is clearly known. But he does not keep a detailed record of all this mental work, indeed could not. […] Deep understanding of a domain of knowledge requires knowing it in various ways. This multiplicity of perspectives grows slowly through hard work and sets the state for the re-cognition we experience as a new insight." (Howard E Gruber, "Darwin on Man", 1981)

"Metaphor [is] a pervasive mode of understanding by which we project patterns from one domain of experience in order to structure another domain of a different kind. So conceived metaphor is not merely a linguistic mode of expression; rather, it is one of the chief cognitive structures by which we are able to have coherent, ordered experiences that we can reason about and make sense of. Through metaphor, we make use of patterns that obtain in our physical experience to organise our more abstract understanding." (Mark Johnson, "The Body in the Mind", 1987)

"There is no coherent knowledge, i.e. no uniform comprehensive account of the world and the events in it. There is no comprehensive truth that goes beyond an enumeration of details, but there are many pieces of information, obtained in different ways from different sources and collected for the benefit of the curious. The best way of presenting such knowledge is the list - and the oldest scientific works were indeed lists of facts, parts, coincidences, problems in several specialized domains." (Paul K Feyerabend, "Farewell to Reason", 1987)

"[…] a mental model is a mapping from a domain into a mental representation which contains the main characteristics of the domain; a model can be ‘run’ to generate explanations and expectations with respect to potential states. Mental models have been proposed in particular as the kind of knowledge structures that people use to understand a specific domain […]" (Helmut Jungermann, Holger Schütz & Manfred Thuering, "Mental models in risk assessment: Informing people about drugs", Risk Analysis 8 (1), 1988)

"Algorithmic complexity theory and nonlinear dynamics together establish the fact that determinism reigns only over a quite finite domain; outside this small haven of order lies a largely uncharted, vast wasteland of chaos." (Joseph Ford, "Progress in Chaotic Dynamics: Essays in Honor of Joseph Ford's 60th Birthday", 1988)

"When partitioning a domain, we divide the information model so that the clusters remain intact. [...] Each section of the information model then becomes a separate subsystem. Note that when the information model is partitioned into subsystems, each object is assigned to exactly one subsystem."  (Stephen J Mellor, "Object-Oriented Systems Analysis: Modeling the World In Data", 1988) 

"While a small domain (consisting of fifty or fewer objects) can generally be analyzed as a unit, large domains must be partitioned to make the analysis a manageable task. To make such a partitioning, we take advantage of the fact that objects on an information model tend to fall into clusters: groups of objects that are interconnected with one another by many relationships. By contrast, relatively few relationships connect objects in different clusters." (Stephen J Mellor, "Object-Oriented Systems Analysis: Modeling the World In Data", 1988) 

"A law explains a set of observations; a theory explains a set of laws. […] a law applies to observed phenomena in one domain (e.g., planetary bodies and their movements), while a theory is intended to unify phenomena in many domains. […] Unlike laws, theories often postulate unobservable objects as part of their explanatory mechanism." (John L Casti, "Searching for Certainty: How Scientists Predict the Future", 1990)

"Generally speaking, problem knowledge for solving a given problem may consist of heuristic rules or formulas that comprise the explicit knowledge, and past-experience data that comprise the implicit, hidden knowledge. Knowledge represents links between the domain space and the solution space, the space of the independent variables and the space of the dependent variables." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Inference is the process of matching current facts from the domain space to the existing knowledge and inferring new facts. An inference process is a chain of matchings. The intermediate results obtained during the inference process are matched against the existing knowledge. The length of the chain is different. It depends on the knowledge base and on the inference method applied." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"An individual understands a concept, skill, theory, or domain of knowledge to the extent that he or she can apply it appropriately in a new situation." (Howard Gardner, "The Disciplined Mind", 1999)

"Knowledge maps are node-link representations in which ideas are located in nodes and connected to other related ideas through a series of labeled links. They differ from other similar representations such as mind maps, concept maps, and graphic organizers in the deliberate use of a common set of labeled links that connect ideas. Some links are domain specific (e.g., function is very useful for some topic domains...) whereas other links (e.g., part) are more broadly used. Links have arrowheads to indicate the direction of the relationship between ideas." (Angela M. O’Donnell et al, "Knowledge Maps as Scaffolds for Cognitive Processing", Educational Psychology Review Vol. 14 (1), 2002) 

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"A domain model is not a particular diagram; it is the idea that the diagram is intended to convey. It is not just the knowledge in a domain expert’s head; it is a rigorously organized and selective abstraction of that knowledge." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"Domain experts are usually not aware of how complex their mental processes are as, in the course of their work, they navigate all these rules, reconcile contradictions, and fill in gaps with common sense. Software can’t do this. It is through knowledge crunching in close collaboration with software experts that the rules are clarified, fleshed out, reconciled, or placed out of scope." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"Effective domain modelers are knowledge crunchers. They take a torrent of information and probe for the relevant trickle. They try one organizing idea after another, searching for the simple view that makes sense of the mass. Many models are tried and rejected or transformed. Success comes in an emerging set of abstract concepts that makes sense of all the detail. This distillation is a rigorous expression of the particular knowledge that has been found most relevant." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"Perception and memory are imprecise filters of information, and the way in which information is presented, that is, the frame, influences how it is received. Because too much information is difficult to deal with, people have developed shortcuts or heuristics in order to come up with reasonable decisions. Unfortunately, sometimes these heuristics lead to bias, especially when used outside their natural domains." (Lucy F Ackert & Richard Deaves, "Behavioral Finance: Psychology, Decision-Making, and Markets", 2010)

"This is always the case in analogical reasoning: Relations between two dissimilar domains never map completely to one another. In fact, it is often the salient similarities between the base and target domains that provoke thought and increase the usefulness of an analogy as a problem-solving tool." (Robbie T Nakatsu, "Diagrammatic Reasoning in AI", 2010)

"Conceptual models are best thought of as design-tools - a way for designers to straighten out and simplify the design and match it to the users’ task-domain, thereby making it clearer to users how they should think about the application. The designers’ responsibility is to devise a conceptual model that seems natural to users based on the users’ familiarity with the task domain. If designers do their job well, the conceptual model will be the basis for users’ mental models of the application." (Jeff Johnson & Austin Henderson, "Conceptual Models", 2011)

"A model or conceptual model is a schematic or representation that describes how something works. We create and adapt models all the time without realizing it. Over time, as you gain more information about a problem domain, your model will improve to better match reality." (James Padolsey, "Clean Code in JavaScript", 2020)

"Knowledge graphs use an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle gives us an additional layer of organizing data (metadata) that adds connected context to support reasoning and knowledge discovery. […] Importantly, some processing can be done without knowledge of the domain, just by leveraging the features of the property graph model (the organizing principle)." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

09 July 2013

🎓Knowledge Management: Mental Model (Definitions)

"A mental model is a cognitive construct that describes a person's understanding of a particular content domain in the world." (John Sown, "Conceptual Structures: Information Processing in Mind and Machine", 1984)

"A mental model is a data structure, in a computational system, that represents a part of the real world or of a fictitious world." (Alan Granham, "Mental Models as Representations of Discourse and Text", 1987)

"[…] a mental model is a mapping from a domain into a mental representation which contains the main characteristics of the domain; a model can be ‘run’ to generate explanations and expectations with respect to potential states. Mental models have been proposed in particular as the kind of knowledge structures that people use to understand a specific domain […]" (Helmut Jungermann, Holger Schütz & Manfred Thuering, "Mental models in risk assessment: Informing people about drugs", Risk Analysis 8 (1), 1988)

 "A mental model is a knowledge structure that incorporates both declarative knowledge (e.g., device models) and procedural knowledge (e.g., procedures for determining distributions of voltages within a circuit), and a control structure that determines how the procedural and declarative knowledge are used in solving problems (e.g., mentally simulating the behavior of a circuit)." (Barbara Y White & John R Frederiksen, "Causal Model Progressions as a Foundation for Intelligent Learning Environments", Artificial Intelligence 42, 1990)

"’Mental models’ are deeply ingrained assumptions, generalizations, or even pictures or images that influence how we understand the world and how we take action. [...] Mental models are deeply held internal images of how the world works, images that limit us to familiar ways of thinking and acting." (Peter Senge, "The Fifth Discipline”, 1990)

"[A mental model] is a relatively enduring and accessible, but limited, internal conceptual representation of an external system (historical, existing, or projected) [italics in original] whose structure is analogous to the perceived structure of that system." (James K Doyle & David N Ford, "Mental models concepts revisited: Some clarifications and a reply to Lane", System Dynamics Review 15 (4), 1999)

"In broad terms, a mental model is to be understood as a dynamic symbolic representation of external objects or events on the part of some natural or artificial cognitive system. Mental models are thought to have certain properties which make them stand out against other forms of symbolic representations." (Gert Rickheit & Lorenz Sichelschmidt, "Mental Models: Some Answers, Some Questions, Some Suggestions", 1999)

"A mental model is conceived […] as a knowledge structure possessing slots that can be filled not only with empirically gained information but also with ‘default assumptions’ resulting from prior experience. These default assumptions can be substituted by updated information so that inferences based on the model can be corrected without abandoning the model as a whole. Information is assimilated to the slots of a mental model in the form of ‘frames’ which are understood here as ‘chunks’ of knowledge with a well-defined meaning anchored in a given body of shared knowledge." (Jürgen Renn, “Before the Riemann Tensor: The Emergence of Einstein’s Double Strategy", 2005)

"A mental model is a mental representation that captures what is common to all the different ways in which the premises can be interpreted. It represents in 'small scale' how 'reality' could be - according to what is stated in the premises of a reasoning problem. Mental models, though, must not be confused with images." (Carsten Held et al, "Mental Models and the Mind", 2006)

"’Mental models’ are deeply ingrained assumptions, generalizations, or even pictures or images that influence how we understand the world and how we take action." (Jossey-Bass Publishers, "The Jossey-Bass Reader on Educational Leadership”, 2nd Ed. 2007)

"A mental model is an internal representation with analogical relations to its referential object, so that local and temporal aspects of the object are preserved." (Gert Rickheit et al, "The concept of communicative competence" [in "Handbook of Communication Competence"], 2008)

"Internal representations constructed on the spot when required by demands of an external task or by a self-generated stimulus. It enables activation of relevant schemata, and allows new knowledge to be integrated. It specifies causal actions among concepts that take place within it, and it can be interacted with in the mind." (Daniel Churchill, "Mental Models" [in "Encyclopedia of Information Technology Curriculum Integration"] , 2008)

"Mental models are representations of reality built in people’s minds. These models are based on arrangements of assumptions, judgments, and values. A main weakness of mental models is that people’s assumptions and judgments change over time and are applied in inconsistent ways when building explanations of the world." (Luis F Luna-Reyes, "System Dynamics to Understand Public Information Technology", 2008)

"A mental model is the collection of concepts and relationships about the image of real world things we carry in our heads" (Hassan Qudrat-Ullah, "System Dynamics Based Technology for Decision Support", 2009)

"A mental recreation of the states of the world reproduced cognitively in order to offer itself as a basis for reasoning." (Eshaa M Alkhalifa, "Open Student Models", 2009)

[Shared Mental Model:] "A mental model that is shared among team members, and may include: 1) task-specific knowledge, 2) task-related knowledge, 3) knowledge of teammates and 4) attitudes/beliefs." (Rosemarie Reynolds et al, "Measuring Shared Mental Models in Unmanned Aircraft Systems", 2015) 

"A network of knowledge content, as well as the relationships among the content."(Rosemarie Reynolds et al, "Measuring Shared Mental Models in Unmanned Aircraft Systems", 2015)

"A mental model (aka mental representation/image/picture) is a mental structure that attempts to model (depict, imagine) how real or imaginary things look like, work or fit together." (The Web of Knowledge) [source]

Resources:
Quotes on "Mental Models" at the-web-of-knowledge.blogspot.com.

06 June 2013

🎓Knowledge Management: Ontology (Definitions)

"A data model that represents the entities that are defined and evaluated by its own attributes, and organized according to a hierarchy and a semantic. Ontologies are used for representing knowledge on the whole of a specific domain or on of it." (Gervásio Iwens et al, "Programming Body Sensor Networks", 2008)

"An ontology specifies a conceptualization, that is, a structure of related concepts for a given domain." (Troels Andreasen & Henrik Bulskov, "Query Expansion by Taxonomy", 2008)

"A semantic structure useful to standardize and provide rigorous definitions of the terminology used in a domain and to describe the knowledge of the domain. It is composed of a controlled vocabulary, which describes the concepts of the considered domain, and a semantic network, which describes the relations among such concepts. Each concept is connected to other concepts of the domain through semantic relations that specify the knowledge of the domain. A general concept can be described by several terms that can be synonyms or characteristic of different domains in which the concept exists. For this reason the ontologies tend to have a hierarchical structure, with generic concepts/terms at the higher levels of the hierarchy and specific concepts/terms at the lover levels, connected by different types of relations." (Mario Ceresa, "Clinical and Biomolecular Ontologies for E-Health", Handbook of Research on Distributed Medical Informatics and E-Health, 2009)

"In the context of knowledge sharing, the chapter uses the term ontology to mean a specification of conceptual relations. An ontology is the concepts and relationships that can exist for an agent or a community of agents. The chapter refers to designing ontologies for the purpose of enabling knowledge sharing and re-use." (Ivan Launders, "Socio-Technical Systems and Knowledge Representation", 2009)

 "The systematic description of a given phenomenon, which often includes a controlled vocabulary and relationships, captures nuances in meaning and enables knowledge sharing and reuse. Typically, ontology defines data entities, data attributes, relations and possible functions and operations." (Mark Olive, "SHARE: A European Healthgrid Roadmap", 2009)

"Those things that exist are those things that have a formal representation within the context of a machine. Knowledge commits to an ontology if it adheres to the structure, vocabulary and semantics intrinsic to a particular ontology i.e. it conforms to the ontology definition. A formal ontology in computer science is a logical theory that represents a conceptualization of real world concepts." (Philip D. Smart, "Semantic Web Rule Languages for Geospatial Ontologies", 2009)

"A formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain." (Yong Yu et al, "Social Tagging: Properties and Applications", 2010)

"Is set of well-defined concepts describing a specific domain." (Hak-Lae Kim et al, "Representing and Sharing Tagging Data Using the Social Semantic Cloud of Tags", 2010)

"An ontology is a 'formal, explicit specification of a shared conceptualisation'. It is composed of concepts and relations structured into hierarchies (i.e. they are linked together by using the Specialisation/Generalisation relationship). A heavyweight ontology is a lightweight ontology (i.e. an ontology simply based on a hierarchy of concepts and a hierarchy of relations) enriched with axioms used to fix the semantic interpretation of concepts and relations." (Francky Trichet et al, "OSIRIS: Ontology-Based System for Semantic Information Retrieval and Indexation Dedicated to Community and Open Web Spaces", 2011)

"The set of the things that can be dealt with in a particular domain, together with their relationships." (Steven Woods et al, "Knowledge Dissemination in Portals", 2011) 

"In semantic web and related technologies, an ontology (aka domain ontology) is a set of taxonomies together with typed relationships connecting concepts from the taxonomies and, possibly, sets of integrity rules and constraints defining classes and relationships." (Marcus Spies & Said Tabet, "Emerging Standards and Protocols for Governance, Risk, and Compliance Management", 2012)

"High-level knowledge and data representation structure. Ontologies provide a formal frame to represent the knowledge related with a complex domain, as a qualitative model of the system. Ontologies can be used to represent the structure of a domain by means of defining concepts and properties that relate them." (Lenka Lhotska et al, "Interoperability of Medical Devices and Information Systems", 2013)

"(a) In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts. (b) In philosophy, ontology is the study of the nature of being, becoming, existence , or reality , as well as the basic categories of being and their relations. Traditionally listed as a part of the major branch of philosophy known as metaphysics, ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences." (Ronald J Lofaro, "Knowledge Engineering Methodology with Examples", 2015)

"It is a shared structure which classify and organizes all the entities of a given domain." (T R Gopalakrishnan Nair, "Intelligent Knowledge Systems", 2015)

"The study of how things relate. Used in big data to analyze seemingly unrelated data to discover insights." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"An ontology is a formal, explicit specification of a shared conceptualization." (Fu Zhang et al, "A Review of Answering Queries over Ontologies Based on Databases", 2016)

18 January 2010

🗄️Data Management: Data Quality Dimensions (Part VI: Referential Integrity)

  
Data Management
Data Management Series

Referential integrity, when considered as data quality dimension, refers to the degree to which the values of a key in one table (aka reference value) match the values of a key in a related table (aka the referenced value). Typically, that's assured by design in Database Management Systems (DBMS) using a feature called referential integrity that defines an explicit relationship between the two tables that makes sure that the values remain valid during database changes. Thus, when a record is inserted or updated and a value is provided for the reference value, the system makes sure that the referenced value is valid, otherwise it throws a referential integrity error. A similar error is thrown when one attempts to delete the record with the referenced value as long is referenced by a table on which the relationship was explicitly defined.

Using referential integrity is a recommended technique for assuring the overall integrity of the data in a database, though there are also exceptions when that's not enforced for all tables (e.g. data warehouses) or only for exceptions (e.g. interface tables where records are imported as they are, attribute whose values references data from multiple tables). Therefore, even if there are tables with the referential integrity enforced, don't make the assumption that it applies to all tables!

In relational DBMS there are three types of integrity mentioned – entity, referential and domain integrity. Entity integrity demands that all the tables must have a primary key that contains no Null values. The referential integrity demands that each non-null value of a foreign key must match the value of a primary key [1], while the domain integrity demands that the type of an attribute should be restricted to a certain data type, the format should be restricted by using constraints, rules or range of possible values [2]. 

Even if not mandatory, all three types of integrity are quintessential for reliable relational databases. When the referential integrity is not enforced at database level or at least in code, when a record from a table is deleted and a foreign key it’s still pointing to it, fact that could lead to unexpected disappearance of records from the system’s UI even if the records are still available. 

During conversions or data migrations is important to assure that the various sets loaded match the referential and domain integrity of the database in which the data will be loaded, otherwise the records not respecting the mentioned type of integrity will be rejected. The rejection itself might not be a problem for several records, though when it happens at large scale, then the situations changes dramatically, especially when the system gives no adequate messages for the cause or rejection. A recommended approach is to assure that the scope is synchronized between the various data elements, and that the referential integrity of datasets is validated before the data are loaded in the destination database.

There are several sources (e.g. [3]) that consider Codd’s referential integrity constraint as a type of consistency, in the support of this idea could be mentioned the fact that referential integrity could be used to solve data consistency issues by bringing the various LOV in the systems. Referential integrity is mainly an architectural concept even if it involves the 'consistency' of foreign key/primary key pairs.

Note:
Expect the unforeseeable! It’s always a good idea to check whether the referential integrity is kept by a system – there are so many things that could go wrong! In data migration solutions, data warehouses and more general analytical solutions is a good idea to have in place mechanisms that check for this kind of issues.


Written: Jan-2010, Last Reviewed: Mar-2024

References:
[1] Halpin. T. (2001) Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann Publishers. ISBN 1-55860-672-6.
[2] MSDN. 2009. Data Integrity. [Online] Available from: http://msdn.microsoft.com/en-us/library/ms184276.aspx (Accessed: 18 January 2009)
[3] Lee Y.W., Pipino L.L., Funk J.D., Wang R.Y. (2006) "Journey to Data Quality", MIT Press. ISBN: 0-262-12287-1

16 September 2008

W3: Cyberspace (Definitions)

"A term used to describe the nonphysical, virtual world of computers." (Andy Walker, "Absolute Beginner’s Guide To: Security, Spam, Spyware & Viruses", 2005)

"A metaphoric abstraction for a virtual reality existing inside computers and on computer networks." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The online world of computer networks where people can interact with others without physically being with them. People commonly interact with cyberspace via the Internet." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"The interdependent network of information technology infrastructures, which includes the Internet, telecommunications networks, computer systems, and embedded processors and controllers." (Olivera Injac & Ramo Šendelj, "National Security Policy and Strategy and Cyber Security Risks", 2016)

"A complex hyper-dimensional space involving the state of many mutually dependent computer and network systems with complex and often surprising properties as compared to physical space." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"Artifacts based on or dependent on computer and communications technology; the information that these artifacts use, store, handle, or process; and the interconnections among these various elements." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"Refers to a physical and non-physical terrain created by and/or composed of some or all of the following: computers, computer systems, networks, and their computer programs, computer data, content data, traffic data, and users." (Thokozani I Nzimakwe, "Government's Dynamic Approach to Addressing Challenges of Cybersecurity in South Africa", 2018)

"Cyberspace, is supposedly 'virtual' world/network created by links between computers, Internet-enabled devices, servers, routers, and other components of the Internet’s infrastructure." (Sanjeev Rao et al, "Online Social Networks Misuse, Cyber Crimes, and Counter Mechanisms", 2021)

16 November 2007

🏗️Software Engineering: Domains (Just the Quotes)

"[Object-oriented analysis is] the challenge of understanding the problem domain and then the system's responsibilities in that light." (Edward Yourdon, "Object-Oriented Design", 1991) 

"To us, analysis is the study of a problem domain, leading to a specification of externally observable behavior; a complete, consistent, and feasible statement of what is needed; a coverage of both functional and quantified operational characteristics (e. g. reliability, availability, performance)." (Edward Yourdon, Object-oriented design, 1991)

"As the size of software systems increases, the algorithms and data structures of the computation no longer constitute the major design problems. When systems are constructed from many components, the organization of the overall system - the software architecture - presents a new set of design problems. This level of design has been addressed in a number of ways including informal diagrams and descriptive terms, module interconnection languages, templates and frameworks for systems that serve the needs of specific domains, and formal models of component integration mechanisms." (David Garlan & Mary Shaw, "An introduction to software architecture", Advances in software engineering and knowledge engineering Vol 1, 1993)

"Design patterns are not about designs such as linked lists and hash tables that can be encoded in classes and reused as is. Nor are they complex, domain-specific designs for an entire application or subsystem. The design patterns [...] are descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context." (Erich Gamma et al, "Design Patterns: Elements of Reusable Object-Oriented Software", 1994)

"Domain-driven design is both a way of thinking and a set of priorities, aimed at accelerating software projects that have to deal with complicated domains." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"If the architecture isolates the domain-related code in a way that allows a cohesive domain design loosely coupled to the rest of the system, then that architecture can probably support domain-driven DESIGN." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"If the design, or some central part of it, does not map to the domain model, that model is of little value, and the correctness of the software is suspect. At the same time, complex mappings between models and design functions are difficult to understand and, in practice, impossible to maintain as the design changes. A deadly divide opens between analysis and design so that insight gained in each of those activities does not feed into the other." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"The technical model that drives the software development process must be strictly pared down to the necessary minimum to fulfill its functions. An explanatory model can include aspects of the domain that provide context that clarifies the more narrowly scoped model. Explanatory models offer the freedom to create much more communicative styles tailored to a particular topic. Visual metaphors used by the domain experts in a field often present clearer explanations, educating developers and harmonizing experts. Explanatory models also present the domain in a way that is simply different, and multiple, diverse explanations help people learn." (Eric Evans, "Domain-Driven Design: Tackling complexity in the heart of software", 2003)

"Every system is built from a domain-specific language designed by the programmers to describe that system. Functions are the verbs of that language, and classes are the nouns."  (Robert C Martin, "Clean Code: A Handbook of Agile Software Craftsmanship", 2008)

"Enterprise architecture [is] a coherent whole of principles, methods, and models that are used in the design and realisation of an enterprise's organisational structure, business processes, information systems, and infrastructure. […] The most important characteristic of an enterprise architecture is that it provides a holistic view of the enterprise. […] To achieve this quality in enterprise architecture, bringing together information from formerly unrelated domains necessitates an approach that is understood by all those involved from those different domains." (Marc Lankhorst, "Enterprise Architecture at Work: Modelling, Communication and Analysis", 2009)

"Making domain concepts explicit in your code means other programmers can gather the intent of the code much more easily than by trying to retrofit an algorithm into what they understand about a domain. It also means that when the domain model evolves - which it will, as your understanding of the domain grows - you are in a good position to evolve the code. Coupled with good encapsulation, the chances are good that the rule will exist in only one place, and that you can change it without any of the dependent code being any the wiser." (Dan North [in Kevlin Henney’s "97 Things Every Programmer Should Know", 2010]) 

"Trying to determine the cognitive load of software using simple measures such as lines of code, number of modules, classes, or methods is misguided. […] When measuring cognitive load, what we really care about is the domain complexity - how complex is the problem that we’re trying to solve with software? A domain is a more largely applicable concept than software size." (Matthew Skelton, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

"Knowledge graphs use an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle gives us an additional layer of organizing data (metadata) that adds connected context to support reasoning and knowledge discovery. […] Importantly, some processing can be done without knowledge of the domain, just by leveraging the features of the property graph model (the organizing principle)." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.