SQL Troubles: networks

Showing posts with label networks. Show all posts

25 March 2025

🏭🗒️Microsoft Fabric: Security [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 25-May-2025

Microsoft Fabric Security [2]

[Microsoft Fabric] Security

{def} a comprehensive security framework designed for the Microsoft Fabric platform [1]

{goal} always on

every interaction with Fabric is encrypted by default and authenticated using Microsoft Entra ID [1]

all communication between Fabric experiences travels through the Microsoft backbone internet [1]
data at rest is automatically stored encrypted [1]
support for extra security features [1]

⇐ allow to regulate access to Fabric [1]
Private Links

enable secure connectivity to Fabric by

restricting access to the Fabric tenant from an Azure VPN
blocking all public access

ensures that only network traffic from that VNet is allowed to access Fabric features [1]
used to provide secure access for data traffic in Fabric including specific workspaces [4]
Azure Private Link and Azure Networking private endpoints sends traffic privately using Microsoft’s backbone network instead of using the public internet [4]

Entra Conditional Access

the connection to data is protected by a firewall or a private network using trusted access [1]

access firewall enabled ADL Gen2 accounts securely [1]

can be limited to specific workspaces [1]

workspaces that have a workspace identity can securely access ADL Gen 2 accounts with public network access enabled, from selected virtual networks and IP addresses [1]

workspace identities can only be created in workspaces associated with a Fabric F SKU capacity [1]

helps users connect to services quickly and easily from any device and any network [1]

each request to connect to Fabric is authenticated with Microsoft Entra ID [1]

allows users to safely connect to Fabric from their corporate office, when working at home, or from a remote location [1]

{feature} Conditional Access

allows to secure access to Fabric on every connection by

defining a list of IPs for inbound connectivity to Fabric [1]
using MFA [1]
restricting traffic based on parameters such as country of origin or device type [1]

conditional access policies

implemented through Microsoft Entra
restrict access based on user, group, network location, application, device, and risk detection [4]

{goal} compliant

data sovereignty provided out-of-box with multi geo capacities [1]
support for a wide range of compliance standards [1]
Fabric services follow the SDL)

a set of strict security practices that support security assurance and compliance requirements [2]
helps developers build more secure software by reducing the number and severity of vulnerabilities in software, while reducing development cost [2]

{goal} governable

leverages a set of governance tools

data lineage
information protection labels
data loss prevention
Purview integration

configurable

in accordance with organizational policies [1]

evolving

new features and controls are added regularly [1]

{feature} managed private endpoints

allow secure connections to data sources without exposing them to the public network or requiring complex network configurations [1]

e.g. as Azure SQL databases

secure and private access to data sources from certain Fabric workloads [4]

{feature} managed virtual networks

virtual networks that are created and managed by Microsoft Fabric for each Fabric workspace [1]
provide network isolation for Fabric Spark workloads

the compute clusters are deployed in a dedicated network and are no longer part of the shared virtual network [1]

enable network security features

managed private endpoints
private link support

{feature} data gateway

allows to connect to on-premises data sources or a data source that might be protected by a firewall or a virtual network
{option} On-premises data gateway

acts as a bridge between on-premises data sources and Fabric 1[]
installed on a server within the network [1]
allows Fabric to connect to data sources through a secure channel without the need to open ports or make changes to the network [1]

{option} Virtual network (VNet) data gateway

allows to connect from Microsoft Cloud services to Azure data services within a VNet, without the need of an on-premises data gateway [1]

{feature} Azure service tags

allows to ingest data from data sources deployed in an Azure virtual network without the use of data gateways [1]

e.g. VMs, Azure SQL MI and REST APIs

can be used to get traffic from a virtual network or an Azure firewall

e.g. outbound traffic to Fabric so that a user on a VM can connect to Fabric SQL connection strings from SSMS, while blocked from accessing other public internet resources [1]

minimize the complexity of updating network security rules using Azure service tags to group and manage IP addresses for a service [4]

{feature} IP allow-lists

allows to enable an IP allow-list on organization's network to allow traffic to and from Fabric
useful for data sources that don't support service tags [1]

e.g. on-premises data sources

{feature} Telemetry

used to maintain performance and reliability of the Fabric platform [2]
the telemetry store is designed to be compliant with data and privacy regulations for customers in all regions where Fabric is available [2]

{feature} trusted workspace access

allows to ccess firewall-enabled ADLS Gen2 accounts in a secure manner from Fabric [4]

{process} authentication

relies on Microsoft Entra ID to authenticate users (or service principals) [2]
when authenticated, users receive access tokens from Microsoft Entra ID [2]

used to perform operations in the context of the user [2]

{feature} conditional access

ensures that tenants are secure by enforcing multifactor authentication [2]

allows only Microsoft Intune enrolled devices to access specific services [1]

restricts user locations and IP ranges.

{process} authorization

all Fabric permissions are stored centrally by the metadata platform

Fabric services query the metadata platform on demand to retrieve authorization information and to authorize and validate user requests [2]

authorization information is sometimes encapsulated into signed tokens [2]

only issued by the back-end capacity platform [1]
include the access token, authorization information, and other metadata [1]

{concept} tenant metadata

information about the tenant
is stored in a metadata platform cluster to which the tenant is assigned

located in a single region that meets the data residency requirements of that region's geography [2]
include customer data
customers can control where their workspaces are located

in the same geography as the metadata platform cluster

by explicitly assigning workspaces on capacities in that region [2]
by implicitly using Fabric Trial, Power BI Pro, or Power BI Premium Per User license mode [2]

all customer data is stored and processed in this single geography [2]

in Multi-Geo capacities located in geographies (geos) other than their home region [2]

compute and storage is located in the multi-geo region [2]

(including OneLake and experience-specific storage [2]

{exception} the tenant metadata remains in the home region
customer data will only be stored and processed in these two geographies [2]

{concept} data-at-rest

all Fabric data stores are encrypted at rest [2]

by using Microsoft-managed keys
includes customer data as well as system data and metadata [2]
⇒ data is never persisted to permanent storage while in an unencrypted state [1]

data can be processed in memory in an unencrypted state [2]

{default} encrypted using platform managed keys (PMK)

Microsoft is responsible for all aspects of key management [2]
data-at-rest on OneLake is encrypted using its keys [3]
{alternative} Customer-managed keys (CMK)

allow to encrypt data at-rest using customer keys [3]

⇒ customer assumes full control of the key [3]

{recommendation} use cloud storage services with CMK encryption enabled and access data from Fabric using OneLake shortcuts [3]

data continues to reside on a cloud storage service or an external storage solution where encryption at rest using CMK is enabled [3]
customers can perform in-place read operations from Fabric whilst staying compliant [3]
shortcuts can be accessed by other Fabric experiences [3]

{concept} data-in-transit

refers to traffic between Microsoft services routed over the Microsoft global network [2]
inbound communication

always encrypted with at least TLS 1.2. Fabric negotiates to TLS 1.3 whenever possible [2]
inbound protection

concerned with how users sign in and have access to Fabric [3]

outbound communication to customer-owned infrastructure

adheres to secure protocols [2]

{exception} might fall back to older, insecure protocols when newer protocols aren't supported [2]

incl. TLS 1

outbound protection

concerned with securely accessing data behind firewalls or private endpoints [3]

OneLake security

allows defining access permissions once [4]

⇐ Fabric enforces the permissions consistently across all engines
security propagates automatically
data owners can

create security roles
refine permissions
control access at the row and column levels to securely share data [4]

workspace security

managed by assigning users to workspace roles [4]

item security

grant access to an individual Fabric item without granting access to the entire workspace [4]

data encryption

encrypts data and metadata at-rest with Microsoft-managed keys [4]
encrypts data in-transit with at least TLS 1.2 and TLS 1.3 when possible [4]

customer lockbox

allows to control how Microsoft engineers access data [4]

[warehouse] dynamic data masking

prevents unauthorized viewing of sensitive data by specifying how much sensitive data to reveal, with minimal effect on the application layer [4]

[warehouse] granular permissions

standard SQL allows more granular control

[Purview] sensitivity labels

same labels used in Microsoft 365 apps

[Purview] Information Protection policies

allows to automatically enforce access permissions to sensitive information [4]

[Purview] Data Loss Prevention

allows to automatically identify the upload of sensitive information to Fabric and trigger automatic risk remediation actions [4]

[Purview] Data Security Posture Management

allows to discover data risks with Copilot in Fabric and immediately take action [4]

e.g. sensitive data in user prompts and responses

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) Security in Microsoft Fabric [link]

[2] Microsoft Learn (2024) Microsoft Fabric security fundamentals [link]

[3] Microsoft Learn (2024) Microsoft Fabric end-to-end security scenario [link]

[4] (2025) Connect to your most sensitive data with end-to-end network security in Fabric [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric security [link]

[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ADL - Azure Data Lake
API - Application Programming Interface
CMK - Customer-Managed Keys

MF - Microsoft Fabric
MFA - Multifactor Authentication
MI - Managed Instance
PMK - Platform-Managed Keys

REST - REpresentational State Transfer
SDL - Security Development Lifecycle

SKU - Stock Keeping Unit
TLS - Transport Layer Security

VM - Virtual Machine
VNet - virtual network

VPN - Virtual Private Network

07 November 2020

⛁DBMS: Event Streaming Databases (More of a Kafka’s Story)

Event streaming architectures are architectures in which data are generated by different sources, and then processed, stored, analyzed, and acted upon in real-time by the different applications tapped into the data streams. An event streaming database is then a database that assures that its data are continuously up-to-date, providing specific functionality like management of connectors, materialized views and running queries on data-in-motion (rather than on static data).

Reading about this type of technologies one can easily start fantasizing about the Web as a database in which intelligent agents can process streams of data in real-time, in which knowledge is derived and propagated over the networks in an infinitely and ever-growing flow in which the limits are hardly perceptible, in which the agents act as a mind disconnected in the end from the human intent. One is stroke by the fusing elements of realism and the fantastic aspects, more like in a Kafka’s story in which the metamorphosis of the technologies and social aspects can easily lead to absurd implications.

The link to Kafka was somehow suggested by Apache Kafka, an open-source distributed event streaming platform, which seems to lead the trends within this new-developing market. Kafka provides database functionality and guarantees the ACID (atomicity, concurrency, isolation, durability) properties of transactions while tapping into data streams.

Data streaming is an appealing concept though it has some important challenges like data overload or over-flooding, the complexity derived from building specific (business) and integrity rules for processing the data, respectively for keeping data consistency and truth within the ever-growing and ever-changing flows.

Data overload or over-flooding occurs when applications are not able to keep the pace with the volume of data or events fired with each change. Imagine the raindrops falling on a wide surface in which each millimeter or micrometer has its own rules for handling the raindrops and this at large scale. The raindrops must infiltrate into the surface to be processed and find their way to the beneath water flows, aggregating up to streams that could nurture seas or even oceans. Same metaphor can be applied to the data events, in which the data pervade applications accumulating in streams processed by engines to derive value. However heavy rains can easily lead to floods in which water aggregates at the surface.

Business applications rely on predefined paths in which the flow of data is tidily linked to specific actions found themselves in processual sequences that reflect the material or cash flow. Any variation in the data flow from expectations will lead to inefficiencies and ultimately to chaos. Some benefit might be derived from data integrations between the business applications, however applications must be designed for this purpose and handle extreme behaviors like over-flooding.

Data streams are maybe ideal for social media networks in which one broadcasts data through the networks and any consumer that can tap to the network can further use the respective data. We can see however the problems of nowadays social media – data, better said information, flow through the networks being changed as fit for purposes that can easily diverge from the initial intent. Moreover, information gets denatured, misused, overused to the degree that it overloads the networks, being more and more difficult to distinguish between reliable and non-reliable information. If common sense helps in the process of handling such information, not the same can be said about machines or applications.

It will be challenging to deal with the vastness, vagueness, uncertainty, inconsistency, and deceit of the networks of the future, however data streaming more likely will have a future as long it can address such issues by design.

27 August 2019

🛡️Information Security: Distributed Denial of Service [DDoS] (Definitions)

"An electronic attack perpetrated by a person who controls legions of hijacked computers. On a single command, the computers simultaneously send packets of data across the Internet at a target computer. The attack is designed to overwhelm the target and stop it from functioning." (Andy Walker, "Absolute Beginner’s Guide To: Security, Spam, Spyware & Viruses", 2005)

"A type of DoS attack in which many (usually thousands or millions) of systems flood the victim with unwanted traffic. Typically perpetrated by networks of zombie Trojans that are woken up specifically for the attack." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"A denial of service (DoS) attack that comes from multiple sources at the same time. Attackers often enlist computers into botnets after infecting them with malware. Once infected, the attacker can then direct the infected computers to attack other computers." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"A denial of service technique using numerous hosts to perform the attack. For example, in a network flooding attack, a large number of co-opted computers (e.g., a botnet) send a large volume of spurious network packets to disable a specified target system. See also denial of service; botnet." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"A DoS attack in which multiple systems are used to flood servers with traffic in an attempt to overwhelm available resources (transmission capacity, memory, processing power, and so on), making them unavailable to respond to legitimate users." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"DDoS stands for distributed denial of service. In this type of an attack, an attacker tends to overwhelm the targeted network in order to make the services unavailable to the intended or legitimate user." (Kirti R Bhatele et al, "The Role of Artificial Intelligence in Cyber Security", Countering Cyber Attacks and Preserving the Integrity and Availability of Critical Systems, 2019)

"In DDoS attack, the incoming network traffic affects a target (e.g., server) from many different compromised sources. Consequently, online services are unavailable due to the attack. The target's resources are affected with different malicious network-based techniques (e.g., flood of network traffic packets)." (Ana Gavrovska & Andreja Samčović, "Intelligent Automation Using Machine and Deep Learning in Cybersecurity of Industrial IoT", 2020)

"This refers to malicious attacks or threats on computer systems to disrupt or break computing activities so that their access and availability is denied to the consumers of such systems or activities." (Heru Susanto et al, "Data Security for Connected Governments and Organisations: Managing Automation and Artificial Intelligence", 2021)

"A denial of service technique that uses numerous hosts to perform the attack." (CNSSI 4009-2015)

"A distributed denial-of-service (DDoS) attack is a malicious attempt to disrupt normal traffic on a targeted server, service or network by overwhelming the target or its surrounding infrastructure with a flood of Internet traffic." (proofpoint) [source]

26 August 2019

🛡️Information Security: Denial of Service [DoS] (Definitions)

"A type of attack on a computer system that ties up critical system resources, making the system temporarily unusable." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Any attack that affects the availability of a service. Reliability bugs that cause a service to crash or hang are usually potential denial-of-service problems." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"This is a technique for overloading an IT system with a malicious workload, effectively preventing its regular service use." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Occurs when a server or Web site receives a flood of traffic - much more traffic or requests for service than it can handle, causing it to crash." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Causing an information resource to be partially or completely unable to process requests. This is usually accomplished by flooding the resource with more requests than it can handle, thereby rendering it incapable of providing normal levels of service." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition" 2nd Ed., 2013)

"Attacks designed to disable a resource such as a server, network, or any other service provided by the company. If the attack is successful, the resource is no longer available to legitimate users." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"An attack from a single attacker designed to disrupt or disable the services provided by an IT system. Compare to distributed denial of service (DDoS)." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"A coordinated attack in which the target website or service is flooded with requests for access, to the point that it is completely overwhelmed." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"An attack that can result in decreased availability of the targeted system." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"An attack that generally floods a network with traffic. A successful DoS attack renders the network unusable and effectively stops the victim organization’s ability to conduct business." (Weiss, "Auditing IT Infrastructures for Compliance" 2nd Ed., 2015)

"A type of cyberattack to degrade the availability of a target system." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"Any action, or series of actions, that prevents a system, or its resources, from functioning in accordance with its intended purpose." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"The prevention of authorized access to resources or the delaying of time-critical operations." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"An attack shutting down running of a service or network in order to render it inaccessible to its users (whether human person or a processing device)." (Wissam Abbass et al, "Internet of Things Application for Intelligent Cities: Security Risk Assessment Challenges", 2021)

"Actions that prevent the NE from functioning in accordance with its intended purpose. A piece of equipment or entity may be rendered inoperable or forced to operate in a degraded state; operations that depend on timeliness may be delayed." (NIST SP 800-13)

"The prevention of authorized access to resources or the delaying of time-critical operations. (Time-critical may be milliseconds or it may be hours, depending upon the service provided)." (NIST SP 800-12 Rev. 1)

"The prevention of authorized access to a system resource or the delaying of system operations and functions." (NIST SP 800-82 Rev. 2)

25 August 2019

🛡️Information Security: Attack (Definitions)

[active attack:] "Any network-based attack other than simple eavesdropping (i.e., a passive attack)." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"Unauthorized activity with malicious intent that uses specially crafted code or techniques." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"An attempt to destroy, expose, alter, disable, steal or gain unauthorised access to or make unauthorised use of an asset," (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

[active attack:] "Attack where the attacker does interact with processing or communication activities." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

[passive attack:] "Attack where the attacker does not interact with processing or communication activities, but only carries out observation and data collection, as in network sniffing." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"An attempt to gain unauthorized access to system services, resources, or information, or an attempt to compromise system integrity." (Olivera Injac & Ramo Šendelj, "National Security Policy and Strategy and Cyber Security Risks", 2016)

"A sequence of actions intended to have a specified effect favorable to an actor that is adversarial to the owners of that system." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"An attempt to bypass security controls in a system with the mission of using that system or compromising it. An attack is usually accomplished by exploiting a current vulnerability." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"Any kind of malicious activity that attempts to collect, disrupt, deny, degrade, or destroy information system resources or information itself." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"an aggressive action against a person, an organisation or an asset intended to cause damage or loss." (ISO/IEC 27000:2014)

23 August 2019

🛡️Information Security: Cybercrime (Definitions)

"A variety of offenses related to information technology, including extortion, boiler-room investment and gambling fraud, and fraudulent transfers of funds." (Robert McCrie, "Security Operations Management" 2nd Ed., 2006)

"Any type of crime that targets computers, or uses computer networks or devices, and violates existing laws. Cybercrime includes cyber vandalism, cyber theft, and cyber-attacks." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"Any crime that is facilitated through the use of computers and networks. This can include crimes that are dependent on computers or networks in order to take place, as well as those whose impact and reach are increased by their use." (Hamid R Arabnia et al, "Application of Big Data for National Security", 2015)

"Cybercrime is defined as any illegal activity that uses a computer either as the object of the crime OR as a tool to commit an offense." (Sanjukta Pookulangara, "Cybersecurity: What Matters to Consumers - An Exploratory Study", 2016)

"Any crime that is facilitated or committed using a computer, network, or hardware device." (Anisha B D Gani & Yudi Fernando, "Concept and Practices of Cyber Supply Chain in Manufacturing Context", 2018)

"Is all illegal acts, the commission of which involves the use of information and communication technologies. It is generally thought of as any criminal activity involving a computer system." (Thokozani I Nzimakwe, "Government's Dynamic Approach to Addressing Challenges of Cybersecurity in South Africa", 2018)

"Any criminal action perpetrated primarily through the use of a computer." (Christopher T Anglim, "Cybersecurity Legislation", 2020)

"Criminal activity involving computer systems, networks, and/or the internet." (Boaventura DaCosta & Soonhwa Seok, "Cybercrime in Online Gaming", 2020)

07 August 2019

🛡️Information Security: Certificate (Definitions)

"An asymmetric key, usually issued by a certificate authority, that contains the public key of a public/private key pair as well as identifying information, expiration dates, and other information and that provides the ability to authenticate its holder. Certificates are used in SQL Server 2005 to secure logins or other database objects." (Victor Isakov et al, "MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (70-444) Study Guide", 2007)

"A certificate is an electronic document consisting of an asymmetric key with additional metadata such as an expiration date and a digital signature that allows it to be verified by a third-party like a certificate authority (CA)." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"A certificate is an electronic document that uses a digital signature to bind an asymmetric key with a public identity. In its simplest form, a certificate is essentially an asymmetric key which can have additional metadata, like a certificate name, subject, and expiration date. A certificate can be selfsigned or issued by a certificate authority." (Michael Coles & Rodney Landrum, , "Expert SQL Server 2008 Encryption", 2008)

"A data object that binds information about a person or some other entity to a public key. The binding is generally done using a digital signature from a trusted third party (a certification authority)." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"(1) A token of authorization or authentication. (2) In data security, a computer data security object that includes identity information, validity specification, and a key." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A digital document that is commonly used for authentication and to help secure information on a network. A certificate binds a public key to an entity that holds the corresponding private key. Certificates are digitally signed by the certification authority that issues them, and they can be issued for a user, a computer, or a service." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A bundle of information containing the encrypted public key of the server, and the identification of the key provider." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"An electronic document used to identify an individual, a system, a server, a company, or some other entity, and to associate a public key with the entity. A digital certificate is issued by a certification authority and is digitally signed by that authority." (IBM, "Informix Servers 12.1", 2014)

"A representation of a sender’s authenticated public key used to minimize malicious forgeries" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A small electronic file that serves to validate or encrypt a message or browser session. Digital certificates are often used to create a digital signature which offers non-repudiation of a user or a Web site." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"An electronic document consisting of an asymmetric key with additional metadata such as an expiration date and a digital signature that allows it to be verified by a third party like a certificate authority (CA)." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide 4th Ed", 2015)

"Cryptography-related electronic documents that allow for node identification and authentication. Digital certificates require more administrative work than some other methods but provide greater security." (Weiss, "Auditing IT Infrastructures for Compliance" 2nd Ed., 2015)

"Digital identity used within a PKI. Generated and maintained by a certificate authority and used for authentication." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"A cryptographic binding between a user identifier and their public key as signed by a recognized authority called a certificate authority." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"In computer security, a digital document that binds a public key to the identity of the certificate owner, thereby enabling the certificate owner to be authenticated. A certificate is issued by a certificate authority and is digitally signed by that authority." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"An electronic document using a digital signature to assert the identity of a person, group, or organization. Certificates attest to the identity of a person or group and contain that organization’s public key. A certificate is signed by a certificate authority with its digital signature." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)

30 July 2019

🧱IT: Network (Definitions)

"Mathematically defined structure of a computing system where the operations are performed at specific locations (nodes) and the flow of information is represented by directed arcs." (Guido Deboeck & Teuvo Kohonen (Eds), "Visual Explorations in Finance with Self-Organizing Maps 2nd Ed.", 2000)

"A system of interconnected computing resources (computers, servers, printers, and so on)." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"A system of connected computers. A local area network (LAN) is contained within a single company, in a single office. A wide area network (WAN) is generally distributed across a geographical area — even globally. The Internet is a very loosely connected network, meaning that it is usable by anyone and everyone." (Gavin Powell, "Beginning Database Design", 2006)

"A system of interconnected devices that provides a means for data to be transmitted from point to point." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"1.Visually, a graph of nodes and connections where more than one entry point for each node is allowed. 2.In architecture, a topological arrangement of hardware and connections to allow communication between nodes and access to shared data and software." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The connection of computer systems (nodes) by communications channels and appropriate software. |" (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"The means by which electronic communications occurs between two or more nodes" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"Two or more computers connected to share data and resources." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"People working towards a common purpose or with common interests where there is no requirement for members of the network to have a work relationship with others, and there is no requirement for mutuality as there is with a team." (Catherine Burke et al, "Systems Leadership, 2nd Ed,", 2018)

28 July 2019

🧱IT: Internet of Things [IoT] (Definitions)

"A term used to describe the community or collection of people and items that use the Internet to communicate with other." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"The embedding of objects with sensors, coupled with the ability of objects to communicate, driving an explosion in the growth of big data." (Brenda L Dietrich et al, "Analytics Across the Enterprise", 2014)

"The Internet of Things entails the aim of all physical or uniquely identifiable objects being connected through wired and wireless networks. In this notion, every object would be virtually represented. Connecting objects in this way offers a whole new universe of possibilities. Real-time analysis of big data streams could enhance productivity and safety of systems (for example, roadways and cars being part of the Internet of Things could help to manage traffic flow). It can also make everyday life more convenient and sustainable (such as connecting all household devices to save electricity)." (Martin Hoegl et al, "Using Thematic Thinking to Achieve Business Success, Growth, and Innovation", 2014)

"IOT refers to a network of machines that have sensors and are interconnected enabling them to collect and exchange data. This interconnection enables devices to be controlled remotely resulting in process efficiencies and lower costs." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"An interconnected network of physical devices, vehicles, buildings, and other items embedded with sensors that gather and share data." (Jonathan Ferrar et al, "The Power of People: Learn How Successful Organizations Use Workforce Analytics To Improve Business Performance", 2017)

"Ordinary devices that are connected to the Internet at any time, anywhere, via sensors." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Also referred to as IoT. Term that describes the connectivity of objects to the Internet and the ability for these objects to send and receive data from each other." (Brittany Bullard, "Style and Statistics", 2016)

"computing or 'smart' devices often with sensor capability and the ability to collect, share, and transfer data using the Internet." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

"The wide-scale deployment of small, low-power computing devices into everyday devices, such as thermostats, refrigerators, clothing, and even into people themselves to continuously monitor health." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"A network of physical objects that have, like cell phones and laptops, internet connectivity enabling automatic communication between them and any other machine connected to the internet without human intervention." (Sue Milton, "Data Privacy vs. Data Security", 2021)

"Integration of various processes such as identifying, sensing, networking, and computation." (Revathi Rajendran et al, "Convergence of AI, ML, and DL for Enabling Smart Intelligence: Artificial Intelligence, Machine Learning, Deep Learning, Internet of Things", 2021)

"It is an interdisciplinary field who is associated with the electronics and computer science. Electronics deals with the development of new sensors or hardware for IoT device and computer science deals with the development of software, protocols and cloud based solution to store the data generated form these IoT devices." (Ajay Sharma, "Smart Agriculture Services Using Deep Learning, Big Data, and IoT", 2021)

"IoT is a network of real-world objects which consists of sensors, software, and other technologies to exchange data with the other systems over the internet." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"This refers to a system of inter-connected computing and smart devices, that are provided with unique identifiers and the ability to transfer data over a network without requiring human interaction." (Wissam Abbass et al, "Internet of Things Application for Intelligent Cities: Security Risk Assessment Challenges", 2021)

"describes the network where sensing elements such as sensors, cameras, and devices are increasingly linked together via the internet to connect, communicate and exchange information." (Accenture)

"ordinary devices that are connected to the internet at any time anywhere via sensors." (Analytics Insight)

"Technologies that enable objects and infrastructure to interact with monitoring, analytics, and control systems over internet-style networks." (Forrester)

27 July 2019

🧱IT: Cloud (Definitions)

"A set of computers, typically maintained in a data center, that can be allocated dynamically and accessed remotely. Unlike a cluster, cloud computers are typically managed by a third party and may host multiple applications from different, unrelated users." (Michael McCool et al, "Structured Parallel Programming", 2012)

"A network that delivers requested virtual resources as a service." (IBM, "Informix Servers 12.1", 2014)

"A secure computing environment accessed via the Internet." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"Products and services managed by a third-party company and made available through the Internet." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"It has the ability to offer and to assist any kind of useful information without any limitations for users." (Shigeki Sugiyama. "Human Behavior and Another Kind in Consciousness: Emerging Research and Opportunities", 2019)

"Remote server and distributed computing environment used to store data and provision computing related services as and when needed on a pay-as-you-go basis." (Wissam Abbass et al, "Internet of Things Application for Intelligent Cities: Security Risk Assessment Challenges", 2021)

"The virtual world in which information technology tools and services are available for hire, use and storage via the internet, Wi-Fi and physical attributes ranging from IT components to data storage." (Sue Milton, "Data Privacy vs. Data Security", 2021)

"uses a network of remote servers hosted on the internet to store, manage, and process data, rather than requiring a local server or a personal computer." (Accenture)

24 July 2019

🧱IT: Virtualization (Definitions)

"Creation of a virtual, as opposed to a real, instance of an entity, such as an operating system, server, storage, or network." (David G Hill, "Data Protection: Governance, Risk Management, and Compliance", 2009)

"The process of partitioning a computer so that multiple operating system instances can run at the same time on a single physical computer." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"A concept that separates business applications and data from hardware resources, allowing companies to pool hardware resources, rather than dedicate servers to application and assign those resources to applications as needed." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"A technique that creates logical representations of computing resources that are independent of the underlying physical computing resources." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

"A method for managing hardware assets used at the same time by different users or processes, or both, that makes the part assigned to each user or process appear to act as if it was running on a separate piece of equipment." (Kenneth A Shaw, "Integrated Management of Processes and Information", 2013)

"Virtual memory is the use of a disk to store active areas of memory to make the available memory appear larger. In a virtual environment, one computer runs software that allows it to emulate another machine. This kind of emulation is commonly known as virtualization." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A technique common in computing, consisting in the creation of virtual (rather than actual) instance of any element, so it can be managed and used independently. Virtualization has been one of the key tools for resource sharing and software development, and now it is beginning to be applied to the network disciplines." (Diego R López & Pedro A. Aranda, "Network Functions Virtualization: Going beyond the Carrier Cloud", 2015)

"Creation of a simulated environment (hardware platform, operating system, storage, etc.) that allows for central control and scalability." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK 4th Ed.", 2015)

"The creation of a virtual version of actual services, applications, or resources." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"The process of creating a virtual version of a resource, such as an operating system, hardware platform, or storage device." (Andrew Pham et al, "From Business Strategy to Information Technology Roadmap", 2016)

"A base component of the cloud that consists of software that emulates physical infrastructure." (Richard Ehrhardt, "Cloud Build Methodology", 2017)

"The process of presenting an abstraction of hardware resources to give the appearance of dedicated access and control to hardware resources, while, in reality, those resources are being shared." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

17 July 2019

🧱IT: Extranet (Definitions)

"A secure Internet site available only to a company’s internal staff and approved third-party partners. Extranets are flourishing in B2B environments where suppliers can have ready access to updated information from their business customers, and vice versa." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"Semi-public TCP/IP network used by several collaborating partners." (Martin J Eppler, "Managing Information Quality 2nd Ed.", 2006)

"Enterprise network using Web technologies for collaboration of internal users and selected external business partners." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"An internal network or intranet opened to selected business partners. Suppliers, distributors, and other authorized users can connect to a company’s network over the Internet or through private networks." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Private, company-owned network that uses IP technology to securely share part of a business's information or operations with suppliers, vendors, partners, customers, or other businesses." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"A network that is outside the control of the company. Extranets are usually connections to outside companies, service providers, customers, and business partners." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"A special network set up by a business for its customers, staff, and business partners to access from outside the office network; may be used to share marketing assets and other non-sensitive items." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"An extension of the corporate intranet over the Internet so that vendors, business partners, customers, and others can have access to the intranet." (James R Kalyvas & Michael R Overly, "Big Data: A Businessand Legal Guide", 2015)

16 July 2019

🧱IT: Quality of Service [QoS] (Definitions)

"The guaranteed performance of a network connection." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"QoS (Quality of Service) is a metric for quantifying desired or delivered degree of service reliability, priority, and other measures of interest for its quality." (Bo Leuf, "The Semantic Web: Crafting infrastructure for agency", 2006)

"a criterion of performance of a service or element, such as the worst-case execution time for an operation." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"The QoS describes the non-functional aspects of a service such as performance." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"QoS (Quality of Service) Networking technology that enables network administrators to manage bandwidth and give priority to desired types of application traffic as it traverses the network." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"A negotiated contract between a user and a network provider that renders some degree of reliable capacity in the shared network." (Gartner)

"Quality of service (QoS) is the description or measurement of the overall performance of a service, especially in terms of the user’s experience. Typically it is used in reference to telephony or computer networks, or to online and cloud-hosted services." (Barracuda) [source]

"The measurable end-to-end performance properties of a network service, which can be guaranteed in advance by a Service Level Agreement between a user and a service provider, so as to satisfy specific customer application requirements. Note: These properties may include throughput (bandwidth), transit delay (latency), error rates, priority, security, packet loss, packet jitter, etc." (CNSSI 4009-2015)

12 July 2019

🧱IT: Intranet (Definitions)

"This is a network technology similar to the Internet that has been constructed by a company for its own benefit. Usually access to a company's intranet is limited to its employees, customers, and vendors." (Dale Furtwengler, "Ten Minute Guide to Performance Appraisals", 2000)

"A private network that uses web technology to distribute information. Usually used to make information available inside a company among employees." (Andy Walker, "Absolute Beginner’s Guide To: Security, Spam, Spyware & Viruses", 2005)

"An organization’s internal system of connected networks built on Internet-standard protocols and usually connected to the Internet via a firewall." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"Internal company networks designed to provide a secure forum for sharing information, often in a web-browser type interface." (Martin J Eppler, "Managing Information Quality 2nd Ed.", 2006)

"The enterprise network using Web technologies for collaboration of internal users." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A subset of the Internet used internally by an organization. Unlike the larger Internet, intranets are private and accessible only from within the organization. The use of Internet technologies over a private network." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Network designed to serve the internal informational needs of a company, using Internet tools." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"a private web site available only to those within a company or organization." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook" 3rd Ed., 2012)

"A computer network designed to be used within a business or company. An intranet is so named because it uses much of the same technology as the Internet. Web browsers, email, newsgroups, HTML documents, and websites are all found on intranets. In addition, the method for transmitting information on these networks is TCP/IP (Transmission Control Protocol/Internet Protocol). See Internet." (James R Kalyvas & Michael R Overly, "Big Data: A Businessand Legal Guide", 2015)

"A special network that only staff within the company network can access. For security reasons an intranet can only be accessed onsite and not remotely." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"A trusted digital source of corporate communication and content designed to educate and empower employees and improve their workplace experiences." (Forrester)

08 July 2019

🧱IT: Grid Computing (Definitions)

"A grid is an architecture for distributed computing and resource sharing. A grid system is composed of a heterogeneous collection of resources connected by local-area and/or wide-area networks (often the Internet). These individual resources are general and include compute servers, storage, application servers, information services, or even scientific instruments. Grids are often implemented in terms of Web services and integrated middleware components that provide a consistent interface to the grid. A grid is different from a cluster in that the resources in a grid are not controlled through a single point of administration; the grid middleware manages the system so control of resources on the grid and the policies governing use of the resources remain with the resource owners." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"Clusters of cheap computers, perhaps distributed on a global basis, connected using even something as loosely connected as the Internet." (Gavin Powell, "Beginning Database Design", 2006)

"A step beyond distributed processing. Grid computing involves large numbers of networked computers, often geographically dispersed and possibly of different types and capabilities, that are harnessed together to solve a common problem." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"A web-based operation allowing companies to share computing resources on demand." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The use of networks to harness the unused processing cycles of all computers in a given network to create powerful computing capabilities." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"A distributed set of computers that can be allocated dynamically and accessed remotely. A grid is distinguished from a cloud in that a grid may be supported by multiple organizations and is usually more heterogeneous and physically distributed." (Michael McCool et al, "Structured Parallel Programming", 2012)

"the use of multiple computing resources to leverage combined processing power. Usually associated with scientific applications." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook" 3rd Ed., 2012)

"A step beyond distributed processing, involving large numbers of networked computers (often geographically dispersed and possibly of different types and capabilities) that are harnessed to solve a common problem. A grid computing model can be used instead of virtualization in situations that require real time where latency is unacceptable." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A named set of interconnected replication servers for propagating commands from an authorized server to the rest of the servers in the set." (IBM, "Informix Servers 12.1", 2014)

"A type of computing in which large computing tasks are distributed among multiple computers on a network." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"Connecting many computer system locations, often via the cloud, working together for the same purpose." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"A computer network that enables distributed resource management and on-demand services." (Forrester)

"A computing architecture that coordinates large numbers of servers and storage to act as a single large computer." (Oracle, "Oracle Database Concepts")

"connecting different computer systems from various location, often via the cloud, to reach a common goal." (Analytics Insight)

07 July 2019

🧱IT: Gateway (Definitions)

"A network software product that allows computers or networks running dissimilar protocols to communicate, providing transparent access to a variety of foreign database management systems (DBMSs). A gateway moves specific database connectivity and conversion processing from individual client computers to a single server computer. Communication is enabled by translating up one protocol stack and down the other. Gateways usually operate at the session layer." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Connectivity software that allows two or more computer systems with different network architectures to communicate." (Sybase, "Glossary", 2005)

"A generic term referring to a computer system that routes data or merges two dissimilar services together." (Paulraj Ponniah, "Data Warehousing Fundamentals for IT Professionals", 2010)

"A software product that allows SQL-based applications to access relational and non-relational data sources." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"An entrance point that allows users to connect from one network to another." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

[database gateway:] "Software required to allow clients to access data stored on database servers over a network connection." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)

"A connector box that enables you to connect two dissimilar networks." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"A node that handles communication between its LAN and other networks" (Nell Dale & John Lewis, "Computer Science Illuminated, 6th Ed.", 2015)

"A system or device that connects two unlike environments or systems. The gateway is usually required to translate between different types of applications or protocols." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"An application that acts as an intermediary for clients and servers that cannot communicate directly. Acting as both client and server, a gateway application passes requests from a client to a server and returns results from the server to the client." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

06 July 2019

🧱IT: Latency (Definitions)

"The fixed cost of servicing a request, such as sending a message or accessing information from a disk. In parallel computing, the term most often is used to refer to the time it takes to send an empty message over the communication medium, from the time the send routine is called to the time the empty message is received by the recipient. Programs that generate large numbers of small messages are sensitive to the latency and are called latency-bound programs." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"The amount of time it takes a system to deliver data in response to a request. For mass storage devices, it is the time it takes to place the read or write heads over the desired spot on the media. In networks, it is a function of the electrical and software properties of the network connection." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"The time delay it takes for a network packet to travel from one destination to another." (John Goodson & Robert A Steward, "The Data Access Handbook", 2009)

"The time it takes for a system to respond to an input." (W Roy Schulte & K Chandy, "Event Processing: Designing IT Systems for Agile Companies", 2009)

"A period of time that the computer must wait while a disk drive is positioning itself to read a particular block of data." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"The measure of time between two events, such as the initiation and completion of an event, or the read on one system and the write to another system." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The time period from start to completion of a unit of work." (Max Domeika, "Software Development for Embedded Multi-core Systems", 2011)

"The time it takes to complete a task - that is, the time between when the task begins and when it ends. Latency has units of time. The scale can be anywhere from nanoseconds to days. Lower latency is better in general." (Michael McCool et al, "Structured Parallel Programming", 2012)

"The amount of time lag before a service executes in an environment. Some applications require less latency and need to respond in near real time, whereas other applications are less time-sensitive." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A delay. Can apply to the sending, processing, transmission, storage, or receiving of information." (Mike Harwood, "Internet Security: How to Defend Against Attackers on the Web" 2nd Ed., 2015)

"A period of waiting for another component to deliver data needed to proceed." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"The time it takes for the specified sector to be in position under the read/write head" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"The delay between when an action such as transmitting data is taken and when it has an effect." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

02 July 2019

🧱IT: Peer-to-Peer Network (Definitions)

[peer-to-peer computing:] "Users loosely connected through online connections that enable them to share data and programs." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours 2nd Ed.", 2001)

[peer-to-peer computing:] "A distributed computing model in which each node has equal standing among the collection of nodes. In the most typical usage of this term, the same capabilities are offered by each node, and any node can initiate a communication session with another node. This contrasts with, for example, client-server computing. The capabilities that are shared in peer-to-peer computing include file-sharing as well as computation." (Beverly A Sanders, "Patterns for Parallel Programming", 2004)

"A network comprised of individual participants that have equal capabilities and duties." (Andy Walker, "Absolute Beginner’s Guide To: Security, Spam, Spyware & Viruses", 2005)

"A blanket term used to describe: (1) a peer-centric distributed software architecture, (2) a flavor of software that encourages collaboration and file sharing between peers, and (3) a cultural progression in the way humans and applications interact with each other that emphasizes two way interactive 'conversations' in place of the Web’s initial television-like communication model (where information only flows in one direction)." (Craig F Smith & H Peter Alesso, "Thinking on the Web: Berners-Lee, Gödel and Turing", 2008)

"A networking system in which nodes in a network exchange data directly instead of going through a central server. " (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"A network where all computers can both share and acces resources from other computers on the same network; a decentralized network." (Faithe Wempen, "Computing Fundamentals: Introduction to Computers", 2015)

"A type of network in which a group of personal computers is interconnected so that the hard disks, CD ROMs, files, and printers of each computer can be accessed from every other computer on the network. Peer-to-peer networks do not have a central file server. This type of system is used if less than a dozen computers will be networked." (James R Kalyvas & Michael R Overly, "Big Data: A Businessand Legal Guide", 2015)

"A decentralized network where participants have equal privileges and make certain resources directly available to other network participants." (AICPA)

24 December 2018

🔭Data Science: Randomness (Just the Quotes)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"The postulate of randomness thus resolves itself into the question, ‘of what population is this a random sample?’ which must frequently be asked by every practical statistician." (Ronald A Fisher, "On the Mathematical Foundation of Theoretical Statistics", Philosophical Transactions of the Royal Society of London Vol. A222, 1922)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper, "The Logic of Scientific Discovery", 1934)

"The definition of random in terms of a physical operation is notoriously without effect on the mathematical operations of statistical theory because so far as these mathematical operations are concerned random is purely and simply an undefined term." (Walter A Shewhart & William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"The first attempts to consider the behavior of so-called 'random neural nets' in a systematic way have led to a series of problems concerned with relations between the 'structure' and the 'function' of such nets. The 'structure' of a random net is not a clearly defined topological manifold such as could be used to describe a circuit with explicitly given connections. In a random neural net, one does not speak of 'this' neuron synapsing on 'that' one, but rather in terms of tendencies and probabilities associated with points or regions in the net." (Anatol Rapoport, "Cycle distributions in random nets", The Bulletin of Mathematical Biophysics 10(3), 1948)

"Time itself will come to an end. For entropy points the direction of time. Entropy is the measure of randomness. When all system and order in the universe have vanished, when randomness is at its maximum, and entropy cannot be increased, when there is no longer any sequence of cause and effect, in short when the universe has run down, there will be no direction to time - there will be no time." (Lincoln Barnett, "The Universe and Dr. Einstein", 1948)

"A random sequence is a vague notion embodying the idea of a sequence in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians and depending somewhat on the uses to which the sequence is to be put." (Derrick H Lehmer, 1951)

"We must emphasize that such terms as 'select at random', 'choose at random', and the like, always mean that some mechanical device, such as coins, cards, dice, or tables of random numbers, is used." (Frederick Mosteller et al, "Principles of Sampling", Journal of the American Statistical Association Vol. 49 (265), 1954)

"The concept of randomness arises partly from games of chance. The word ‘chance’ derives from the Latin cadentia signifying the fall of a die. The word ‘random’ itself comes from the French randir meaning to run fast or gallop." (G Spencer Brown, "Probability and Scientific Inference", 1957)

"[…] random numbers should not be generated with a method chosen at random. Some theory should be used." (Donald E Knuth, "The Art of Computer Programming" Vol. II, 1968)

"The generation of random numbers is too important to be left to chance." (Robert R Coveyou, [Oak Ridge National Laboratory] 1969)

"[...] too many users of the analysis of variance seem to regard the reaching of a mediocre level of significance as more important than any descriptive specification of the underlying averages Our thesis is that people have strong intuitions about random sampling; that these intuitions are wrong in fundamental respects; that these intuitions are shared by naive subjects and by trained scientists; and that they are applied with unfortunate consequences in the course of scientific inquiry. We submit that people view a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. Consequently, they expect any two samples drawn from a particular population to be more similar to one another and to the population than sampling theory predicts, at least for small samples." (Amos Tversky & Daniel Kahneman, "Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought." (Edwin T Jaynes, "Probability Theory: The Logic of Science", 1979)

"From a purely operational point of viewpoint […] the concept of randomness is so elusive as to cease to be viable." (Mark Kac, 1983)

"Randomness is a difficult notion for people to accept. When events come in clusters and streaks, people look for explanations and patterns. They refuse to believe that such patterns - which frequently occur in random data - could equally well be derived from tossing a coin. So it is in the stock market as well." (Burton G Malkiel, "A Random Walk Down Wall Street", 1989)

"The term chaos is used in a specific sense where it is an inherently random pattern of behaviour generated by fixed inputs into deterministic (that is fixed) rules (relationships). The rules take the form of non-linear feedback loops. Although the specific path followed by the behaviour so generated is random and hence unpredictable in the long-term, it always has an underlying pattern to it, a 'hidden' pattern, a global pattern or rhythm. That pattern is self-similarity, that is a constant degree of variation, consistent variability, regular irregularity, or more precisely, a constant fractal dimension. Chaos is therefore order (a pattern) within disorder (random behaviour)." (Ralph D Stacey, "The Chaos Frontier: Creative Strategic Control for Business", 1991)

"When nearest neighbor effects exist, the randomized complete block analysis [can be] so poor as to deserver to be called catastrophic. It [can not] even be considered a serious form of analysis. It is extremely important to make this clear to the vast number of researchers who have near religious faith in the randomized complete block design." (Walt Stroup & D Mulitze, "Nearest Neighbor Adjusted Best Linear Unbiased Prediction", The American Statistician 45, 1991)

"Chaos demonstrates that deterministic causes can have random effects […] There's a similar surprise regarding symmetry: symmetric causes can have asymmetric effects. […] This paradox, that symmetry can get lost between cause and effect, is called symmetry-breaking. […] From the smallest scales to the largest, many of nature's patterns are a result of broken symmetry; […]" (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"Probability theory is an ideal tool for formalizing uncertainty in situations where class frequencies are known or where evidence is based on outcomes of a sufficiently long series of independent random experiments. Possibility theory, on the other hand, is ideal for formalizing incomplete information expressed in terms of fuzzy propositions." (George Klir, "Fuzzy sets and fuzzy logic", 1995)

"We use mathematics and statistics to describe the diverse realms of randomness. From these descriptions, we attempt to glean insights into the workings of chance and to search for hidden causes. With such tools in hand, we seek patterns and relationships and propose predictions that help us make sense of the world." (Ivars Peterson, "The Jungles of Randomness: A Mathematical Safari", 1998)

"Events may appear to us to be random, but this could be attributed to human ignorance about the details of the processes involved." (Brain S Everitt, "Chance Rules", 1999)

"I sometimes think that the only real difference between Bayesian and non-Bayesian hierarchical modelling is whether random effects are labeled with Greek or Roman letters." (Peter Diggle, "Comment on Bayesian analysis of agricultural field experiments", Journal of Royal Statistical Society B vol. 61, 1999)

"The self-similarity of fractal structures implies that there is some redundancy because of the repetition of details at all scales. Even though some of these structures may appear to teeter on the edge of randomness, they actually represent complex systems at the interface of order and disorder." (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"Randomness is NOT the absence of a pattern." (Bill Venables," S-Plus User’s Conference", 1999)

"Most physical systems, particularly those complex ones, are extremely difficult to model by an accurate and precise mathematical formula or equation due to the complexity of the system structure, nonlinearity, uncertainty, randomness, etc. Therefore, approximate modeling is often necessary and practical in real-world applications. Intuitively, approximate modeling is always possible. However, the key questions are what kind of approximation is good, where the sense of 'goodness' has to be first defined, of course, and how to formulate such a good approximation in modeling a system such that it is mathematically rigorous and can produce satisfactory results in both theory and applications." (Guanrong Chen & Trung Tat Pham, "Introduction to Fuzzy Sets, Fuzzy Logic, and Fuzzy Control Systems", 2001)

"[…] we would like to observe that the butterfly effect lies at the root of many events which we call random. The final result of throwing a dice depends on the position of the hand throwing it, on the air resistance, on the base that the die falls on, and on many other factors. The result appears random because we are not able to take into account all of these factors with sufficient accuracy. Even the tiniest bump on the table and the most imperceptible move of the wrist affect the position in which the die finally lands. It would be reasonable to assume that chaos lies at the root of all random phenomena." (Iwo Białynicki-Birula & Iwona Białynicka-Birula, "Modeling Reality: How Computers Mirror Life", 2004)

"Chance is just as real as causation; both are modes of becoming. The way to model a random process is to enrich the mathematical theory of probability with a model of a random mechanism. In the sciences, probabilities are never made up or 'elicited' by observing the choices people make, or the bets they are willing to place. The reason is that, in science and technology, interpreted probability exactifies objective chance, not gut feeling or intuition. No randomness, no probability." (Mario Bunge, "Chasing Reality: Strife over Realism", 2006)

"Complexity arises when emergent system-level phenomena are characterized by patterns in time or a given state space that have neither too much nor too little form. Neither in stasis nor changing randomly, these emergent phenomena are interesting, due to the coupling of individual and global behaviours as well as the difficulties they pose for prediction. Broad patterns of system behaviour may be predictable, but the system's specific path through a space of possible states is not." (Steve Maguire et al, "Complexity Science and Organization Studies", 2006)

"A Black Swan is a highly improbable event with three principal characteristics: It is unpredictable; it carries a massive impact; and, after the fact, we concoct an explanation that makes it appear less random, and more predictable, than it was. […] The Black Swan idea is based on the structure of randomness in empirical reality. [...] the Black Swan is what we leave out of simplification." (Nassim N Taleb, "The Black Swan", 2007)

"[myth:] Random errors can always be determined by repeating measurements under identical conditions. […] this statement is true only for time-related random errors ." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"To fulfill the requirements of the theory underlying uncertainties, variables with random uncertainties must be independent of each other and identically distributed. In the limiting case of an infinite number of such variables, these are called normally distributed. However, one usually speaks of normally distributed variables even if their number is finite." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"While in theory randomness is an intrinsic property, in practice, randomness is incomplete information." (Nassim N Taleb, "The Black Swan", 2007)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"The key to understanding randomness and all of mathematics is not being able to intuit the answer to every problem immediately but merely having the tools to figure out the answer." (Leonard Mlodinow,"The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Data always vary randomly because the object of our inquiries, nature itself, is also random. We can analyze and predict events in nature with an increasing amount of precision and accuracy, thanks to improvements in our techniques and instruments, but a certain amount of random variation, which gives rise to uncertainty, is inevitable." (Alberto Cairo, "The Functional Art", 2011)

"No matter what the laws of chance might tell us, we search for patterns among random events wherever they might occur–not only in the stock market but even in interpreting sporting phenomena." (Burton G Malkiel, "A Random Walk Down Wall Street: The Time-Tested Strategy For Successful Investing", 2011)

"Randomness might be defined in terms of order - its absence, that is. […] Everything we care about lies somewhere in the middle, where pattern and randomness interlace." (James Gleick, "The Information: A History, a Theory, a Flood", 2011)

"The storytelling mind is allergic to uncertainty, randomness, and coincidence. It is addicted to meaning. If the storytelling mind cannot find meaningful patterns in the world, it will try to impose them. In short, the storytelling mind is a factory that churns out true stories when it can, but will manufacture lies when it can't." (Jonathan Gottschall, "The Storytelling Animal: How Stories Make Us Human", 2012)

"When some systems are stuck in a dangerous impasse, randomness and only randomness can unlock them and set them free." (Nassim N Taleb, "Antifragile: Things That Gain from Disorder", 2012)

"Although cascading failures may appear random and unpredictable, they follow reproducible laws that can be quantified and even predicted using the tools of network science. First, to avoid damaging cascades, we must understand the structure of the network on which the cascade propagates. Second, we must be able to model the dynamical processes taking place on these networks, like the flow of electricity. Finally, we need to uncover how the interplay between the network structure and dynamics affects the robustness of the whole system." (Albert-László Barabási, "Network Science", 2016)

"Too little attention is given to the need for statistical control, or to put it more pertinently, since statistical control (randomness) is so rarely found, too little attention is given to the interpretation of data that arise from conditions not in statistical control." (William E Deming)

More quotes on "Randomness" at the-web-of-knowledge.blogspot.com.

13 December 2018

🔭Data Science: Bayesian Networks (Just the Quotes)

"The best way to convey to the experimenter what the data tell him about theta is to show him a picture of the posterior distribution." (George E P Box & George C Tiao, "Bayesian Inference in Statistical Analysis", 1973)

"In the design of experiments, one has to use some informal prior knowledge. How does one construct blocks in a block design problem for instance? It is stupid to think that use is not made of a prior. But knowing that this prior is utterly casual, it seems ludicrous to go through a lot of integration, etc., to obtain 'exact' posterior probabilities resulting from this prior. So, I believe the situation with respect to Bayesian inference and with respect to inference, in general, has not made progress. Well, Bayesian statistics has led to a great deal of theoretical research. But I don't see any real utilizations in applications, you know. Now no one, as far as I know, has examined the question of whether the inferences that are obtained are, in fact, realized in the predictions that they are used to make." (Oscar Kempthorne, "A conversation with Oscar Kempthorne", Statistical Science, 1995)

"Bayesian methods are complicated enough, that giving researchers user-friendly software could be like handing a loaded gun to a toddler; if the data is crap, you won't get anything out of it regardless of your political bent." (Brad Carlin, "Bayes offers a new way to make sense of numbers", Science, 1999)

"Bayesian inference is a controversial approach because it inherently embraces a subjective notion of probability. In general, Bayesian methods provide no guarantees on long run performance." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian inference is appealing when prior information is available since Bayes’ theorem is a natural way to combine prior information with data. Some people find Bayesian inference psychologically appealing because it allows us to make probability statements about parameters. […] In parametric models, with large samples, Bayesian and frequentist methods give approximately the same inferences. In general, they need not agree." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The Bayesian approach is based on the following postulates: (B1) Probability describes degree of belief, not limiting frequency. As such, we can make probability statements about lots of things, not just data which are subject to random variation. […] (B2) We can make probability statements about parameters, even though they are fixed constants. (B3) We make inferences about a parameter θ by producing a probability distribution for θ. Inferences, such as point estimates and interval estimates, may then be extracted from this distribution." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"The important thing is to understand that frequentist and Bayesian methods are answering different questions. To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confidence intervals, use frequentist methods. Generally, Bayesian methods run into problems when the parameter space is high dimensional." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"Bayesian networks can be constructed by hand or learned from data. Learning both the topology of a Bayesian network and the parameters in the CPTs in the network is a difficult computational task. One of the things that makes learning the structure of a Bayesian network so difficult is that it is possible to define several different Bayesian networks as representations for the same full joint probability distribution." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Bayesian networks provide a more flexible representation for encoding the conditional independence assumptions between the features in a domain. Ideally, the topology of a network should reflect the causal relationships between the entities in a domain. Properly constructed Bayesian networks are relatively powerful models that can capture the interactions between descriptive features in determining a prediction." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Bayesian networks use a graph-based representation to encode the structural relationships - such as direct influence and conditional independence - between subsets of features in a domain. Consequently, a Bayesian network representation is generally more compact than a full joint distribution (because it can encode conditional independence relationships), yet it is not forced to assert a global conditional independence between all descriptive features. As such, Bayesian network models are an intermediary between full joint distributions and naive Bayes models and offer a useful compromise between model compactness and predictive accuracy." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"Bayesian networks inhabit a world where all questions are reducible to probabilities, or (in the terminology of this chapter) degrees of association between variables; they could not ascend to the second or third rungs of the Ladder of Causation. Fortunately, they required only two slight twists to climb to the top." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The main differences between Bayesian networks and causal diagrams lie in how they are constructed and the uses to which they are put. A Bayesian network is literally nothing more than a compact representation of a huge probability table. The arrows mean only that the probabilities of child nodes are related to the values of parent nodes by a certain formula (the conditional probability tables) and that this relation is sufficient. That is, knowing additional ancestors of the child will not change the formula. Likewise, a missing arrow between any two nodes means that they are independent, once we know the values of their parents. [...] If, however, the same diagram has been constructed as a causal diagram, then both the thinking that goes into the construction and the interpretation of the final diagram change." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"The transparency of Bayesian networks distinguishes them from most other approaches to machine learning, which tend to produce inscrutable 'black boxes'. In a Bayesian network you can follow every step and understand how and why each piece of evidence changed the network’s beliefs." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

"With Bayesian networks, we had taught machines to think in shades of gray, and this was an important step toward humanlike thinking. But we still couldn’t teach machines to understand causes and effects. [...] By design, in a Bayesian network, information flows in both directions, causal and diagnostic: smoke increases the likelihood of fire, and fire increases the likelihood of smoke. In fact, a Bayesian network can’t even tell what the 'causal direction' is." (Judea Pearl & Dana Mackenzie, "The Book of Why: The new science of cause and effect", 2018)

Pages