SQL Troubles: operations

Showing posts with label operations. Show all posts

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented.

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

{def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities

designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
based on a robust DCP

designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
a highly-available micro-service architecture with well-defined responsibilities [2]

data and query processing is packaged into units (aka tasks)

can be readily moved across compute nodes and re-started at the task level

widely-partitioned data with a flexible distribution model [2]
a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
a framework for fine-grained monitoring and flexible scheduling of tasks [2]

{component} SQL Server Front End (SQL-FE)

responsible for

compilation
authorization
authentication
metadata

used by the compiler to

{operation} generate the search space (aka MEMO) for incoming queries
{operation} bind metadata to data cells
leveraged to ensure the durability of the transaction manifests at commit [2]

only transactions that successfully commit need to be actively tracked to ensure consistency [2]
any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]

{component} SQL Server Backend (SQL-BE)

used to perform write operations on the LST [2]

inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]

SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes [2]
each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]

the serialized information is then uploaded as a block to the manifest file
uploading the block does not yet make any visible changes to the file [2]

each block is identified by a unique ID generated on the writing SQL BE [2]

after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]

the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]

the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]

at this point, the changes to the file on storage will become effective [2]

changes to the manifest file are not visible until the Commit operation on the SQL FE

the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]

the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]

[read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE

the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]

queries over these are processed by the SQL Server query execution engine [2]
the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]

enables the cache to be used by different operations operating on different snapshots of the table [2]
enables the cache to be incrementally updated as new transactions commit [2]

{feature} supports explicit user transactions

can execute multiple statements within the same transaction in a consistent way

the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]

changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]

[multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
{write operations} the behavior of the SQL BE depends on the type of the operation.

insert operations

only add new data and have no dependency on previous changes [2]
the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks

⇐ effectively appends the data to the manifest file [2]

{update|delete operations}

handled differently

⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]

e.g. an update operation can be followed by another update operation touching the same rows

the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]

SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]

the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]

{concept} Distributed Query Processor (DQP)

responsible for

distributed query optimization
distributed query execution
query execution topology management

{concept} Workload Management (WLM)

consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]

each compute server runs two micro-services

{service} Execution Service (ES)

responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]

{service} SQL Server instance

used as the back-bone for execution of the template query for a given task [2]

⇐ holds a cache on top of local SSDs

in addition to in-memory caching of hot data

data can be transferred from one compute server to another

via dedicated data channels

the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]

{concept} cell data abstraction

the key building block that enables to abstract data stores

abstracts DQP from the underlying store [1]
any dataset can be mapped to a collection of cells [1]
allows distributing query processing over data in diverse formats [1]
tailored for vectorized processing when the data is stored in columnar formats [1]
further improves relational query performance

2-dimenstional

distributions (data alignment)
partitions (data pruning)

each cell is self-contained with its own statistics [1]

used for both global and local QO [1]
cells can be grouped physically in storage [1]
queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]

{concept} distributed query processing (DQP) framework

operates at the cell level
agnostic to the details of the data within a cell

data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]

{concept} dataset

logically abstracted as a collection of cells [1]
can be arbitrarily assigned to compute nodes to achieve parallelism [1]
uniformly distributed across a large number of cells

[scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
such that they can be processed in parallel across nodes

{concept} session

supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
all data are accessible from any session [1]

multiple sessions can access all underlying data concurrently [1]

{concept} Physical Metadata layer

new layer introduced in the SQL Server storage engine [2]

🏭🗒️Microsoft Fabric: External Data Sharing [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 26-Mar-2025

External data sharing [1]

[Microsoft Fabric] External data sharing

{def} feature that enables Fabric users to share data from their tenant with users in another Fabric tenant (aka cross-tenant sharing) [1]

the data is shared in-place from OneLake storage locations in the sharer's tenant [1]

⇒ no data is actually copied to the other tenant [1]
creates a OneLake shortcut in the other tenant that points back to the original data in the sharer's tenant [1]
data is exposed as read-only [1]
data can be consumed by any OneLake compatible Fabric workload in that tenant [1]

{benefit} allows for efficient and secure data sharing without duplicating data

the shared data remains read-only for the consumer, ensuring data integrity and consistency [2]
multiple tables and folders can be shared at once [2]

{prerequisite} Fabric admins must turn on external data sharing both in the sharer's tenant and in the external tenant

by specifying who can create and accept external data shares [1]
users can share data residing in tables or files within supported Fabric items [1]

require standard Fabric read and reshare permissions for the item being shared [1]
the user creating the share invites a user from another tenant with a link to accept the external data share [1]

upon accepting the share, the recipient chooses a lakehouse where a shortcut to the shared data will be created [1]
the links work only for users in external tenants

for sharing data within the same OneLake storage accounts with users in the same tenant, use OneLake shortcuts [1]

{limitation} shortcuts contained in folders that are shared via external data sharing won't resolve in the consumer tenant [1]

access is enabled via a dedicated Fabric-to-Fabric authentication mechanism
⇐ doesn’t require Entra B2B guest user access [1]

{operation} create an external data share in the provider tenant)

external data shares can be created for tables or files in lakehouses and warehouses, and in KQL, SQL, mirrored databases [1]
{limitation} the sharer can't control who has access to the data in the consumer's tenant [1]

{operation} accept an external data share in consuming tenant)

only lakehouses can be chosen for the operation
the consumer can grant access to the data to anyone [1]

incl. guest users from outside the consumer's organization [1]

data can be transferred across geographic boundaries when it's accessed within the consumer's tenant [1]

{operation} revoke extern data shares

any user in the sharing tenant with read and reshare permissions on an externally shared item can revoke the external data share at any time [1]

via Manage permissions >> External data shares tab
can be performed of any item the user has read and reshare permissions on [3]
{warning}a revoked external data share can't be restored [3]

irreversibly severs all access from the receiving tenant to the shared data [3]
a new external data share can be created instead [3]

applies to

lakehouse

an entire lakehouse schema can be shared [2]

shares all the tables in the schema [2]
any changes to the schema are immediately reflected in the consumer’s lakehouse [2]

mirrored database
KQL database
OneLake catalog

can be consumed via

Spark workloads

notebooks or Spark

lakehouse SQL Analytics Endpoint
semantic models
⇐ data can be shared from a provider and consumed in-place via SQL queries or in a Power BI report [2]

{feature} external data sharing APIs

support service principals for admin and user operations

can be used to automate the creation or management of shares [2]

supports service principals or managed identity [2]

{planned} data warehouse support

share schemas or tables from a data warehouse [2]

{planned} shortcut sharing

share OneLake shortcuts using external data sharing [2]
⇒ data residing outside of Fabric will be externally shareable

S3, ADLS Gen2, or any other supported shortcut location [2]

{planned} consumer APIs

consumer activities, including viewing share details and accepting shares, will soon be available via API [2]

Previous Post <<||>> Next Post

References:

[1] Microsoft Learn (2024) External data sharing in Microsoft Fabric [link]

[2] Microsoft Fabric Updates Blog (2025) External data sharing enhancements out now [link]
[3] Microsoft Learn (2024) Fabric: Manage external data shares [link]

Resources:

[R1] Analytics on Azure Blog (2024) External Data Sharing With Microsoft Fabric [link]
[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ADLS - Azure Data Lake Storage
API - Application Programming Interface
B2B - business-to-business
KQL - Kusto Query Language

09 February 2025

🏭🗒️Microsoft Fabric: Sharding [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Feb-2024

[Microsoft Fabric] Data Partitioning (aka Sharding)

{definition} "a process where small chunks of the database are isolated and can be updated independently of other shards" [2]
allows a logical database to be partitioned across multiple physical servers [1]

each partition is referred to as a shard
the largest tables are partitioned across multiple database servers [1]

when operating on a record, the application must determine which shard will contain the data and then send the SQL to the appropriate server [1]

partitioning is based on a Key Value

e.g. such as a user ID

proven technique for achieving data processing on a massive scale [1]

solution used at the largest websites

e.g. Facebook, Twitter
usually associated with rapid growth

⇒ the approach needs to be dynamic [1]

the only way to scale a relational database to massive web use [1]

together with caching and replication [1]

{drawback} involves significant operational complexities and compromises [1]

the application must contain logic that understands the location of any particular piece of data and the logic to route requests to the correct shard [1]
requests that can only be satisfied by accessing more than one shard thus need complex coding as well, whereas on a nonsharded database a single SQL statement might suffice.

{drawback} high operational costs [1]
{drawback} application complexity

it’s up to the application code to route SQL requests to the correct shard [1]

⇒ a dynamic routing layer must be implemented

⇐ most massive websites are adding shards as they grow [1]
layer required to maintain Memcached object copies and to differentiate between the master database and read-only replicas [1]

{drawback} crippled SQL

[sharded database] it is not possible to issue a SQL statement that operates across shards [1]

⇒ usually SQL statements are limited to row-level access [1]
⇒ only programmers can query the database as a whole [1]
joins across shards cannot be implemented, nor can aggregate GROUP BY operations [1]

{drawback} loss of transactional integrity

ACID transactions against multiple shards are not possible and/or not practical [1]

⇐ {exception} there are database systems that support 2PC

involves considerable troubleshooting as conflicts and bottlenecks can occur [1]

{drawback} operational complexity.

load balancing across shards becomes extremely problematic

adding new shards requires a complex rebalancing of data [1]
changing the database schema requires a rolling operation across all the shards [1]

⇒ can lead to transitory inconsistencies in the schema [1]

a sharded database entails a huge amount of operational effort and administrator skill [1]

{concept} CAP (Consistency, Availability, and Partition) theorem

in a distributed database system, one can have at most only two of CAP tolerance [1]
consistency

every user of the database has an identical view of the data at any given instant [1]

availability

in the event of a failure, the database remains operational [1]

partition tolerance

the database can maintain operations in the event of the network’s failing between two segments of the distributed system [1]

{concept} partitioning

{def} core pattern of building scalable services by dividing state (data) and compute into smaller accessible units to improve scalability and performance [5]

⇐ determines that a particular service partition is responsible for a portion of the complete state of the service.

a partition is a set of replicas)

{type} [stateless services] a logical unit that contains one or more instances of a service [5]

partitioning a stateless service is a very rare scenario
scalability and availability are normally achieved by adding more instances
{subtype} externally persisted state

persists its state externally [5]

e.g. databases in Azure SQL Database

{subtype} computation-only services

service that do not manage any persistent state e.g. calculator or image thumbnailing [5]

{type} scalable stateful services

partition state (data)
a partition of a stateful service as a scale unit that is highly reliable through replicas that are distributed and balanced across the nodes in a cluster
the state must be accessed and stored

⇒ bound by

network bandwidth limits
system memory limits
disk storage limits

{scenario} run into resource constraints in a running cluster

{recommendation} scale out the cluster to accommodate the new requirements [4]

{concept}distributed systems platform used to build hyper-scalable, reliable and easily managed applications for the cloud [6]

⇐ addresses the significant challenges in developing and managing cloud applications
places the partitions on different nodes [5]

allows partitions to grow to a node's resource limit

⇐ partitions are rebalances across nodes [5]

{benefit} ensures the continued efficient use of hardware resources [5]

{default} makes sure that there is about the same number of primary and secondary replicas on each node

⇒ nodes that hold replicas can serve more traffic and others that serve less traffic [5]
hot and cold spots may appear in a cluster

⇐ it should be preferably avoided

{recommendation} partition the state so is evenly distributed across all partitions [5]
{recommendation} report load from each of the replicas for the service [5]

provides the capability to report load consumed by services [5]

e.g. amount of memory, number of records
detects which partitions server higher loads than others [5]

⇐ based on the metrics reported

rebalances the cluster by moving replicas to more suitable nodes, so that overall no node is overloaded [5]
⇐ it's not always possible to know how much data will be in a given partition

{recommendation} adopt a partitioning strategy that spreads the data evenly across the partitions [5]

{benefit} prevents situations described in the voting example [5]

{recommendation} report load

{benefit} helps smooth out temporary differences in access or load over time [5]

{recommendation} choose an optimal number of partitions to begin with

⇐ there's nothing that prevents from starting out with a higher number of partitions than anticipated [5]

⇐ assuming the maximum number of partitions is a valid approach [5]

⇒ one may end up needing more partitions than initially considered [5]

⇐ {constraint} the partition count can't be changed after the fact [5]

⇒ apply more advanced partition approaches

e.g. creating a new service instance of the same service type
e.g. implement client-side logic that routes the requests to the correct service instance

Previous Post <<||>> Next Post

References:

[1] Guy Harrison (2015) Next Generation Databases: NoSQL, NewSQL, and Big Data

[2] DAMA International (2017) "The DAMA Guide to the Data Management Body of Knowledge" 2nd Ed

[3] Microsoft Fabric (2024) External data sharing in Microsoft Fabric [link]
[4] Microsoft Fabric (2024) Data sharding policy [link]
[5] Microsoft Fabric (2024) Partition Service Fabric reliable services [link]
[6] MSDN (2015) Microsoft Azure - Azure Service Fabric and the Microservices Architecture [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ACID - atomicity, consistency, isolation, durability

2PC - Two Phase Commit

CAP - Consistency, Availability, Partition

05 March 2024

🧭Business Intelligence: Data Culture (Part I: Generative AI - No Silver Bullet)

Business Intelligence Series

Talking about holy grails in Data Analytics, another topic of major importance for an organization’s "infrastructure" is data culture, that can be defined as the collective beliefs, values, behaviors, and practices of an organization’s employees in harnessing the value of data for decision-making, operations, or insight. Rooted in data literacy, data culture is an extension of an organization’s culture in respect to data that acts as enabler in harnessing the value of data. It’s about thinking critically about data and how data is used to create value.

The current topic was suggested by PowerBI.tips’s webcast from today [3] and is based on Brent Dykes’ article from Forbes ‘Why AI Isn’t Going to Solve All Your Data Culture Problems’ [1]. Dykes’ starting point for the discussion is Wavestone's annual data executive survey based on which the number of companies that reported they had "created a data-driven organization" rose sharply from 23.9 percent in 2023 to 48.1 percent in 2024 [2]. The report’s authors concluded that the result is driven by the adoption of Generative AI, the capabilities of OpenAI-like tools to generate context-dependent meaningful text, images, and other content in response to prompts.

I agree with Dykes that AI technologies can’t be a silver bullet for an organization data culture given that AI either replaces people’s behaviors or augments existing ones, being thus a substitute and not a cure [1]. Even for a disruptive technology like Generative AI, it’s impossible to change so much employees’ mindset in a so short period of time. Typically, a data culture matures over years with sustained effort. Therefore, the argument that the increase is due to respondent’s false perception is more than plausible. There’s indeed a big difference between thinking about an organization as being data-driven and being data-driven.

The three questions-based evaluation considered in the article addresses this difference, thinking vs. being. Changes in data culture don’t occur just because some people or metrics say so, but when people change their mental models based on data, when the interpersonal relations change, when the whole dynamics within the organization changes (positively). If people continue the same behavior and practices, then there are high chances that no change occurred besides the Brownian movement in a confined space of employees, that’s just chaotic motion.

Indeed, a data culture should encourage the discovery, exploration, collaboration, discussions [1] respectively knowledge sharing and make people more receptive and responsive about environmental or circumstance changes. However, just involving leadership and having things prioritized and funded is not enough, no matter how powerful the drive. These can act as enablers, though more important is to awaken and guide people’s interest, working on people’s motivation and supporting the learning process through mentoring. No amount of brute force can make a mind move and evolve freely unless the mind is driven by an inborn curiosity!

Driving a self-driving car doesn’t make one a better driver. Technology should challenge people and expand their understanding of how data can be used in different contexts rather than give solutions based on a mass of texts available as input. This is how people grow meaningfully and how an organization’s culture expands. Readily available answers make people become dull and dependent on technology, which in the long-term can create more problems. Technology can solve problems when used creatively, when problems and their context are properly understood, and the solutions customized accordingly.

Unfortunately, for many organizations data culture will be just a topic to philosophy about. Data culture implies a change of mindset, perception, mental models, behavior, and practices based on data and not only consulting the data to confirm one’s biases on how the business operates!

Previous Post <<||>> Next Post

Resources:
[1] Forbes (2024) Why AI Isn’t Going To Solve All Your Data Culture Problems, by Brent Dykes (link)
[2] Wavestone (2024) 2024 Data and AI Leadership Executive Survey (link)
[3] Power BI tips (2024) Ep.299: AI & Data Culture Problems (link)

21 December 2016

♟️Strategic Management: Operations (Just the Quotes)

"Strategy, or the art of properly directing masses upon the theater of war, either for defense or for invasion. […] Strategy is the art of making war upon the map, and comprehends the whole theater of operations. Grand Tactics is the art of posting troops upon the battle-field according to the accidents of the ground, of bringing them into action, and the art of fighting upon the ground, in contradistinction to planning upon a map. Its operations may extend over a field of ten or twelve miles in extent. Logistics comprises the means and arrangements which work out the plans of strategy and tactics. Strategy decides where to act; logistics brings the troops to this point; grand tactics decides the manner of execution and the employment of the troops." (Antoine-Henri Jomini, "The Art of War", 1838)

"A policy therefore might be likened to strategy, the broad, overall, long term conception which gives direction and purpose to the tactics of immediately daily operations and decisions." (Lawrence K. Frank, "National Policy for the Family", 1948)

"Essential to organization planning, then, is the search for an ideal form of organization to reflect the basic goals of the enterprise. This entails not only charting the main lines of organization and reflecting the organizational philosophy of the enterprise leaders (e.g., shall authority be as centralized as possible, or should the company try to break its operations down into semiautonomous product or territorial divisions?), but also a sketching out of authority relationships throughout the structure." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The concern of OR with finding an optimum decision, policy, or design is one of its essential characteristics. It does not seek merely to define a better solution to a problem than the one in use; it seeks the best solution... [It] can be characterized as the application of scientific methods, techniques, and tools to problems involving the operations of systems so as to provide those in control of the operations with optimum solutions to the problems." (C West Churchman et al, "Introduction to Operations Research", 1957)

"The essential task of management is to arrange organizational conditions and methods of operations so that people can achieve their own goals best by directing their own efforts toward organizational objectives." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"Now we are looking for another basic outlook on the world - the world as organization. Such a conception - if it can be substantiated - would indeed change the basic categories upon which scientific thought rests, and profoundly influence practical attitudes. This trend is marked by the emergence of a bundle of new disciplines such as cybernetics, information theory, general system theory, theories of games, of decisions, of queuing and others; in practical applications, systems analysis, systems engineering, operations research, etc. They are different in basic assumptions, mathematical techniques and aims, and they are often unsatisfactory and sometimes contradictory. They agree, however, in being concerned, in one way or another, with ‘systems’, ‘wholes’ or ‘organizations’; and in their totality, they herald a new approach." (Ludwig von Bertalanffy, "General System Theory", 1968)

"The dogma of delegation is simple - the Sixth Truth of Management again: either the delegatee is capable of running the operation successfully by himself or he isn't. This handy formula relieves the top executive of any responsibility except that of finding, supervising, and (at the appropriate time) moving the men who are doing all the work. He Can then truly manage by exception: he does not get worked up over operations that are going well, but concentrates on the plague spots, where everything, including the management, is going badly." (Robert Heller, "The Naked Manager: Games Executives Play", 1972)

"Perhaps the fault [for the poor implementation record for models] lies in the origins of managerial model-making - the translation of methods and principles of the physical sciences into wartime operations research. [...] If hypothesis, data, and analysis lead to proof and new knowledge in science, shouldn’t similar processes lead to change in organizations? The answer is obvious-NO! Organizational changes (or decisions or policies) do not instantly pow from evidence, deductive logic, and mathematical optimization." (Edward B Roberts, "Interface", 1977)

"Management science [operations research] also involves the philosophy of approaching a problem in a logical manner (i.e., a scientific approach). The logical, consistent, and systematic approach to problem solving can be as useful (and valuable) as the knowledge of the mechanics of the mathematical techniques themselves." (Bernard W. Taylor III, "Introduction to Management Science, 1986)

"Strategy means abstract thinking and planning, as opposed to tactics, which are the individual operations used to implement strategy. Tactics are specific; strategy is general. Tactics tend to be immediate, strategy long-term." (Bruce Pandolfini, "Weapons of Chess: An omnibus of chess strategy", 1989)

"At the heart of reengineering is the notion of discontinuous thinking - of recognizing and breaking away from the outdated rules and fundamental assumptions that underlie operations. Unless we change these rules, we are merely rearranging the deck chairs on the Titanic. We cannot achieve breakthroughs in performance by cutting fat or automating existing processes. Rather, we must challenge old assumptions and shed the old rules that made the business underperform in the first place." (Michael M Hammer, "Reengineering Work: Don't Automate, Obliterate", Magazine, 1990)

"Even with simple and usable models, most organizations will need to upgrade their analytical skills and literacy. Managers must come to view analytics as central to solving problems and identifying opportunities - to make it part of the fabric of daily operations." (Dominic Barton & David Court, "Making Advanced Analytics Work for You", 2012)

"DevOps recognizes the importance of culture. The acronym CAMS (culture, automation, measurement, and sharing) is used to encapsulate its key themes. Culture is acknowledged as all important in making development and IT operations work together effectively. But what is culture in this context? It is not so much about an informal dress code, flexible hours, or a free in-house cafeteria as it is about how decisions are taken, norms of behavior, protocols of communication, and the ways of navigating hierarchy and bureaucracy to get things done." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

11 June 2016

♜Strategic Management: Resilience (Definitions)

"The ability to recover from challenges or to overcome obstacles. In a social-ecological context this refers to the innovation capacity of the organization to successfully address societal and environmental challenges." (Rick Edgeman & Jacob Eskildsen, "Social-Ecological Innovation", 2014)

"The quality of being able to absorb systemic 'shocks' without being destroyed even if recovery produces an altered state to that of the status quo ante." (Philip Cooke, "Regional Innovation Systems in Centralised States: Challenges, Chances, and Crossovers", 2015)

"The ability of an organization to quickly adapt to disruptions while maintaining continuous business operations and safeguarding people, assets, and overall brand equity. Business resilience goes a step beyond disaster recovery, by offering post-disaster strategies to avoid costly downtime, shore up vulnerabilities, and maintain business operations in the face of additional, unexpected breaches." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A capability to anticipate, prepare for, respond to, and recover from significant multi-hazard threats with minimum damage to social well-being, the economy, and the environment." (Carolyn N Stevenson, "Addressing the Sustainable Development Goals Through Environmental Education", 2019)

"The ability of a project to readily resume from unexpected events, threats or actions." (Phil Crosby, "Shaping Mega-Science Projects and Practical Steps for Success", 2019)

"The ability of an infrastructure to resist, respond and overcome adverse events" (Konstantinos Apostolou et al, "Business Continuity of Critical Infrastructures for Safety and Security Incidents", 2020)

"The capacity to respond to, adapt and learn from stressors and changing conditions." (Naomi Borg & Nader Naderpajouh, "Strategies for Business Sustainability in a Collaborative Economy", 2020)

"The word resilience refers to the ability to overcome critical moments and adapt after experiencing some unusual and unexpected situation. It also indicates return to normal." (José G Vargas-Hernández, "Urban Socio-Ecosystems Green Resilience", 2021)

"Operational resilience is a set of techniques that allow people, processes and informational systems to adapt to changing patterns. It is the ability to alter operations in the face of changing business conditions. Operationally resilient enterprises have the organizational competencies to ramp up or slow down operations in a way that provides a competitive edge and enables quick and local process modification." (Gartner)

[Operational resilience:] "The ability of an organization to absorb the impact of any unexpected event without failing to deliver on its brand promise." (Forrester)

[Business resilience:] "The ability to thrive in the face of unpredictable events and circumstances without deteriorating customer experience or sacrificing the long-term viability of the company." (Forrester)

29 March 2016

♜Strategic Management: Decision-Making (Definitions)

[decision-making:] "The process of making choices in a project team environment. Several types of decision-making are useful in projects: consensus, leader-imposed, delegated, voting, and scoring models." (Timothy J Kloppenborg et al, "Project Leadership", 2003)

[semistructured decisions:] "Decisions in which only some of the phases are structured; require a combination of standard solution procedures and individual judgment." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[strategic decision:] "refers to a decision that exhibits the following characteristics: it is made in a situation of uncertainty, of incomplete information, in a complex environment, variable/mutating environment (as opposed to 'all things being otherwise equal'); it is not recurrent, therefore the decision maker is relatively deprived; it may have far-reaching (favorable or adverse) consequences that could jeopardize the survivability of the enterprise; it is systemic (many elements with many relationships among them); the decision maker does not have experience-proven models (we cannot resort to 'turnkey' mechanisms). " (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

[strategic decisions:] "Decisions for sustained enterprise success and business growth." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[tactical decisions:] "Decisions ensuring that existing operations and processes are in alignment with business objectives and strategies." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[decision-making processes:] "Management processes that define objectives, study alternatives, analyze available data, and reflect on intuitive beliefs. They interpret findings and compare alternates to form a conclusion or make a choice upon which the organization may act." (Carl F Lehmann, "Strategy and Business Process Management", 2012)

[microdecision:] "A small decision made many times by many workers at the front line of the organization. They usually have a significant impact on organizational performance due to their sheer volume." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

[decision-making:] "How decisions are made, based on what types of resources, information, and specific processes are available." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

[decision-making] "the process of making choices or reaching conclusions, especially on important political or business matters." (Ken Sylvester, "Negotiating in the Leadership Zone", 2015)

[tactical decisions:] "broader decision questions than operational decisions, semistructured in nature, some but not all information necessary to make the decision is available, primarily internally focused and made by middle-level managers." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

[operating or function-specific decisions:] "day-to-day, routine decisions with a concise decision question and a clear, well-defined, and structured algorithm to make a choice among alternatives." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

[strategic decisions:] "complex, nonroutine, unstructured decisions involving many different and connected parts. Some variables may not be well understood, often information required to make the decision may be unavailable, incomplete, and in some situations information may be known to be flawed or inaccurate. These decisions usually involve a high degree of uncertainty about outcomes. If implemented, strategic decisions often result in major changes in an organization." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

17 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VII (List of Values)

Introduction

Lists are one of the basic structures in Mathematics, the term referring to an (ordered) set of elements separated by comma, space or any other delimiter (e.g. “:”, “;”). The elements of a list can be numbers, words, functions, or any other type of objects. In the world of databases, a list is typically formed out of the values of a given column or a given record, however it could span also a combination of rows and records, is such cases two delimiters being needed – one for column and one for row. From here comes probably the denomination of list of values. In a more general accept a list of values could be regarded as a delimited/concatenated subset. Such lists are formed when needed to send the data between the layers of an application or applications, this type of encoding being quite natural. In fact, also the data in a database are stored in similar tabular delimited structure, more complex though.

An useful example in which the list of values are quite handy is the passing of multiple values within the parameter of stored procedure or function (see example). This supposes first building the list and then use the values in a dynamic build query (like in the before mentioned example) or by building a table on the fly. We can call the two operations composition, respectively decomposition of list of values.

Composition

Composition, whether on vertical or horizontal is nothing but a concatenation in which the values alternate with one or more delimiters. Let’s reconsider the concatenation based on the values of a Person.AddressType AdventureWorks table. As the logic for concatenating for one or more attributes is the same, the below example concatenates a list based on a single attribute, namely AddressTypeID in SingleList, respectively two attributes, AddressTypeID and Name.

-- concatenation of values across a table 
;WITH CTE (AddressTypeID, Name, Ranking) 
AS (--preparing the data       
     SELECT AddressTypeID  
     , Name 
     , ROW_NUMBER () OVER(ORDER BY Name) Ranking 
     FROM Person.AddressType 
     -- WHERE ... 
) 
, DAT (SingleList, DoubleList, Ranking) 
AS ( -- concatenating the values 
     SELECT Cast(AddressTypeID as varchar(max)) SingleList 
     , Cast('('+ Cast(AddressTypeID as varchar(10)) + ',''' + Name + ''')' as varchar(max)) DoubleList 
     , Ranking 
     FROM CTE 
     WHERE Ranking = 1 
     UNION ALL 
     SELECT DAT.SingleList + ',' + Cast(CTE.AddressTypeID as varchar(20)) SingleList 
    , Cast(DAT.DoubleList + ', ('+ Cast(CTE.AddressTypeID as varchar(10)) + ',''' + CTE.Name + ''')' as varchar(max)) DoubleList 
    , CTE.Ranking  
     FROM CTE          
       JOIN DAT           
          ON CTE.Ranking = DAT.Ranking + 1       
)  

-- the lists 
SELECT SingleList 
, DoubleList 
FROM DAT 
WHERE Ranking = (SELECT MAX(Ranking) FROM DAT)

The second example is based on atypical delimiters, resembling to the structure built for a batch insert or table value constructor-based statement, and as we’ll see later, ideal to be used in a dynamically-built query

Decomposition

Decomposition follows the inverse path, though it’s much easier to exemplify. In fact it’s used the same technique introduced in the last example from the previous post belonging to the same cycle, Subparts of a String, in which a space was used as delimiter. Another example is the dbo.SplitList function which decomposes a string using a loop.

-- decomposition of a string to a table using CTE 
CREATE FUNCTION dbo.StringToTable( 
 @str varchar(500) 
,@Delimiter char(1)) 
RETURNS @Temp TABLE ( 
Id int NOT NULL 
,Value varchar(50)) 
AS 
BEGIN  
     ;WITH CTE (PrevString, Position, Word)  
     AS (  
     SELECT LTrim(RTrim( CASE  
           WHEN CharIndex(@Delimiter, @str)>;0 THEN Right(@str, Len(@str)-CharIndex(@Delimiter, @str))  
           ELSE ''  
      END)) PrevString  
     , 1 Position  
     , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, @str)>0 THEN LEFT(@str, CharIndex(@Delimiter, @str)-1)  
           ELSE @str  
       END)) Word  
      UNION ALL  
      SELECT LTrim(RTrim(CASE  
            WHEN CharIndex(@Delimiter, PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(@Delimiter, PrevString))  
             ELSE ''  
       END)) PrevString  
      , Position + 1 Position  
      , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, PrevString)>0 THEN LEFT(PrevString, CharIndex(@Delimiter, PrevString)-1)  
          ELSE PrevString  
      END)) Word      FROM CTE  
     WHERE Len(PrevString)>0  
    )  
     INSERT @Temp(Id, Value) 
     SELECT Position  
     , Word      FROM CTE  
     OPTION (maxrecursion 100)  
     RETURN 
END

Here are two examples based on the single list created above and another one based on alphabet:

-- decomposing a list
SELECT Id 
, value 
FROM dbo.StringToTable('6,1,2,3,4,5', ',')     


-- decomposing the "alphabet" 
SELECT Id 
, value 
FROM dbo.StringToTable('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z', ',')

Even if the function deals only with a delimiter, it could be used to decompose lists involving multiple delimiters, as long the list is adequately built:

-- decomposing double list 
SELECT Id 
, value 
, Left(value, CHARINDEX(',', value)-1) LeftValue 
, Right(value, len(value)-CHARINDEX(',', value)) RightValue 
FROM dbo.StringToTable('6,Archive;1,Billing;2,Home;3,Main Office;4,Primary;5,Shipping', ';')

The tables built thus from list of values can be further used in queries when needed to create a table on the fly. It would be interesting maybe to show that the composition and decomposition are inverse functions, however that’s out of scope, at least for current set of posts.

11 March 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VIII (Insertions, Deletions and Replacements)

Until now, the operations with strings resumed to concatenation and its reverse operation(s) - extracting a substring or splitting a string into substrings. It was just the warm up! There are several other important operations that involve the internal manipulation of strings – insertion, deletion and replacement of a substring in a given string, operations performed using the Replace and Stuff functions.

Replace function, as its name denotes, replaces all occurrences of a specified string value with another. Several scenarios in which the function is quite useful: the replacement of delimiters, special characters, correcting misspelled words or any other chunks of text. Here are some basic simple examples, following to consider the before mentioned applications in other posts:

-- examples with replace 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT replace(@str, ' ', ',') Example1 
, replace(@str, ' ', ' ') Example2 
, replace(@str, ' ', '') Example3 
, replace(@str, 'is', 'as') Example4

Output:

Example1	Example2	Example3	Example4
this,is,a,test,string	this is a test string	thisisateststring	thas as a test string

When there are good chances that the searched string won’t appear in the “searched” string, and especially when additional logic is depending on the replacement, logic that could be included in the same expression with the replacement, then maybe it makes sense to check first if the searched character is present:

-- replacement with check 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
SET @str = 'this is a test string' 
SET @search = 'this string' 
SET @replacememt = 'other string' 
SELECT CASE            
    WHEN CharIndex(@search, @str)>0 THEN Replace(@str, @search, @replacememt)             
    ELSE @str        
END result

Unfortunately the function doesn’t have the flexibility of the homonym functions provided by the languages from the family of VB (VBScript, VB.NET), which allow to do the replacement starting with a given position, and/or for a given number of occurrences. This type of behavior could be obtained with a simple trick – splitting the string into two other strings, performing the replacement on the second string, and then concatenating the first string and the result of the replacement:

-- replacement starting with a given position 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
DECLARE @start int 
SET @str = 'this is a test string' 
SET @search = 's' 
SET @replacememt = 'x' 
SET @start = 7 
SELECT Left(@str, @start-1) FirstPart 
, RIGHT(@str, Len(@str)-@start+1) SecondPart 
, CASE        
    WHEN @start <= LEN(@str) THEN Left(@str, @start-1) + Replace(RIGHT(@str, Len(@str)-@start+1), @search, @replacememt)        
     ELSE @str  
END Replacement

Output:

FirstPart	SecondPart	Replacement
this i	s a test string	this ix a text xtring

The logic can be encapsulated in a function together with additional validation logic.

Stuff function inserts a string into another string starting with a given position and deleting a specified number of characters. Even if seldom used, the function it’s quite powerful allowing to insert a string in another, to remove a part of a string or more general, to replace a single occurrence of a string with another string, as can be seen from the below examples:

-- Stuff-based examples 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT STUFF(@str, 6, 2, 'was ') Example1 
, STUFF(@str, 1, 0, 'and ') Example2 
, STUFF(@str, 1, 0, 'that') Example3 
, STUFF(@str, LEN(@str) + 1, 0, '!') Example4

Output:

Example1	Example2	Example3	Example4
this was a test string	and this is a test string	thatthis is a test string	NULL

If in the first example is done a replacement of a text from a fix position, in the next examples are attempted insert on a first, middle respectively end position. As can be seen, the last example doesn’t work as expected, this because the insert position can’t go over the length of the target string. Actually, if the insert needs to be done at the beginning, respectively the end of a string, a concatenation can be much easier to use. A such example is the padding of strings with leading or trailing characters, typically in order to arrive to a given length. SQL Server doesn’t provide such a function, however the function is quite easy to build.

-- left/right padding

DECLARE @str varchar(30) 
DECLARE @length int  
DECLARE @padchar varchar(1) 
SET @str = '12345'  
SET @length = 10 
SET @padchar = '0' 
SELECT @str StringToPad  
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding

Output:

StringToPad	LeftPadding	RightPadding
12345	0000012345	1234500000

Note:
The queries work also in SQL databases in Microsoft Fabric.

Happy Coding!

SQL Troubles

Pages

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

🏭🗒️Microsoft Fabric: External Data Sharing [Notes]

09 February 2025

🏭🗒️Microsoft Fabric: Sharding [Notes]

05 March 2024

🧭Business Intelligence: Data Culture (Part I: Generative AI - No Silver Bullet)

21 December 2016

♟️Strategic Management: Operations (Just the Quotes)

11 June 2016

♜Strategic Management: Resilience (Definitions)

29 March 2016

♜Strategic Management: Decision-Making (Definitions)

17 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VII (List of Values)

Introduction

Composition

11 March 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VIII (Insertions, Deletions and Replacements)

About Me