Showing posts with label operations. Show all posts
Showing posts with label operations. Show all posts

26 March 2025

💠🏭🗒️Microsoft Fabric: Polaris SQL Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources and may deviate from them. Please consult the sources for the exact content!

Unfortunately, besides the references papers, there's almost no material that could be used to enhance the understanding of the concepts presented. 

Last updated: 26-Mar-2025

Read and Write Operations in Polaris [2]

[Microsoft Fabric] Polaris SQL Pool

  • {def} distributed SQL query engine that powers Microsoft Fabric's data warehousing capabilities
    • designed to unify data warehousing and big data workloads while separating compute and state for seamless cloud-native operations
    • based on a robust DCP 
      • designed to execute read-only queries in a scalable, dynamic and fault-tolerant way [1]
      • a highly-available micro-service architecture with well-defined responsibilities [2]
        • data and query processing is packaged into units (aka tasks) 
          • can be readily moved across compute nodes and re-started at the task level
        • widely-partitioned data with a flexible distribution model [2]
        • a task-level "workflow-DAG" that is novel in spanning multiple queries [2]
        • a framework for fine-grained monitoring and flexible scheduling of tasks [2]
  • {component} SQL Server Front End (SQL-FE)
    • responsible for 
      • compilation
      • authorization
      • authentication
      • metadata
        • used by the compiler to 
          • {operation} generate the search space (aka MEMO) for incoming queries
          • {operation} bind metadata to data cells
          • leveraged to ensure the durability of the transaction manifests at commit [2]
            • only transactions that successfully commit need to be actively tracked to ensure consistency [2]
            • any manifests and data associated with aborted transactions are systematically garbage-collected from OneLake through specialized system tasks [2]
  • {component} SQL Server Backend (SQL-BE)
    • used to perform write operations on the LST [2]
      • inserting data into a LST creates a set of Parquet files that are then recorded in the transaction manifest [2]
      • a transaction is represented by a single manifest file that is modified concurrently by (one or more) SQL BEs [2]
        • SQL BE leverages the Block Blob API provided by ADLS to coordinate the concurrent writes  [2]
        • each SQL BE instance serializes the information about the actions it performed, either adding a Parquet file or removing it [2]
          • the serialized information is then uploaded as a block to the manifest file
          • uploading the block does not yet make any visible changes to the file [2]
            • each block is identified by a unique ID generated on the writing SQL BE [2]
        • after completion, each SQL BE returns the ID of the block(s) it wrote to the Polaris DCP [2]
          • the block IDs are then aggregated by the Polaris DCP and returned to the SQL FE as the result of the query [2]
      • the SQL FE further aggregates the block IDs and issues a Commit Block operation against storage with the aggregated block IDs [2]
        • at this point, the changes to the file on storage will become effective [2]
      • changes to the manifest file are not visible until the Commit operation on the SQL FE
        • the Polaris DCP can freely restart any part of the operation in case there is a failure in the node topology [2]
      • the IDs of any blocks written by previous attempts are not included in the final list of block IDs and are discarded by storage [2]
    • [read operations] SQL BE is responsible for reconstructing the table snapshot based on the set of manifest files managed in the SQL FE
      • the result is the set of Parquet data files and deletion vectors that represent the snapshot of the table [2]
        • queries over these are processed by the SQL Server query execution engine [2]
        • the reconstructed state is cached in memory and organized in such a way that the table state can be efficiently reconstructed as of any point in time [2]
          • enables the cache to be used by different operations operating on different snapshots of the table [2]
          • enables the cache to be incrementally updated as new transactions commit [2]
  • {feature} supports explicit user transactions
    • can execute multiple statements within the same transaction in a consistent way
      • the manifest file associated with the current transaction captures all the (reconciled) changes performed by the transaction [2]
        • changes performed by prior statements in the current transaction need to be visible to any subsequent statement inside the transaction (but not outside of the transaction) [2]
    • [multi-statement transactions] in addition to the committed set of manifest files, the SQL BE reads the manifest file of the current transaction and then overlays these changes on the committed manifests [1]
    • {write operations} the behavior of the SQL BE depends on the type of the operation.
      • insert operations 
        • only add new data and have no dependency on previous changes [2]
        • the SQL BE can serialize the metadata blocks holding information about the newly created data files just like before [2]
        • the SQL FE, instead of committing only the IDs of the blocks written by the current operation, will instead append them to the list of previously committed blocks
          • ⇐ effectively appends the data to the manifest file [2]
    • {update|delete operations} 
      • handled differently 
        • ⇐ since they can potentially further modify data already modified by a prior statement in the same transaction [2]
          • e.g. an update operation can be followed by another update operation touching the same rows
        • the final transaction manifest should not contain any information about the parts from the first update that were made obsolete by the second update [2]
      • SQL BE leverages the partition assignment from the Polaris DCP to perform a distributed rewrite of the transaction manifest to reconcile the actions of the current operation with the actions recorded by the previous operation [2]
        • the resulting block IDs are sent again to the SQL FE where the manifest file is committed using the (rewritten) block IDs [2]
  • {concept} Distributed Query Processor (DQP)
    • responsible for 
      • distributed query optimization
      • distributed query execution
      • query execution topology management
  • {concept} Workload Management (WLM)
    •  consists of a set of compute servers that are, simply, an abstraction of a host provided by the compute fabric, each with a dedicated set of resources (disk, CPU and memory) [2]
      • each compute server runs two micro-services
        • {service} Execution Service (ES) 
          • responsible for tracking the life span of tasks assigned to a compute container by the DQP [2]
        • {service} SQL Server instance
          • used as the back-bone for execution of the template query for a given task  [2]
            • ⇐ holds a cache on top of local SSDs 
              • in addition to in-memory caching of hot data
            • data can be transferred from one compute server to another
              • via dedicated data channels
              • the data channel is also used by the compute servers to send results to the SQL FE that returns the results to the user [2]
              • the life cycle of a query is tracked via control flow channels from the SQL FE to the DQP, and the DQP to the ES [2]
  • {concept} cell data abstraction
    • the key building block that enables to abstract data stores
      • abstracts DQP from the underlying store [1]
      • any dataset can be mapped to a collection of cells [1]
      • allows distributing query processing over data in diverse formats [1]
      • tailored for vectorized processing when the data is stored in columnar formats [1] 
      • further improves relational query performance
    • 2-dimenstional
      • distributions (data alignment)
      • partitions (data pruning)
    • each cell is self-contained with its own statistics [1]
      • used for both global and local QO [1]
      • cells can be grouped physically in storage [1]
      • queries can selectively reference either cell dimension or even individual cells depending on predicates and type of operations present in the query [1]
    • {concept} distributed query processing (DQP) framework
      • operates at the cell level 
      • agnostic to the details of the data within a cell
        • data extraction from a cell is the responsibility of the (single node) query execution engine, which is primarily SQL Server, and is extensible for new data types [1], [2]
  • {concept} dataset
    • logically abstracted as a collection of cells [1] 
    • can be arbitrarily assigned to compute nodes to achieve parallelism [1]
    • uniformly distributed across a large number of cells 
      • [scale-out processing] each dataset must be distributed across thousands of buckets or subsets of data objects,
      •  such that they can be processed in parallel across nodes
  • {concept} session
    • supports a spectrum of consumption models, ranging from serverless ad-hoc queries to long-standing pools or clusters [1]
    • all data are accessible from any session [1]
      • multiple sessions can access all underlying data concurrently  [1]
  • {concept} Physical Metadata layer
    • new layer introduced in the SQL Server storage engine [2]
See also: Polaris

References:
[1] Josep Aguilar-Saborit et al (2020) POLARIS: The Distributed SQL Engine in Azure Synapse, Proceedings of the VLDB Endowment PVLDB 13(12) [link]
[2] Josep Aguilar-Saborit et al (2024), Extending Polaris to Support Transactions [link]
[3] Gjnana P Duvvuri (2024) Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
[R2] Patrick Pichler (2023) Data Warehouse (Polaris) vs. Data Lakehouse (Spark) in Microsoft Fabric [link]
[R3] Tiago Balabuch (2023) Microsoft Fabric Data Warehouse - The Polaris engine [link]

Acronyms:
CPU - Central Processing Unit
DAG - Directed Acyclic Graph
DB - Database
DCP - Distributed Computation Platform 
DQP - Distributed Query Processing 
DWH - Data Warehouses 
ES - Execution Service
LST - Log-Structured Table
SQL BE - SQL Backend
SQL FE - SQL Frontend
SSD - Solid State Disk
WAL - Write-Ahead Log
WLM - Workload Management

🏭🗒️Microsoft Fabric: External Data Sharing [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 26-Mar-2025

External data sharing
External data sharing [1]

[Microsoft Fabric] External data sharing 

  • {def} feature that enables Fabric users to share data from their tenant with users in another Fabric tenant (aka cross-tenant sharing) [1]
    • the data is shared in-place from OneLake storage locations in the sharer's tenant [1]
      • ⇒ no data is actually copied to the other tenant [1]
      • creates a OneLake shortcut in the other tenant that points back to the original data in the sharer's tenant [1]
      • data is exposed as read-only [1]
      • data can be consumed by any OneLake compatible Fabric workload in that tenant [1]
    • {benefit} allows for efficient and secure data sharing without duplicating data
      • the shared data remains read-only for the consumer, ensuring data integrity and consistency [2]
      • multiple tables and folders can be shared at once [2]
  • {prerequisite} Fabric admins must turn on external data sharing both in the sharer's tenant and in the external tenant
    • by specifying who can create and accept external data shares [1]
    • users can share data residing in tables or files within supported Fabric items [1]
      • require standard Fabric read and reshare permissions for the item being shared [1]
      • the user creating the share invites a user from another tenant with a link to accept the external data share [1]
        • upon accepting the share, the recipient chooses a lakehouse where a shortcut to the shared data will be created [1]
        • the links work only for users in external tenants
          • for sharing data within the same OneLake storage accounts with users in the same tenant, use OneLake shortcuts [1]
        • {limitation} shortcuts contained in folders that are shared via external data sharing won't resolve in the consumer tenant [1]
    • access is enabled via a dedicated Fabric-to-Fabric authentication mechanism 
    • ⇐ doesn’t require Entra B2B guest user access [1]
  • {operation} create an external data share in the provider tenant)
    • external data shares can be created for tables or files in lakehouses and warehouses, and in KQL, SQL, mirrored databases [1]
    • {limitation} the sharer can't control who has access to the data in the consumer's tenant [1]
  • {operation} accept an external data share in consuming tenant)
    • only lakehouses can be chosen for the operation
    • the consumer can grant access to the data to anyone [1]
      • incl. guest users from outside the consumer's organization [1]
    • data can be transferred across geographic boundaries when it's accessed within the consumer's tenant [1]
  • {operation} revoke extern data shares
    • any user in the sharing tenant with read and reshare permissions on an externally shared item can revoke the external data share at any time [1]
      • via Manage permissions >> External data shares tab
      • can be performed of any item the user has read and reshare permissions on [3]
      • {warning}a revoked external data share can't be restored [3] 
        • irreversibly severs all access from the receiving tenant to the shared data [3]
        • a new external data share can be created instead [3]
  • applies to
    • lakehouse
      • an entire lakehouse schema can be shared [2]
        • shares all the tables in the schema [2]
        • any changes to the schema are immediately reflected in the consumer’s lakehouse [2]
    • mirrored database
    • KQL database
    • OneLake catalog
  • can be consumed via 
    • Spark workloads
      • notebooks or Spark
    • lakehouse SQL Analytics Endpoint
    • semantic models
    • ⇐ data can be shared from a provider and consumed in-place via SQL queries or in a Power BI report [2]
  • {feature} external data sharing APIs
    • support service principals for admin and user operations
      • can be used to automate the creation or management of shares [2]
        • supports service principals or managed identity [2]
    • {planned} data warehouse support
      • share schemas or tables from a data warehouse [2]
    • {planned} shortcut sharing
      • share OneLake shortcuts using external data sharing [2]
      • ⇒ data residing outside of Fabric will be externally shareable
        • S3, ADLS Gen2, or any other supported shortcut location [2]
    • {planned} consumer APIs
      • consumer activities, including viewing share details and accepting shares, will soon be available via API [2]

References:
[1] Microsoft Learn (2024) External data sharing in Microsoft Fabric [link]
[2] Microsoft Fabric Updates Blog (2025) External data sharing enhancements out now [link]
[3] Microsoft Learn (2024) Fabric: Manage external data shares [link]

Resources:
[R1] Analytics on Azure Blog (2024) External Data Sharing With Microsoft Fabric [link]
[R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
ADLS - Azure Data Lake Storage
API - Application Programming Interface
B2B - business-to-business
KQL - Kusto Query Language

09 February 2025

🏭🗒️Microsoft Fabric: Sharding [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Feb-2024

 [Microsoft Fabric] Data Partitioning (aka Sharding)

  • {definition} "a process where small chunks of the database are isolated and can be updated independently of other shards" [2]
  • allows a logical database to be partitioned across multiple physical servers [1]
    • each partition is referred to as a shard
    • the largest tables are partitioned across multiple database servers [1]
      • when operating on a record, the application must determine which shard will contain the data and then send the SQL to the appropriate server [1]
    • partitioning is based on a Key Value
      • e.g. such as a user ID
  • proven technique for achieving data processing on a massive scale [1]
    • solution used at the largest websites 
      • e.g. Facebook, Twitter
      • usually associated with rapid growth
        •  ⇒ the approach needs to be dynamic [1]
    • the only way to scale a relational database to massive web use [1]
      • together with caching and replication [1]
  • {drawback} involves significant operational complexities and compromises [1]
    • the application must contain logic that understands the location of any particular piece of data and the logic to route requests to the correct shard [1]
    • requests that can only be satisfied by accessing more than one shard thus need complex coding as well, whereas on a nonsharded database a single SQL statement might suffice.
  • {drawback} high operational costs [1]
  • {drawback} application complexity
    • it’s up to the application code to route SQL requests to the correct shard [1]
      • ⇒ a dynamic routing layer must be implemented
        • ⇐ most massive websites are adding shards as they grow [1]
        • layer required to maintain Memcached object copies and to differentiate between the master database and read-only replicas [1]
  • {drawback} crippled SQL
    • [sharded database] it is not possible to issue a SQL statement that operates across shards [1]
      • ⇒ usually SQL statements are limited to row-level access [1]
      • ⇒ only programmers can query the database as a whole [1]
      • joins across shards cannot be implemented, nor can aggregate GROUP BY operations [1]
  • {drawback} loss of transactional integrity
    • ACID transactions against multiple shards are not possible and/or not practical [1]
      • ⇐ {exception} there are database systems that support 2PC
        • involves considerable troubleshooting as conflicts and bottlenecks can occur [1]
  • {drawback} operational complexity. 
    • load balancing across shards becomes extremely problematic
      • adding new shards requires a complex rebalancing of data [1]
      • changing the database schema requires a rolling operation across all the shards [1]
        • ⇒ can lead to transitory inconsistencies in the schema [1]
    • a sharded database entails a huge amount of operational effort and administrator skill [1]
  • {concept} CAP (Consistency, Availability, and Partition) theorem 
    • in a distributed database system, one can have at most only two of CAP tolerance [1]
    • consistency
      • every user of the database has an identical view of the data at any given instant [1]
    • availability
      • in the event of a failure, the database remains operational [1]
    • partition tolerance
      • the database can maintain operations in the event of the network’s failing between two segments of the distributed system [1]
  • {concept} partitioning
    • {def} core pattern of building scalable services by dividing state (data) and compute into smaller accessible units to improve scalability and performance [5]
      • ⇐ determines that a particular service partition is responsible for a portion of the complete state of the service.
        • a partition is a set of replicas)
    • {type} [stateless services] a logical unit that contains one or more instances of a service [5]
      • partitioning a stateless service is a very rare scenario
      • scalability and availability are normally achieved by adding more instances
      • {subtype} externally persisted state
        • persists its state externally [5]
          • e.g. databases in Azure SQL Database
      • {subtype} computation-only services
        • service that do not manage any persistent state e.g. calculator or image thumbnailing [5]
    • {type} scalable stateful services
      • partition state (data)
      • a partition of a stateful service as a scale unit that is highly reliable through replicas that are distributed and balanced across the nodes in a cluster
      • the state must be accessed and stored
        • ⇒ bound by
          • network bandwidth limits
          • system memory limits
          • disk storage limits
        • {scenario} run into resource constraints in a running cluster
          • {recommendation} scale out the cluster to accommodate the new requirements [4]
  • {concept}distributed systems platform used to build hyper-scalable, reliable and easily managed applications for the cloud [6]
    • ⇐ addresses the significant challenges in developing and managing cloud applications
    • places the partitions on different nodes [5]
      • allows partitions to grow to a node's resource limit
        • ⇐ partitions are rebalances across nodes [5]
      • {benefit} ensures the continued efficient use of hardware resources [5]
    • {default} makes sure that there is about the same number of primary and secondary replicas on each node
      • ⇒ nodes that hold replicas can serve more traffic and others that serve less traffic [5]
      • hot and cold spots may appear in a cluster
          • ⇐ it should be preferably avoided
        • {recommendation} partition the state so is evenly distributed across all partitions [5]
        • {recommendation} report load from each of the replicas for the service [5]
    • provides the capability to report load consumed by services [5]
      • e.g.  amount of memory, number of records
      • detects which partitions server higher loads than others [5]
        • ⇐ based on the metrics reported
      • rebalances the cluster by moving replicas to more suitable nodes, so that overall no node is overloaded  [5]
      • ⇐ it's not always possible to know how much data will be in a given partition
        • {recommendation} adopt a partitioning strategy that spreads the data evenly across the partitions [5]
          • {benefit} prevents situations described in the voting example [5]
        • {recommendation} report load
          • {benefit} helps smooth out temporary differences in access or load over time [5]
    • {recommendation} choose an optimal number of partitions to begin with
      • ⇐ there's nothing that prevents from starting out with a higher number of partitions than anticipated [5]
        • ⇐ assuming the maximum number of partitions is a valid approach [5]
      • ⇒ one may end up needing more partitions than initially considered [5]
        • ⇐ {constraint} the partition count can't be changed after the fact [5]
          •  ⇒ apply more advanced partition approaches
            • e.g. creating a new service instance of the same service type
            • e.g. implement client-side logic that routes the requests to the correct service instance

References:
[1] Guy Harrison (2015) Next Generation Databases: NoSQL, NewSQL, and Big Data
[2] DAMA International (2017) "The DAMA Guide to the Data Management Body of Knowledge" 2nd Ed
[3] Microsoft Fabric (2024) External data sharing in Microsoft Fabric [link]
[4] Microsoft Fabric (2024) Data sharding policy [link]
[5] Microsoft Fabric (2024) Partition Service Fabric reliable services [link]
[6] MSDN (2015) Microsoft Azure - Azure Service Fabric and the Microservices Architecture [link]

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
ACID - atomicity, consistency, isolation, durability
2PC - Two Phase Commit
CAP - Consistency, Availability, Partition

05 March 2024

🧭Business Intelligence: Data Culture (Part I: Generative AI - No Silver Bullet)

Business Intelligence
Business Intelligence Series

Talking about holy grails in Data Analytics, another topic of major importance for an organization’s "infrastructure" is data culture, that can be defined as the collective beliefs, values, behaviors, and practices of an organization’s employees in harnessing the value of data for decision-making, operations, or insight. Rooted in data literacy, data culture is an extension of an organization’s culture in respect to data that acts as enabler in harnessing the value of data. It’s about thinking critically about data and how data is used to create value. 

The current topic was suggested by PowerBI.tips’s webcast from today [3] and is based on Brent Dykes’ article from Forbes ‘Why AI Isn’t Going to Solve All Your Data Culture Problems’ [1]. Dykes’ starting point for the discussion is Wavestone's annual data executive survey based on which the number of companies that reported they had "created a data-driven organization" rose sharply from 23.9 percent in 2023 to 48.1 percent in 2024 [2]. The report’s authors concluded that the result is driven by the adoption of Generative AI, the capabilities of OpenAI-like tools to generate context-dependent meaningful text, images, and other content in response to prompts. 

I agree with Dykes that AI technologies can’t be a silver bullet for an organization data culture given that AI either replaces people’s behaviors or augments existing ones, being thus a substitute and not a cure [1]. Even for a disruptive technology like Generative AI, it’s impossible to change so much employees’ mindset in a so short period of time. Typically, a data culture matures over years with sustained effort. Therefore, the argument that the increase is due to respondent’s false perception is more than plausible. There’s indeed a big difference between thinking about an organization as being data-driven and being data-driven. 

The three questions-based evaluation considered in the article addresses this difference, thinking vs. being. Changes in data culture don’t occur just because some people or metrics say so, but when people change their mental models based on data, when the interpersonal relations change, when the whole dynamics within the organization changes (positively). If people continue the same behavior and practices, then there are high chances that no change occurred besides the Brownian movement in a confined space of employees, that’s just chaotic motion.  

Indeed, a data culture should encourage the discovery, exploration, collaboration, discussions [1] respectively knowledge sharing and make people more receptive and responsive about environmental or circumstance changes. However, just involving leadership and having things prioritized and funded is not enough, no matter how powerful the drive. These can act as enablers, though more important is to awaken and guide people’s interest, working on people’s motivation and supporting the learning process through mentoring. No amount of brute force can make a mind move and evolve freely unless the mind is driven by an inborn curiosity!

Driving a self-driving car doesn’t make one a better driver. Technology should challenge people and expand their understanding of how data can be used in different contexts rather than give solutions based on a mass of texts available as input. This is how people grow meaningfully and how an organization’s culture expands. Readily available answers make people become dull and dependent on technology, which in the long-term can create more problems. Technology can solve problems when used creatively, when problems and their context are properly understood, and the solutions customized accordingly.

Unfortunately, for many organizations data culture will be just a topic to philosophy about. Data culture implies a change of mindset, perception, mental models, behavior, and practices based on data and not only consulting the data to confirm one’s biases on how the business operates!

Previous Post <<||>> Next Post

Resources:
[1] Forbes (2024) Why AI Isn’t Going To Solve All Your Data Culture Problems, by Brent Dykes (link)
[2] Wavestone (2024) 2024 Data and AI Leadership Executive Survey (link)
[3] Power BI tips (2024) Ep.299: AI & Data Culture Problems (link)

21 December 2016

♟️Strategic Management: Operations (Just the Quotes)

"Strategy, or the art of properly directing masses upon the theater of war, either for defense or for invasion. […] Strategy is the art of making war upon the map, and comprehends the whole theater of operations. Grand Tactics is the art of posting troops upon the battle-field according to the accidents of the ground, of bringing them into action, and the art of fighting upon the ground, in contradistinction to planning upon a map. Its operations may extend over a field of ten or twelve miles in extent. Logistics comprises the means and arrangements which work out the plans of strategy and tactics. Strategy decides where to act; logistics brings the troops to this point; grand tactics decides the manner of execution and the employment of the troops." (Antoine-Henri Jomini, "The Art of War", 1838)

"A policy therefore might be likened to strategy, the broad, overall, long term conception which gives direction and purpose to the tactics of immediately daily operations and decisions." (Lawrence K. Frank, "National Policy for the Family", 1948)

"Essential to organization planning, then, is the search for an ideal form of organization to reflect the basic goals of the enterprise. This entails not only charting the main lines of organization and reflecting the organizational philosophy of the enterprise leaders (e.g., shall authority be as centralized as possible, or should the company try to break its operations down into semiautonomous product or territorial divisions?), but also a sketching out of authority relationships throughout the structure." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The concern of OR with finding an optimum decision, policy, or design is one of its essential characteristics. It does not seek merely to define a better solution to a problem than the one in use; it seeks the best solution... [It] can be characterized as the application of scientific methods, techniques, and tools to problems involving the operations of systems so as to provide those in control of the operations with optimum solutions to the problems." (C West Churchman et al, "Introduction to Operations Research", 1957)

"The essential task of management is to arrange organizational conditions and methods of operations so that people can achieve their own goals best by directing their own efforts toward organizational objectives." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"Now we are looking for another basic outlook on the world - the world as organization. Such a conception - if it can be substantiated - would indeed change the basic categories upon which scientific thought rests, and profoundly influence practical attitudes. This trend is marked by the emergence of a bundle of new disciplines such as cybernetics, information theory, general system theory, theories of games, of decisions, of queuing and others; in practical applications, systems analysis, systems engineering, operations research, etc. They are different in basic assumptions, mathematical techniques and aims, and they are often unsatisfactory and sometimes contradictory. They agree, however, in being concerned, in one way or another, with ‘systems’, ‘wholes’ or ‘organizations’; and in their totality, they herald a new approach." (Ludwig von Bertalanffy, "General System Theory", 1968)

"The dogma of delegation is simple - the Sixth Truth of Management again: either the delegatee is capable of running the operation successfully by himself or he isn't. This handy formula relieves the top executive of any responsibility except that of finding, supervising, and (at the appropriate time) moving the men who are doing all the work. He Can then truly manage by exception: he does not get worked up over operations that are going well, but concentrates on the plague spots, where everything, including the management, is going badly." (Robert Heller, "The Naked Manager: Games Executives Play", 1972)

"Perhaps the fault [for the poor implementation record for models] lies in the origins of managerial model-making - the translation of methods and principles of the physical sciences into wartime operations research. [...] If hypothesis, data, and analysis lead to proof and new knowledge in science, shouldn’t similar processes lead to change in organizations? The answer is obvious-NO! Organizational changes (or decisions or policies) do not instantly pow from evidence, deductive logic, and mathematical optimization." (Edward B Roberts, "Interface", 1977)

"Management science [operations research] also involves the philosophy of approaching a problem in a logical manner (i.e., a scientific approach). The logical, consistent, and systematic approach to problem solving can be as useful (and valuable) as the knowledge of the mechanics of the mathematical techniques themselves." (Bernard W. Taylor III, "Introduction to Management Science, 1986)

"Strategy means abstract thinking and planning, as opposed to tactics, which are the individual operations used to implement strategy. Tactics are specific; strategy is general. Tactics tend to be immediate, strategy long-term." (Bruce Pandolfini, "Weapons of Chess: An omnibus of chess strategy", 1989)

"At the heart of reengineering is the notion of discontinuous thinking - of recognizing and breaking away from the outdated rules and fundamental assumptions that underlie operations. Unless we change these rules, we are merely rearranging the deck chairs on the Titanic. We cannot achieve breakthroughs in performance by cutting fat or automating existing processes. Rather, we must challenge old assumptions and shed the old rules that made the business underperform in the first place." (Michael M Hammer, "Reengineering Work: Don't Automate, Obliterate", Magazine, 1990)

"Even with simple and usable models, most organizations will need to upgrade their analytical skills and literacy. Managers must come to view analytics as central to solving problems and identifying opportunities - to make it part of the fabric of daily operations." (Dominic Barton & David Court, "Making Advanced Analytics Work for You", 2012)

"DevOps recognizes the importance of culture. The acronym CAMS (culture, automation, measurement, and sharing) is used to encapsulate its key themes. Culture is acknowledged as all important in making development and IT operations work together effectively. But what is culture in this context? It is not so much about an informal dress code, flexible hours, or a free in-house cafeteria as it is about how decisions are taken, norms of behavior, protocols of communication, and the ways of navigating hierarchy and bureaucracy to get things done." (Sriram Narayan, "Agile IT Organization Design: For Digital Transformation and Continuous Delivery", 2015)

11 June 2016

♜Strategic Management: Resilience (Definitions)

"The ability to recover from challenges or to overcome obstacles. In a social-ecological context this refers to the innovation capacity of the organization to successfully address societal and environmental challenges." (Rick Edgeman & Jacob Eskildsen, "Social-Ecological Innovation", 2014)

"The quality of being able to absorb systemic 'shocks' without being destroyed even if recovery produces an altered state to that of the status quo ante." (Philip Cooke, "Regional Innovation Systems in Centralised States: Challenges, Chances, and Crossovers", 2015)

"The ability of an organization to quickly adapt to disruptions while maintaining continuous business operations and safeguarding people, assets, and overall brand equity. Business resilience goes a step beyond disaster recovery, by offering post-disaster strategies to avoid costly downtime, shore up vulnerabilities, and maintain business operations in the face of additional, unexpected breaches." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A capability to anticipate, prepare for, respond to, and recover from significant multi-hazard threats with minimum damage to social well-being, the economy, and the environment." (Carolyn N Stevenson, "Addressing the Sustainable Development Goals Through Environmental Education", 2019)

"The ability of a project to readily resume from unexpected events, threats or actions." (Phil Crosby, "Shaping Mega-Science Projects and Practical Steps for Success", 2019)

"The ability of an infrastructure to resist, respond and overcome adverse events" (Konstantinos Apostolou et al, "Business Continuity of Critical Infrastructures for Safety and Security Incidents", 2020)

"The capacity to respond to, adapt and learn from stressors and changing conditions." (Naomi Borg & Nader Naderpajouh, "Strategies for Business Sustainability in a Collaborative Economy", 2020)

"The word resilience refers to the ability to overcome critical moments and adapt after experiencing some unusual and unexpected situation. It also indicates return to normal." (José G Vargas-Hernández, "Urban Socio-Ecosystems Green Resilience", 2021)

"Operational resilience is a set of techniques that allow people, processes and informational systems to adapt to changing patterns. It is the ability to alter operations in the face of changing business conditions. Operationally resilient enterprises have the organizational competencies to ramp up or slow down operations in a way that provides a competitive edge and enables quick and local process modification." (Gartner)

[Operational resilience:] "The ability of an organization to absorb the impact of any unexpected event without failing to deliver on its brand promise." (Forrester)

[Business resilience:] "The ability to thrive in the face of unpredictable events and circumstances without deteriorating customer experience or sacrificing the long-term viability of the company." (Forrester)

29 March 2016

♜Strategic Management: Decision-Making (Definitions)

[decision-making:] "The process of making choices in a project team environment. Several types of decision-making are useful in projects: consensus, leader-imposed, delegated, voting, and scoring models." (Timothy J  Kloppenborg et al, "Project Leadership", 2003)

[semistructured decisions:] "Decisions in which only some of the phases are structured; require a combination of standard solution procedures and individual judgment." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[strategic decision:] "refers to a decision that exhibits the following characteristics: it is made in a situation of uncertainty, of incomplete information, in a complex environment, variable/mutating environment (as opposed to 'all things being otherwise equal'); it is not recurrent, therefore the decision maker is relatively deprived; it may have far-reaching (favorable or adverse) consequences that could jeopardize the survivability of the enterprise; it is systemic (many elements with many relationships among them); the decision maker does not have experience-proven models (we cannot resort to 'turnkey' mechanisms). " (Humbert Lesca & Nicolas Lesca, "Weak Signals for Strategic Intelligence: Anticipation Tool for Managers", 2011)

[strategic decisions:] "Decisions for sustained enterprise success and business growth." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[tactical decisions:] "Decisions ensuring that existing operations and processes are in alignment with business objectives and strategies." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

[decision-making processes:] "Management processes that define objectives, study alternatives, analyze available data, and reflect on intuitive beliefs. They interpret findings and compare alternates to form a conclusion or make a choice upon which the organization may act." (Carl F Lehmann, "Strategy and Business Process Management", 2012)

[microdecision:] "A small decision made many times by many workers at the front line of the organization. They usually have a significant impact on organizational performance due to their sheer volume." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

[decision-making:] "How decisions are made, based on what types of resources, information, and specific processes are available." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

[decision-making] "the process of making choices or reaching conclusions, especially on important political or business matters." (Ken Sylvester, "Negotiating in the Leadership Zone", 2015)

[tactical decisions:] "broader decision questions than operational ­decisions, semistructured in nature, some but not all information ­necessary to make the decision is available, primarily internally focused and made by middle-level managers." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

[operating or function-specific decisions:] "day-to-day, routine ­decisions with a concise decision question and a clear, well-defined, and structured algorithm to make a choice among alternatives." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

[strategic decisions:] "complex, nonroutine, unstructured decisions involving many different and connected parts. Some variables may not be well understood, often information required to make the decision may be unavailable, incomplete, and in some situations information may be known to be flawed or inaccurate. These decisions usually involve a high degree of uncertainty about outcomes. If implemented, strategic ­decisions often result in major changes in an organization." (Daniel J. Power & Ciara Heavin, "Data-Based Decision Making and Digital Transformation", 2018)

17 June 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VII (List of Values)

Introduction

    Lists are one of the basic structures in Mathematics, the term referring to an (ordered) set of elements separated by comma, space or any other delimiter (e.g. “:”, “;”). The elements of a list can be numbers, words, functions, or any other type of objects. In the world of databases, a list is typically formed out of the values of a given column or a given record, however it could span also a combination of rows and records, is such cases two delimiters being needed – one for column and one for row. From here comes probably the denomination of list of values. In a more general accept a list of values could be regarded as a delimited/concatenated subset. Such lists are formed when needed to send the data between the layers of an application or applications, this type of encoding being quite natural. In fact, also the data in a database are stored in similar tabular delimited structure, more complex though.  

    An useful example in which the list of values are quite handy is the passing of multiple values within the parameter of stored procedure or function (see example). This supposes first building the list and then use the values in a dynamic build query (like in the before mentioned example) or by building a table on the fly. We can call the two operations composition, respectively decomposition of list of values.

Composition 

Composition, whether on vertical or horizontal is nothing but a concatenation in which the values alternate with one or more delimiters. Let’s reconsider the concatenation based on the values of a Person.AddressType AdventureWorks table. As the logic for concatenating for one or more attributes is the same, the below example concatenates a list based on a single attribute, namely AddressTypeID in SingleList, respectively two attributes, AddressTypeID and Name.

-- concatenation of values across a table 
;WITH CTE (AddressTypeID, Name, Ranking) 
AS (--preparing the data       
     SELECT AddressTypeID  
     , Name 
     , ROW_NUMBER () OVER(ORDER BY Name) Ranking 
     FROM Person.AddressType 
     -- WHERE ... 
) 
, DAT (SingleList, DoubleList, Ranking) 
AS ( -- concatenating the values 
     SELECT Cast(AddressTypeID as varchar(max)) SingleList 
     , Cast('('+ Cast(AddressTypeID as varchar(10)) + ',''' + Name + ''')' as varchar(max)) DoubleList 
     , Ranking 
     FROM CTE 
     WHERE Ranking = 1 
     UNION ALL 
     SELECT DAT.SingleList + ',' + Cast(CTE.AddressTypeID as varchar(20)) SingleList 
    , Cast(DAT.DoubleList + ', ('+ Cast(CTE.AddressTypeID as varchar(10)) + ',''' + CTE.Name + ''')' as varchar(max)) DoubleList 
    , CTE.Ranking  
     FROM CTE          
       JOIN DAT           
          ON CTE.Ranking = DAT.Ranking + 1       
)  

-- the lists 
SELECT SingleList 
, DoubleList 
FROM DAT 
WHERE Ranking = (SELECT MAX(Ranking) FROM DAT) 

 List of values - concatenation
 

   The second example is based on atypical delimiters, resembling to the structure built for a batch insert or table value constructor-based statement, and as we’ll see later, ideal to be used in a dynamically-built query

Decomposition

Decomposition follows the inverse path, though it’s much easier to exemplify. In fact it’s used the same technique introduced in the last example from the previous post belonging to the same cycle, Subparts of a String, in which a space was used as delimiter. Another example is the dbo.SplitList function which decomposes a string using a loop.


-- decomposition of a string to a table using CTE 
CREATE FUNCTION dbo.StringToTable( 
 @str varchar(500) 
,@Delimiter char(1)) 
RETURNS @Temp TABLE ( 
Id int NOT NULL 
,Value varchar(50)) 
AS 
BEGIN  
     ;WITH CTE (PrevString, Position, Word)  
     AS (  
     SELECT LTrim(RTrim( CASE  
           WHEN CharIndex(@Delimiter, @str)>;0 THEN Right(@str, Len(@str)-CharIndex(@Delimiter, @str))  
           ELSE ''  
      END)) PrevString  
     , 1 Position  
     , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, @str)>0 THEN LEFT(@str, CharIndex(@Delimiter, @str)-1)  
           ELSE @str  
       END)) Word  
      UNION ALL  
      SELECT LTrim(RTrim(CASE  
            WHEN CharIndex(@Delimiter, PrevString)>0 THEN Right(PrevString, Len(PrevString)-CharIndex(@Delimiter, PrevString))  
             ELSE ''  
       END)) PrevString  
      , Position + 1 Position  
      , LTrim(RTrim(CASE  
           WHEN CharIndex(@Delimiter, PrevString)>0 THEN LEFT(PrevString, CharIndex(@Delimiter, PrevString)-1)  
          ELSE PrevString  
      END)) Word      FROM CTE  
     WHERE Len(PrevString)>0  
    )  
     INSERT @Temp(Id, Value) 
     SELECT Position  
     , Word      FROM CTE  
     OPTION (maxrecursion 100)  
     RETURN 
END    

Here are two examples based on the single list created above and another one based on alphabet:


-- decomposing a list
SELECT Id 
, value 
FROM dbo.StringToTable('6,1,2,3,4,5', ',')     


-- decomposing the "alphabet" 
SELECT Id 
, value 
FROM dbo.StringToTable('a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z', ',') 

List of values - simple decomposition    

List of values - alphabet decomposition
   
Even if the function deals only with a delimiter, it could be used to decompose lists involving multiple delimiters, as long the list is adequately built:


-- decomposing double list 
SELECT Id 
, value 
, Left(value, CHARINDEX(',', value)-1) LeftValue 
, Right(value, len(value)-CHARINDEX(',', value)) RightValue 
FROM dbo.StringToTable('6,Archive;1,Billing;2,Home;3,Main Office;4,Primary;5,Shipping', ';')     

List of values - double decomposition 

 The tables built thus from list of values can be further used in queries when needed to create a table on the fly. It would be interesting maybe to show that the composition and decomposition are inverse functions, however that’s out of scope, at least for current set of posts. 

11 March 2011

💎SQL Reloaded: Pulling the Strings of SQL Server VIII (Insertions, Deletions and Replacements)

Until now, the operations with strings resumed to concatenation and its reverse operation(s) - extracting a substring or splitting a string into substrings. It was just the warm up! There are several other important operations that involve the internal manipulation of strings – insertion, deletion and replacement of a substring in a given string, operations performed using the Replace and Stuff functions.

Replace function, as its name denotes, replaces all occurrences of a specified string value with another. Several scenarios in which the function is quite useful: the replacement of delimiters, special characters, correcting misspelled words or any other chunks of text. Here are some basic simple examples, following to consider the before mentioned applications in other posts:

-- examples with replace 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' 
SELECT replace(@str, ' ', ',') Example1 
, replace(@str, ' ', ' ') Example2 
, replace(@str, ' ', '') Example3 
, replace(@str, 'is', 'as') Example4  
Output:
Example1 Example2 Example3 Example4
this,is,a,test,string this is a test string thisisateststring thas as a test string

When there are good chances that the searched string won’t appear in the “searched” string, and especially when additional logic is depending on the replacement, logic that could be included in the same expression with the replacement, then maybe it makes sense to check first if the searched character is present:

-- replacement with check 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
SET @str = 'this is a test string' 
SET @search = 'this string' 
SET @replacememt = 'other string' 
SELECT CASE            
    WHEN CharIndex(@search, @str)>0 THEN Replace(@str, @search, @replacememt)             
    ELSE @str        
END result 

Unfortunately the function doesn’t have the flexibility of the homonym functions provided by the languages from the family of VB (VBScript, VB.NET), which allow to do the replacement starting with a given position, and/or for a given number of occurrences. This type of behavior could be obtained with a simple trick – splitting the string into two other strings, performing the replacement on the second string, and then concatenating the first string and the result of the replacement:

-- replacement starting with a given position 
DECLARE @str varchar(30) 
DECLARE @search varchar(30) 
DECLARE @replacememt varchar(30) 
DECLARE @start int 
SET @str = 'this is a test string' 
SET @search = 's' 
SET @replacememt = 'x' 
SET @start = 7 
SELECT Left(@str, @start-1) FirstPart 
, RIGHT(@str, Len(@str)-@start+1) SecondPart 
, CASE        
    WHEN @start <= LEN(@str) THEN Left(@str, @start-1) + Replace(RIGHT(@str, Len(@str)-@start+1), @search, @replacememt)        
     ELSE @str  
END Replacement 
Output:
FirstPart SecondPart Replacement
this i s a test string this ix a text xtring

The logic can be encapsulated in a function together with additional validation logic.

Stuff function inserts a string into another string starting with a given position and deleting a specified number of characters. Even if seldom used, the function it’s quite powerful allowing to insert a string in another, to remove a part of a string or more general, to replace a single occurrence of a string with another string, as can be seen from the below examples:
 
-- Stuff-based examples 
DECLARE @str varchar(30) 
SET @str = 'this is a test string' SELECT STUFF(@str, 6, 2, 'was ') Example1 , STUFF(@str, 1, 0, 'and ') Example2 , STUFF(@str, 1, 0, 'that') Example3 , STUFF(@str, LEN(@str) + 1, 0, '!') Example4
Output:
Example1 Example2 Example3 Example4
this was a test string and this is a test string thatthis is a test string NULL

If in the first example is done a replacement of a text from a fix position, in the next examples are attempted insert on a first, middle respectively end position. As can be seen, the last example doesn’t work as expected, this because the insert position can’t go over the length of the target string. Actually, if the insert needs to be done at the beginning, respectively the end of a string, a concatenation can be much easier to use. A such example is the padding of strings with leading or trailing characters, typically in order to arrive to a given length. SQL Server doesn’t provide such a function, however the function is quite easy to build.
 
-- left/right padding 
DECLARE @str varchar(30) 
DECLARE @length int  
DECLARE @padchar varchar(1) 
SET @str = '12345'  
SET @length = 10 
SET @padchar = '0' 
SELECT @str StringToPad  
, CASE  
     WHEN LEN(@str)<@length THEN Replicate(@padchar, @length-LEN(@str)) + @str       
     ELSE @str  
END LeftPadding  
, CASE  
     WHEN LEN(@str)<@length THEN @str + Replicate(@padchar, @length-LEN(@str))      
     ELSE @str  
END RightPadding 
 
Output:
StringToPad LeftPadding RightPadding
12345 0000012345 1234500000

Note:
The queries work also in SQL databases in Microsoft Fabric.

Happy Coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.