21 January 2024

🛠️🗒️SQL Server: Clustered & Non-clustered Indexes (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.
Columnstore indexes will be considered separately.

Last updated: 22-Jan-2024

Indexes

  • {definition} a database object associated with a table or view that provides efficient access path to data in the base object, based on one or more columns
    • contains keys built from one or more columns stored physically as B-trees
      • ⇒ provides the path that is necessary to get to the data in the quickest and most efficient manner possible [7]
      • ⇒ can dramatically speed up the process of finding data as well as the process of joining tables together [2]
    • on-disk structure
      • index placement 
        • placed on either 
          • particular filegroup 
          • partitioned according to a predefined partition scheme
        • {default} placed on the same filegroup as the base object [5]
    • must be maintained and updated as data are added or changed in the base object
      • each index brings with it a certain amount of overhead
        • ⇐ the updates, inserts, and deletes are slower
  • {type} clustered index (CI)
    • based on a clustering key
      • tells SQL Server how to order the table's data pages [11]
      • static columns (aka unchanging columns, non-volatile columns)
        • the clustering key is never updated [11]
      • nonstatic columns 
        • when the clustering key value is updated, the data may need to be moved elsewhere (aka record relocation) in the clustered index so that the clustering order is maintained [11]
          • triggers a nonclustered index update as well 
            • so that the value of the clustering key is correct for the relevant nonclustered index rows [5]
              • wastes time and space
              • causes page splits and fragmentation 
              • adds unnecessary overhead to every modification of the column(s) that make up the clustering key [5]
          • updating the clustering key is clearly more expensive than updating a non-key column [11]
      • can be either
        • a single column key
        • a composite key
          • [SQL Server 2005] up to 16 columns
          • [SQL Server 2016] up to 32 columns can be combined into a single composite index key
          • all the columns in a composite index key must be in the same table or view [8]
          • the most redundant data within the entire table
    • {restriction}a table can have only one clustered index [5]
    • {restriction} maximum allowable size of the combined index values is 900 bytes 
    • {restriction} LOB data types can’t be specified as columns
    • {subtype} unique clustered index (UCI)
      • created with the UNIQUE keyword
    • {subtype} non-unique clustered indexes
      • created without the UNIQUE keyword
      • SQL Server does not require a clustered index to be unique
        • ⇒ must have some means of uniquely identifying every row
        • forces the index to be unique by appending a 4-byte value (aka uniqueifier) to key values as necessary to differentiate identical key values from one another  [2]
          • stored in every level of the B-tree [11]
            • in both clustered and non-clustered indexes
          • stored as a variable-length column [11]
            • if a table does not already contain any other variable-length columns, each duplicate value is actually consuming 8-bytes of overhead: 
              • 4 bytes for the uniqueifier value 
              • 4 bytes to manage variable-length columns on the row [11]
              • the overhead for rows with a NULL uniqueifier value is zero bytes [11]
          • added everywhere the clustering key is stored [11]
          • if there are many rows using the same clustering key value, the operation can become quite expensive [11]
          • is not readily visible and cannot be queried [11]
          • it is impossible to estimate how much additional storage overhead will result from the addition of a uniqueifier, without first having a thorough understanding of the data being stored [11]
          • meaningless operation in the context of the data [11]
          • all clustered indexes must be unique so that nonclustered index entries can point to exactly one specific row [5]
  • {type} nonclustered index (NCI)
    • the index is a completely separate structure from the data itself [5]
      • ⇒ its presence or absence doesn’t affect how the data pages are organized [5]
      • heap or a table with a clustered index affects the structure of nonclustered indexes [5]
      • best at singleton selects
        • ⇐ queries that return a single row. 
        • once the nonclustered B-tree is navigated, the actual data can be accessed with just one page I/O, that is, the read of the page from the underlying table
    •  each table can have up to 999 nonclustered indexes [8]
    • {subtype} nonclustered indexes with included columns
      • uses the INCLUDE keyword along with a list of one or more columns to indicate the columns to be saved in the leaf level of an index [1]
        • the data for these columns are duplicated in index leaf-level pages
          • ⇒ increases the disk space requirement [1]
          • ⇒ fewer index rows can fit on index pages [1]
            • ⇒ might increase the I/O and decrease the database cache efficiency [1]
          • ⇒ data is stored and updated twice   [1]
            • ⇒ index maintenance may increase in terms of the time that it takes  [1]
      • can be specified up to 1,023 columns [1]
      • the included columns are not part of the index key
        • ⇒ keeps the key size small and efficient [1]
      • having the columns available at the leaf pages with the index means avoid querying the base table if the included columns can satisfy the query [1]
        • ⇒ increases the chances that the query will find all it needs in the index pages itself, without any further lookups [1]
      • provides the benefits of a covering index  [1]
      • results in smaller, efficient, and narrow keys [1]
      • removes the 900 bytes/16 columns key size restriction [1]
      • {restriction} columns of data type text, ntext, and image are not allowed as non-key included columns [1]
  • {option} FILLFACTOR
    • its value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth [16]
    • it's applied when the index is first created but is not enforced afterward
      • ⇒ isn't maintained over time
    • affects only the leaf-level pages in the index 
      • normally reserves enough empty space on intermediate index pages to store at least one row of the index's maximum size [2]
    • the empty space is reserved between the index rows rather than at the end of the index [16]
    • its value is a percentage from 1 to 100
      • {default} 0 which means that the leaf-level pages are filled to capacity [16]
    • provided for fine-tuning index data storage and performance [16]
    • affects performance 
      • creating an index with a relatively low fillfactor 
        • helps avoid page splits during inserts
          • with pages only partially full, the potential for needing to split one of them in order to insert new rows is lower than it would be with completely full pages
          • can be good for performance if the new data is evenly distributed throughout the table [16]
            • if all the data is added to the end of the table, the empty space in the index pages will not be filled [16]
        •  the index will require more storage space and can decrease read performance [16]
          • ⇒ can decrease database read performance by an amount inversely proportional to the fill-factor setting [16]
      • a high fillfactor can help compact pages so that less I/O is required to service a query
        • common technique with data warehouses
        • retrieving pages that are only partially full wastes I/O bandwidth
  • {option} PAD_INDEX option
    • instructs SQL Server to apply the FILLFACTOR to the intermediate-level pages of the index [2]
    • if FILLFACTOR setting is so high that there isn't room on the intermediate pages for even a single row (e.g., a FILLFACTOR of 100%),  the percentage is overridden so that at least one row fits [2]
    • if FILLFACTOR setting is so low that the intermediate pages cannot store at least two rows, SQL Server will override the fillfactor percentage on the intermediate pages so that at least two rows fit on each page [2]
  • {option} IGNORE_DUP_KEY
    • {enabled} when a UNIQUE index is created the option is used a duplicate key error on a multiple-row INSERT won’t cause the entire statement to be rolled back [5]
      • ⇒ the nonunique row is discarded, and all other rows are inserted [5]
        • ⇒ it makes a violation in a multiple-row data modification nonfatal to all the nonviolating rows [5]
    • {disabled} during changes even if one row is found that would cause duplicates of keys defined as unique, the entire statement is aborted and no rows are affected [5]
  • {option} STATISTICS_NORECOMPUTE 
    • determines whether the statistics on the index should be updated automatically [5]
    • typically enabled
    • used to set a specific statistic or index to not update automatically [5]
    • overrides an ON value for the AUTO_UPDATE_STATISTICS database option [5]
    • if the database option is set to OFF, that behavior for a particular index can’t be overridden 
      •  all statistics in the database must be updated manually 
        • via UPDATE STATISTICS or sp_updatestats [5]
  • {option} MAXDOP
    • controls the maximum number of processors that can be used for index creation [5]
    • can override the server configuration option [max degree of parallelism for index building] [5]
    • allows multiple processors to be used for index creation can greatly enhance the performance of index build operations [5]
      • each processor builds an equal-sized chunk of the index in parallel
        • each parallel thread builds a separate tree
          • the math used to determine the theoretical minimum number of required pages varies from the actual number [5]
          • the tree might not be perfectly balanced
          • after each thread is completed, the trees are essentially concatenated [5]
            • extra page space reserved during this parallel process can be used for later modifications [5]
  • {operation} index tuning 
    • iterative process 
    • {operation}index consolidation
      • typically performed when indexes overlap 
        • includes duplicated indexes
      • {caution} consider the various aspects when modifying indexes
      • the number and design of indexes impact performance
        • ⇐ the goal is to obtain the best performance with the smallest number of indexes
  • {operation} creation (aka index creation)
    • [clustered index] 
      • the data becomes the leaf level of the clustered index [5]
        • data in the table is moved to new pages and ordered by the clustering key
        • maintained logically rather than physically
          • the order is maintained through a doubly linked list (aka page chain)
            • based on the definition of the clustered index
              • same as the order of rows on the data pages
            • deciding on which column(s) to cluster is an important performance consideration [5] 
          • can be ordered in only one way
    • although heap pages do have pointers to next and previous pages, the ordering is arbitrary and doesn’t represent any ordering in the table itself [5]
  • {operation} [SQL Server 2005] rebuilding (aka rebuilding indexes)
    • {goal} generate new statistics [18]
    • {goal} change the number of pages used by the index (fill factor) [18]
    • {goal} reorganize the data on database pages in relative proximity to each other [18]
    • introduced mainly for SQL Server's own internal purposes when applying upgrades and service packs [6]
    • drops and re-creates the index
      • always rebuilds the entire index [3]
        • regardless of the extent of fragmentation [3]
        • an index rebuild for a lightly fragmented index is really overkill [3]
      • there must be enough free space in the database to accommodate the new index [3]
        • otherwise the database will grow to provide the required free space [3]
          • this can be problematic for large indexes [3]
      • reclaims disk space by compacting the pages based on the specified or existing fill factor setting [15]
      • reorders the index rows in contiguous pages [15]
        • removes fragmentation
      • automatically rebuilds all index statistics [18]
        • columns statistics aren’t touched 
        • usually faster than reorganizing indexes if the logical fragmentation is high [18]
      • when ALL is specified, all indexes on the table are dropped and rebuilt in a single transaction
      • indexes with more than 128 extents are rebuilt in two separate phases [15]
        • logical phase
          • the existing allocation units used by the index are marked for deallocation [15]
          • the data rows are copied and sorted [15]
          • then moved to new allocation units created to store the rebuilt index [15]
        • physical phase
          • the allocation units previously marked for deallocation are physically dropped in short transactions that happen in the background, and do not require many locks [15]
      • serial index rebuild
        • forces an allocation order scan, ignoring the key placement and scanning the object from first IAM to last IAM order [13]
          • via WITH(NOLOCK, INDEX=0)
        • takes longer than a [parallel index rebuild]
      • parallel index rebuild 
        • during rebuild a portion of the data (range) is assigned to each of the parallel workers [13]
          • for each parallel worker is assigned its own CBulkAllocator when saving the final index pages [13]
        • a leap frog behavior of values, across extents occurs as the workers copy the final keys into place [13]
          • each worker spreads 1/nth of the data across the entire file instead of packing the key values together in a specific segment of the file [13]
      • can use minimal-logging to reduce transaction log growth [3]
      • may require long-term locks on the table [3]
        • ⇒ can limit concurrent operations [3]
      • replaces
        • DBCC DBREINDEX command 
        • DROP_EXISTING option of CREATE INDEX command
      • can be executed 
        • online (aka online operation)
          • table and indexes are available while the operation is performed 
        • offline (aka offline operation)
      • permissions requirements
        • ALTER permission on the table or view
          • users must be members of either:
            • sysadmin
            • db_ddladmin 
            • db_owner 
    • {operation} reorganize (aka reorganize indexes, index reorganization)
        • doesn’t update statistics at all [3]
          • always single-threaded [3]
            • always fully logged [3]
              • ⇐ doesn’t prevent transaction log clearing [3]
            • doesn’t hold blocking locks
              • always executed online
              • won’t update statistics at all [3]
              • uses minimal system resources 
                • requires 8KB of additional space in the database [3]
                • doesn’t hold blocking locks [3]
                • takes care of the existing fragmentation [3]
                  • ⇒ makes it a better choice for removing fragmentation from a lightly fragmented index  [3]
                  • ⇒ makes it a poor choice for removing fragmentation from a heavily fragmented index [3]
              • {limitation} can’t use index options 
              • replaces the DBCC INDEXDEFRAG command [6]
              • removes some of the fragmentation from an index
                • it’s not guaranteed to remove all the fragmentation
                  • just like DBCC INDEXDEFRAG 
              • doesn’t have a corresponding option in the CREATE INDEX command [6]
                • because when creating an index there is nothing to reorganize [6]
              • it defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right, order of the leaf nodes [15]
              • compacts the index pages
                • based on the existing fill factor value
            • {operation} disabling (aka index disabling)
              • makes an index completely unavailable
                • ⇒ it can't be used for finding rows for any operations
                • ⇒ the index won't be maintained as changes to the data are made
              • disables one or ALL indexes with a single command
              • indexes must be completely rebuilt to make them useful again
                • ⇐ there is no ENABLE option
                  • ⇐ because no maintenance is done while an index is disabled
              • {warning} disabling a [clustered index] on a table, makes the  data unavailable [6]
                • ⇐ because the data is stored in the clustered index leaf level [6]
            • {operation} dropping (aka index dropping)
            • {operation} changing index options 
              • via ALTER INDEX command
            • {warning} regular rebuilding of all the indexes in a database is a poor way to manage indexes
              • leads to resources waste by rebuilding indexes that are not fragmented
              • it can affect the availability of a system when the indexes can’t be rebuilt online
          • {concept} index extent
            • a group of eight index pages
          • {concept} narrow index
            • in terms of the number of bytes it stores [11]
              • usually has few columns in the index key
              • can accommodate more rows into an 8KB index page than a wide index [1]
                • ⇒ {benefit} reduces I/O
                • ⇒ {benefit} reduces storage requirements
                • ⇒ {benefit} improves database caching
              • {benefit} reduces maintenance overhead
            • small indexes might not have an intermediate level at all
            • the goal when defining indexes isn’t to have only very narrow indexes [5]
              • extremely narrow indexes usually have fewer uses than slightly wider indexes [5]
          • {concept} wide index
            • typically an index with many columns in the index key
              • wider keys cause more I/O and permit fewer key rows to fit on each B-tree page
                • indexes require a larger number of pages than it otherwise would and causes it to take up more disk space a lot of space (and potentially buffer pool memory) is wasted
              • large indexes often have more than one intermediate level
            • {benefit} covers more queries
            • {downside} increases maintenance overhead 
            • {downside} can adversely affect disk space
          • {concept} ever increasing index
            • e.g. a numeric value can be continuously incremented by the value defined at creation
              • e.g. identity integers
            • {benefit} avoids fragmentation
              • results in sequential IO and maximizes the amount of data stored per page
                • the most efficient use of system resources
            • {benefit} improve write performance
              • SQL Server can much more efficiently write data if it knows the row will always be added to the most recently allocated, or last, page [11]
            • non-sequential key column can result in a much higher overhead during insertion [11]
              • SQL Server has to find the correct page to write to and pull it into memory
                • if the page is full, SQL Server will need to perform a page split to create more space [11] 
                • during a page split, a new page is allocated, and half the records are moved from the old page to the newly-allocated page [11]
                • each page has a pointer to the previous and next page in the index, so those pages will also need to be updated [11]
          • {concept} fragmentation results from data modifications 
            • can take the form of gaps in data pages, so wasting space, and a logical ordering of the data that no longer matches the physical ordering [11]
          • {concept} covering index
            • nonclustered index that contains all the columns requested by the query without going to the base table 
              • allows it to skip the bookmark lookup step and simply return the data the query seeks from its own B-tree [2]
              • when a clustered index is present, a query can be covered using a combination of nonclustered and clustered key columns since the clustered key is the nonclustered index's bookmark [2]
              • the next best thing to having multiple clustered indexes on the same table [2]
            • included columns 
              • columns added in the INCLUDE clause of the CREATE INDEX command
              • allow exceeding the 900-byte or 16-key column limits in the leaf level of a nonclustered index [5]
              • appear only in the leaf level 
              • don’t affect the sort order of the index rows in any way [5]
              • SQL Server can silently add an included column to indexes [5]
                • e.g. when an index is created on a partitioned table and no ON filegroup or ON partition_scheme_name is specified [5]
                  • when partitioning a non-unique, nonclustered index, the Database Engine adds the partitioning column as a non-key (included) column of the index, if it is not already specified [8]
              • column names cannot be repeated in the list [8]
              • column namescannot be used simultaneously as both key and non-key columns [8]
              • {restriction} all data types are allowed except text, ntext, and image [8
              • {restriction} index must be created or rebuilt offline if any one of the specified non-key columns are varchar(max), nvarchar(max), or varbinary(max) data types [8]
              • [computed columns] that are deterministic and either precise or imprecise can be included columns [8]
              • [computed columns] derived from image, ntext, text, varchar(max), nvarchar(max), varbinary(max), and xml data types can be included in non-key columns as long as the computed column data types is allowable as an included column [8]
            • [SQL Server 2005] can be stored in the same filegroup as the underlying table, or on a different filegroup [12]
          • {concept} [SQL Server 2008] filtered indexes 
            • indexes with simple predicates that restrict the set of rows included in the index [4]
            • leaf level contains an index row only for keys that match the filter definition [5]
            • created using a new WHERE clause on a CREATE INDEX statement
            • alternative to indexed views
              • which are more expensive to use and maintain
              • their matching capability is not supported in all editions
              • tend to be more useful for the more classical relational query precomputation scenarios
            • {benefit} have fewer rows and are also narrower than the base table
              • require fewer pages as well
              • specific constraints that are used on queries with large tables where space is an issue, this kind of index can be quite useful
            • usage scenarios
              • fields are used only occasionally
                • resulting in many NULL entries for that column
                  • a traditional index stores a lot of NULLs and wastes a lot of storage space
                  • updates to the table have to maintain this index for every row
                • querying a table with a small number of distinct values and are using a multicolumn predicate where some of the elements are fixed
                  • this might be useful for a regular report with special purpose
                    • it speeds up a small set of queries while not slowing down updates as much for everyone else
                • when there is a known query condition on an expensive query on a large table
          • {concept} indexed views
            • similar to materialized views in Oracle
            • without indexes, views are purely logical
              • ⇒ the data involved has no physical storage [5]
            • the first index built must be a clustered index
              • data is stored in the leaf level of the index [5]
              • further nonclustered indexes can be built
            • {constraint} view’s columns must be deterministic
            • {constraint} session-level restrictions
              • options must be set to a specific value (see )
                • if these options aren’t set as specified, an error message is raised when attempted to create an indexed view [5]
            • {constraint} only deterministic functions are allowed
              • check via SELECT OBJECTPROPERTY (object_id('<function_name>'), 'IsDeterministic')
              • imprecise values (float or real values) can be used only if the computed value is persisted [5]
            • {constraint} the definition of any underlying object’s schema can’t change [5]
              • must be enforced through WITH SCHEMABINDING option
            • {constraint} if a view's definition contains GROUP BY
              • the SELECT list must include the aggregate COUNT_BIG (*)
                • COUNT_BIG returns a BIGINT, a 8-byte integer
              • can’t contain HAVING, CUBE, ROLLUP, or GROUP BY ALL
              • all GROUP BY columns must appear in the SELECT list
              • constraints apply only to view definition, not to the queries that might use the indexed views [5]
            • {benefit}materialize summary aggregates of large tables [5]
              • eliminates the need of scanning the underlying, large tables [5]
            • {activity} check whether a view is indexable
              • via SELECT OBJECTPROPERTY (OBJECT_ID ('<view_name>'), 'IsIndexable');
            • {activity} check whether a view is indexed
              • via SELECT OBJECTPROPERTY (OBJECT_ID ('<view_name>'), 'IsIndexed');
            • alternative solutions
              • temporary tables
              • persistent tables 
            • [Query Optimizer] doesn’t always choses an indexed view for query’s execution plan [5]
              • {restriction} considered only in [Enterprise], [Developer] and [Evaluation] editions
              • the base tables might be accessed directly [5]
              • use NOEXPAND hint in FROM clause to make sure that the indexed view isn’t expanded into its underlying SELECT statement [5]
                • {recommendation} NOEXPAND hint should be used only when improves query’s performance [5]
          • indexes on computed columns
            • without indexes, computed columns are purely logical
              • the data involved has no physical storage [5]
                • recomputed every time a row is accessed 
                  • {exception}the computed column is marked as PERSISTED [5]
              • {restriction} see the requirements for indexed views 
                • {exception} table-level restrictions don’t apply [5]
              • {restriction} allowed on deterministic, precise (and persisted imprecise) computed columns where the resulting data type is otherwise indexable [5]
            • {activity} check whether a column is deterministic
              • via SELECT COLUMNPROPERTY (OBJECT_ID('<tabelle>'), '<column>', 'IsDeterministic');
            • {activity} check whether a column is precise
              • via SELECT COLUMNPROPERTY (OBJECT_ID('<tabelle>'), '<column>', 'IsPrecise');
            • {activity} check if a column is persisted 
              • SELECT is_persisted
                FROM sys.computed_columns
                WHERE object_id = object_Id('<tabelle>')
                AND name = '<column_name>'
          • {concept} missing index
            • index that would have lead to a optimal query plan
            • [query optimizer] when generating a query plan, it analyzes what are the best indexes for a particular filter condition [17]
              • if the best indexes do not exist it generates a suboptimal query plan [17]
              • stores information about the missing indexes
              • {activity} review missing indexes
                • via sys.dm_db_missing_index_group_stats
                  • when table metadata changes, all missing index information about the table is deleted [17]
                    • e.g. column added/dropped, index adde
                • via  MissingIndexes element in XML Showplans
                  • correlate indexes that the query optimizer considers missing with the queries for which they are missing [17]
                  • turn on via SET STATISTICS XML ON option
              • it’s not intended to fine tune an indexing configuration [17]
                • it cannot gather statistics for more than 500 missing index groups [17]
                  • after this threshold is reached, no more missing index group data is gathered [17]
                    • the threshold is not a tunable parameter and cannot be changed [17]
                • it does not specify an order for columns to be used in an index [17]
                • for queries involving only inequality predicates, it returns less accurate cost information [17]
                • it reports only include columns for some queries
                  • index key columns must be manually selected [17]
                • it returns only raw information about columns on which indexes might be missing [17]
                  • information returned might require additional processing before being used to create an index [17]
                • it does not suggest [filtered indexes] [17]
                • it can return different costs for the same missing index group that appears multiple times in XML Showplans [17]
                  • it can occur when different parts of a single query benefit differently from the same missing index group [17]
                • it does not consider [trivial query plans] [17]
          • {concept} fragmented index: 
            • index that is not in the same logic order as it is stored in memory or on storage [19]
            • caused by 
              • modifications to data can cause the information in the index to become scattered in the database (fragmented) [15]
              • autoshrink option
            • heavily fragmented indexes can degrade query performance and cause your application to respond slowly [15]
            • index fragmentation 
              • exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file [15]
              • {type} extent fragmentation (aka external index fragmentation)
                • the pages get out of physical order, as a result of data modifications [11]
                  • can result in random I/O
                    • does not perform as well as sequential I/O [11]
                • is truly bad only when SQL Server is doing an ordered scan of all or part of a table or an index [6]
                  • only when the pages need to be fetched in logical order is needed to follow the page chain.
                    • if the pages are heavily fragmented, this operation is more expensive than if there were no fragmentation [6]
                  •  if seeking individual rows through an index, it doesn't matter where those rows are physically located [6]
              • {type} page fragmentation (aka internal index fragmentation)
                • occurs when there are gaps in the data pages
                  • reduces the amount of data that can be stored on each page
                    • increases the overall amount of space needed to store the data [11]
                  • [scanning] the entire table or index involves more read operations than if no free space were available on your pages [6]
                • some fragmentation is desirable in order to minimize the [page splits] [6]
            • fragmentation on small indexes is often not controllable
              • the pages of small indexes are stored on mixed extents [15]
                • mixed extents are shared by up to eight objects [15]
                  • the fragmentation in a small index might not be reduced after reorganizing or rebuilding the index [15]
            • {affect} primarily impacts range-scan queries
              • singleton queries would not notice much impact [11]
              • can benefit from routine defragmentation efforts [11]
            • {affect} disk performance and the effectiveness of the SQL Server read-ahead manager [10]
            • disk latency
              • [small-scale environment] only the highest fragmentation levels had a significant negative impact on disk latency [10]
              • [large-scale environment] significantly lower and never became an issue 
                • due to increased I/O performance provided by the SAN [10]
            • {action} detect fragmentation
              • analyze the index to determine the degree of fragmentation
                • via sys.dm_db_index_physical_stats 
              • controlled through
                • fillfactor setting 
                • regular defrag operations
            • {remedy} reorganize indexes
            • {remedy} rebuild indexes
          • {concept} page splits
            • when a new row is added to a full index page, half the rows are moved to a new page to make room for the new row
              • the new page must be linked into the indexes page chain, and usually the new page is not contiguous to the page being split [6]
            • resource intensive operation
            • occurs frequently
            • can lead to external fragmentation [6]
          • {concept} interleaving 
            • occurs when index extents are not completely contiguous within the data file, leaving extents from one or more indexes intermingled in the file [18]
            • can occur even when there is no logical fragmentation
              • because all index pages are not necessarily contiguous, even when logical ordering matches physical ordering [18]
          • {concept} leap frog behavior 
            • behavior occurs during [parallel index rebuild]
            • when the key range is leap frogged the fragmentation limits [SQL Server]’s I/O size to 160K instead of 508K and drives the number of I/O requests much higher [13]
            • {workaround} [partitioned table] partition the table on separate files matching the DOP used to build the index
              • allows better alignment of parallel workers to specific partitions, avoiding the [leap frog behavior]
            • {workaround} [non-partitioned table] aligning the number of files with the DOP may be helpful [13]
              • with reasonably even distribution of free space in each file the allocation behavior is such that alike keys will be placed near each other [13]
            • {workaround} for single partition rebuild operations consider serial index building behaviors to minimize fragmentation behaviors [13]
          • {concept} defragmentation
            • does not yield performance gains in every case [10]
            • steps
              • {activity} ensure there are no resource issues
                • common resource issues are related to
                  • I/O subsystem performance
                  • memory usage
                  • CPU utilization
                • physical disk fragmentation [10]
                  • {recommendation}  [small-scale environments] correct disk fragmentation before running index defragmentation [10]
                    • not necessary on SAN environments [10]
                • inappropriate schemas [10]
                • out-of-date statistics
                  • {recommendation} resolve any out-of-date statistics before defragmenting [10]
              • {activity} determine the amount of index fragmentation 
                • focus on the larger indexes 
                  • their pages are less likely to be cached [10]
                  • {best practice} monitor regularly the fragmentation levels on indexes [10]
              • {activity} determine workload type 
                • not all workload types benefit from defragmenting [10]
                  • read-intensive workload types that do significant disk I/O benefit the most [10]
                  • DSS workload types benefit much more than OLTP workload types [10]
              • {activity} determine queries that perform poorly [10]
                • determine the amount of I/O performed by a query [10]
                • queries that scan large ranges of index pages are affected most by fragmentation and gain the most from defragmenting [10]
              • {activity} understand the effect of fragmentation on disk throughput and the read-ahead manager [10]
                • for queries that scan one or more indexes, the read-ahead manager is responsible for scanning ahead through the index pages and bringing additional data pages into the SQL Server data cache [10] 
                  • the read-ahead manager dynamically adjusts the size of reads it performs based on the physical ordering of the underlying pages [10]
                  • when there is low fragmentation, the read-ahead manager can read larger blocks of data at a time, more efficiently using the I/O subsystem [10] 
                  • as the data becomes fragmented, the read-ahead manager must read smaller blocks of data [10]
                  • the amount of read-ahead operations that can be issued is independent of the physical ordering of the data; however, smaller read requests take more CPU resources per block, resulting in less overall disk throughput [10]
                    • when fragmentation exists and the read-ahead manager is unable to read the larger block sizes, it can lead to a decrease in overall disk throughput [10]
          • {constraint}UNIQUE 
            • allows the columns that make up the constraint to allow NULLs, but it doesn’t allow all key columns to be NULL for more than one row [5]
            • unique indexes created using the CREATE INDEX command are no different from indexes created to support constraints [5]
              • [Query Optimizer] makes decisions based on the presence of the unique index rather than on whether the column was declared as a constraint [5]
                • ⇐ is irrelevant how the index was created [5]
          • {constraint} PRIMARY KEY 
            • all the columns involved in the PRIMARY KEY don’t allow NULL values
              • if any of the columns allow NULL values, the PRIMARY KEY constraint can’t be created [5]
            • its value is unique within the table
              • uniqueness defined via the UNIQUE constraint
              • ⇒ a unique index is created on the columns that make up the constraint [5]
            • when a PRIMARY KEY or UNIQUE constraint is created, the underlying index structure that’s created is the same as though you had used the CREATE INDEX command directly [5]
              • ⇐ however usage and features have some differences
                • a constraint-based index can’t have other features added (e.g. included columns, filters), while a UNIQUE index can have these features while still enforcing uniqueness over the key definition of the index [5]
                • when referencing a UNIQUE index - which doesn’t support a constraint - through a FOREIGN KEY constraint, indexes can’t be referenced with filters [5]
                  • ⇐ an index that doesn’t use filters or an index that uses included columns can be referenced [5]
            • ⇐ the names of the indexes built to support these constraints are the same as the constraint names [5]
          • {concept} index design
            • designing and using proper indexes is one of the keys to maximizing query performance [1]
              • over-indexing 
                • can be worse than under-indexing
                • extreme case: index every column
              • under-indexing
                • extreme case: having no index
            • very complex, time-consuming, and error-prone process
              • even for moderately complex databases and workloads [12]
            • requires knowing
              • the data
              • the workload
              • how SQL Server works [5]
                • index internals
                • statistics
                • query optimization
                • maintenance
            • {activity} understand the characteristics of the database itself [12]
              • e.g. characteristics of OLAP vs OLTP
            • {activity}understand the characteristics of the most frequently used queries [12]
            • {activity} understand the characteristics of the columns used in the queries [12]
              • data type
              • column uniqueness
                • unique indexes are more useful for the query optimizer than nonunique indexes [12]
              • data distribution in the column
              • NOT NULL values percentage 
              • columns with categories of values
              • columns with distinct ranges of values
              • sparse columns 
              • computed columns
            • {activity} determine which index options might enhance performance when the index is created or maintained [12]
              • e.g. creating a clustered index on an existing large table would benefit from the ONLINE index option 
            • {activity} determine the optimal storage location for the index [12]
              • storage location of indexes can improve query performance by increasing disk I/O performance [12]
                • ⇐ storing a nonclustered index on a filegroup that is on a different disk than the table filegroup can improve performance because multiple disks can be read at the same time [12] 
              • [partitioning] clustered and nonclustered indexes can use a partition scheme across multiple filegroups
                • ⇐ [partitioning] makes large tables or indexes more manageable by letting you access or manage subsets of data quickly and efficiently, while maintaining the integrity of the overall collection
                • {default} indexes created on a partitioned table will also use the same partitioning scheme and partitioning column [9]
                  • ⇐ the index is aligned with the table [9]
                  • allows for easier management and administration, particularly with the sliding-window scenario [9]
            • {guideline} avoid over-indexing heavily updated tables 
              • large numbers of indexes on a table affect the CRUD statements [12]
            • {guideline} use many indexes to improve query performance on tables with low update requirements, but large volumes of data [12]
            • {guideline} indexing small tables may not be optimal 
              • because it can take the query optimizer longer to traverse the index searching for data than to perform a simple table scan [12]
            • {guideline} indexes on views can provide significant performance gains when the view contains
              • aggregations
              • table joins
              • combination of aggregations and joins
              • views does not have to be explicitly referenced in the query for the query optimizer to use it [12]
            • {guideline} keep your indexes as compact and narrow as possible while still meeting the business needs your system was designed to address [2]
              • ⇐ the index needs to be efficient
            • {guideline}use the Database Engine Tuning Advisor to analyze your database and make index recommendations [12]
            • {guideline} create nonclustered indexes on the columns that are frequently used in predicates and join conditions in queries [12]
            • {guideline} write queries that insert or modify as many rows as possible in a single statement
              • instead of using multiple queries to update the same rows
              • exploits optimized index maintenance
            • {guideline} consider using filtered indexes on columns that have well-defined subsets [12]
              • e.g. sparse columns
              • e.g. columns with mostly NULL values
              • e.g. columns with categories of values
              • e.g. columns with distinct ranges of values
              • e.g. consider indexing computed columns
            • {recommendation} use a couple of relatively narrow columns that, together, can form a unique key [11]
              • saves the cost of the uniqueifier in the data pages of the leaf level of your clustered index [11]
              • still results in an increase increase of the row size for your clustering key in the index pages of both your clustered and nonclustered indexes
            • {recommendation} carefully design and determine when and what columns should be included [1]
              • determine whether the gains in query performance outweigh the effect on performance during data modification and the additional disk space requirements [1]
            • {recommendation} keep the index size small
              • tends to be efficient [1]
              • features such as INCLUDE and filtered indexes can profoundly affect the index in both size and usefulness [5]
            • {recommendation} have all the data required by the query obtained from the covering index without touching the clustered index or table data page 
              • results in lesser I/O and better query throughput
              • poorly designed indexes and a lack of indexes are primary sources of database application bottlenecks [12]
              • the right index, created for the right query, can reduce query execution time from hours down to seconds [5]
                • without a useful index the entire table is scanned to retrieve the data
                • in joins some tables may be scanned multiple times to find all the data needed to satisfy the query [2]
            • {recommendation} create a clustered index on all user tables [14]
              • the performance benefits of having a clustered index on a table outweigh the negatives ones [14]
                • tables with clustered indexes have a higher 'page splits/sec' values [14]
                • heaps can have higher throughput (rows/sec) for individual or batch CRUD operations [14]
                • concurrent insert operations performed are often susceptible to encountering hot spots [14]
                  •  the contention at the particular insert location in the table adversely affects performance [14]
              • disk space required is smaller than for a heap [14]
              • deleting rows frees up more disk space when compared to a heap table
                • [clustered indexes] empty extents are automatically deallocated from the object [14]
                • [heaps] extents are not deallocated automatically [14]
                  • extents are held on for later reuse [14]
                    • results in table size bloating [14]
                • the residual disk space for heap tables could be reclaimed by using maintenance tasks [14]
                  • ⇐ requires additional work and processing [14]
            • {recommendation} choose the optimum sort order by examining the individual workload queries [17]
            • {recommendation} limit changes to no more than 1-2 indexes
          • metadata
            • {recommendation}for the data not persisted make periodic copies to use the data for further analysis
            • [SQL Server 2000] sys.sysindexes
              • system table that stores system-level information about indexes
              • a pointer to each index's root node is stored in sysindexes' root column
              • {limitation} doesn't support partition tables
                • use sys.indexes instead
              • {warning} susceptible to be removed in further versions 
            • [SQL Server 2005] sys.indexes
              • contains a row per index or heap of a tabular object, such as a table, view, or table-valued function
              • metadata's visibility is limited to the securables a user either owns or to which has been granted permissions
            • [SQL Server 2005] sys.index_columns
              • contains one row per column that is part of an index or unordered table (heap).
            • [SQL Server 2005] sys.dm_db_index_physical_stats
              • returns the size and fragmentation information for the data and indexes of the specified table or view
              • useful for determining the size and health of indexes [5]
              • replaces DBCC SHOWCONTIG 
              • when the function is called, SQL Server traverses the page chains for the allocated pages for the specified partitions of the table or index [5]
              • has similar parameters as sys.dm_db_index_physical_stats
            • [SQL Server 2008] sys.dm_db_missing_index_group_stats
              • returns summary information about missing index groups [17]
              • updated by every query execution
                • ⇒ not by every query compilation or recompilation
              • not persisted
            • [SQL Server 2008] sys.dm_db_missing_index_groups 
              • returns information about a specific group of missing indexes [17]
                • ⇐ the group identifier and the identifiers of all missing indexes that are contained in that group [17]
              • not persisted
              • {required permissions} VIEW SERVER STATE
              • {required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE
            • [SQL Server 2008] sys.dm_db_missing_index_details 
              • DMV that returns detailed information about missing indexes [17]
              • updated when a query is optimized by the query optimizer
              • not persisted
              • {limitation} the result set is limited to 600 rows
                • ⇒  address the missing indexes issues to see further values
              • {required permissions} VIEW SERVER STATE
              • {required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE STATE on the server
            • [SQL Server 2008] sys.dm_db_missing_index_columns 
              • returns information about the database table columns that are missing an index [17]
              • not persisted
              • {required permissions} VIEW SERVER STATE
              • {required permissions}[SQL Server 2022] VIEW SERVER PERFORMANCE
            • {undocumented} [SQL Server 2012] sys.dm_db_database_page_allocations 
              • returns a row for every page used or allocated based on the given parameters [5]
              • replaces the DBCC IND undocumented command [5]
            • [SQL Server 2019] sys.dm_db_page_info
              • function that returns information about a page in a database
                • returns one row that contains the header information from the page, including the object_id, index_id, and partition_id
              • replaces  DBCC PAGE in most cases
          • {command} DBCC SHOWCONTIG 
            • determines how full the pages in a table and/or index really are
            • shows three types of fragmentation: 
              • extent scan fragmentation
              • logical scan fragmentation
                • the percentage of out-of-order pages in an index [10]
                • not relevant on heaps and text indexes [10]
                • high values for logical scan fragmentation can lead to degraded performance of index scans [10]
            • scan density
              • not valid when indexes span multiple [10]
            • {concept} average page density
              • measure of fullness for leaf pages of an index [10]
              • low values for average page density can result in more pages that must be read to satisfy a query [10]
                • reorganizing the pages so they have a higher page density can result in less I/O to satisfy the same query [10]
                • generally, tables have a high page density after initial loading of data [10]
                  • page density may decrease over time as data is inserted, resulting in splits of leaf pages [10]
                • value dependent on the fillfactor specified when the table was created [10]
            • key indicators
              • Logical Scan Fragmentation 
              • Avg. Page Density
            • {option} WITH FAST 
              • allows DBCC SHOWCONTIG to avoid scanning the leaf pages of an index [10]
                •  it cannot report page density numbers [10]
              • consider using it on a busy server [10]
          • {command} DBCC INDEXDEFRAG
            • defragments index pages
            • defragments only individual indexes [10]
              •  cannot rebuild all indexes with a single statement [10]
            • performs an in-place update of the pages to defragment them [20]
              • ⇐ does not require locking the table for the entirety of the process, [20]
                •  allows users to access the table [20]
              • only defragments the pages within the extents [20]
                • ⇐ it does not defragment the extents [20]
                  • ⇐ doesn’t affect extent scan fragmentation
                •   reduces logical scan fragmentation
            • online operation 
            • skips locked pages as it encounters them
              •  may not be able to completely eliminate fragmentation [18] [6]
            • each unit of work performed occurs as a separate transaction [18]
              •   can be stopped and restarted without losing any work [18]
            • {drawback} does not help to untangle indexes that have become interleaved within a data file [10]
              • even if indexes can be rebuilt so that there is minimal interleaving, this does not have a significant effect on performance [10]
            • {drawback} does not correct extent fragmentation on indexes [10]
            • {drawback} can produce large transaction log files [20]
              • {workaround} decrease the backup interval for transaction backups while DBCC INDEXDEFRAG is running [20]
              • ⇐ keeps the transaction log files from growing too large [20]
            • {drawback} can make extent switching worse [20]
              • because a page order scan, which might be in order after DBCC INDEXDEFRAG, might switch between more extents [20]
              • {phase: 1} compacts the pages and attempt to adjust the page density to the fillfactor that was specified when the index was created
                • attempts to raise the page-density level of pages to the original fillfactor [10]
              • {phase: 2} defragments the index by shuffling the pages so that the physical ordering matches the logical ordering of the leaf nodes of the index [10]
                • ⇐ performed as a series of small discrete transactions
                  • has a small impact to overall system performance [10]
          • {command} DBCC DBREINDEX
            • rebuilds an index or all indexes on a table 
              • ⇐ builds indexes the same way as CREATE INDEX [10]
                • restores the page density levels to the original fillfactor or to the provided value [10]
                • rebuilds statistics automatically during the rebuild 
                  • have dramatic improvements on workload performance [10]
                • can take advantage of multiple-processor computers [10]
                  • can be significantly faster when rebuilding large or heavily fragmented indexes [10]
              • rearranges the extents and the pages within a table so that they are in index order [19]
                • ⇐ pages scan faster when they have been written in heap memory in the same order as their next page's pointers (index order) [19]
                • extents can be paged from disk faster during a scan operation when they're written to storage in the same order as their next extent pointers (index order) [19]
            • offline operation
            • its effectiveness can be influenced by the available free space [10]
              • ⇐ without large enough contiguous blocks of free space, DBREINDEX may be forced to reuse other areas of free space within the data files, resulting in indexes being rebuilt with a small amount of logical fragmentation [10]
              • the amount of free space needed varies and is dependent on the number of indexes being created in the transaction [10]
                • {guideline} [clustered indexes] required free space = 1.2 * (average row size) * (number of rows) [10]
                • [nonunique clustered index] when rebuilt is also need free space for both the clustered and any nonclustered indexes
                  • implicitly rebuilt because SQL Server must generate new unique identifiers for the rows [10]
                  • {guideline} [nonclustered indexes] average row size *number of rows
                    • average row size of each row in the nonclustered index considers the length of the nonclustered key plus the length of clustering key or row ID [10]
            • runs as one atomic transaction [18]
              • when stopped the changes are rolled back [18]
            • [clustered indexes][locking] puts an exclusive table lock on the table [19]
              • prevents any users from accessing the table [19]
            • [nonclustered indexes][locking] puts a shared table lock
              • prevents all but SELECT operations from being performed on it [19]
            • {good practice} specify individual indexes for defragmentation [10]
              • ⇐ gives more control over the operations being performed [10]
              • ⇐ can help to avoid unnecessary work [10]
            • {limitation} does not reduce page density levels on pages that currently have a higher page density than the original fillfactor [10]
            • {advantage} as the fragmentation level and size of the indexes increase, DBCC DBREINDEX can rebuild the indexes much faster than DBCC INDEXDEFRAG [10]
            • via DBCC DBREINDEX (‘<table_name>')

          Acronyms:
          CI - clustered index
          CRUD - Create, Read, Update, Delete
          DBCC - Database Console Command
          DMV - Dynamic Management View
          IAM - Index Allocation Map
          NCI - nonclustered index
          UCI - unique clustered index

          Resources:
          [1] Scalability Experts (2005) "Microsoft® SQL Server 2005: Changing the Paradigm",
          [2] Ken Henderson (2003) "Guru's Guide to SQL Server Architecture and Internals"
          [3] Paul S Randal (2011) SQL Q&A: The Lore of Logs (link)
          [4] Kalen Delaney et al (2009) "Microsoft® SQL Server® 2008 Internals"
          [5] Kalen Delaney (2013) Microsoft SQL Server 2012 Internals
          [6] Kalen Delaney (2006) "Inside Microsoft® SQL Server™ 2005: The Storage Engine"
          [7] Jason Strate & Grant Fritchey (2015) "Expert Performance Indexing in SQL Server"
          [8] Microsoft Learn (2023) CREATE INDEX (link)
          [9] Kimberly L Tripp (2005) Partitioned Tables and Indexes in SQL Server 2005 (link)
          [10] Mike Ruthruff (2009) Microsoft SQL Server 2000 Index Defragmentation Best Practices [white paper]
          [11] Michelle Ufford (2011) Effective Clustered Indexes, (link)
          [12] Technet (2016) SQL Server Index Design Guide (link)
          [13] Bob Dorr (2015) How It Works: MAX DOP Level and Parallel Index Builds (old link)
          [14] Stuart Ozer et al (2007) SQL Server Best Practices Article (link)  
          [15] Microsoft Learn (2023) Optimize index maintenance to improve query performance and reduce resource consumption (link)
          [16] MSDN (2015) Specify Fill Factor for an Index (link)
          [17] Technet (2015) Finding Missing Indexes (link)
          [18] MSDN Blogs (2010) Notes - SQL Server Index Fragmentation, Types and Solutions, by Pankaj Mittal (old link)
          [19] Wayne W Berry (2010) The DBCC DBREINDEX Command
          [20] 
          Wayne W Berry (2010) The DBCC INDEXDEFRAG Command
          [21] Microsoft Learn (2023) SQL Server and Azure SQL index architecture and design guide (link)


          14 January 2024

          🛠️🗒️SQL Server: The Query Plan (Notes)

          Disclaimer: This is work in progress intended to consolidate information from various sources (some unfortunately not available anymore).

          Query plan (aka query execution plan, execution plan)

          • read-only reentrant structure that is shared by multiple users [1]
            • optimized version of the query tree [7]
          • defines
            • the sequence in which the source tables are accessed [7]
            • the methods used to extract data from each table [7]
              • whether it uses an index or a table scan [7]
            • the methods used to compute calculations [7]
            • the methods used to filter, aggregate, and sort data from each table [7]
          • built by the Query Optimizer
            • it chooses a query plan using a set of heuristics to balance compilation time and plan optimality in order to find a good query plan [7]
          • stored in the plan cache
            • remain in the plan cache as long as there is enough memory to store them
              • under memory pressure exists, a cost-based approach is used to determine which execution plans to remove from the plan cache [7]
                • execution plans are examined multiple times until enough have been removed to satisfy memory requirements [7]
                  • ⇒ an execution plan may have its cost increased and decreased more than once [7]
                • the database engine uses the resource monitor and user worker threads to free memory from the plan cache in response to memory pressure [7]
                  • examine plans run concurrently to decrease the current cost for each unused execution plan [7]
                  • the resource monitor removes execution plans from the plan cache when global memory pressure exists [7]
                    • it frees memory to enforce policies for system memory, process memory, resource pool memory, and maximum size for all caches [7]
                  • the user worker threads remove execution plans from the plan cache when single cache memory pressure exists [7]
                    • it enforce policies for maximum single cache size and maximum single cache entries [7]
              • when memory pressure no longer exists, the decreasing of current cost of unused execution plans is stopped [7]
                • all execution plans remain in the plan cache, even if their cost is zero [7]
            • one or all plans can be removed from the plan cache [7]
              • via DBCC FREEPROCCACHE
              • via DBCC FREESYSTEMCACHE can also be used to clear any cache, including plan cache
              • [SQL Server 2016] via ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE
                • clears the procedure (plan) cache for the database in scope [7]
            • changes in some configuration settings via sp_configure and reconfigure cause plans to be removed from plan cache [7]
          • agnostic to both parameter values and user-specific information
            • no user context is stored in the query plan [5]
          • the text of each newly submitted query is hashed and cached in memory, along with the original query text and a link to the execution context and the execution plan that the original query resulted in [2]
            • allows avoid redundantly parsing and optimizing queries [2]
          • must first go through a conversion to be passed to the LPE process to be carried out
            • ⇐ regardless of the stage that actually produces the execution plan
            • this is handled by the conversion stage of the optimization process
            • each optimized query plan is processed by this stage before being returned for execution [3]
          • [SQL Server 2005] is put in the plan cache when a cacheable batch is submitted for execution and compiled [1]
            • a portion from the buffer pool is used for caching query plans (aka plan cache) [3]
          • [SQL Server 2005] there are at most two instances of a query plan at any time in plan cache [1]
            • one for all of the serial executions 
            • one for all of the parallel executions
              • common for all parallel executions, regardless of their degrees of parallelism [5]
              • a necessary condition to compile a parallel query plan is that the minimum of the number of processors must be more than one after having survived:
                • the processor affinity mask
                • the 'max degree of parallelism' server-wide option (possibly set using the "sp_configure" stored procedure)
                • [SQL Server 2008] the MAXDOP limits imposed on the resource pool that the query is executing within, by the Resource Governor
                • any MAXDOP hint on the query
            • if two identical queries posed by the same user using two different sessions with the same session options arrive simultaneously, two query plans exists while they execute [3]
              •  at the end of their executions, the plan for only one of them is retained in the plan cache [3]
          • can be cached (aka query plan caching)
            • query plans are cashed for many types of batches [3]
              • an exact text match for two ad-hoc queries before reuse can occur [3]
                • the text match is both case- and space-sensitive, even on case-insensitive servers [3]
            • if a query plan is not cached, its reuse opportunity is zero [3] 
              • such a plan will be compiled every time it is executed, potentially resulting in poor performance [3]
              • in some cases, non-caching is a desirable outcome [3]
          • can be reused (aka query plan reuse)
            • when the plan cache is searched for plan reuse opportunities, the comparisons are against query plans, not against execution contexts [3]
            • once a reusable query plan is found, an available execution context is found (causing execution context reuse) or freshly generated [3]
            • does not necessarily imply execution context reuse [3]
            • query plans should be reused whenever possible and optimal to avoid unnecessary query compilation costs [3]
            • factors that affect plan-reuse:
              • [stored procedures and triggers] the execution plan is executed separately from the execution plan for the batch calling the stored procedure or firing the trigger [7]
                • allows for greater reuse of their execution plans [7]
                • a query plan for a stored procedure is not reused when executed across different databases
                  • doesn’t apply to ad-hoc queries, prepared queries, or dynamic SQL
                • trigger execution
                  • the number of rows affected by that execution (1 versus n) is a distinguishing factor in determining a plan cache hit
                  • ⇐ specific to triggers, and does not apply to stored procedures.
                  • [SQL Server 2005] INSTEAD OF triggers
                    • the "1-plan" is shared by executions that affect both 0 and 1 row
                  • [SQL Server 2005] non-INSTEAD OF ("after") triggers
                    • "1-plan" is only used by executions that affect 1 row and "n-plan" is used by executions that affect both 0 and n rows (n > 1)
                  • triggers associated with bulk inserts are cached.
              • bulk insert statements are never cached 
              • a batch that contains any one literal longer than 8 KB is not cached
                • a literal's length is measured after constant folding is applied
              • batches flagged with the "replication flag" associated with a replication user are not matched with batches without that flag.
              • batch called from CLR is not matched with the same batch submitted from outside of CLR
                •  ⇐ applies to CLR triggers and non-CLR triggers
                • two CLR-submitted batches can reuse the same plan. 
              • notification queries and non-notification queries
              • query plans for queries submitted via sp_resyncquery are not cached
                • if the query is resubmitted (via sp_resyncquery or otherwise), it needs to be compiled again
              • batches within a  cursor with non-cursor batches 
              • certain SET options (see [3])
                • {best practice} to avoid SET option-related recompilations, establish SET options at connection time, and ensure that they do not change for the duration of the connection [3]
              • batches with unqualified object names result in non-reuse of query plans. 
              • stored procedure created with "CREATE PROCEDURE ...WITH RECOMPILE
              • stored procedure executed using "EXEC ... WITH RECOMPILE"
          • trace flag 8675
            • determines when the optimizer times out while generating a query plan [2]
            • tells when the optimizer reaches the memory limit to which SQL Server restricts it [2]
              • ⇐ about 80% of the BPool
          • {concept} trivial plan 
            • results when the query optimizer determines that given the tables referenced in the query and the indexes existing on them, only one plan is possible [3]
              • usually a single table is referenced
              • doesn't require any cost estimation or comparisons between plans [2]
            • in XML plan will appear statementOptmLevel="TRIVIAL"
            • a query that has generated a trivial plan may not always generate a trivial plan
              • e.g. new indexes might be created on the underlying tables, and so multiple access paths become available to the query optimizer
            • may be generated for certain types of covered queries, joins between tables with constraint relationships, DML statements [2]
            • a query that contains a subquery is not eligible for a trivial plan [2]
            • using a variable in the query doesn't impair the optimizer's ability to select a trivial plan [12]
              • regardless of variable's value at runtime, a simple equality operator is being used to compare the variable with a uniquely indexed column, and the optimizer intrinsically knows that a trivial plan is the best option [2]
            • 8759 trace flag
              • allows detecting when a trivial plan has been chosen [2]
              • {enabled} the optimizer writes the first portion of a query for which a trivial plan is created to the error log [2]
          • {concept} execution context
            • data structure that holds parameter values and user-specific information specific to the executed query
            • derived from a query plan 'on the fly' [3]
              • before a batch execution begins, a skeleton execution context is generated
                • as execution proceeds, the necessary execution context pieces are generated and put into the skeleton
                  • ⇒ two execution contexts need not be identical even after user-specific information and query parameters are deleted from them
                • because structures of execution contexts derived from the same query plan can differ from one another, the execution context used for a particular execution has slight impact on performance [3]
                  • the impact of the differences diminishes over time as the plan cache gets 'hot' and as a steady state is reached [3]
              • holds the values that are needed for a specific execution of a query plan [3]
              • multiple execution contexts may exist for a query plan
              • ⇒ it cannot exist in the plan cache without an associated query plan [3]
                • ⇒ whenever a query plan is removed from plan cache, all of the associated execution contexts are also removed along with it [3]
            • cached (aka execution context caching)
            • can be reused (aka execution context reuse)
            • each user concurrently executing a batch will have an execution context 
            • is not reentrant 
              • ⇐ at any point of time, an execution context can execute only one batch submitted by a session
                • ⇐ while the execution is happening, the context is not given to any other session or user [3]
            • if a batch execution generates an error of severity 11 or higher, the execution context is destroyed [3]
              • ⇒ even in the absence of memory pressure the number of execution contexts (for a given query plan) cached in plan cache can go up and down [3]
            • if a parallel query plan is compiled, a serial execution context may be derived from the parallel plan at execution time by disabling parallelism-specific plan operators [3]
            • execution contexts that are associated with parallel branches of the parallel plan are not cached, however serial branches of the parallel plan are cached [3]
          • {concept} plan cost
            • associated with query plans and execution contexts
            • partially controls how long the plan or context will live in the plan cache
            • the cost of an ad-hoc query is zero
            • [SQL Server 2000] for a query plan, the cost is a measure of the server resources (CPU time and I/O) it takes the query optimizer to optimize the batch
              • for ad-hoc queries, the cost is zero
            • [SQL Server 2000] for an execution context, the cost is a measure of the server resources (CPU time and I/O) it takes for the server to initialize the execution context so that the individual statements become ready for execution
              • doesn't include costs (CPU and I/O) incurred during batch executions
              • typically, execution context costs are lower than query plan costs 
            • [SQL Server 2005] the cost of a query plan is a measure of the amount of resources required to produce it
              • the cost is calculated in 'number of ticks' with a maximum value of 31
                • the individual parts of the cost are calculated as follows [3]
                  • two I/Os cost 1 tick, with a maximum of 19 ticks
                  • two context switches cost 1 tick, with a maximum of 8 ticks
                  • sixteen memory pages (128 KB) cost 1 tick, with a maximum of 4 ticks
                • as soon as the size of the plan cache reaches 50% of the buffer pool size, the next plan cache access decrements the ticks of all of the plans by 1 each [3]
                  • because this decrement is piggybacked on a thread that accesses the plan cache for plan lookup purpose, the decrement can be considered to occur in a lazy fashion [3]
                • if the sum of the sizes of all of the caches reaches or exceeds 75% of the buffer pool size, a dedicated resource monitor thread gets activated, and it decrements tick counts of all of the objects in all of the caches
                  • this thread's behavior approximates that of the lazy-writer thread in SQL Server 2000
              • cost = I/O cost + context switch cost (a measure of CPU cost) + memory cost
              • the plan cache is distinct from the data cache
              • there are other functionality-specific caches
              • the lazy-writer process does not decrement costs 
            •  query plan reuse causes the query plan cost to be reset to its initial value [3]
          • {concept} query optimization
            • the multi-phased process of selecting a 'good-enough' query execution plan from potentially many possible plans  [7]
            • involves the overhead of selecting the query and selecting a plan
              • the overhead is typically saved several-fold when the query optimizer picks an efficient execution plan [7]
          • {concept} [SQL Server 2022] optimized plan forcing 
            • part of intelligent query processing family of features
            • reduces compilation overhead for repeating forced queries [7]
              • requires the Query Store to be enabled and in 'read write' mode
              • once the query execution plan is generated, specific compilation steps are stored for reuse as an optimization replay script [7]
                • stored as part of the compressed showplan XML in Query Store, in a hidden OptimizationReplay attribute [7]
          • {concept} estimated execution plan
            • the compiled plan, as produced by the query optimizer [7]
          • {concept} actual execution plan
            • the compiled plan plus its execution context [7]
            • includes runtime information available after the execution completes [7]
              • e.g. execution warnings, or in newer versions of the Database Engine, the elapsed and CPU time used during execution [7]
          • {concept} Live Query Statistics
            • is the same as the compiled plan plus its execution context
            • includes runtime information during execution progress [7]
              • e.g. the actual number of rows flowing through the operators
            • updated every second. Runtime information includes for example [7]
          • {concept} query plan hash
            • a binary hash value calculated on the execution plan for a given query [7]
            • used to uniquely identify similar execution plans [7]
          • {concept} query hash
            • a binary hash value calculated on the Transact-SQL text of a query [7]
            • used to uniquely identify queries [7]
          • {concept} plan cache 
            • part of the memory pool that is used to store execution plans [7]
            • has two stores for all compiled plans
              • Object Plans cache store (OBJCP) 
                • used for plans related to persisted objects (stored procedures, functions, and triggers) [7]
              • SQL Plans cache store (SQLCP) 
                • used for plans related to auto-parameterized, dynamic, or prepared queries [7]
            • has two additional stores that don't store compiled plans
              • Bound Trees cache store (PHDR) 
                • used for data structures (aka bound trees, algebraizer trees) used during plan compilation for views, constraints, and defaults [7]
              • Extended Stored Procedures cache store (XPROC) 
                • used for predefined system procedures that are defined using a DLL, not using Transact-SQL statements [7]
                  • e.g. sp_executeSql or xp_cmdshell
                    • the cached structure contains only the function name and the DLL name in which the procedure is implemented [7]
          • {concept} plan handle
            • transient identifier that allows to retrieve a compiled plan from the plan cache [7] 
              • remains constant only while the plan remains in the cache [7]
            • hash value derived from the compiled plan of the entire batch [7] 
            • remains the same even if one or more statements in the batch get recompiled [7]
            • if a plan was compiled for a batch instead of a single statement, the plan for individual statements in the batch can be retrieved using the plan handle and statement offsets [7]
          • {concept} recompile execution plans
            • certain changes in a database can cause an execution plan to be either inefficient or invalid, based on the new state of the database [7]
              • the changes that invalidate an execution plan are detected and the plan are marked as not valid [7]
                •  ⇒ a new plan must then be recompiled for the next connection that executes the query [7]
              • conditions that invalidate a plan include the following:
                • changes made to a table or view referenced by the query [7]
                  • via ALTER TABLE or ALTER VIEW
                • changes made to a single procedure, which would drop all plans for that procedure from the cache [7]
                  • via ALTER PROCEDURE
                • changes to any indexes used by the execution plan [7]
                • updates on statistics used by the execution plan, generated either explicitly from a statement or generated automatically.
                • dropping an index used by the execution plan.
                • an explicit call to sp_recompile.
                • large numbers of changes to keys 
                  • generated by INSERT or DELETE statements from other users that modify a table referenced by the query [7]
                • for tables with triggers, if the number of rows in the inserted or deleted tables grows significantly [7]
                • executing a stored procedure using the WITH RECOMPILE option [7]
            • most recompilations are required either for statement correctness or to obtain potentially faster query execution plans [7]


          Acronyms
          MAXDOP - Maximum Degree of Parallelism

          Resources:
          [1] Scalability Experts, "Microsoft® SQL Server 2005: Changing the Paradigm", 2005
          [2] Ken Henderson (2004) The Guru's Guide to SQL Server Architecture and Internals 55/12
          [3] Greg Low, Plan Caching and Recompilation in SQL Server 2012, 2013 [3]
          [4] MCITP Self-Paced Training Kit Exam 70-442
          [5] Microsoft Learn (2012) Execution Plan Caching and Reuse (link)
          [6] Microsoft Learn (2023) Execution plan overview (link)
          [7] Microsoft Learn (2023) Optimized plan forcing with Query Store (link)
          [8] Microsoft Learn (2023) Query processing architecture guide (link)


          10 January 2024

          📦Data Migrations (DM): Conceptualization (Part VIII: Create a Data Migration Strategy for Dynamics 365)

          Data Migration
          Data Migration Series

          Microsoft made available a module on creating a Data Migration strategy (DMS) for Dynamics 365 solutions [1] and provides a template for the DMS workshop on GitHub. Despite the all-encompassing title, the module focuses mainly on the latter. Here are several points to consider in addition.

          The workshop allows to gather most important aspects related to the DM, however it's useful to make the strategy explicit as part of a document and share it with the team. The DMS document should follow the general elaboration of a strategy by providing information about the current and targeted state, goals and objectives, assumptions, dependencies, risks and opportunities, the architecture (including alternatives that need to be explored), direct and interfacing processes, roles involved and their responsibilities, communication paths (including escalation), etc. 

          Even if the template discusses general aspects, the treatment of best practices is minimal (see some technical and architectural aspects).

          The DMS should describe when possible the high-level architecture planned for DM. The architecture section can be updated as the project progresses. 

          One can plan for 3-4 dry-runs plus the UAT, though further iterations might be needed depending on the data quality, respectively the complexity of the logic needed. Each dry-run will have its own scope (in terms of entities and number of records) and objectives (in terms of passed tests), while the UAT attempts to mimic the same conditions the as DM for Go-Live.

          Several iterations are usually needed to identify the entities in scope. Moreover, the scope will suffer several changes as new entities and migration areas are identified. There are high the chances that some data entities will be forgotten or postponed after Go Live. Upon case, the business might approach a phased implementation, each with their own DM,

          It's useful to have a list of the entities available in the organization and map them to the entities available in the target system(s). This allows to identify the gaps on the DM side. There will be entities for which data need to be provided, entities for which there's no counterpart in the target system, respectively entities with different levels of detail.

          An important step of the process is defining the business rules that apply - what subset of data is migrated (e.g. active referenced master data, open transactions, etc.). The definitions will suffer further changes as business perception about what's needed changes.

          As they'll go through multiple changes, it's useful to have such lists in Excel or similar repositories and reference them in the strategy and further documents. 

          The strategy can also target the post-Go-Live migration activities, though the process and architecture might be slightly different, even if many of the requirements still apply. One can devise the further aspects as part of the DMS or consider them as part of the Data Management strategy and operations.

           Despite the considerable overlapping of the DM with Data Management, at least until now the DM is not considered as part of the latter. Big organizations with periodic DMs might need to consider a broader strategy, following to address punctually the deviations from each project. 

          The DM is usually considered as a subproject of the implementation and should be managed as such. Depending on its scope and characteristics (principles, acceptance criteria) the DM might use a different methodology and timeline than the main project. Defining project's goals/objectives, outcomes, stakeholders, timeline, phases, and milestones, respectively the high-level assumptions, dependencies, risks, opportunities and constraints is Project Management 101.

          The DM is critical for an implementation's success. It would be useful to describe the architectural alternatives, as well to provide a series of architectural recommendations.

          Previous Post <<||>> Next Post

          Resources:
          [1] Microsoft learn (2023) Create a data migration strategy for Dynamics 365 solutions (link)

          08 January 2024

          🗒️Dynamics 365: Fixed Assets (Notes)

          Disclaimer: This is work in progress intended to consolidate information from various sources.

          Fixed Assets (aka FA)

          • assets that a company owns and uses in the daily operations of the company and are not intended for resale to customers [1]
            • ⇐ the same asset might be classified as a FA in one company and as a current asset or inventory item in another [1]
          • represent permanent value and not just expenditures in the year of acquisition [1]
          • typically depreciated, or expensed, over their useful life
            • other adjustments might also be necessary
            • their useful lives span multiple years [1]
          • treated as balance sheet transactions in the year that they are acquired, and are posted as an asset to a balance sheet account [1]
          • {action} review asset value in the balance sheet (net book value) 
            • should be reviewed at least one time each year  [1]
              • it can be reviewed monthly, quarterly, semi-annually, or annually [1]
            • an adjustment of the asset value in the balance sheet (write-down or write-up) might be necessary  [1]
              • the adjustments are usually caused by some extraordinary occurrences in the market that affect the price if the company were to reacquire the asset. [1]
              • ⇐ accounting principles in some countries or regions prohibit the write-up of assets [1]
          • {action} remove asset from the accounting books
            • when a company no longer has use for the asset [1]
              • ⇐ because it is either being sold or scrapped [1]
            • the original acquisition price and accumulated depreciation of the asset are reversed, and any surplus or loss from the disposal is posted to the profit and loss statement [1]
          • {action} dispose asset
            • the fixed asset transactions that were previously posted are reversed out of the original accounts [1]
            • the net amounts are then moved to the appropriate account for gain and loss for asset disposal [1]
          • Fixed asset groups 
            • allow specifying default attributes for every FA that is assigned to a group [1]
              • include the service life and the depreciation convention.
              • FAs are assigned to a group when they are created [1]
            • the only required field when creating a FA [1]
          • Books 
            • track the financial value of a fixed asset over time by using the depreciation configuration that is defined in the depreciation profile
              • ⇐ each book tracks an independent financial lifecycle of an asset [1]
            • assigned to fixed asset groups
              • {default} the books assigned to the FA group are then assigned to the FA [1]
            •  books that are configured to post to the GL are associated with a posting profile
              • GL accounts are defined for each book in the posting profile and are used when fixed asset transactions are posted [1]
              • ⇐ the configuration is the default setting because it's typically used for corporate financial reporting [1]
            • books that don't post to the GL post only to the Fixed asset subledger and are typically used for tax reporting purposes [1]
              • this approach allows the flexibility to delete historical transactions for the asset book because they haven't been committed to GL [1]
              • {field} Post to general ledger
                • {No}disables posting to the general ledger for the book
                  • sets the Posting layer field automatically to None
            • each book that is attached to a FA is set up for a particular posting layer that has an overall depreciation objective [1] 
            • derived books
              • setup in addition to the standard books [1]
              • simplify the posting of FA book transactions that are planned for regular intervals
                • one book is chosen as the primary book, which is usually the one used for accounting depreciation [1]
                  • then attach to it other books that are set up to post transactions in the same intervals as the primary book [1]
                    • ⇐ tax depreciation books are often set up as derived books [1]
                • allow posting transactions to different posting layers at the same time [1]
                • acquisitions, acquisition adjustments and disposals can be set up with it
                  • use acquisition when the book and the derived book should be applied to the FA from the time of the fixed asset acquisition [1]
                  • if the primary book and the derived books have the same intervals regarding sale or disposal, all FA transaction types are available for the setup of a derived book [1]
                  • depreciation posted in the derived book will be the same amount as was posted for the primary book [1]
                    • if the depreciation methods are different between the books, one should not generate depreciation transactions by using the derived process [1]
                      • ⇐ the books can then be attached to specific fixed assets [1]
              • the specified derived transactions are posted against the derived books as an exact copy of the primary transaction [1]
                • ⇒ derived transactions are typically set up for acquisitions and disposals, not for depreciation transactions [1]
              • when posting transactions for a book that contains derived books, the derived book transactions are posted automatically in journals, purchase orders, or free text invoices [1]
              • when preparing the primary book transactions in the FA journal, one can view and modify the amounts of the derived transactions before posting them [1]
              • certain accounts are updated only once by postings of the primary book [1]
                • e.g. sales tax and customer or vendor accounts
              • derived book transactions are posted to the accounts that have been defined for the derived book in the FA posting profiles page [1]
              • {exception} books that are set up to post transactions at intervals other than the primary book intervals must be attached to the FA as separate books and not as derived books [1]
          • depreciation
            • an entry that expenses the part of the asset's original purchase price that was used during the year [1]
            • FAs are depreciated in different ways for different purposes
              • depreciation for tax purposes 
                • calculated by using current tax rules to achieve the highest possible depreciation before taxes [1]
              • depreciation for reporting purposes
                • calculated according to accounting laws and standards [1]
              • the various kinds of depreciation are calculated and recorded separately in the posting layers [1]
          • depreciation methods
            • straight line
              • computed by taking the costs of the acquisition and dividing those costs by the expected service life of the asset [1]
                • the rules are usually defined in the local legislation [1]
          • depreciation profiles 
            • allow configuring via rules how the value of an asset is depreciated over time
              • it's needed to define the method of depreciation, the depreciation year (calendar year or fiscal year), and the frequency of depreciation [1]
            • a primary depreciation profile is assigned to every book
              • ⇐ books also have an alternative (aka switchover depreciation) profile, if this type of profile is applicable [1]
                • the depreciation proposal will switch to this profile when the alternative profile calculates a depreciation amount that is equal to or greater than the default depreciation profile [1]
              • ⇐ enable the Calculate depreciation option to automatically include the FA book in depreciation runs [1]
            • usually refers to tangible assets, but intangible assets can also use depreciation profiles [1]
            • financial assets are considered non-depreciable [1]
              •  e.g. shares of stocks or bonds 
            • extraordinary depreciation profile
              • used for additional depreciation of an asset in unusual circumstances [1]
                • e.g. record depreciation that results from a natural disaster
            • depreciation calculation methods:
              • Straight line service life
              • Reducing balance
              • Manual
              • Factor
              • Consumption
              • Straight line life remaining
              • 200% reducing balance
              • 175% reducing balance
              • 150% reducing balance
              • 125% reducing balance
              • ⇐ most companies use one or more of the straight-line methods, one or more of the reducing balance methods, or the manual method [1]
              • ⇐ the rationale for all methods is to allocate or accrue the depreciable value of the asset into accounting periods [1]
                • the depreciable value of the asset is the acquisition price reduced by a scrap value, if any
              • ⇐ many depreciation profiles with the same depreciation method can be created for different calculations [1]
          • posting profile 
            • must be defined for each book that will post to GL, but it can also be defined at a more detailed level [1]
              • e.g. can be defined for the combination of a book and a FA group, or even for an individual FA book [1]
              • Set up a posting profile record general ledger
            • for each posting profile, one must select the relevant transaction type and book, and then designate the ledger accounts [1]
          • {option} Create depreciation adjustments with basis adjustments 
            • {selected} depreciation adjustments will be automatically created when the value of the asset is updated [1]
            • {not selected} the updated asset value will only affect depreciation calculations going forward [1]
            • {not enabled} for an asset, the depreciation proposal skips the asset [1]
          • {optional} define special depreciation allowances, or bonus depreciation, for a specific combination of a fixed asset group and a book [1]
            • assign a priority to the special depreciation allowance to specify the order in which allowances are calculated when multiple allowances are assigned to a book [1]
          • Journal names 
            • create the journal names that should be used with the FA journal.
              • set the Journal type field to 'Post fixed assets'
              • set the Voucher series field so that the journal names are used for the FA journal [1]
              • FA journals should not use the One voucher number only setting
                • ⇐ because a unique voucher number is required for several automated processes, such as transfers and splits [1]
            • each journal used for depreciation is defined by its journal name for only one posting layer [1]
              • the posting layer in the journal can't be changed [1]
                • the restriction helps guarantee that transactions for each posting layer are kept separate [1]
              • at least one journal name must be created for each posting layer [1]
                • for books that don't post to GL create a journal where the posting layer is set to None [1]
          • {parameter} Capitalization threshold
            • determines the assets that are depreciated
            • if a purchase line is selected as a fixed asset, but it doesn't meet the specified capitalization threshold, a fixed asset is still created or updated, but the Calculate prorated depreciation option is set to No. Therefore, the asset won't be automatically depreciated as part of the depreciation proposals.
          • {option} Automatically create depreciation adjustment amounts with disposal
            • {Yes}, the asset depreciation is automatically adjusted, based on the depreciation settings at the time of asset disposal [1]
          • [Purchase orders] 
            • {option} Allow asset acquisition from Purchasing
              • {Yes} asset acquisition occurs when the invoice is posted [1]
              • {No} the FA can be put on a PO and invoice, but the acquisition won't be posted [1]
                • the posting must be done as a separate step, from the FA journal
            • {option} Create asset during product receipt or invoice posting 
              • {enabled} a new asset is created while posting
                • ⇒ the asset doesn't have to be set up as a fixed asset before the transaction
          • [Purchase requisitions] {option} Check for fixed assets creation during line entry
          • Fixed asset number sequence 
            • can be overridden by the FA group number sequence if it has been specified [1]
          • depreciation conventions for groups of assets that are part of the same book
            • {action} updating the depreciation convention
              • delete all depreciation transactions that exist for those assets [1]
              • delete all transactions for depreciation adjustments, transactions for bonus depreciation, and transactions for extraordinary depreciation for those assets [1]
            • {action} update the depreciation convention for assets that have already been disposed
              • delete first the existing disposal transactions [1]
              • delete all transactions that were generated because of the disposal process [1]
            • {action} after updating the depreciation convention for assets
              • process depreciation and extraordinary depreciation for each asset [1]
              • make manual depreciation adjustments, if any adjustments are required [1]
          • [General Ledger] main accounts 
            • {default} the ledger accounts that are defined are used for your FA transactions [1]
            • {recommendation} define the ledger accounts that are used during the disposal processes, both disposal sales and disposal scraps [1]
            • {recommendation} to help guarantee that transactions are correctly reversed, set up accounts for each type of transaction that you use in your business [1]
              • the main account should be the original account set on the posting profile for the transaction type, and the offset account should the gain and loss for disposal account [1]
              • {exception} for the net book value, the main account and the offset account should be set to the gain and loss for disposal account [1]
              • various combinations of main accounts can be created depending on the level of detail needed in GL [1]
            • can be based on transaction types, books, and other main accounts [1]
            • the value of all FAs is typically summarized in multiple main accounts that are required for financial reporting [1]
            • each time a transaction is posted for a FA, the appropriate main accounts are updated
              • ⇐ the main accounts always show the updated value [1]
            • the main accounts that are used for posting are determined by the options for ledger integration that are specified for the item model group [1]
              • the main accounts that are used vary, depending on whether an asset is assigned to the purchase order line [1]
              • the accounts are derived from the posting profile for each item group [1]
          • [Inventory management]
            • inventory items can be transferred to FAs either as an acquisition or as part of an acquisition [1]
              • an inventory item becomes a FA acquisition through 
                • inventory journal
                  • one can enter the acquisition of FA that the LE has produced or constructed for itself [1]
                • PO
                  • one PO line creates one fixed asset, regardless of the quantity [1]
                  • items can be set up for both resale and internal use [1]
                    • using specific receipt and issue accounts on item groups for FAs, one can use the same inventory item both for internal purchases and as stock for resale [1]
                • acquisition proposal
                • ⇐ with it a FA book acquisition transaction is created [1]
                  • if a book acquisition includes a derived book, the derived book acquisition transaction is also created [1]
            • the effect the acquisition of fixed assets has on inventory depends on LE's setup [1]
              • posting rules control the decrease in inventory when an acquisition is posted [1]
                •  the inventory isn't always decreased when posting invoices related to FA [1]
            • FAs for internal use must have an account type of 
              • 'Fixed asset receipt'
                • used to track FA's receipt
                • when posting a Vendor invoice, use the FA receipt account if any of conditions hold:
                  • the invoice line contains an existing FA for internal purposes [1]
                  • the New fixed asset? check box is selected for the product receipt line that is posted [1]
                  • the 'Create a new fixed asset' check box is selected for the vendor invoice line [1]
                • typically, this account is an expense account
              • 'Fixed asset issue' 
                • used to track the issuing of the FA to the recipient 1[]
                • when an asset is acquired by using a PO, the FA issue account offsets the FA debit account [1]
                • the asset acquisition can be posted either when posting the Vendor invoice or when posting the asset acquisition in the FA journal, possibly by using an acquisition proposal [1]
              • ⇐ setup for either an item group or an individual item by using the Purchase order tab on the Item group or Posting page [1]
            • if an inventory reservation exists when product receipts are posted, one can’t assign or create a FA from the line [1]
            • the accounts that FA transactions are posted to depend on two factors:
              • whether the assets are purchased or constructed by the LE [1]
              • the transaction type of the asset
                • connects the inventory transaction to the posting profile in FAs [1]
                • because the posting profile in FAs defines which accounts are updated, the selection of a transaction type for a FA is also, indirectly, the selection of the main accounts that the transaction is posted to [1]
                • for both constructed and purchased fixed assets, the transaction type is typically Acquisition or Acquisition adjustment [1]
          • [Accounts receivable]
            • the integration uses posting profiles that are set up in FAs
              • activated when a FA, book, and FA transaction type is selected for a Customer invoice before the customer invoice is posted [1]
              • selling a FA must be done via Free text invoices
                • ⇐ because FAs aren’t part of Inventory management [1]
          • [Accounts payable]
            • typically, FAs are acquired from external vendors
          • asset acquisitions are posted either
            • when posting the Vendor invoices
              • FA accounts are updated whenever a Vendor invoice for a FA acquisition is posted [1]
              • the transaction is posted according to the posting profiles that are set up in FAs for the various FA transaction types [1]
            • from FAs
          • [Project management and accounting]
            • a project can be associated with a FA asset that is affected by the project
              • each phase, task, or subproject can be associated to a different asset [1]
              • one FA can be associated with each project record [1]
              • the association is created via the Fixed asset number field on the Projects page [1]
                • typically, applies to projects related to work, maintenance, or improvements for the FA [1]
              • the project type must be either Internal or Cost project
              • clear the Fixed asset number field on the Projects page to delete the association [1]
            • when the project is completed, a write-up adjustment for the asset isn’t created automatically [1]
              • ⇒ must be created manually [1] 
            • one can designate a FA created or manufactured as part of an estimate project [1] 
              • at the end of an estimate project, one can automatically post a FA acquisition transaction [1]

          Acronyms:
          FA - Fixed Asset
          GL - General Ledger
          LE - Legal Entity
          PO - Purchase Order

          Resources:
          [1] Microsoft Learn (2023) Configure fixed assets management in Dynamics 365 Finance [link]

          02 January 2024

          🕸Systems Engineering: Never-Ending Stories in Praxis (Quote of the Day)

          Systems Engineering
          Systems Engineering Cycle

          "[…] the longer one works on […] a project without actually concluding it, the more remote the expected completion date becomes. Is this really such a perplexing paradox? No, on the contrary: human experience, all-too-familiar human experience, suggests that in fact many tasks suffer from similar runaway completion times. In short, such jobs either get done soon or they never get done. It is surprising, though, that this common conundrum can be modeled so simply by a self-similar power law." (Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990)

          I found the above quote while browsing through Manfred Schroeder's book on fractals, chaos and power laws, book that also explores similar topics like percolation, recursion, randomness, self-similarity, determinism, etc. Unfortunately, when one goes beyond the introductory notes of each chapter, the subjects require more advanced knowledge of Mathematics, respectively further analysis and exploration of the models behind. Despite this, the book is still an interesting read with ideas to ponder upon.

          I found myself a few times in the situation described above - working on a task that didn't seem to end, despite investing more effort, respectively approaching the solution from different angles. The reasons residing behind such situations were multiple, found typically beyond my direct area of influence and/or decision. In a systemic setup, there are parts of a system that find themselves in opposition, different forces pulling in distinct directions. It can be the case of interests, goals, expectations or solutions which compete or make subject to politics. 

          For example, in Data Analytics or Data Science there are high chances that no progress can be made beyond a certain point without addressing first the quality of data or design/architectural issues. The integrations between applications, data migrations and other solutions which heavily rely on data are sensitive to data quality and architecture's reliability. As long the source of variability (data, data generators) is not stabilized, providing a stable solution has low chances of success, no matter how much effort is invested, respectively how performant the tools are. 

          Some of the issues can be solved by allocating resources to handle their implications. Unfortunately, some organizations attempt to solve such issues by allocating the resources in the wrong areas or by addressing the symptoms instead of taking a step back and looking systemically at the problem, analyzing and modeling it accordingly. Moreover, there are organizations which refuse to recognize they have a problem at all! In the blame game, it's much easier to shift the responsibility on somebody else's shoulders. 

          Defining the right problem to solve might prove more challenging than expected and usually this requires several iterations in which the knowledge obtained in the process is incorporated gradually. Other times, one attempts to solve the correct problem by using the wrong methodology, architecture and/or skillset. The difference between right and wrong depends on the context, and even between similar problems and factors the context can make a considerable difference.

          The above quote can be corroborated with situations in which perfection is demanded. In IT and management setups, excellence is often confounded with perfection, the latter being impossible to achieve, though many managers take it as the norm. There's a critical point above which the effort invested outweighs solution's plausibility by an exponential factor.  

          Another source for unending effort is when requirements change frequently in a swift manner - e.g. the rate with which changes occur outweighs the progress made for finding a solution. Unless the requirements are stabilized, the effort spirals towards the outside (in an exponential manner). 

          Finally, there are cases with extreme character, in which for example the complexity of the task outweighs the skillset and/or the number of resources available. Moreover, there are problems which accept plausible solutions, though there are also problems (especially systemic ones) which don't have stable or plausible solutions. 

          Behind most of such cases lie factors that tend to have chaotic behavior that occurs especially when the environments are far from favorable. The models used to depict such relations are nonlinear, sometimes expressed as power laws - one quantity varying as a power of another, with the variation increasing with each generation. 

          Previous Post <<||>> Next Post

          Resources:
          [1] Manfred Schroeder, "Fractals, Chaos, Power Laws Minutes from an Infinite Paradise", 1990 (quotes)

          31 December 2023

          🛠️🗒️SQL Server: The Buffer Pool (Notes)

          Disclaimer: This is work in progress intended to consolidate information from various sources (some unfortunately not available anymore).

          Buffer Pool  [aka BP, BPool]

          • main memory component in SQL Server [2]
            • ⇐ preferred memory allocator for the whole server [1]
            • ⇐ the biggest consumer of SQL Server 
          • area of memory that SQL Server used to store
              • cached data (aka data pages)
              • cached execution plans 
              • algebrizer trees for views, constraints and defaults
              • lock memory
              • ⇐ anything with a non-zero value in sys.dm_os_memory_clerks.single_pages_kb
            • ⇐ caching data in the buffer pool reduces the load on the I/O subsystem and improves performance [6]
            • ⇐ all pages must be copied into the buffer pool before they can be used in a query
              •  ⇒ it is needed to scan the buffer pool for a number of operations [3]
            • single-page allocator
              • it commits and decommits memory blocks of 8KB granularity only
            • manages disk I/O functions for bringing data and index pages into the data cache so data can be shared among users [2]
            • all memory not used by another memory component remains in the buffer pool to be used as a data cache for pages read in from the database files on disk [2]
              • when other components require memory, they can request a buffer from the buffer pool [2]
                • most of the buffers taken from the buffer pool for other memory components go to other kinds of memory caches [2]
              • ⇐ they can only get blocks of 8KB in size [1]
                • these blocks are not continues in memory[1]
                • allocation for large buffers will be satisfied by memory node's multi-page allocator or by virtual allocator [1]
                  • ⇒ memory will be allocated outside of Buffer Pool [1]
                • BP can be used as underneath memory manager for SQL Server components as long as they allocate buffers of 8KB [1]
                  • pages allocated from BP are referred as stolen[1]
            • first decides how much of VAS it needs to reserve for its usage (aka target memory)
              • decision based on internal memory requirements and external memory state
              • it calculates its target amount of memory it thinks it should commit before it can get into memory pressure [1]
                • ⇐ to keep the system out of paging the target memory is constantly recalculated [1]
                • {restriction} the target memory can't exceed max memory that represents max server memory settings [1]
                • even if the min server memory equals the max server memory BP commits its memory on demand [1]
                  • ⇐ monitor the corresponding profiler event to observe this behavior [1]
            • reserves all of it right a way
                • ⇐ check monitor SQL Server's virtual bytes from perfmon or use vasummary view to observe this behavior 
                • ⇐ normally can't get all the needed memory in one region
                  • ⇒ several large regions reserved ← by design behavior 
            • commits pages on demand
          • during startup the memory manager configures BP to be SQLOS's single page allocator
            • from that point on all dynamic single page allocations are provided by BP[1]
          • has its own memory clerk
            • is leveraged to respond to external and VAS memory pressure only [47]
          • leverages only Virtual and AWE SQLOS's interfaces
            • never uses any type of page allocator from SQLOS [1]
          • {operation} dropping buffers 
            • likely results in some degree of performance degradation
              • ⇐ any subsequent query executions will have to reread the data from the database files increasing I/O
            • via: DBCC DROPCLEANBUFFERS
          • {concept} page
            • the fundamental unit of data storage in SQL Server
          • {concept} buffer
            • page in memory that's the same size as a data or index page [2]
              • ⇐ page frame that can hold one page from a database [2]
          • {concept} buffer pool scan
            • common internal operation that iterates through the entire buffer descriptor array to find any buffers that belong to a specific database [3]
            • [<SQL Server 2022] serial operation
              • on large memory machines operations that require scanning the buffer pool can be slow [3]
                • ⇐ the larger the machine, the greater the impact 
                • ⇐ it doesn’t necessarily matter about the size of the operation [3]
            • [SQL Server 2022] {feature}Buffer Pool Parallel Scan
              • parallelized by utilizing multiple cores
                • adds processing power to scan the buffer pool more efficiently [3]
                  • ⇐ benefits both small and large database operations on larger memory machines
                    • 10-30x improvement in executions
                    • customers running mission-critical OLTP, hosted service providers, and data warehouse environments will witness the most improvements in overall processing speed [3]
                  • uses one task per 8 million buffers (64 GB) 
                    • ⇐ a serial scan will still be used if there are less than 8 million buffers [3]
                • uses buffer pool scan diagnostics to improve supportability and insights with new buffer pool scan events [3]
              • can significantly improve the performance of database workloads
                • operations benefiting from it
                  • database startup/shutdown
                  • creating a new database
                  • file drop operations
                  • backup/restore operations
                  • Always On failover events
                  • DBCC CHECKDB and DBCC Check Table
                  • log restore operations
                  • other internal operations (e.g., checkpoint) 
          References:
          [1] Slava Oaks (2005) SQLOS's memory manager and SQL Server's Buffer Pool (link) [something similar seems to be available here]
          [2] Kalen Delaney (2006) Inside Microsoft® SQL Server™ 2005: The Storage Engine
          [3] Microsoft SQL Server Blog (2022) Improve scalability with Buffer Pool Parallel Scan in SQL Server 2022, by David Pless (link)
          [4] Microsoft learn (2022) Operations that trigger a buffer pool scan may run slowly on large-memory computers (link)
          [5] Microsoft Docs (2023) DBCC TRACEON - Trace Flags (link)
          [6] Dmitri Korotkevitch (2015) Expert SQL Server: In-Memory OLTP
          [7] Slava Oaks (2005) SQLOS's memory manager: responding to memory pressure (link)

          30 December 2023

          💫ERP Systems: Microsoft Dynamics 365's Invoice Capture (Features)

          Disclaimer: This is work in progress intended to consolidate information from the various sources (what I considered as important during my exploration of the documentation and further communication) and not to provide an overview of all the features. Please refer to the documentation for a complete overview!

          In what concerns the releases see [2].
          Last updated: 23-May-2024

          Invoice Capture - Main Components
          Invoice Capture - Main Components [3]

          AI Model

          • {feature} prebuilt model (aka Invoice processing model) [by design]
            • can handle the most common invoices in various languages
            • owned by Microsoft and cannot be trained by the customers
          • {feature} custom prebuilt models [planned]
            • built on top of the prebuilt model to handle more complex invoice layouts
            • only requires to train the exceptional invoices
            • after a model is published, additional mapping is required to map the model fields to the invoice files
          • {feature} custom model [by design]
            • requires training the model from scratch 
              • costs time and effort
            • supports the Charges and Sales Tax Amount fields [1.5.0.2]
            • support lookup lists on custom fields [1.6.0.x]

          Channels

          • flows that collect the invoices into one location 
            • there's always a 1:1 relationship between flows and channels
          • multiple channels can be defined using different triggers based on Power Automated connectors
          • {feature}default channel for Upload files [1.0.1.0]
          • {feature} supports multiple sources via flow templates [by design]
            • Outlook.com
            • Microsoft Outlook 365
            • Microsoft Outlook 365 shared mailbox [1.1.0.10]
              • to achieve similar behavior one had to modify the first steps of the generated flow with the "When a new email arrives in a shared mailbox (V2)" trigger
            • SharePoint
            • OneDrive
            • OneDrive for business [1.0.1.0]
          • {feature} assign legal entity on the channel
            • the LE value is automatically assigned without applying additional derivation logic [1]
          • invoices sent via predefined channels are captured and appear on the 'Received files' page

          Configuration groups 

          • allow managing the list of invoice fields and the manual review settings 
          • can be assigned for each LE or Vendors. 
            • all the legal entities in the same configuration group use the same invoice fields and manual review setting
          • default configuration group 
            • created after deployment, can't be changed or deleted

          Fields

          • {feature} standard fields 
            • Legal entity 
              • organizations registered with legal authorities in D365 F&O and selected for Invoice capture
              • {feature} allow to enforce role-based security model
              • {feature} synchronized from D365 F&O to CRM 
              • {missing feature} split the invoices between multiple LEs
            • Vendor master 
              • individuals or organizations that supply goods or services
              • used to automatically derive the Vendor account
              • {feature} synchronized from D365 F&O to CRM
                • synchronization issues solved [1.6.0.x]
              • {feature}Vendor account can be derived from tax number [1.0.1.0]
            • Invoice header
              • {feature} Currency code can be derived from the currency symbol [1.0.1.0]
            • Invoice lines
            • Item master
              • {feature} Item number can be derived from External item number [1.0.1.0]
              • {feature} default the item description for procurement category item by the value from original document [1.1.0.32/10.0.39]
            • Expense types (aka Procurement categories) 
            • Charges
              • amounts added to the lines or header
              • {feature} header-level charges [1.1.0.10]
              • {missing feature}line-level charges 
            • {feature}Financial dimensions 
              • header level [1.1.0.32/10.0.39]
              • line level [1.3.0.x]
            • Purchase orders
              • {feature} PO formatting based on the number sequence settings
              • {missing feature}PO details
              • {missing feature} multiple POs, multiple receipts 
            • {missing feature} Project information integration
              • {workaround} the Project Id can be added as custom field on Invoice line for Cost invoices and with this the field should be mapped with the corresponding Data entity for transferring the value for D365 F&O
          • {feature} custom fields [1.1.0.32/10.0.39]

          File filter

          • applies additional filtering to incoming files at the application level
          • with the installation a default updatable global file filter is provided
          • can be applied at different channel levels
          • {event} an invoice document is received
            • the channel is checked first for a file filter
            • if no file filter is assigned to the channel level, the file filter at the system level is used [1]
          • {configuration} maximum files size; 20 MB
          • {configuration} supported files types: PDF, PNG, JPG, JPEG, TIF, TIFF
          • {configuration} supported file names 
            • filter out files that aren't relevant to invoices [1]
            • rules can be applied to accept/exclude files whose name contains predefined strings [1]
          Actions 

          • Import Invoice
            • {feature} Channel for file upload
              • a default channel is provided for directly uploading the invoice files
              • maximum 20 files can be uploaded simultaneously
          • Capture Invoice
            • {feature} Invoice capture processing
              • different derivation rules are applied to ensure that the invoices are complete and correct [1]
            • {missing feature} differentiate between relevant and non-relevant content}
            • {missing feature} merging/splitting files
              • {workaround} export the file(s), merge/split them, and import the result
          • Void files [1.0.1.0]
            • once the files voided, it is allowed to be deleted from Dataverse
              • saves storage costs
          • Classify Invoice
            • {feature} search for LEs
            • {feature} assign LE to Channel
              • AP clerks view only the invoices under the LE which are assigned to them
            • {feature} search for Vendor accounts by name or address 
          • Maintaining Headers
            • {feature} multiple sales taxes [1.1.0.26]
            • {missing feature} rounding-off [only in D365 F&O]
            • {feature} support Credit Notes [1.5.0.2]
          • Maintaining Lines
            • {feature} Add/remove lines
            • {feature} "Remove all" option for deleting all the invoice lines in Side-by-Side Viewer [1.1.0.26]
              • Previously it was possible to delete the lines one by one, which by incorrectly formatted big invoices would lead to considerable effort. Imagine the invoices from Microsoft or other cloud providers that contain 5-10 pages of lines. 
            • {feature}Aggregate multiple lines [requested]
              • Now all lines from an invoice have the same importance. Especially by big invoices, it would be useful to aggregate the amounts from multiple lines under one. 
            • {feature} Show the total amount across the lines [requested]
              • When removing/adding lines, it would be useful to compare the total amount across the lines with the one from the header. 
            • Check the UoM consistency between invoice line and linked purchase order line
              • For the Invoice to be correctly processed, the two values must match. 
            • Support for discounts [requested]
              • as workaround, discounts can be entered as separate lines
          • Transfer invoice

          • Automation
            • {parameter} Auto invoice cleanup
              •  automatically cleans up the transferred invoices and voided invoices older than 180 days every day [1]
            • Use continuous learning 
              • select this option to turn on the continuous learning feature
              • learns from the corrections made by the AP clerk on a previous instance of the same invoice [1]
                • records the mapping relationship between the invoice context and the derived entities [1]
                • the entities are automatically derived for the next time a similar invoice is captured [1]
              • {missing feature} standard way to copy the continuous learning from UAT to PROD
              • {feature} migration tool for continuous learning knowledge
                • allows us to transfer the learning knowledge from one environment to another [1.6.0.x]
            • Confidence score check
              • for the prebuilt model the confidence score is always the same 
                • its value is returned by the AI Builder service
                • confidence score can only be improved only when the customer prebuilt model is used 
                •  it can be increased by uploading more samples and do the tagging accordingly
                • a low confidence score is caused by the fact that not enough samples with the same pattern have been trained
              • {parameter} control the confidence score check [1.1.0.32/10.0.39]
          • Manage file filters
          User Interface
          • Side-by-side view [by design]

          • History logs [1.0.1.0]
            • supported in Received files and Captured invoices
            • help AP clerks know the actions and results in each step during invoice processing 
          • Navigation to D365 F&O [1.0.1.0]
            • once the invoice is successfully transferred to D365 F&O, a quick link is provided for the AP clerk to open the Pending vendor invoice list in F&O
          • Reporting
            • {missing feature} consolidated overview across multiple environments (e.g. for licensing needs evaluation)
            • {missing features} metrics by Legal entity, Processing status, Vendor and/or Invoice type
              • {workaround} a paginated report can be built based on Dataverse

          Data Validation

          • [Invoice Capture] derivation rules 
            • applied to ensure that the invoices are complete and correct [1]
            • {missing feature} derive vendor using the associated email address
            • {parameter} Format purchase order 
              • used to check the number sequence settings in D365 F&O to format the PO number [1]
            • {parameter} Derive currency code for cost invoice
              • used to derive the value from invoice master data in D365 F&O [1]
              • <-- the currency code on PO Invoices must be identical to the one on PO
            • {parameter} Validate total sales tax amount 
              • validates the consistency between the sales tax amount on the Sales tax card and the total sales tax amount, when there's a sales tax line [1]
            • {parameter} Validate total amount 
              • confirm alignment between the calculated total invoice amount and the captured total amount [1]
          • [D365 F&O] before workflow submissions [requested]
            • it makes sense to have out-of-box rules 
            • currently this can be done by implementing extensions
          • [D365 F&O] during workflow execution [by design]
          Dynamics 365 for Finance [aka D365 F&O]
          • Attachments 
            • {parameter} control document type for persisting the invoice attachment in D365 F&O [1.1.0.32/10.0.39]
          • Invoice Capture
            • {parameter} select the entities in scope
            • {parameter}differentiate by Invoice type whether Vendor invoice or Invoice journal is used to book the invoices
            • {parameter} Transfer attachment
          • Fixed Assets
            • {missing feature} create Fixed asset automatically during the time the invoice is imported
          • Vendor Invoice Journal 
            • {missing feature} configure what journal the invoice is sent to
              • the system seems to pick the first journal available
              • {parameter} control journal name for creating 'Invoice journal'  [1.1.0.32/10.0.39]
            • {missing feature}grouping multiple invoices together in a journal (e.g., vendor group, payment terms, payment of method)
          •  Approval Workflow
            • {missing feature} involve Responsible person for the Vendor [requested]
            • {missing feature} involve Buyer for Cost invoices [requested]
            • {missing feature} differentiator between the various types of invoices [requested]
            • {missing feature} amount-based approval [requested]
          • Billing schedule
            • {feature} integration with Billing schedules
            • {feature} modify or cancel Billing schedules
          • Reporting
            • {missing feature} Vendor invoices hanging in the approval workflow (incl. responsible person for current action, respectively error message)
            • {missing feature} Report for GL reconciliation between Vendor invoice and GL via Billing schedules
            • {missing feature} Overview of the financial dimensions used (to identify whether further setup is needed)

          Previous post <<||>> Next post

          Resources:
          [1] Microsoft Learn (2023) Invoice capture overview (link)
          [2] Yammer (2023) Invoice Capture for Dynamics 365 Finance (link)
          [3] Microsoft (2023) Invoice Capture for Dynamics 365 Finance - Implementation Guide

          Acronyms:
          AI - Artificial Intelligence 
          AP - Accounts Payable
          F&O - Finance & Operations
          LE - Legal Entity
          PO - Purchase Order
          UoM- Unit of Measure

          25 December 2023

          💫ERP Systems: Microsoft Dynamics 365's Vendor Invoices Processing (Setup Areas)

          ERP Systems

          Invoices can be set up to be imported automatically in Dynamics 365 for Finance and Operation (D365 F&O) via Invoice Capture (see process). Besides the Invoice Capture-related set up, there are several areas related to Vendor invoice's processing:

          General Ledger

          Accounts for automatic transactions need to be defined at least for the 'Vendor invoice rounding off' and 'Rounding variance', respectively for the 'Exchange rate gain' and 'Exchange rate loss', if this wasn't done already.

          To facilitate accounts' reconciliation and reporting, one can enable a Batch transfer rule for the Source document type 'Vendor invoice' and thus one journal is used for each Vendor invoice. This also makes sure that the Description from Invoice will be taken over in the Journal, which facilitates accounts' reconciliation.

          Subscription Billing

          In case one needs to defer the amounts over a time interval (e.g. x months), as it's the case for prepayments, the integration with Subscription Billing on the Vendor side in F&O seems to work with minimal configuration. A Billing schedule is created for each Invoice distribution and can be modified or cancelled after the Invoice is posted, if needed.

          If your organization uses Subscription Billing also for Accounts Receivables (AR), one might need to compromise on the setup because the same parameters are used for both modules. 

          In what concerns accounts' reconciliation, there seems to be a 1:1 mapping between the Schedule line and the General Ledger (GL) posting, a table with the mapping between the records being available. One can build thus a Paginated report to display the mapping between AP and GL, if the infrastructure is in place (e.g. by building a data lakehouse/warehouse based on F&O, see post). 

          Fixed Assets

          One can enable the creation of Fixed assets by checking the "Create asset during product receipt or invoice posting" radio button in Fixed assets parameters.

          The feature "create the fixed asset automatically during the time of invoice import" for PO-based Invoices (with "Create Fixed Asset" flag set at line level) doesn't seem to be supported yet.

          Vendor Invoice Journals

          Before Invoice Capture, Vendor invoice journals were helpful for posting summarized cost invoices that are not associated with POs (e.g. expenses for supplies or services). One can still use this approach, however if the line-based details are important, then it makes sense to use Pending vendor invoice with Service items or Procurement categories.  

          A single Vendor invoice is created as one Vendor invoice journal. 

          Organization Administration

          When using workflows, setting the same language for all LEs can reduce the amount of redundant information maintained in the workflow(s), otherwise the texts need to be provided for each language. 

          Default descriptions can be enabled for Purchase orders' invoice ledger and vendor to carry the same description entered Pending Vendor Invoices in Invoices and Journals. The functionality works also for Cost invoices.

          Ideally, Invoice's description should have been maintained in Invoice capture and/or Microsoft should have provided a default description also for it. 

          System

          Unless there's a requirement to post manually the journals from subledger to GL, a batch transfer for subledger journals must be created for each LE.

          One can enable email notifications for the users participating in workflows and use workflow delegations for the intervals the respective users are on leave. 

          All users participating in the workflow must be available also as active employees. It makes sense to do latest when the integration goes Live. Moreover, the Users need to have the appropriate permissions for the roles they have in the process.

          Vendor Invoice Automation

          There's further functionality available under the 'Vendor Invoice Automation' label (see [1], [2]), though the following are the most important ones: 

          • Automatically apply prepayments to vendor invoices
          • Automatically submit imported invoices to the workflow system.
          • Match product receipts to pending vendor invoice lines.

          Recurring Vendor Invoice Templates

          Microsoft introduced with 10.0.38 PU a new feature called 'Vendor invoice templates', which allows creating recurring vendor invoices without the need to enter all the vendor invoice information for each separate invoice (see [3]). There seems to be no information available whether any integration between this feature and Invoice Capture will be supported.

          E-Invoicing

          EU countries need to enforce the Directive 2014/55/EU for Vendor and Customer invoices. The directive requires that the electronic exchange of invoice documents between suppliers and buyers to occur over government-held third-party solutions. Each country has its own system(s) and regulations with different scope and timelines. Some countries have already requirements for 2024, respectively 2025 and there are similar projects in US and other countries. 

          Even if Microsoft started this year (2023) to provide country-specific integrations for e-Invoicing, for the moment there seem no information available on how e-Invoicing will integrate with Invoice Capture.

          Previous post <<||>> Next post

          Resources:
          [1] Microsoft Learn (2023) Vendor Invoice Automation (link)
          [2]Microsoft Dynamics 365 (2023) The Future of Finance: Unlocking the Benefits of Accounts Payable Automation (link)
          [3] Hylke Britstra (2023) Recurring vendor invoice templates (link)
          [4] Dan Edwards (2023) AP Automation at DynamicsCon (link)

          Related Posts Plugin for WordPress, Blogger...

          About Me

          My photo
          Koeln, NRW, Germany
          IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.