SQL Troubles: SQL Server 2022

Showing posts with label SQL Server 2022. Show all posts

31 December 2023

💠🛠️🗒️SQL Server: Buffer Pool [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources (some unfortunately not available anymore).

Buffer Pool [aka BP, BPool]

main memory component in SQL Server [2]

⇐ preferred memory allocator for the whole server [1]
⇐ the biggest consumer of SQL Server

area of memory that SQL Server used to store

cached data (aka data pages)
cached execution plans
algebrizer trees for views, constraints and defaults
lock memory
⇐ anything with a non-zero value in sys.dm_os_memory_clerks.single_pages_kb

⇐ caching data in the buffer pool reduces the load on the I/O subsystem and improves performance [6]
⇐ all pages must be copied into the buffer pool before they can be used in a query

⇒ it is needed to scan the buffer pool for a number of operations [3]

single-page allocator

it commits and decommits memory blocks of 8KB granularity only

manages disk I/O functions for bringing data and index pages into the data cache so data can be shared among users [2]
all memory not used by another memory component remains in the buffer pool to be used as a data cache for pages read in from the database files on disk [2]

when other components require memory, they can request a buffer from the buffer pool [2]

most of the buffers taken from the buffer pool for other memory components go to other kinds of memory caches [2]

⇐ they can only get blocks of 8KB in size [1]

these blocks are not continues in memory[1]
allocation for large buffers will be satisfied by memory node's multi-page allocator or by virtual allocator [1]

⇒ memory will be allocated outside of Buffer Pool [1]

BP can be used as underneath memory manager for SQL Server components as long as they allocate buffers of 8KB [1]

pages allocated from BP are referred as stolen[1]

first decides how much of VAS it needs to reserve for its usage (aka target memory)

decision based on internal memory requirements and external memory state
it calculates its target amount of memory it thinks it should commit before it can get into memory pressure [1]

⇐ to keep the system out of paging the target memory is constantly recalculated [1]
{restriction} the target memory can't exceed max memory that represents max server memory settings [1]
even if the min server memory equals the max server memory BP commits its memory on demand [1]

⇐ monitor the corresponding profiler event to observe this behavior [1]

reserves all of it right a way

⇐ check monitor SQL Server's virtual bytes from perfmon or use vasummary view to observe this behavior
⇐ normally can't get all the needed memory in one region

⇒ several large regions reserved ← by design behavior

commits pages on demand

during startup the memory manager configures BP to be SQLOS's single page allocator

from that point on all dynamic single page allocations are provided by BP[1]

has its own memory clerk

is leveraged to respond to external and VAS memory pressure only [47]

leverages only Virtual and AWE SQLOS's interfaces

never uses any type of page allocator from SQLOS [1]

{operation} dropping buffers

likely results in some degree of performance degradation

⇐ any subsequent query executions will have to reread the data from the database files increasing I/O

via: DBCC DROPCLEANBUFFERS

{concept} page

the fundamental unit of data storage in SQL Server

{concept} buffer

page in memory that's the same size as a data or index page [2]

⇐ page frame that can hold one page from a database [2]

{concept} buffer pool scan

common internal operation that iterates through the entire buffer descriptor array to find any buffers that belong to a specific database [3]
[<SQL Server 2022] serial operation

on large memory machines operations that require scanning the buffer pool can be slow [3]

⇐ the larger the machine, the greater the impact
⇐ it doesn’t necessarily matter about the size of the operation [3]

[SQL Server 2022] {feature}Buffer Pool Parallel Scan

parallelized by utilizing multiple cores

adds processing power to scan the buffer pool more efficiently [3]

⇐ benefits both small and large database operations on larger memory machines

10-30x improvement in executions
customers running mission-critical OLTP, hosted service providers, and data warehouse environments will witness the most improvements in overall processing speed [3]

uses one task per 8 million buffers (64 GB)

⇐ a serial scan will still be used if there are less than 8 million buffers [3]

uses buffer pool scan diagnostics to improve supportability and insights with new buffer pool scan events [3]

can significantly improve the performance of database workloads

operations benefiting from it

database startup/shutdown
creating a new database
file drop operations
backup/restore operations
Always On failover events
DBCC CHECKDB and DBCC Check Table
log restore operations
other internal operations (e.g., checkpoint)

Previous Post <<||>> Next Post

References:
[1] Slava Oaks (2005) SQLOS's memory manager and SQL Server's Buffer Pool (link) [something similar seems to be available here]
[2] Kalen Delaney (2006) Inside Microsoft® SQL Server™ 2005: The Storage Engine
[3] Microsoft SQL Server Blog (2022) Improve scalability with Buffer Pool Parallel Scan in SQL Server 2022, by David Pless (link)
[4] Microsoft learn (2022) Operations that trigger a buffer pool scan may run slowly on large-memory computers (link)

[5] Microsoft Docs (2023) DBCC TRACEON - Trace Flags (link)

[6] Dmitri Korotkevitch (2015) Expert SQL Server: In-Memory OLTP

[7] Slava Oaks (2005) SQLOS's memory manager: responding to memory pressure (link)

20 October 2023

💎🏭SQL Reloaded: Extended LTrim/RTrim in SQL Server 2022 (Before and After)

In SQL Server 2022, the behavior of LTrim (left trimming) and RTrim (right trimming) functions was extended with one more string parameter. When provided, the engine checks whether the first parameter starts (for LTrim), respectively ends (for RTrim) with the respective value and removes it, the same as the space character char(32) was removed previously:

-- prior behavior of LTrim/RTrim
DECLARE @text as nvarchar(50) = '  123  '
SELECT '(' + LTrim(@text) + ')' LeftTrimming
, '(' + RTrim(@text) + ')' RightTrimming
, '(' + Ltrim(RTrim(@text)) + ')' Trimming1 -- prior SQL Server 2017
, '(' + Trim(@text) + ')' Trimming2 -- starting with SQL 2017

LeftTrimming	RightTrimming	Trimming1	Trimming2
(123 )	( 123)	(123)	(123)

Here's the new behavior:

-- extended behavior of LTrim/LTrim (SQL Server 2022+)
DECLARE @text as nvarchar(50) = '123abc123abc'
SELECT LTrim(@text , '123') LeftTrimming
, RTrim(@text , 'abc') RightTrimming;

LeftTrimming	RightTrimming
abc123abc	123abc123

Previously, to obtain the same result one could write something like:

-- prior solution via Left/Right for the same (SQL Server 2000+)
DECLARE @text as nvarchar(50) = '123abc123abc'
SELECT CASE WHEN Left(@text, 3) = '123' THEN Right(@text,Len(@text)-3) ELSE @text END LeftTrimming
, CASE WHEN Right(@text, 3) = 'abc' THEN Left(@text,Len(@text)-3) ELSE @text END  RightTrimming

-- prior solution via "LIKE" for the same (SQL Server 2000+)
DECLARE @text as nvarchar(50) = '123abc123abc'
SELECT CASE WHEN @text LIKE '123%' THEN Right(@text,Len(@text)-3) ELSE @text END LeftTrimming
, CASE WHEN @text LIKE '%abc' THEN Left(@text,Len(@text)-3) ELSE @text END  RightTrimming

As can be seen, the syntax is considerable simplified. However, there are few the situations when is needed. In the past I had to write code to remove parenthesis, quotes or similar characters:

-- removing parantheses
DECLARE @text as nvarchar(50) = '(testing)'
SELECT LTrim(@text , '(') LeftTrimming
, RTrim(@text , ')') RightTrimming
, RTrim(LTrim(Trim(@text), '('), ')') Trimming 

-- removing double quotes
DECLARE @text as nvarchar(50) = '"testing"'
SELECT LTrim(@text , '"') LeftTrimming
, RTrim(@text , '"') RightTrimming
, RTrim(LTrim(Trim(@text), '"'), '"') Trimming

The Trim for the 3rd value in both queries was used to remove the eventual spaces before the character to be replaced:

-- removing paranteses with lead/end spaces
SELECT RTrim(LTrim(Trim('   (testing)   '), '('), ')');

Then I thought, maybe I could use the same to remove the tags from an XML element. I tried the following code and unfortunately it doesn't seem to work:

-- attempting to remove the start/end tags from xml elements
DECLARE @text as nvarchar(50) = '<string>testing</string>'
SELECT LTrim(@text , '<string>') LeftTrimming
, RTrim(@text , '</string>') RightTrimming
, RTrim(LTrim(Trim(@text), '<string>'), '</string>') Trimming

LeftTrimming	RightTrimming	Trimming
esting</string>	<string>te	e

That's quite an unpleasant surprise! In exchange, the value type can be defined as XML and use the following code to obtain the needed result:

-- extracting the value from a tag element
DECLARE @text XML = '<string>testing</string>'
SELECT @text.query('data(/string)') as value

Notes:

The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

25 January 2023

💎SQL Reloaded: Documenting External Tables

When using serverless SQL pool, CETAS (aka Create External Table as Select) are the main mechanism of making data available from the Data Lake for queries. In case is needed to document them, sys.external_tables DMV can be used to export their metadata, much like sys.tables or sys.views can be used for the same:

-- CETAS metadata
SELECT TOP (50) ext.object_id
, ext.name
, schema_name(ext.schema_id) [schema_name]
, ext.type_desc
, ext.location
, ext.data_source_id
, ext.file_format_id
, ext.max_column_id_used
, ext.uses_ansi_nulls
, ext.create_date
, ext.modify_date
FROM sys.external_tables ext

It's interesting that CETAS have type_desc = 'USER_TABLE' in sys.all_objects, same like user-defined tables in SQL Server have:

-- CETAS' metadata via sys.all_objects
SELECT *
FROM sys.all_objects
WHERE object_id = object_id('<schema_name>.<CETAS name>')

The data source and file format can be retrieved via the sys.external_data_sources and sys.external_file_formats DMVs. Moreover, it's useful to include the logic into a view, like the one below:

-- drop view
--DROP VIEW IF EXISTS dbo.vAdminExternalTables

-- create view
CREATE VIEW dbo.vAdminExternalTables
AS
-- external tables - metadata 
SELECT ext.object_id
, sch.name + '.' + ext.name [unique_identifier]
, sch.name [schema]
, ext.name [object]
, ext.type_desc [type]
, ext.max_column_id_used 
, ext.location
, eds.name data_source 
, eff.name file_format 
, ext.create_date 
, ext.modify_date
FROM sys.external_tables ext
     JOIN sys.schemas sch
       ON ext.schema_id = sch.schema_id 
     JOIN sys.external_data_sources eds 
       ON ext.data_source_id = eds.data_source_id
     JOIN sys.external_file_formats eff
       ON ext.file_format_id = eff.file_format_id

-- testing the view
SELECT top 10 *
FROM dbo.vAdminExternalTables

The view can be used then for further queries, for example checking the CETAS created or modified starting with a given date:

-- external tables created after a certain date
SELECT *
FROM dbo.vAdminExternalTables ext
WHERE ext.create_date >= '20230101'
  OR ext.modify_date >= '20230101';

Or, when the CETAS are deployed from one environment to another, one can compare the datasets returned by the same view between environments, something like in the below query:

-- comparison external tables metadata between two databases
SELECT *
FROM (
    SELECT *
    FROM <test_database>.dbo.vAdminExternalTables
    WHERE [Schema] = '<schema_name>'
	) PRD
	FULL OUTER JOIN (
    SELECT *
    FROM <prod_database>.dbo.vAdminExternalTables
    WHERE [Schema] = '<schema_name>'
	) UAT
	ON PRD.[unique_identifier] = UAT.[unique_identifier]
-- WHERE PRD.[unique_identifier] IS NULL OR UAT.[unique_identifier] IS NULL

The definitions for multiple CETAS can be exported from the source database in one step via the Object Explorer Details >> Tables >> External tables >> (select CETAS) >> Script Table as >> ... .

Happy coding!

Previous Post <<||>> Next Post

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

Identifying the SQL Server DMVs which are accessible for the Serverless SQL pool (see previous post), allowed me to identify besides sys.dm_exec_requests_history three more DMVs with statistics on the statements executed on the server: sys.dm_request_phases, sys.dm_request_phases_task_group_stats and sys.dm_request_phases_exec_task_stats. Untofurtunately, there seems to be no documentation available on these DMVs, and, at the time the post was written, there were also no further hits on google.com or bing.com found on the same.

sys.dm_request_phases

sys.dm_request_phases provides insights in the phases an execution statement goes through, and seems to summarize the other two views:

-- Azure Serverless SQL pool: request phases
SELECT TOP (100) dist_statement_id
, RPH.dist_request_id
, TRY_CAST(RPH.id as bigint) id
, TRY_CAST(RPH.parent_ids as bigint) parent_ids
, RPH.start_time
, RPH.end_time
--, RPH.total_elapsed_time_ms
--, RPH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, RPH.min_time_ms
--, RPH.min_time_ms/1000.0 min_time_sec
--, RPH.max_time_ms
--, RPH.max_time_ms/1000.0 max_time_sec
--, RPH.avg_time_ms
, RPH.avg_time_ms/1000.0 avg_time_sec
--, RPH.stdev_time_ms
--, RPH.stdev_time_ms/1000.0 stdev_time_sec -- it has no values
--, RPH.min_rows
--, RPH.max_rows
--, RPH.avg_rows
--, RPH.stdev_rows -- it has no values
, RPH.total_rows
--, RPH.total_bytes_processed
, RPH.total_bytes_processed/1028.0 total_kb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
FROM sys.dm_request_phases RPH
ORDER BY Id

dist_statement_id	dist_request_id	id	parent_ids	start_time	end_time	avg_time_sec	total_rows	total_kb_processed	state_desc	operation_type	input_dop	output_dop	task_retries	error_id
8C4386DC...	820E9FC6...	1	2	...09:58:34.213	...09:58:36.337	1.031	2030	2343.310311	succeeded	Shuffle	1	1	0	0
8C4386DC...	820E9FC6...	2	0	...09:58:36.447	...09:58:39.713	1.891	9	7145.193579	succeeded	Return	1	1	0	0
C9524971...	680DCB55...	3	4	...10:05:46.747	...10:05:47.057	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
C9524971...	680DCB55...	4	0	...10:05:47.057	...10:05:48.480	1.406	0	6630.101167	succeeded	Return	1	1	0	0
FD2D17AD...	C9453EF2...	5	6	...11:58:54.060	...11:58:55.297	0.547	10	1534.098249	succeeded	ComputeToControlNode	1	1	0	0
FD2D17AD...	C9453EF2...	6	0	...11:58:55.297	...11:58:55.420	0.125	10	4.074902	succeeded	Return	1	1	0	0
9FB0A268...	CAA533DE...	7	8	...11:59:16.483	...11:59:16.700	0.203	2030	2343.310311	succeeded	Shuffle	1	1	0	0
9FB0A268...	CAA533DE...	8	0	...11:59:16.700	...11:59:18.640	1.922	6	7143.673151	succeeded	Return	1	1	0	0
1732AB0D...	AC1A4F10...	9	10	...11:59:25.950	...11:59:26.140	0.172	2030	2343.310311	succeeded	Shuffle	1	1	0	0
1732AB0D...	AC1A4F10...	10	0	...11:59:26.140	...11:59:27.450	1.297	9	6635.185797	succeeded	Return	1	1	0	0

Notes:
1) The foreign keys and dates (in the above and below queries) were truncated to accomodate all the important attributes in the snapshot of the values returned.
2) Based on the exisitng queries, there are two records for each executed statement, a Shuffle or ComputeToControlNode followed by a Return (see operation_type). In more complex scenario there are several Shuffles and Broadcasts and a Return. According to the Microsoft team, even if for serverless SQL pools there's no Data Movement Service (DMS), there's a similar algorithm responsible for moving the data between the nodes.
3) Because in serverless SQL pool each query has its own distribution statement id, the min, max, avg and total values will have the sames values across the columns. Therefore, the columns with redundant values were commented.
4) The Id of the request phase seems to have numeric values despite being defined as alphanumeric. I tried to cast the values to bigint for sorting purposes.

sys.dm_request_phases_task_group_stats

sys.dm_request_phases_task_group_stats stores metadata about the requests breakdown at task group:

-- Azure Serverless SQL pool: request phases breakdown at task group
SELECT TOP (100) RPT.dist_request_id
, TRY_CAST(RPT.id as bigint) id
, TRY_CAST(RPT.parent_ids as bigint) parent_ids
, RPT.dist_statement_id
, RPT.state_desc
, RPT.start_time
, RPT.end_time
, RPT.input_dop
, RPT.output_dop
, RPT.operation_type
, RPT.task_retries
FROM sys.dm_request_phases_task_group_stats RPT
ORDER BY id

dist_request_id	id	parent_ids	dist_statement_id	state_desc	start_time	end_time	input_dop	output_dop	operation_type	task_retries
820E9FC6...	1	2	8C4386DC...	succeeded	638098055142132551	638098055163382693	1	1	Shuffle	0
820E9FC6...	2	0	8C4386DC...	succeeded	638098055164476163	638098055197133001	1	1	Return	0
680DCB55...	3	4	C9524971...	succeeded	638098059467450021	638098059470574953	1	1	Shuffle	0
680DCB55...	4	0	C9524971...	succeeded	638098059470574953	638098059484793682	1	1	Return	0
C9453EF2...	5	6	FD2D17AD...	succeeded	638098127340607112	638098127352951067	1	1	ComputeToControlNode	0
C9453EF2...	6	0	FD2D17AD...	succeeded	638098127352951067	638098127354202970	1	1	Return	0
CAA533DE...	7	8	9FB0A268...	succeeded	638098127564826084	638098127567013504	1	1	Shuffle	0
CAA533DE...	8	0	9FB0A268...	succeeded	638098127567013504	638098127586388549	1	1	Return	0
AC1A4F10...	9	10	1732AB0D...	succeeded	638098127659513620	638098127661388514	1	1	Shuffle	0
AC1A4F10...	10	0	1732AB0D...	succeeded	638098127661388514	638098127674513601	1	1	Return	0

Notes:
1) The DVM seems to return the same number of records as sys.dm_request_phases.
2) Observe the format of the start_time and end_time, probably the timestamps come from the Spark cluster and were not translated into an SQL Server data type.

sys.dm_request_phases_exec_task_stats

sys.dm_request_phases_exec_task_stats stores metadata about the requests breakdown at task level:

-- Azure Serverless SQL pool: request phases breakdown at task
SELECT TOP (100) RPE.dist_request_id
, TRY_CAST(RPE.id as bigint) id
--, RPE.min_time_ms
--, RPE.max_time_ms
, RPE.avg_time_ms/1000.0 avg_time_sec
--, RPE.stdev_time_ms
, RPE.total_bytes_processed
--, RPE.min_rows
--, RPE.max_rows
--, RPE.avg_rows
--, RPE.stdev_rows
, RPE.total_rows
, RPE.error_id
FROM sys.dm_request_phases_exec_task_stats RPE
ORDER BY id

dist_request_id	id	avg_time_sec	total_kb_processed	total_rows	error_id
820E9FC6...	1	1.031	2343.310311	2030	0
820E9FC6...	2	1.891	7145.193579	9	0
680DCB55...	3	0.203	2343.310311	2030	0
680DCB55...	4	1.406	6630.101167	0	0
C9453EF2...	5	0.547	1534.098249	10	0
C9453EF2...	6	0.125	4.074902	10	0
CAA533DE...	7	0.203	2343.310311	2030	0
CAA533DE...	8	1.922	7143.673151	6	0
AC1A4F10...	9	0.172	2343.310311	2030	0
AC1A4F10...	10	1.297	6635.185797	9	0

What does all this mean?

The lack of documentation makes it challenging to interpret the values of the views besides the data and metadata they offer. In a paper on POLARIS, the code given to the serveless SQL pool engine, a taks is defined as "a careful packaging of data and query processing into units [...] that can be readily moved across compute nodes and re-started at the task level" [1]. Therefore, one can assume that this is the level targetted by the sys.dm_request_phases_exec_task_stats DMV. Further on, the tasks are grouped at phase level according to the sys.dm_request_phases_task_group_stats, the metadata from the two DMVs being further combined into sys.dm_request_phases DMV.

If the meaning is kept from dedicated SQL pools, a shuffle operation indicates that data is moved between the frontend and backend nodes to satisfy a request, while a Result represents the operation of returning the result selt to client. The "ComputeToControlNode" operation involves a simple select (e.g. SELECT top 10) from a CETA and therefore no "Shuffle" is needed.

Requests' history

Further on, one can use the "Distributed statement id" to join the execution request phases with the request history, however matches will be found only for a small subset of the records (probably the executions since the pool started):

-- Azure Serverless SQL pool: requests history with request phase info
SELECT top 100 ERH.status
, ERH.transaction_Id
, ERH.distributed_statement_Id 
, ERH.query_hash 
--, ERH.login_name 
, ERH.start_time
, ERH.end_time 
, ERH.command 
, ERH.query_text 
--, ERH.total_elapsed_time_ms
, ERH.total_elapsed_time_ms/1000.0 total_elapsed_time_sec
--, ERH.data_processed_mb
, ERH.data_processed_mb
, RPH.avg_time_ms/1000.0 avg_time_sec
, RPH.total_rows
, RPH.total_bytes_processed/1028.0/1028.0 total_mb_processed
, RPH.state_desc
, RPH.operation_type
, RPH.input_dop
, RPH.output_dop
, RPH.task_retries
, RPH.error_id
, ERH.error
, ERH.error_code 
FROM sys.dm_exec_requests_history ERH
     JOIN sys.dm_request_phases RPH
	   ON ERH.distributed_statement_Id = RPH.dist_statement_id
	  --AND RPH.parent_ids = 0 -- only the parent
ORDER BY RPH.Id DESC

Here's a subset of the result set focusing only on the statistical values:

distr_statement_Id	start_time	end_time	total_elapsed_time_sec	data_processed_mb	avg_time_sec	total_rows	total_mb_processed	operation_type	id	parent_ids
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.031	2030	2.279484738326	Shuffle	1	2
{8C4386D...	...8:24.4300000	...8:39.8266666	15.396	10	1.891	9	6.950577411478	Return	2	0
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	0.203	2030	2.279484738326	Shuffle	3	4
{C952497...	...5:45.2100000	...5:48.4933333	3.283	10	1.406	0	6.449514753891	Return	4	0
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.547	10	1.492313471789	ComputeToControlNode	5	6
{FD2D17A...	...8:52.1400000	...8:55.4166666	3.276	10	0.125	10	0.003963912451	Return	6	0
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	0.203	2030	2.279484738326	Shuffle	7	8
{9FB0A26...	...9:15.1300000	...9:18.6366666	3.506	10	1.922	6	6.949098395914	Return	8	0
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	0.172	2030	2.279484738326	Shuffle	9	10
{1732AB0...	...9:24.6900000	...9:27.4500000	2.76	10	1.297	9	6.454460892023	Return	10	0

Notes:
As can be seen, the volume of data processed and the elapsed time values don't match between the two tables, though they are close. The differences probably result from further steps occuring in the process.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Josep Aguilar-Saborit, Raghu Ramakrishnan et al, "POLARIS: The Distributed SQL Engine in Azure Synapse", VLDB Conferences. PVLDB, 13(12): 3204 – 3216, 2020, DOI: https://doi.org/10.14778/3415478.3415545

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part III: Ranking) 🆕

In two previous posts I shown how to use the newly introduced WINDOW clause in SQL Server 2022 for simple aggregations, respectively running totals, by providing some historical context concerning what it took to do the same simple aggregations as SUM or AVG within previous versions of SQL Server. Let's look at another scenario based on the previously created Sales.vSalesOrders view - ranking records within a partition.

There are 4 ranking functions that work across partitions: Row_Number, Rank, Dense_Rank and NTile. However, in SQL Server 2000 only Row_Number could be easily implemented, and this only if there is a unique identifier (or one needed to create one on the fly):

-- ranking based on correlated subquery (SQL Server 2000+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
, (-- correlated subquery
  SELECT count(SRT.SalesOrderId)
  FROM Sales.vSalesOrders SRT
  WHERE SRT.ProductId = SOL.ProductId 
    AND SRT.[Year] = SOL.[Year]
	AND SRT.[Month] = SOL.[Month]
    AND SRT.SalesOrderId <= SOL.SalesOrderId
   ) RowNumberByDate
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderDate ASC

As alternative for implementing the other ranking functions, one could use procedural language for looping, though this approach was not recommendable given the performance concerns.

SQL Server 2005 introduced all 4 ranking functions, as they are in use also today:

-- ranking functions (SQL Server 2005+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings
, Row_Number() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) RowNumberQty
, Rank() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS RankQty
, Dense_Rank() OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS DenseRankQty
, NTile(4) OVER (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC) AS NTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Now, in SQL Server 2022 the WINDOW clause allows simplifying the query as follows by defining the partition only once:

-- ranking functions (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings

, Row_Number() OVER SalesByMonth AS RowNumberQty
, Rank() OVER SalesByMonth AS RankQty
, Dense_Rank() OVER SalesByMonth AS DenseRankQty
, NTile(4) OVER SalesByMonth AS NTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month] ORDER BY SOL.OrderQty DESC)
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Forward (and backward) referencing of one window into the other can be used with ranking functions as well:

-- ranking functions with ascending/descending sorting (SQL Server 2022+)
SELECT SOL.SalesOrderId 
, SOL.ProductId
, SOL.OrderDate
, SOL.[Year]
, SOL.[Month]
, SOL.OrderQty
-- rankings (descending)
, Row_Number() OVER SalesByMonthSortedDESC AS DescRowNumberQty
, Rank() OVER SalesByMonthSortedDESC AS DescRankQty
, Dense_Rank() OVER SalesByMonthSortedDESC AS DescDenseRankQty
, NTile(4) OVER SalesByMonthSortedDESC AS DescNTileQty
-- rankings (ascending)
, Row_Number() OVER SalesByMonthSortedASC AS AscRowNumberQty
, Rank() OVER SalesByMonthSortedASC AS AscRankQty
, Dense_Rank() OVER SalesByMonthSortedASC AS AscDenseRankQty
, NTile(4) OVER SalesByMonthSortedASC AS AscNTileQty
FROM Sales.vSalesOrders SOL
WHERE SOL.ProductId IN (745)
  AND SOL.[Year] = 2012
  AND SOL.[Month] BETWEEN 1 AND 3
WINDOW SalesByMonth AS (PARTITION BY SOL.ProductId, SOL.[Year], SOL.[Month])
, SalesByMonthSortedDESC AS (SalesByMonth ORDER BY SOL.OrderQty DESC)
, SalesByMonthSortedASC AS (SalesByMonth ORDER BY SOL.OrderQty ASC)
ORDER BY SOL.[Year]
, SOL.[Month]
, SOL.OrderQty DESC

Happy coding!

SQL Troubles

Pages

31 December 2023

💠🛠️🗒️SQL Server: Buffer Pool [Notes]

20 October 2023

💎🏭SQL Reloaded: Extended LTrim/RTrim in SQL Server 2022 (Before and After)

25 January 2023

💎SQL Reloaded: Documenting External Tables

20 January 2023

💎SQL Reloaded: Monitoring the Synapse serverless SQL pool with Dynamics Management Views II

30 October 2022

💎SQL Reloaded: The WINDOW Clause in SQL Server 2022 (Part III: Ranking) 🆕

About Me