SQL Troubles

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

Typically, a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation.

Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.

-- concatenation of strings 
SELECT 'a string' + ' and ' + 'a second string' Example1  
, 'a string' + ',' + 'a second string' Example2  
, '1245' + '67890' Example3  
, '1245' + '.' + '67890' Example4

The concatenation of string variables or columns functions based on the same principles:

-- concatenating string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 varchar(50) 
DECLARE @string3 varchar(50) 

SET @string1 = 'this is a string'  
SET @string2 = 'this is another string'  

SELECT @string1 + ' and ' + @string2 Example1 
, @string1 + char(31) + @string2 Example2 
, @string1 + ', ' + @string2 Example3 
, @string1 + ' ' + @string3 Example4 
, @string1 + IsNull(@string3, '!') Example5

Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:

-- concatenating columns of joined tables 
SELECT PAD.AddressID 
, IsNull(PAD.AddressLine1, '')  
+ IsNull(', ' + PAD.AddressLine2, '') 
+ IsNull(', ' + PAD.City, '') 
+ IsNull(', ' + PAD.PostalCode, '') 
+ IsNull(', ' + PSP.Name, '') Address 
FROM Person.Address PAD 
      JOIN Person.StateProvince PSP 
         ON PAD.StateProvinceID = PSP.StateProvinceID

As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:

-- concatenation within a loop 
DECLARE @list varchar(max) 
DECLARE @index int  
SET @list = '' 
SET @index = ASCII('a') WHILE (@index<ASCII('z')) 
BEGIN 
     SET @list = @list + ', ' + Char(@index) 
     SET @index = @index + 1 
END 
SELECT @list Result

There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:

-- concatenation across records 
;WITH Data(Column1, Ranking)  
AS  
( -- preparing test data 
       SELECT 'A' Column1, 0 Ranking  
       UNION ALL  
       SELECT 'B' Column1, 1 Ranking  
       UNION ALL  
       SELECT 'C' Column1, 2 Ranking 
) 
, Result(Column1, Ranking)  
AS 
(  -- performing the actual concatenation 
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking

      FROM Data

      WHERE Ranking = 0  
      UNION ALL 
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking  
      FROM Data A  
         JOIN Result  B  
             ON A.Ranking - 1 = B.Ranking  
) SELECT Column1  
FROM Result  
WHERE Ranking IN (SELECT MAX(Ranking) FROM Result)

The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

It doesn’t make sense to talk about the creation of string data types without talking about their “selection” as people typically want to see also examples at work, and not only learn about theoretical facts. I’m saying that because often programming seems to be like putting together the pieces of a puzzle without knowing how the final image would look like. Anyway, coming back to the topic, what I find great about SQL Server is that it allows to “create” and select a string with a minimum overhead within a simple select statement:

-- selecting a string value 
SELECT 'this is a string' -- string 1 
, 'and this is a second string' --string2 

-- selecting a string value (named columns)  
SELECT 'this is a string' String1  
, 'and this is a second string' String2

Pretty simple, isn’t it? This kind of scripts could be useful especially when debugging logic or testing an expression (e.g. functions at work), the advantage residing in the fact that is not necessary to built a table in order to test simple things. When performing multiple operations with the same values could be handy to store the values in declared variables:

-- selecting string variables 
DECLARE @string1 varchar(50)

DECLARE @string2 char(50) 
DECLARE @string3 char(50) = 'this is a string'  

SET @string1 = 'this is a string'  

SELECT @string1 String1 , @string2 String2 
, @string3 String3

As can be seen a not initialized variable has by default the value NULL, fact that could lead to unexpected behavior problems when used in expressions involving multiple variables. Therefore it’s recommended to initialize the values with the empty string or at least to handle the nulls in expressions. Starting with SQL Server 2008 it’s possible to declare and initialize variables within the same statement, so the above code won’t work in previous versions, unless the third declaration is broken into two pieces as per definition of first string.

It looks like that’s all, at least in what concerns the direct declaration of a string, be it implicit or explicit. However a basic introduction into strings’ creation is incomplete without mentioning the creation of strings from other data types and the various functions that could be used for this purpose. Ignoring the Cast and Convert functions used to explicitly convert the other data types to string data types, there are several other functions for this purpose, namely Char, Space and Replicate.

Probably some of you are already familiar with the ASCII (American Standard Code for Information Interchange) character-encoding scheme (vs. binary encoding) used to encode files. ASCII code represents a mapping between number and characters, SQL Server supporting the transformation between two sets through the ASCII and Char functions. If ASCII translates a non-unicode character into an integer, the Char function translates an integer value into a non-unicode character. The integer values range between 0 and 255, they encompassing the 0-9 digits, the characters of English together with the diacritics of other important languages, punctuation signs and several graphical characters. The mapping between the two sets is unique, and as can be seen from the below example based on a common table expression, the functions are inverse:

-- ASCII character values 
WITH CTE 
AS (    
    SELECT 0 [Index]  
    UNION ALL 
    SELECT [Index]+1 [Index]  
    FROM CTE 
    WHERE [Index]<255 
) SELECT [Index] 
, CHAR([Index]) [Character] 
, ASCII(CHAR([Index])) [ASCII] 
FROM CTE 
OPTION (MAXRECURSION 0)

There is not much to say about the Space and Replicate functions, the Space function returns a string of repeated spaces, while the Replicate function forms a string as a repeated sequence of values. The definition of the Space function could be considered as redundant as long the same output could be obtained by using the space as repeating sequence.

-- Space & Replicate at work 
SELECT Space(1) Example1 
, Space(5) Example2 
, Replicate(' ', 1) Example3 
, Replicate(' ', 5) Example4 
, Replicate('10', 1) Example5 
, Replicate('10', 2) Example6 
, Replicate('10', 3) Example7

As per the last statement, the first and third examples, respectively the second and fourth example will return the same values, unfortunately the output of the respective examples it’s not so easy to see. For exemplification, it could have been enriched by comparing or concatenating the strings, though that’s a topic for the next post.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

09 December 2010

♟️Strategic Management: Interaction (Just the Quotes)

"Social structures are the products of social synergy, i.e., of the interaction of different social forces, all of which, in and of themselves, are destructive, but whose combined effect, mutually checking, constraining, and equilibrating one another, is to produce structures. The entire drift is toward economy, conservatism, and the prevention of waste. Social structures are mechanisms for the production of results, and the results cannot be secured without them. They are reservoirs of power." (James Q Dealey & Lester F Ward, "A Text-book of Sociology", 1905)

"The true nature of the universal principle of synergy pervading all nature and creating all the different kinds of structure that we observe to exist, must now be made clearer. Primarily and essentially it is a process of equilibration, i.e., the several forces are first brought into a state of partial equilibrium. It begins in collision, conflict, antagonism, and opposition, and then we have the milder phases of antithesis, competition, and interaction, passing next into a modus vivendi, or compromise, and ending in collaboration and cooperation. […] The entire drift is toward economy, conservatism, and the prevention of waste." (James Q Dealey & Lester F Ward, "A Text-book of Sociology", 1905)

"The leadership and other processes of the organization must be such as to ensure a maximum probability that in all interactions and all interactions and all relationships with the organization each member will, in the light of his background, values, and expectations, view the experience as supportive and one which builds and maintains his sense of personal worth and importance." (Rensis Likert, "New patterns of management", 1961)

"The unique feature of the decision tree is that it allows management to combine analytical techniques such as discounted cash flow and present value methods with a clear portrayal of the impact of future decision alternatives and events. Using the decision tree, management can consider various courses of action with greater ease and clarity. The interactions between present decision alternatives, uncertain events, and future choices and their results become more visible." (John F Magee, "Decision Trees for Decision Making", Harvard Business Review, 1964)

"Interaction and decision making relies heavily on group processes." (Rensis Likert, "The Human Organization", 1967)

"The systems approach to problems focuses on systems taken as a whole, not on their parts taken separately. Such an approach is concerned with total - system performance even when a change in only one or a few of its parts is contemplated because there are some properties of systems that can only be treated adequately from a holistic point of view. These properties derive from the relationship between parts of systems: how the parts interact and fit together." (Russell L Ackoff, "Towards a System of Systems Concepts", 1971)

"A company is a multidimensional system capable of growth, expansion, and self-regulation. It is, therefore, not a thing but a set of interacting forces. Any theory of organization must be capable of reflecting a company's many facets, its dynamism, and its basic orderliness. When company organization is reviewed, or when reorganizing a company, it must be looked upon as a whole, as a total system." (Albert Low, "Zen and Creative Management", 1976)

"Managers are not confronted with problems that are independent of each other, but with dynamic situations that consist of complex systems of changing problems that interact with each other. I call such situations messes. Problems are extracted from messes by analysis. Managers do not solve problems, they manage messes." (Russell L Ackoff, "The future of operational research is past", 1979)

"The manager must decide what type of group is wanted. If cooperation, teamwork, and synergy really matter, then one aims for high task interdependence. One structures the jobs of group members so that they have to interact frequently [...] to get their jobs done. Important outcomes are made dependent on group performance. The outcomes are distributed equally. If frenzied, independent activity is the goal, then one aims for low task interdependence and large rewards are distributed competitively and unequally." (Gregory P Shea & Richard A Guzzo, Sloan Management Review, 1987)

"There are several world view assumptions present in enterprise engineering. The first assumption is that the enterprise can be viewed as a complex system. This is necessary because systems in organizations are systems of organized complexity. Complexity is the result of the multiplicity and intricacy of man’s interaction with other components of the system. Secondly, the enterprise is to be viewed as a system of processes. These processes are engineered both individually and holistically. The final assumption is the use of engineering rigor in transforming the enterprise. The enterprise engineering paradigm views the enterprise as a complex system of processes that can be engineered to accomplish specific organizational objectives. In the Enterprise Engineering paradigm, the enterprise is viewed as a complex system of processes that can be engineered to accomplish specific organizational objectives." (Donald H Liles, "Enterprise modeling within an enterprise engineering framework", 1996)

"Senior management needed to step in and make some very tough moves. [...] we also realized then that there must be a better way to formulate strategy. What we needed was a balanced interaction between the middle managers, with their deep knowledge but narrow focus, and senior management, whose larger perspective could set a context." (Andrew Grove, Only the Paranoid Survive, 1998)

"True systems thinking, on the other hand, studies each problem as it relates to the organization’s objectives and interaction with its entire environment, looking at it as a whole within its universe. Taking your organization from a partial systems to a true systems state requires effective strategic management and backward thinking." (Stephen G Haines, "The Systems Thinking Approach to Strategic Planning and Management", 2000)

"The whole way of thinking focuses attention, for most, on the designed system, but it never proves sufficient, and they [the managers] have to 'get things done anyway', almost despite the system. What they are not encouraged to do, by this very way of thinking itself, is to pay attention to the detailed interactions between them, through which they "get things done." [This] is a thoroughly stressful daily experience for people." (Ralph D. Stacey et al, "Complexity and Management: Fad or Radical Challenge to Systems Thinking?", 2000)

"The key element of an organization is not a building or a set of policies and procedures; organizations are made up of people and their relationships with one another. An organization exists when people interact with one another to perform essential functions that help attain goals." (Richard Daft, "The Leadership Experience" , 2002)

"Organizations are not systems but the ongoing patterning of interactions between people. Patterns of human interaction produce further patterns of interaction, not some thing outside of the interaction. We call this perspective complex responsive processes of relating." (Ralph Stacey, 2005)

"Synergy occurs when organizational parts interact to produce a joint effect that is greater than the sum of the parts acting alone. As a result the organization may attain a special advantage with respect to cost, market power, technology, or employee." (Richard L Daft, "The Leadership Experience" 4th Ed., 2008)

"Strategy is concerned with an organisation's direction for the future; its purpose, its ambitions, its resources and how it interacts with the environment in which it operates." (Peter Lake & Robert Drake, "Information Systems Management in the Big Data Era", 2014)

"One way of managing complexity is to constrain the freedom of the parts: to hold some of those nonlinear interactions still. Businesses accomplish this with tight rules, processes, hierarchies, policies, and rigid strategies. Gathering people together under a corporate roof reduces complexity by constraining individual autonomy. The upside, of course, is collaboration, alignment of goals, and faster exchange of information." (Paul Gibbons, "The Science of Successful Organizational Change", 2015)

"Data from the customer interactions is the lifeblood for any organisation to view, understand and optimise the customer experience both remotely and on the front line! In the same way that customer experience experts understand that it’s the little things that count, it’s the small data that can make all the difference." (Alan Pennington, "The Customer Experience Book", 2016)

"[…] deliver a customer experience where the customer sees real value from how you use the data that they share with you and they will keep interacting/sharing that data and their consent for you to use it!" (Alan Pennington, "The Customer Experience Book", 2016)

"A system is a framework that orders and sequences activity within the organisation to achieve a purpose within a band of variance that is acceptable to the owner of the system. Systems are the organisational equivalent of behaviour in human interaction. Systems are the means by which organisations put policies into action. It is the owner of a system who has the authority to change it, hence his or her clear acceptance of the degree of variation generated by the existing system." (Catherine Burke et al, "Systems Leadership" 2nd Ed., 2018)

"Organizations that rely too heavily on org charts and matrixes to split and control work often fail to create the necessary conditions to embrace innovation while still delivering at a fast pace. In order to succeed at that, organizations need stable teams and effective team patterns and interactions. They need to invest in empowered, skilled teams as the foundation for agility and adaptability. To stay alive in ever more competitive markets, organizations need teams and people who are able to sense when context changes and evolve accordingly." (Matthew Skelton & Manuel Pais, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

08 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IX (Special Characters)

Under special characters denomination are categorized typically the characters that don’t belong to the alphabet of a given language, typically English (a-z, A-Z), or the numeric digits (0-9). Characters like umlauts (e.g. ä, ë, ö, ü, ÿ), accents (e.g. é, í, ó, ú) and other type of characters used in specific languages like German, French, Turkish, Hungarian, Romanian, Dutch or Swedish, together with punctuation signs or graphical characters are falling under the designation of special characters. It’s true that SQL Server, like many other databases, supports Unicode, a standard designed to encode such characters, though not all database designs are taking Unicodes into account.

Preponderantly the old database and other software solutions use the non-unicode string data types (char, varchar, text) fact that makes the special characters to be displayed inadequately, sometimes undecipherable. In order to avoid this behavior could be decided to port the data types to unicode or use only the standard characters of English language, both solutions with their positive and negative aspects. In such cases, especially during migration project or ETL tasks, eventually as part of a Data Quality initiative, it’s preferred to identify and replace the special characters in a manual, automatic or semi-automatic fashion. In addition, there are also cases in which, from various reasons, some attributes are not allowed or should not include special characters, and also this aspect could be included in a Data Quality initiative.

During assessment of Data Quality, in an organized or less organized manner, a first step resides in understanding the quality of the data. In the current case this resumes primarily in identifying how many records contain special characters, how could be the data cleaned, and the “cost” for this activity. Actually, before going this far, must be defined the character sets included in special characters, the definition could vary from case to case. For example in some cases could be considered also the space or the important punctuation signs as valid characters, while in others they may not be allowed.

There could be identified thus multiple scenarios, though I found out that the range of characters a-z, A-Z, 0-9 and the space are considered as valid character in most of the cases. For this purpose could be built a simple function that iterates through all the characters of a string and identifies if there is any character not belonging to the before mentioned range of valid characters. In order to address this, a few years back I built a function similar with the below one:

-- checks if a string has special characters 
CREATE FUNCTION dbo.HasSpecialCharacters( 
@string nvarchar(1000)) 
RETURNS int 
AS  
    BEGIN 
         DECLARE @retval int 
         DECLARE @index int 
         DECLARE @char nchar(1)  

         SET @retval = 0 
         SET @index = 1  

         WHILE (@index <= IsNull(len(@string), 0) AND @retval=0) 
        BEGIN  
           SET @char = Substring(@string, @index, @index+1) 
           IF NOT (ASCII(@char) BETWEEN 48 AND 57 -- numeric value 
             OR ASCII(@char) BETWEEN 65 AND 90 -- capital letters 
            OR ASCII(@char) BETWEEN 97 AND 122 -- small letters 
            OR ASCII(@char) = 32) --space 
           BEGIN 
                SET @retval = @index 
           END 
           ELSE 
               SET @index = @index + 1  
     END  

    RETURN (@retval) 
END

Function’s logic is based on the observation that the ASCII of numeric values could be found in the integer interval between 48 and 57, the capital letters between 65 and 90, while the small letters between 97 and 122. By adding the ASCII for space and eventually several other characters, the check on whether an character is valid resuming thus to only 4 constraints. Here’s the function at work:

-- testing HasSpecialCharacters function 
SELECT dbo.HasSpecialCharacters('kj324h5kjkj3245k2j3hkj342jj4') Example1 
, dbo.HasSpecialCharacters('Qualität') Example2 
, dbo.HasSpecialCharacters('Änderung') Example3 
, dbo.HasSpecialCharacters('') Example4 
, dbo.HasSpecialCharacters(NULL) Example5 
, dbo.HasSpecialCharacters('Ä') Example6 
, dbo.HasSpecialCharacters('ä') Example7 
, dbo.HasSpecialCharacters('a') Example8

As can be seen, the function returns the position where a special character is found, fact that enables users to identify the character that causes the problem. A similar function could be built also in order to count the number of special characters found in a string, the change residing in performing a counter rather then returning the position at the first occurrence of a special character.

The function might not be perfect though it solves the problem. There are also other alternatives, for example of storing the special characters in a table and performing a simple join against the target table. Another solution could be based on the use RegEx functionality, either by using OLE automation or directly CLR functionality. There could be done variations on the above solution too by limiting to check on whether the characters of a string are falling in the range projected by the ASCII function. That’s what the following function does:

-- checks if a string has special characters falling in an interval  
CREATE FUNCTION dbo.HasCharNotInASCIIRange( 
@string nvarchar(1000) 
, @start int 
, @end int) 
RETURNS int 
AS  
BEGIN      
    DECLARE @retval int 
    DECLARE @index int 
    DECLARE @char nchar(1)

 
    SET @retval = 0 
    SET @index = 1

    WHILE (@index <= IsNull(len(@string), 0) AND @retval=0) 
    BEGIN  
         SET @char = Substring(@string, @index, @index+1) 
         IF NOT (ASCII(@char) BETWEEN @start AND @end)  
        BEGIN 
              SET @retval = @index 
        END 
        ELSE 
             SET @index = @index + 1  
     END 
     RETURN (@retval) 
END

With this function are necessary 4 calls in order to identify if a string contains special characters, though we loose the flexibility of identifying the first character that is invalid. We could still identify the first occurrence by taking the minimum value returned by the 4 calls, however, unlike Oracle (see Least function), SQL Server doesn’t have such a function, so we’ll have eventually to built it. Anyway, here’s the above function at work:

-- testing HasCharNotInASCIIRange function 
SELECT dbo.HasCharNotInASCIIRange('k12345', 48, 57) Example1 
, dbo.HasCharNotInASCIIRange('12k345', 48, 57) Example2 
, dbo.HasCharNotInASCIIRange('12345', 48, 57) Example3 
, dbo.HasCharNotInASCIIRange(' 12345', 48, 57) Example4 
, dbo.HasCharNotInASCIIRange('12345 ', 48, 57) Example5 
, dbo.HasCharNotInASCIIRange('', 48, 57) Example6 
, dbo.HasCharNotInASCIIRange(NULL, 48, 57) Example7 
, dbo.HasCharNotInASCIIRange('', 48, 57) Example8 
, dbo.HasCharNotInASCIIRange('a', 48, 57) Example9 
, dbo.HasCharNotInASCIIRange('Ä', 48, 57) Example10 
, dbo.HasCharNotInASCIIRange('ä', 32, 32) Example11

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

05 December 2010

🔦Process Management: Business Process Management (Definitions)

"The general term for the services and tools that support explicit process management (such as process analysis, definition, execution, monitoring, and administration), including support for human and application-level interaction." (Tilak Mitra et al, "SOA Governance", 2008)

"Software that models an enterprise's human and machine tasks and the interactions between them as processes and can monitor these tasks in real time in order to trigger a unit of work or set off an alert when specified time limits are exceeded or a response is not received within a specified time." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"Software products that support the design, execution, and monitoring of repetitive, day-to-day business processes. Can create data used in diagnostic or interactive control systems." (Leslie G Eldenburg & Susan K Wolcott, "Cost Management 2nd Ed", 2011)

"A popular management technique that includes methods and tools to support the design, analysis, implementation, management, and optimization of operational business processes." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Management approach focused on aligning all aspects of an organization with the wants and needs of clients. Business process management attempts to improve processes continuously and may be therefore described as a “process optimization process.” Business process management activities can be grouped into five categories: design, modeling, execution, monitoring, and optimization." (IQBBA, "Standard glossary of terms used in Software Engineering", 2011)

"A managerial discipline that is focused on execution. It is the art and science of how organizations do things and how they can do them better. BPM attempts to optimize process performance to achieve strategic business objectives consistently while adapting, when necessary, to change or to new opportunity." (Carl F Lehmann, "Strategy and Business Process Management", 2012)

"Managing the work steps and business activities of an organization's workers in an automated way." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"A discipline focused on continuous improvement and transformation of end-to-end cross-functional business processes." (Forrester)

"Business process management (BPM) is a discipline that uses various methods to discover, model, analyze, measure, improve and optimize business processes. A business process coordinates the behavior of people, systems, information and things to produce business outcomes in support of a business strategy. Processes can be structured and repeatable, or unstructured and variable." (Gartner)

"Business process management (BPM) is the discipline of improving a business process from end to end by analyzing it, modelling how it works in different scenarios, executing improvements, monitoring the improved process and continually optimizing it." (Techtarget)

"Business process management (BPM) involves the use of appropriate tools and techniques to design, analyse, and manage operational business processes and, where possible, to improve those processes. The term business process refers to repetitive activities performed in the context of an organization’s normal, everyday operations." (Institute for Competitive Intelligence)

💎🏭SQL Reloaded: A Look into Reverse

Some time back I was mentioning Stuff as one of the functions rarely used in SQL Server. Here’s another one: Reverse. I remember I used such a function in other programming languages and maybe 1-2 times in SQL Server, and I can’t say I seen it in action in other posts. As it name states and the MSDN documentation confirms, Reverse returns the reverse of a string value. The function takes as parameter any string value or value that could be implicitly conversed to a string, otherwise you’ll have to do an explicit conversion. As can be seen from the below first example the reverse of the “string” string is “gnirts”, and, as per the second example, the reverse of the revers of a string is the initial string itself. The third and fourth example use in exchange a numeric value.

-- reversing a string using Reverse function 
SELECT Reverse('string') Example1 
, Reverse(Reverse('string')) Example2 
, Reverse(1234567890) Example3 
, Reverse(Reverse(1234567890)) Example4

Example1	Example2	Example3	Example4
gnirts	string	987654321	1234567890

It seems there is not much to be said about Reverse function, and in the few situations that you need it, it’s great to have it, otherwise you would have to built it yourself. Let’s try to imagine there is no Reverse function in SQL Server, how would we provide such functionality by using existing functionality? In general such an exercise might look like lost time, though it could be useful in order to present several basic techniques. In this case the Reverse functionality could be provided by using a simple loop that iterates through string’s characters changing their order. The below piece of code should return the same output as above:

-- reversing a string using looping 
DECLARE @String varchar(50) 
DECLARE @Retval varchar(50) 
DECLARE @Index int 
SET @String = 'string' 
SET @Index = 1 
SET @Retval = '' 
WHILE @Index <= Len(@String) 
BEGIN  
   SET @Retval = Substring(@String, @Index, 1) + @Retval   
   SET @Index = @Index + 1  
END 
SELECT @Retval

Some type of problems whose solution involve the sequential processing of values within loops could be solved also using recursive techniques (recursion) that involve the sequential processing of values in which the output of a given intermediary step is based on the previous step. Recursion imply a slightly different view of the same problem, it could prove itself to be maybe less efficient in some cases and quite a useful technique in the others. SQL Server provides two types of recursion, one at function level involving functions, stored procedures are triggers, respectively at query level implemented in common table expressions. Recursive functions, functions that call themselves, have been for quite a while in SQL Server though few of the database books I read really talk about them. Probably I will detail the topic in another post, though for the moment I will show only the technique at work. In this case the recursive approach is quite simple:

-- reversing a string using a recursive function 
CREATE FUNCTION dbo.ReverseString( 
@string varchar(50)) 
RETURNS varchar(50) 
AS 
BEGIN           
     RETURN (CASE WHEN Len(@string)>1 THEN dbo.ReverseString( Right(@string, len(@string)-1)) + Left(@string, 1) ELSE @string END) 
END

-- testing 
SELECT dbo.ReverseString('string') Example1 
, dbo.ReverseString(dbo.ReverseString('string')) Example2

Common table expressions (CTE), since their introduction with SQL Server 2005, became quite an important technique in processing hierarchical structures like BOMs, which involve the traversal of the whole tree. A sequence is actually a tree having the width of 1 node, so it seems like a CTE’s job:

-- reversing a string using common table expression 
DECLARE @String varchar(50) 
SET @String = 'string' 
;WITH CTE(String, [Index]) 
AS (     SELECT Cast('' as varchar(50)) String 
    , 1 [Index] 
    UNION ALL 
    SELECT Cast(Substring(@String, [Index], 1) + String as varchar(50)) String 
    , [Index] + 1 [Index] 
    FROM CTE 
    WHERE [Index]<=Len(@String) 
)

SELECT @String Reversed 
FROM CTE 
WHERE [Index]-1=Len(@String)

Unlike the previous two solutions, the CTE approach involves more work, work that maybe isn’t required just for the sake of using the technique. When multiple alternative approaches exist, I prefer using the (simplest) solution that offers the highest degree of flexibility. In this case is the loop, though in others it could be a recursive function or a CTE.

Notes:

The queries work also in SQL databases in Microsoft Fabric.

Happy coding!

30 November 2010

🧭Business Intelligence: Enterprise Reporting (Part IX: How Many Reporting Systems Do You Need?)

Business Intelligence Series

I hope nobody’s questioning the fact that an organization needs at least a reporting system in order to take advantage of the huge amount of data it has collected over time, in order to have an overview on what’s happening in the organization, take better decision supported by data, etc. In one way or another many organizations arrive to have in place two or more reporting systems and all the consequences deriving from it. From experience, I would expect that many professionals have put themselves at least once the above question, often they having to live with their decisions and they are not always happy about them.

On one side we have the demand, the needs of the various departments and groups existing in an organization, while on the other side of the balance we have the various types of data and the technologies existing on the market (could be regarded as the supply), and the various constraints an organization might have: resources (financial, human, processing power, time), politics and philosophies, geography, complexity, etc. If they don’t seem so many you have to consider that the respective constraints could be further split into a long chain of causality, the books on Project Management, Architecture, Methodologies and other related topics treating them in detail, so no need to go there.

From practical reasons, I’d like to reformulate the above question as follows:
1. Is it enough only one reporting system or do we need more than one?
2. What’s the optimum number of reporting systems existing in an organization?

A blunt answer for the first question would be that an organization might need as many reporting systems as it’s needed in order to satisfy the existing reporting needs, in other words the demand. Straightforward answer but not really acceptable for a Manager, in particular, and an IT Professional, in general. Another blunt answer, this time for the second question, would be that an organization needs only “one and good” reporting system, though that’s contradicting the various constraints existing in IT world. So, what should it be?

Typically there are two views, the bottom up view, in this case starting with the data and building up to the reporting demands, and top down view, starting from the reporting requirements to data. In this play-set the technologies, in general, the reporting systems, in particular, are playing the role of middle point, the two views applying in respect to data vs. reporting technologies, reporting technologies vs. reporting requirements, increasing the complexity of the view. Sometimes is enough to focus only on the reporting requirements, why we consider then also the data in this scenario?! Remember, reporting it’s all about harnessing the value of your data, not only in finding the right reporting solution.

Often, in order to understand the requirements of an organization and understanding the value of data it’s easier to look at the various types of data it stores – the bottom up approach. Typically we could make distinction between master vs. transactional data, raw vs. aggregated data, structured vs. semi-structured vs. non-structured data, cleaned vs. not cleaned data, historical vs. live data, mined, distributed or heterogenous data, to mention just a few of the important ways of categorizing the data.

The respective types of data are important because they require special types of techniques or tools to process, report and harness them. The master and transactional data are at the heart of business applications, the “fluid” that flows and nourishes them, with an incredible potential when harnessed at its real value. Such unaltered data are typically structured at macro level, though they could contain also semi-structured or non-structured chunks of data, could have different levels of quality, could be distributed or heterogenous, etc. The range of data repositories range from text or document systems to specialized databases, any mixture being in theory possible.

So we have a huge volume of data collected over time, how could we harness them in order to reveal their real value? Excepting the operational use, the usefulness of raw data, data found in their unaltered form, stops usually at the point where their complexity falls beyond people’s limits of understanding them. Thus we arrived to aggregate, recombine or extract chunks from the data, bringing the data to a more comprehensible form. We even arrived to extract more information out of the data using specific techniques known as data mining methods, algorithms that allow us to identify associations, cluster or interpolate the data, the complexity of such algorithms evolving in considerable in the past two decades.

There are few software applications which provide the whole range of types of data processing, most of them resuming in recombining and presenting the data. Also data presentation offers a considerable range of formats (e.g. tabular, text, XML, Excel, charts and other diagrams) with complex navigational functionality (slice-and-dice, drill-down, drill-through, click-through), different ways of making the data available for consumption (direct-access, cached, web services), different security levels, etc.

When it comes to managers and users they want the data at the level of detail and form that allows the easiest/lowest proximate understanding level, and, “yesterday”, to use a word that reflects the urgency of requirements. This could be done in theory by coming with a whole set of reports covering all possible requirements, though that’s not so efficient as investment and not always possible with a button click, as probably all we’d like to.

For the ones familiarized to Manufacturing, it’s actually a disguised push vs. pull scenario, pushing the reports to the user without expecting their demands vs. waiting for the user to forward the reporting requirements, from which to derive eventually new reports or make changes to existing ones. In Manufacturing it’s all about finding the right balance between push and demand, and even if the respective field has been found some (best) practices that leads to the middle way, in IT we still have to dig for it. The road seems to lead nowhere… but it helps getting a rough understanding of what a report involves, at least in what concerns the important non-programming stuff.

In the top down approach, as the data are remaining relatively constant, it’s easier to focus on the set of requirements and the reporting tool(s) used. It makes sense to choose the reporting tool that covers at least the most important requirements while for the other you have the choice to use workarounds or address them using two or more reporting solutions, attempting again to cover the most important requirements, and so on. The problem with having more than one reporting solution is that often data, logic and reports arrive to be replicated, the whole reporting infrastructure becoming more difficult to manage. On the other side as long you are having a clear (partitioned) delimitation between the reporting frameworks, logic and data are replicated at minimum, having two or more reporting solutions seems to be an acceptable trade. Examples of such delimitations are for example the OLTP vs. OLAP solutions, to which adds the data mining reporting solution(s).

There are also attempts to extend an existing reporting framework above the initial functionality, though often people arrived to reinvent the wheel or create little monsters they arrive to live with. Also this is an acceptable approach, as long the reporting framework allows that. Some attempts might succeed to provide what they were designed for, others not. There are also some good news from this perspective, the appearance of mash-ups and semantic technologies could in the future allow the integration of reporting systems, making possible things that nowadays are quite challenging. The future is open of unlimited possibilities, but better stick to the present! For that stay tuned for a second post!

Previous Post <<||>> Next Post

SQL Troubles

Pages

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

09 December 2010

♟️Strategic Management: Interaction (Just the Quotes)

08 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IX (Special Characters)

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

05 December 2010

🔦Process Management: Business Process Management (Definitions)

💎🏭SQL Reloaded: A Look into Reverse

30 November 2010

🧭Business Intelligence: Enterprise Reporting (Part IX: How Many Reporting Systems Do You Need?)

About Me