31 December 2010

Meta-Blogging: Past, Present and Future

Introduction

   Even if I started to blog 3-4 years ago, only this year (still 2010) I started to allocate more time for blogging, having two blogs on which I try to post something periodically: SQL Troubles and The Web of Knowledge plus a homonym Facebook supporting group (for the later blog). As a parenthesis, the two blogs are approaching related topics from different perspective, the first focusing on data related topics, while the second approaching data from knowledge and web perspective; because several posts qualify for both blogs, I was thinking to merge the two blogs, though given the different perspectives and types of domains that deal with them, at least for the moment I’ll keep them apart. Closing the parenthesis, I would like to point out that I would love to allocate more time though I have to balance between blogging, my professional and personal life, and even if the three have many points in common, some delimitation it’s necessary. Because it’s the end of a year, I was thinking that it’s maybe the best time to draw the line and analyze the achievements of the previous year and the expectations for the next year(s), for each of the two blogs. So here are my thoughts:

Past and Present

    There are already more than 10 years since I started to work with the various database systems, my work ranging from data modeling to database development, reporting, ERP systems, etc. I can’t consider myself an expert, though I’ve accumulated experience in a whole range of areas, fact that I think entitles me to say that I have something to write about, even if the respective themes are not rocket science. In addition, it’s the human endeavor of learning something new each day, and in IT that’s quite an imperative, the evolvement of various technologies requesting those who are working in this domain to spend extra hours in learning new things or of consolidating or reusing knowledge in new ways. I considered at that time, and I still do, that blogging helps the learning process, allowing me to externalize the old or new knowledge, clear my thoughts, have also some kind of testimony of what I know or at least a repository of information I could reuse when needed, and eventually receive some feedback. These are few of the reasons for which this blog was born, and I hope the information presented in here are useful also for other people.

  During the past year I made it to post on my blog more than 100 entries on various topics, the thematic revolving around strings, hierarchical queries, CLR functionality, Data Quality, SSIS, ERPs, Reports, troubleshooting, best practices, joins, etc. Not all the posts rose to my expectations, though that’s a start, hoping that I will find a personal style and the quality of the posts will increase. I can’t say I received lot of feedback, however based on the user access’ statistics provided by Clustrmaps and Google the number of visitors this year was somewhere around 8500, close to my expectations. Talking about the number of visitors, it’s nice to have also some visualization, so for this year’s statistics I’ll use Clustrmaps visualization, which provides a more detailed geographical overview than Google’s Stats, while for trending I show below Google’s Stats (contains data from May until today):

The Web of Knowledge - Clustrmaps 2010 statistics

The Web of Knowledge - Google 2010 statistics

     What I find great about Google’s Stats is that it provides also an overview of the most accessed posts and the traffic sources. There are also some statistics of the audience per browsers and OS, though they are less important for my blogging requirements, at least for the moment.

The Web of Knowledge - pageviews by OS The Web of Knowledge - pageviews by browsers

  What I find interesting is that most visited posts and searched keywords were targeting SSIS and Oracle vs. SQL Server-related topics. So, if for the future I want more traffic than maybe I should diversify my topics in this direction.

Future

  I realize that I started many topics, having in the next year to continue posting on the same, but also targeting new topics like Relational Theory, Business Intelligence, Data Mining, Data Management, Statistics, SQL Server internals, data technologies, etc. Many of the posts will be an extension of my research on the above topics, and I was thinking to post also my learning notes with the hope that I will receive more feedback. I realized that I need to be more active and provide more feedback to other blogs, using the respective comments as gateways to my blog and try to build a network around it. I was thinking also to start a Facebook “support group”, posting the links I discovered, quotes or impressions in a more condensed form, but again this will take me more time, so I’m not sure if it makes sense to do that. Maybe I should post them directly on the blog, however I wanted my posts to be a little more consistent than that. Anyway I know also that I won’t manage to post more than an average of one post per week though per current expectations is ok.

  Right now all the posts are following a push model, in other words I push the content independently of whether there is a demand or not for it. It’s actually natural because the blog is having a personal note. In the future I’m expecting to move in the direction of a pull model, in other words to write on topics requested by readers, however for this I need more feedback from you, the reader. So please let me know what topics you’d like to read!

  I close here, hoping that the coming year (2011) will be much better than the current one. I wish to all of you, a Happy New Year!

14 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server III: (Concatenation)

Typically, a database in general, and a table in particular, that follows the normalization rules, is designed to have the columns contain the smallest semantic chunks of data, it could be a Street, a Zip Code, City, a person’s First or Last Name, but also a large chunk of text like a Description or a Comment. No matter how well designed is a database, there will always be the need to do various operations with strings, typically concatenation, extraction of subpart of a string, insertion or deletion of characters, rearangement of string’s characters, trimming, splitting it in substrings, or of getting various numeric values: length, position of a given text, number of not empty characters, on whether the text represents a valid numeric or date values, etc. In the following posts I will attempt to address the respective operations in the context of select statements, and let’s start with concatenation. 

Concatenation is the operation of joining two or more string values within an expression. In SQL Server the “+” operator is used for concatenation, and it could be used to concatenate two or more members. In order to concatenate two members from which one of them is a string, the second term needs to be explicitly converted to a string data type, otherwise an error will occur. For readability or post-processing purposes, the strings are concatenated using a delimiter in order to delimit the boundaries of the initial value, it could be used a space, a comma, a semicolon, a pipe, a tab or any other character that could be used typically in delimiting columns.  

-- concatenation of strings 
SELECT 'a string' + ' and ' + 'a second string' Example1  
, 'a string' + ',' + 'a second string' Example2  
, '1245' + '67890' Example3  
, '1245' + '.' + '67890' Example4 

The concatenation of string variables or columns functions based on the same principles: 

-- concatenating string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 varchar(50) 
DECLARE @string3 varchar(50) 

SET @string1 = 'this is a string'  
SET @string2 = 'this is another string'  

SELECT @string1 + ' and ' + @string2 Example1 
, @string1 + char(31) + @string2 Example2 
, @string1 + ', ' + @string2 Example3 
, @string1 + ' ' + @string3 Example4 
, @string1 + IsNull(@string3, '!') Example5 
 
concatenation 2 

Here’s another example based on the concatenation of columns coming from two joined tables from AdventureWorks database:
 
-- concatenating columns of joined tables 
SELECT PAD.AddressID 
, IsNull(PAD.AddressLine1, '')  
+ IsNull(', ' + PAD.AddressLine2, '') 
+ IsNull(', ' + PAD.City, '') 
+ IsNull(', ' + PAD.PostalCode, '') 
+ IsNull(', ' + PSP.Name, '') Address 
FROM Person.Address PAD 
      JOIN Person.StateProvince PSP 
         ON PAD.StateProvinceID = PSP.StateProvinceID 
 
concatenation 3

As stressed in the previous post, the NULL values need to be adequately handled either by initializing values or by using the IsNull or COALESCE functions. The concatenation of strings combined with IsNull function could be used creatively in order to add a comma only when a value is not null, as in the above example.

There are scenarios in which is needed to concatenate the values belonging to the same column but from different records, for example concatenating the email values in order to send a single email to all the users in one single action. Before the introduction of common table expressions (CTE), wasn’t possible to concatenate the string values belonging to different records, at least not in a query, this functionality being achieved by using cursors or loops, or simply performed on client or intermediate layers. As I already gave an example on how to use cursor in order to loop through the values of a table and concatenate them (see “Cursors and Lists” post), I will focus on the use of loops and simple CTEs.

Loops are one of the basic functionality in programming languages, no matter of their complexity or type. Either if are talking about WHILE, FOR, foreach or do … until loops, the principles are the same: perform a list of actions until one or more conditions are met. In this case the actions performed is reduced to a set of concatenations based on the letters of the (English) alphabet:

-- concatenation within a loop 
DECLARE @list varchar(max) 
DECLARE @index int  
SET @list = '' 
SET @index = ASCII('a') WHILE (@index<ASCII('z')) 
BEGIN 
     SET @list = @list + ', ' + Char(@index) 
     SET @index = @index + 1 
END 
SELECT @list Result 

There is more work that needs to be performed in order to remove the leading comma from the output, but that’s a topic for the next post, when discussing about decomposition of strings.

CTEs are a little more complex to use than the loops, though the concatenation could be achieved across records and this in one query and not in procedural language as in the above example. In order delimit the two components of a CTE, I made use of a second CTE which simulates the existence of a given table:
 
-- concatenation across records 
;WITH Data(Column1, Ranking)  
AS  
( -- preparing test data 
       SELECT 'A' Column1, 0 Ranking  
       UNION ALL  
       SELECT 'B' Column1, 1 Ranking  
       UNION ALL  
       SELECT 'C' Column1, 2 Ranking 
) 
, Result(Column1, Ranking)  
AS 
(  -- performing the actual concatenation 
      SELECT Cast(Column1 as nvarchar(max)) Column1 , Ranking 
      FROM Data 
      WHERE Ranking = 0  
      UNION ALL 
      SELECT Cast(B.Column1 + ',' + A.Column1 as nvarchar(max)) Column1 , A.Ranking  
      FROM Data A  
         JOIN Result  B  
             ON A.Ranking - 1 = B.Ranking  
) SELECT Column1  
FROM Result  
WHERE Ranking IN (SELECT MAX(Ranking) FROM Result) 
  
The logic for doing a simple concatenation seems maybe complicated, though the volume of work is not so big if we ignore the first CTE. On the other side I introduced an “index” within the Ranking column, fact that allows processing easier the records. When dealing with the records coming from a table it’s probably much easier to use one of the ranking functions that suits best.

Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

11 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server II (Creation and Selection)

It doesn’t make sense to talk about the creation of string data types without talking about their “selection” as people typically want to see also examples at work, and not only learn about theoretical facts. I’m saying that because often programming seems to be like putting together the pieces of a puzzle without knowing how the final image would look like. Anyway, coming back to the topic, what I find great about SQL Server is that it allows to “create” and select a string with a minimum overhead within a simple select statement:

-- selecting a string value 
SELECT 'this is a string' -- string 1 
, 'and this is a second string' --string2 

-- selecting a string value (named columns)  
SELECT 'this is a string' String1  
, 'and this is a second string' String2 
 
Strings - example1

Pretty simple, isn’t it? This kind of scripts could be useful especially when debugging logic or testing an expression (e.g. functions at work), the advantage residing in the fact that is not necessary to built a table in order to test simple things. When performing multiple operations with the same values could be handy to store the values in declared variables:  

-- selecting string variables 
DECLARE @string1 varchar(50) 
DECLARE @string2 char(50) 
DECLARE @string3 char(50) = 'this is a string'  

SET @string1 = 'this is a string'  

SELECT @string1 String1 , @string2 String2 
, @string3 String3 
   
Strings - example2

As can be seen a not initialized variable has by default the value NULL, fact that could lead to unexpected behavior problems when used in expressions involving multiple variables. Therefore it’s recommended to initialize the values with the empty string or at least to handle the nulls in expressions. Starting with SQL Server 2008 it’s possible to declare and initialize variables within the same statement, so the above code won’t work in previous versions, unless the third declaration is broken into two pieces as per definition of first string.

It looks like that’s all, at least in what concerns the direct declaration of a string, be it implicit or explicit. However a basic introduction into strings’ creation is incomplete without mentioning the creation of strings from other data types and the various functions that could be used for this purpose. Ignoring the Cast and Convert functions used to explicitly convert the other data types to string data types, there are several other functions for this purpose, namely Char, Space and Replicate.

Probably some of you are already familiar with the ASCII (American Standard Code for Information Interchange) character-encoding scheme (vs. binary encoding) used to encode files. ASCII code represents a mapping between number and characters, SQL Server supporting the transformation between two sets through the ASCII and Char functions. If ASCII translates a non-unicode character into an integer, the Char function translates an integer value into a non-unicode character. The integer values range between 0 and 255, they encompassing the 0-9 digits, the characters of English together with the diacritics of other important languages, punctuation signs and several graphical characters. The mapping between the two sets is unique, and as can be seen from the below example based on a common table expression, the functions are inverse:

-- ASCII character values 
WITH CTE 
AS (    
    SELECT 0 [Index]  
    UNION ALL 
    SELECT [Index]+1 [Index]  
    FROM CTE 
    WHERE [Index]<255 
) SELECT [Index] 
, CHAR([Index]) [Character] 
, ASCII(CHAR([Index])) [ASCII] 
FROM CTE 
OPTION (MAXRECURSION 0) 

There is not much to say about the Space and Replicate functions, the Space function returns a string of repeated spaces, while the Replicate function forms a string as a repeated sequence of values. The definition of the Space function could be considered as redundant as long the same output could be obtained by using the space as repeating sequence.

-- Space & Replicate at work 
SELECT Space(1) Example1 
, Space(5) Example2 
, Replicate(' ', 1) Example3 
, Replicate(' ', 5) Example4 
, Replicate('10', 1) Example5 
, Replicate('10', 2) Example6 
, Replicate('10', 3) Example7 
 
Replicate

As per the last statement, the first and third examples, respectively the second and fourth example will return the same values, unfortunately the output of the respective examples it’s not so easy to see. For exemplification, it could have been enriched by comparing or concatenating the strings, though that’s a topic for the next post. 

Notes:
The queries work also in SQL databases in Microsoft Fabric
 
Happy coding!

09 December 2010

♟️Strategic Management: Interaction (Just the Quotes)

"Social structures are the products of social synergy, i.e., of the interaction of different social forces, all of which, in and of themselves, are destructive, but whose combined effect, mutually checking, constraining, and equilibrating one another, is to produce structures. The entire drift is toward economy, conservatism, and the prevention of waste. Social structures are mechanisms for the production of results, and the results cannot be secured without them. They are reservoirs of power." (James Q Dealey & Lester F Ward, "A Text-book of Sociology", 1905)

"The true nature of the universal principle of synergy pervading all nature and creating all the different kinds of structure that we observe to exist, must now be made clearer. Primarily and essentially it is a process of equilibration, i.e., the several forces are first brought into a state of partial equilibrium. It begins in collision, conflict, antagonism, and opposition, and then we have the milder phases of antithesis, competition, and interaction, passing next into a modus vivendi, or compromise, and ending in collaboration and cooperation. […] The entire drift is toward economy, conservatism, and the prevention of waste." (James Q Dealey & Lester F Ward, "A Text-book of Sociology", 1905)

"The leadership and other processes of the organization must be such as to ensure a maximum probability that in all interactions and all interactions and all relationships with the organization each member will, in the light of his background, values, and expectations, view the experience as supportive and one which builds and maintains his sense of personal worth and importance." (Rensis Likert, "New patterns of management", 1961)

"The unique feature of the decision tree is that it allows management to combine analytical techniques such as discounted cash flow and present value methods with a clear portrayal of the impact of future decision alternatives and events. Using the decision tree, management can consider various courses of action with greater ease and clarity. The interactions between present decision alternatives, uncertain events, and future choices and their results become more visible." (John F Magee, "Decision Trees for Decision Making", Harvard Business Review, 1964)

"Interaction and decision making relies heavily on group processes." (Rensis Likert, "The Human Organization", 1967)

"The systems approach to problems focuses on systems taken as a whole, not on their parts taken separately. Such an approach is concerned with total - system performance even when a change in only one or a few of its parts is contemplated because there are some properties of systems that can only be treated adequately from a holistic point of view. These properties derive from the relationship between parts of systems: how the parts interact and fit together." (Russell L Ackoff, "Towards a System of Systems Concepts", 1971) 

"A company is a multidimensional system capable of growth, expansion, and self-regulation. It is, therefore, not a thing but a set of interacting forces. Any theory of organization must be capable of reflecting a company's many facets, its dynamism, and its basic orderliness. When company organization is reviewed, or when reorganizing a company, it must be looked upon as a whole, as a total system." (Albert Low, "Zen and Creative Management", 1976)

"Managers are not confronted with problems that are independent of each other, but with dynamic situations that consist of complex systems of changing problems that interact with each other. I call such situations messes. Problems are extracted from messes by analysis. Managers do not solve problems, they manage messes." (Russell L Ackoff, "The future of operational research is past", 1979)

"The manager must decide what type of group is wanted. If cooperation, teamwork, and synergy really matter, then one aims for high task interdependence. One structures the jobs of group members so that they have to interact frequently [...] to get their jobs done. Important outcomes are made dependent on group performance. The outcomes are distributed equally. If frenzied, independent activity is the goal, then one aims for low task interdependence and large rewards are distributed competitively and unequally." (Gregory P Shea & Richard A Guzzo, Sloan Management Review, 1987)

"There are several world view assumptions present in enterprise engineering. The first assumption is that the enterprise can be viewed as a complex system. This is necessary because systems in organizations are systems of organized complexity. Complexity is the result of the multiplicity and intricacy of man’s interaction with other components of the system. Secondly, the enterprise is to be viewed as a system of processes. These processes are engineered both individually and holistically. The final assumption is the use of engineering rigor in transforming the enterprise. The enterprise engineering paradigm views the enterprise as a complex system of processes that can be engineered to accomplish specific organizational objectives. In the Enterprise Engineering paradigm, the enterprise is viewed as a complex system of processes that can be engineered to accomplish specific organizational objectives." (Donald H Liles, "Enterprise modeling within an enterprise engineering framework", 1996)

"Senior management needed to step in and make some very tough moves. [...] we also realized then that there must be a better way to formulate strategy. What we needed was a balanced interaction between the middle managers, with their deep knowledge but narrow focus, and senior management, whose larger perspective could set a context." (Andrew Grove, Only the Paranoid Survive, 1998)

"True systems thinking, on the other hand, studies each problem as it relates to the organization’s objectives and interaction with its entire environment, looking at it as a whole within its universe. Taking your organization from a partial systems to a true systems state requires effective strategic management and backward thinking." (Stephen G Haines, "The Systems Thinking Approach to Strategic Planning and Management", 2000)

"The whole way of thinking focuses attention, for most, on the designed system, but it never proves sufficient, and they [the managers] have to 'get things done anyway', almost despite the system. What they are not encouraged to do, by this very way of thinking itself, is to pay attention to the detailed interactions between them, through which they "get things done." [This] is a thoroughly stressful daily experience for people." (Ralph D. Stacey et al, "Complexity and Management: Fad or Radical Challenge to Systems Thinking?", 2000)

"The key element of an organization is not a building or a set of policies and procedures; organizations are made up of people and their relationships with one another. An organization exists when people interact with one another to perform essential functions that help attain goals." (Richard Daft, "The Leadership Experience" , 2002)

"Organizations are not systems but the ongoing patterning of interactions between people. Patterns of human interaction produce further patterns of interaction, not some thing outside of the interaction. We call this perspective complex responsive processes of relating." (Ralph Stacey, 2005)

"Synergy occurs when organizational parts interact to produce a joint effect that is greater than the sum of the parts acting alone. As a result the organization may attain a special advantage with respect to cost, market power, technology, or employee." (Richard L Daft, "The Leadership Experience" 4th Ed., 2008)

"Strategy is concerned with an organisation's direction for the future; its purpose, its ambitions, its resources and how it interacts with the environment in which it operates." (Peter Lake & Robert Drake, "Information Systems Management in the Big Data Era", 2014)

"One way of managing complexity is to constrain the freedom of the parts: to hold some of those nonlinear interactions still. Businesses accomplish this with tight rules, processes, hierarchies, policies, and rigid strategies. Gathering people together under a corporate roof reduces complexity by constraining individual autonomy. The upside, of course, is collaboration, alignment of goals, and faster exchange of information." (Paul Gibbons, "The Science of Successful Organizational Change",  2015)

"Data from the customer interactions is the lifeblood for any organisation to view, understand and optimise the customer experience both remotely and on the front line! In the same way that customer experience experts understand that it’s the little things that count, it’s the small data that can make all the difference." (Alan Pennington, "The Customer Experience Book", 2016)

"[…] deliver a customer experience where the customer sees real value from how you use the data that they share with you and they will keep interacting/sharing that data and their consent for you to use it!" (Alan Pennington, "The Customer Experience Book", 2016)

"A system is a framework that orders and sequences activity within the organisation to achieve a purpose within a band of variance that is acceptable to the owner of the system.  Systems are the organisational equivalent of behaviour in human interaction. Systems are the means by which organisations put policies into action.  It is the owner of a system who has the authority to change it, hence his or her clear acceptance of the degree of variation generated by the existing system." (Catherine Burke et al, "Systems Leadership" 2nd Ed., 2018)

"Organizations that rely too heavily on org charts and matrixes to split and control work often fail to create the necessary conditions to embrace innovation while still delivering at a fast pace. In order to succeed at that, organizations need stable teams and effective team patterns and interactions. They need to invest in empowered, skilled teams as the foundation for agility and adaptability. To stay alive in ever more competitive markets, organizations need teams and people who are able to sense when context changes and evolve accordingly." (Matthew Skelton & Manuel Pais, "Team Topologies: Organizing Business and Technology Teams for Fast Flow", 2019)

08 December 2010

💎🏭SQL Reloaded: Pulling the Strings of SQL Server IX (Special Characters)

Under special characters denomination are categorized typically the characters that don’t belong to the alphabet of a given language, typically English (a-z, A-Z), or the numeric digits (0-9). Characters like umlauts (e.g. ä, ë, ö, ü, ÿ), accents (e.g. é, í, ó, ú) and other type of characters used in specific languages like German, French, Turkish, Hungarian, Romanian, Dutch or Swedish, together with punctuation signs or graphical characters are falling under the designation of special characters. It’s true that SQL Server, like many other databases, supports Unicode, a standard designed to encode such characters, though not all database designs are taking Unicodes into account. 

Preponderantly the old database and other software solutions use the non-unicode string data types (char, varchar, text) fact that makes the special characters to be displayed inadequately, sometimes undecipherable. In order to avoid this behavior could be decided to port the data types to unicode or use only the standard characters of English language, both solutions with their positive and negative aspects. In such cases, especially during migration project or ETL tasks, eventually as part of a Data Quality initiative, it’s preferred to identify and replace the special characters in a manual, automatic or semi-automatic fashion. In addition, there are also cases in which, from various reasons, some attributes are not allowed or should not include special characters, and also this aspect could be included in a Data Quality initiative.

During assessment of Data Quality, in an organized or less organized manner, a first step resides in understanding the quality of the data. In the current case this resumes primarily in identifying how many records contain special characters, how could be the data cleaned, and the “cost” for this activity. Actually, before going this far, must be defined the character sets included in special characters, the definition could vary from case to case. For example in some cases could be considered also the space or the important punctuation signs as valid characters, while in others they may not be allowed. 

There could be identified thus multiple scenarios, though I found out that the range of characters a-z, A-Z, 0-9 and the space are considered as valid character in most of the cases. For this purpose could be built a simple function that iterates through all the characters of a string and identifies if there is any character not belonging to the before mentioned range of valid characters. In order to address this, a few years back I built a function similar with the below one:

-- checks if a string has special characters 
CREATE FUNCTION dbo.HasSpecialCharacters( 
@string nvarchar(1000)) 
RETURNS int 
AS  
    BEGIN 
         DECLARE @retval int 
         DECLARE @index int 
         DECLARE @char nchar(1)  

         SET @retval = 0 
         SET @index = 1  

         WHILE (@index <= IsNull(len(@string), 0) AND @retval=0) 
        BEGIN  
           SET @char = Substring(@string, @index, @index+1) 
           IF NOT (ASCII(@char) BETWEEN 48 AND 57 -- numeric value 
             OR ASCII(@char) BETWEEN 65 AND 90 -- capital letters 
            OR ASCII(@char) BETWEEN 97 AND 122 -- small letters 
            OR ASCII(@char) = 32) --space 
           BEGIN 
                SET @retval = @index 
           END 
           ELSE 
               SET @index = @index + 1  
     END  

    RETURN (@retval) 
END 

Function’s logic is based on the observation that the ASCII of numeric values could be found in the integer interval between 48 and 57, the capital letters between 65 and 90, while the small letters between 97 and 122. By adding the ASCII for space and eventually several other characters, the check on whether an character is valid resuming thus to only 4 constraints. Here’s the function at work:  

-- testing HasSpecialCharacters function 
SELECT dbo.HasSpecialCharacters('kj324h5kjkj3245k2j3hkj342jj4') Example1 
, dbo.HasSpecialCharacters('Qualität') Example2 
, dbo.HasSpecialCharacters('Änderung') Example3 
, dbo.HasSpecialCharacters('') Example4 
, dbo.HasSpecialCharacters(NULL) Example5 
, dbo.HasSpecialCharacters('Ä') Example6 
, dbo.HasSpecialCharacters('ä') Example7 
, dbo.HasSpecialCharacters('a') Example8 
 
special characters 1

As can be seen, the function returns the position where a special character is found, fact that enables users to identify the character that causes the problem. A similar function could be built also in order to count the number of special characters found in a string, the change residing in performing a counter rather then returning the position at the first occurrence of a special character.

The function might not be perfect though it solves the problem. There are also other alternatives, for example of storing the special characters in a table and performing a simple join against the target table. Another solution could be based on the use RegEx functionality, either by using OLE automation or directly CLR functionality. There could be done variations on the above solution too by limiting to check on whether the characters of a string are falling in the range projected by the ASCII function. That’s what the following function does:  

-- checks if a string has special characters falling in an interval  
CREATE FUNCTION dbo.HasCharNotInASCIIRange( 
@string nvarchar(1000) 
, @start int 
, @end int) 
RETURNS int 
AS  
BEGIN      
    DECLARE @retval int 
    DECLARE @index int 
    DECLARE @char nchar(1) 
 
    SET @retval = 0 
    SET @index = 1 
    WHILE (@index <= IsNull(len(@string), 0) AND @retval=0) 
    BEGIN  
         SET @char = Substring(@string, @index, @index+1) 
         IF NOT (ASCII(@char) BETWEEN @start AND @end)  
        BEGIN 
              SET @retval = @index 
        END 
        ELSE 
             SET @index = @index + 1  
     END 
     RETURN (@retval) 
END 

With this function are necessary 4 calls in order to identify if a string contains special characters, though we loose the flexibility of identifying the first character that is invalid. We could still identify the first occurrence by taking the minimum value returned by the 4 calls, however, unlike Oracle (see Least function), SQL Server doesn’t have such a function, so we’ll have eventually to built it. Anyway, here’s the above function at work:  

-- testing HasCharNotInASCIIRange function 
SELECT dbo.HasCharNotInASCIIRange('k12345', 48, 57) Example1 
, dbo.HasCharNotInASCIIRange('12k345', 48, 57) Example2 
, dbo.HasCharNotInASCIIRange('12345', 48, 57) Example3 
, dbo.HasCharNotInASCIIRange(' 12345', 48, 57) Example4 
, dbo.HasCharNotInASCIIRange('12345 ', 48, 57) Example5 
, dbo.HasCharNotInASCIIRange('', 48, 57) Example6 
, dbo.HasCharNotInASCIIRange(NULL, 48, 57) Example7 
, dbo.HasCharNotInASCIIRange('', 48, 57) Example8 
, dbo.HasCharNotInASCIIRange('a', 48, 57) Example9 
, dbo.HasCharNotInASCIIRange('Ä', 48, 57) Example10 
, dbo.HasCharNotInASCIIRange('ä', 32, 32) Example11 

special characters 2


Notes:
The queries work also in SQL databases in Microsoft Fabric

Happy coding!

07 December 2010

💎SQL Reloaded: Pulling the Strings of SQL Server I 7- Introduction

    The (character) string or simply the character data type, how is named by the MSDN documentation, is one of the primary data types available in a database and probably the most complex given the fact that it can encompass any combination of numeric, literal, date or any other primary data type. In fact it could include any type of chunk of text that could be written in any language as SQL Server supports Unicode and thus most of the (written) languages. In this post I’m not intending to make a complete anthology of strings, and neither to retake the whole volume of information available in MSDN or other important books. My intent is to look at strings from a slightly different perspective, considering the various functions involving strings and how they could be used in order to provide various functionality, in fact the cornerstone of everyday developer.

   A few things to remember:

1. there were mainly two types of non-unicode strings: the char (or character) of fixed length, respectively the varchar of variable length (varying character)

2. if initially both types of strings were having a maximum length of 8000 of characters, with SQL Server it’s possible to have a varchar with maximum storage size, declared as varchar(max).

3. if in the past there were some strict recommendations in what concerns the use of char or varchar, nowadays the varchar tends to be used almost everywhere, even for the strings of length 1.

4. talking about length, it denotes the number of chracters a string stores.

5. the trailing spaces, the spaces found at the right extremity of a string are typically ignored, while the leading spaces, the spaces found at the left extremity, are not ignored.

6. starting with SQL Server 2000, for the two character data types were introduced the corresponding unicode data types prefixed with n (national): nchar, respectively nvarchar.

7. given the fact that a unicode character needs more space to store the same non-unicode string, actually the number of bits doubles, the maximum length for an unicode string is only 4000.

8. there is also a non-unicode text, respectively ntext unicode data type, designed to store maximum length, though as it seems they could become soon deprecated, so might be a good idea to avoid it.

9. not-initialized variables, including strings, have the value NULL, referred also the NULL string, therefore it’s always a good idea to initialize your variables.

10. by empty string string is designated the string containing no character “’’”, and has the length 0.

11. there are several specific functions available for the creation, manipulation and conversion of strings from and to other data types.

12. not all of the aggregated functions work with string values (specifically the ones requesting a number value like SUM, AVG, STDV).

13. the operations performed on strings of different data types are generally not impacted by this aspect, though there are some exceptions.

14. there are several (basic) operations with strings, typically concatenation, extraction of subpart of a string, insertion, replacement or deletion of characters, rearrangement of string’s characters, trimming, splitting it in substrings (decomposition), etc.

15. there are several numeric values based on strings: length, position of a given text in a text, number of not empty characters, encoding of a character, on whether the text represents a valid numeric or date values, etc.

05 December 2010

🔦Process Management: Business Process Management (Definitions)

"The general term for the services and tools that support explicit process management (such as process analysis, definition, execution, monitoring, and administration), including support for human and application-level interaction." (Tilak Mitra et al, "SOA Governance", 2008)

"Software that models an enterprise's human and machine tasks and the interactions between them as processes and can monitor these tasks in real time in order to trigger a unit of work or set off an alert when specified time limits are exceeded or a response is not received within a specified time." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"Software products that support the design, execution, and monitoring of repetitive, day-to-day business processes. Can create data used in diagnostic or interactive control systems." (Leslie G Eldenburg & Susan K Wolcott, "Cost Management 2nd Ed", 2011)

"A popular management technique that includes methods and tools to support the design, analysis, implementation, management, and optimization of operational business processes." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Management approach focused on aligning all aspects of an organization with the wants and needs of clients. Business process management attempts to improve processes continuously and may be therefore described as a “process optimization process.” Business process management activities can be grouped into five categories: design, modeling, execution, monitoring, and optimization." (IQBBA, "Standard glossary of terms used in Software Engineering", 2011)

"A managerial discipline that is focused on execution. It is the art and science of how organizations do things and how they can do them better. BPM attempts to optimize process performance to achieve strategic business objectives consistently while adapting, when necessary, to change or to new opportunity." (Carl F Lehmann, "Strategy and Business Process Management", 2012)

"Managing the work steps and business activities of an organization's workers in an automated way." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"A discipline focused on continuous improvement and transformation of end-to-end cross-functional business processes." (Forrester)

"Business process management (BPM) is a discipline that uses various methods to discover, model, analyze, measure, improve and optimize business processes. A business process coordinates the behavior of people, systems, information and things to produce business outcomes in support of a business strategy. Processes can be structured and repeatable, or unstructured and variable." (Gartner)

"Business process management (BPM) is the discipline of improving a business process from end to end by analyzing it, modelling how it works in different scenarios, executing improvements, monitoring the improved process and continually optimizing it." (Techtarget)

"Business process management (BPM) involves the use of appropriate tools and techniques to design, analyse, and manage operational business processes and, where possible, to improve those processes. The term business process refers to repetitive activities performed in the context of an organization’s normal, everyday operations." (Institute for Competitive Intelligence)

💎🏭SQL Reloaded: A Look into Reverse

Some time back I was mentioning Stuff as one of the functions rarely used in SQL Server. Here’s another one: Reverse. I remember I used such a function in other programming languages and maybe 1-2 times in SQL Server, and I can’t say I seen it in action in other posts. As it name states and the MSDN documentation confirms, Reverse returns the reverse of a string value. The function takes as parameter any string value or value that could be implicitly conversed to a string, otherwise you’ll have to do an explicit conversion. As can be seen from the below first example the reverse of the “string” string is “gnirts”, and, as per the second example, the reverse of the revers of a string is the initial string itself. The third and fourth example use in exchange a numeric value.
 
-- reversing a string using Reverse function 
SELECT Reverse('string') Example1 
, Reverse(Reverse('string')) Example2 
, Reverse(1234567890) Example3 
, Reverse(Reverse(1234567890)) Example4 
Example1 Example2 Example3 Example4
gnirts string 987654321 1234567890

It seems there is not much to be said about Reverse function, and in the few situations that you need it, it’s great to have it, otherwise you would have to built it yourself. Let’s try to imagine there is no Reverse function in SQL Server, how would we provide such functionality by using existing functionality? In general such an exercise might look like lost time, though it could be useful in order to present several basic techniques. In this case the Reverse functionality could be provided by using a simple loop that iterates through string’s characters changing their order. The below piece of code should return the same output as above:

-- reversing a string using looping 
DECLARE @String varchar(50) 
DECLARE @Retval varchar(50) 
DECLARE @Index int 
SET @String = 'string' 
SET @Index = 1 
SET @Retval = '' 
WHILE @Index <= Len(@String) 
BEGIN  
   SET @Retval = Substring(@String, @Index, 1) + @Retval   
   SET @Index = @Index + 1  
END 
SELECT @Retval 

Some type of problems whose solution involve the sequential processing of values within loops could be solved also using recursive techniques (recursion) that involve the sequential processing of values in which the output of a given intermediary step is based on the previous step. Recursion imply a slightly different view of the same problem, it could prove itself to be maybe less efficient in some cases and quite a useful technique in the others. SQL Server provides two types of recursion, one at function level involving functions, stored procedures are triggers, respectively at query level implemented in common table expressions. Recursive functions, functions that call themselves, have been for quite a while in SQL Server though few of the database books I read really talk about them. Probably I will detail the topic in another post, though for the moment I will show only the technique at work. In this case the recursive approach is quite simple:

-- reversing a string using a recursive function 
CREATE FUNCTION dbo.ReverseString( 
@string varchar(50)) 
RETURNS varchar(50) 
AS 
BEGIN           
     RETURN (CASE WHEN Len(@string)>1 THEN dbo.ReverseString( Right(@string, len(@string)-1)) + Left(@string, 1) ELSE @string END) 
END 
 
-- testing 
SELECT dbo.ReverseString('string') Example1 
, dbo.ReverseString(dbo.ReverseString('string')) Example2 

Common table expressions (CTE), since their introduction with SQL Server 2005, became quite an important technique in processing hierarchical structures like BOMs, which involve the traversal of the whole tree. A sequence is actually a tree having the width of 1 node, so it seems like a CTE’s job:
  
-- reversing a string using common table expression 
DECLARE @String varchar(50) 
SET @String = 'string' 
;WITH CTE(String, [Index]) 
AS (     SELECT Cast('' as varchar(50)) String 
    , 1 [Index] 
    UNION ALL 
    SELECT Cast(Substring(@String, [Index], 1) + String as varchar(50)) String 
    , [Index] + 1 [Index] 
    FROM CTE 
    WHERE [Index]<=Len(@String) 
) 
SELECT @String Reversed 
FROM CTE 
WHERE [Index]-1=Len(@String) 

Unlike the previous two solutions, the CTE approach involves more work, work that maybe isn’t required just for the sake of using the technique. When multiple alternative approaches exist, I prefer using the (simplest) solution that offers the highest degree of flexibility. In this case is the loop, though in others it could be a recursive function or a CTE.

Notes:
The queries work also in SQL databases in Microsoft Fabric.

Happy coding!
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.