20 November 2005

DBMS: Who's gonna rule the world?

    This morning I started to read an article from Information Week about the future of database administrators, as it seems they will rule the world. Is not a joke, their role will become more and more important over the next years, why this? 

 1. Databases become bigger and more complex      
The size and complexity of databases is increasing from year to year, same their importance in making faster decision based on current or historical data. Bigger the size and complexity, bigger also the need for ad-hoc or one mouse click reports, and who’s the person who has the knowledge and resources to do this?! 

2. Data mining      
Historical data doesn't resume only to simple reporting, but data mining is becoming more popular, having as purpose the discovery of trends in the business, facilitating somehow predictions for business' future.     
Let’s not forget that the big players on the database market started to offer data mining solutions, incorporated or not in their database platforms, the data mining features are a start in providing new views over the data.      
Big companies will opt probably for data analysts, but for medium-sized and small companies, the role of a data analyst could be easy taken by a database administrator. 

3. The knowledge that matters      
Even if databases are backend for complex UI, in time, if not immediately, the database administrator becomes aware of underlaying overall structure of the database and, why not, leaded by curiosity or business' requirements he becomes aware of the content of the data. Even if is a paradox, the database administrator has sometimes more data knowledge than a Manager and the skills to do with data that magic that is important for Managers. 

4. Data Cleansing      
More and more companies are becoming aware of the outliers from their data, so data cleansing is becoming a priority, and again the database administrator has an important role in this process. 

5. Database Security      
Database platforms are growing, unfortunately same thing happens with their security holes and the number of attacks. A skilled database administrator must have the knowledge to overcome these problems and solve the issues as soon as possible. 

6. Training      
Even if can be considered a consequence of the previous enumerated facts, it deserves to be considered separately. Companies start to realize how important it to have a skilled database administrator, so probably they will provide the requested training or help the DBA to achieve this. 

 7. On site/off shore compromise      
If in the past years, has been en vogue the concept of off shoring people, including database administrators; probably soon companies will become aware that reducing direct maintenance costs doesn’t mean a reduction of overall costs, which may contain also indirect costs. The delayed response time generated by communication issues, availability or resources plays an important role in the increase indirect costs, most of the big companies will be forced then to opt for a compromise between on site and off shore.      
In this case the direct benefit for a database administrator is not so obvious, except the financial benefit there is also an image advantage, even if is disregarded, the image and the fact a person is not anymore isolated is important.

19 November 2005

SQL Reloaded: Cursors and Lists

    I found cursors really useful when the set-based logic provided by a query make it difficult to solve special types of problems (e.g. concatenate in a list the values returned by a select, multiple updates/insert/deletions based on the success or failure of previous logic). Cursors are relatively easy to write though lot of code is redundant from one solution to another. Therefore, I prefer to have the simplest code and then modify it according to new requirements.   

   Here's a simple example of a function that returns a list of emails:

--creating the sample table
CREATE TABLE EmailAddresses(ID int, Email varchar(50))

--insert test records
INSERT EmailAddresses
VALUES (1, 'John.Travolta@star.com')
INSERT EmailAddresses
VALUES (2, 'Robert.DeNiro@star.com')
INSERT EmailAddresses
VALUES (3, 'MegRyan@star.com')
INSERT EmailAddresses
VALUES (4, 'Helen.Hunt@star.com')
INSERT EmailAddresses
VALUES (5, 'Jodie.Foster@star.com')

-- creating the function 
CREATE FUNCTION dbo.GetEmails()
RETURNS varchar(1000)
/*
Purpose: returns a list of Emails
Parameters:
Notes:
Sample: SELECT dbo.GetEmails() AS ListEmails
*/
AS
BEGIN
   DECLARE @Email varchar(50)
   DECLARE @Emails varchar(1000)

   SET @Emails = ''

   -- Create Emails Cursor
   DECLARE Emails CURSOR FOR
   SELECT Email
   FROM EmailAddresses
   ORDER BY Email

   OPEN Emails -- Open Emails Cursor

   --fetch first set of records from Emails Cursor
   FETCH NEXT FROM Emails
   INTO @Email

   WHILE @@FETCH_STATUS = 0 --if the fatch was successful
   BEGIN
      SET @Emails = @Emails + @Email + ','

      --fetch next set of records from Emails Cursor
      FETCH NEXT FROM Emails
      INTO @Email
   END

   CLOSE Emails -- close Emails cursor
   DEALLOCATE Emails --deallocate Emails cursor

   --remove the extra comma
   IF Len(@Email)>0
      SET @Emails = Left(@Emails, Len(@Emails)-1)

   RETURN @Emails
END
 
--testing the function
SELECT dbo.GetEmails()

Notes:
The code was tested on SQL Server 2000 till 2017.
The logic from dbo.GetList function can use as source any other table as long the length of the target column is less than 50 characters as the @Result was defined as varchar(50).

     Exploring the above idea, what if the content of the query is not known until runtime and all is known is that only one column of data is returned? This could be achieved with the help of a table data type:

--creating the source table
CREATE TABLE Countries(ID int, Country varchar(50))

--insert test records
INSERT Countries
VALUES (1, 'US')
INSERT Countries
VALUES (2, 'UK')
INSERT Countries
VALUES (3, 'Germany')
INSERT Countries
VALUES (4, 'Spain')
INSERT Countries
VALUES (5, 'France')

 
-- creating the function 
CREATE FUNCTION dbo.GetList(
@Query varchar(250))
RETURNS varchar(1000)
/*
Purpose: returns a list of List
Parameters:
Notes:
Sample: SELECT dbo.GetList('SELECT Country FROM Countries ORDER BY Country') AS List
*/

AS
BEGIN
   DECLARE @Result varchar(50)
   DECLARE @List varchar(1000)
   DECLARE @Temp TABLE (Val varchar(10))

   SET @List = ''

   --insert in a table data type the results returned by query
   INSERT @Temp
   EXEC (@Query)

   -- Create List Cursor
   DECLARE List CURSOR FOR
   SELECT Val
   FROM @Temp

   OPEN List -- Open List Cursor

   --fetch first set of records from List Cursor
   FETCH NEXT FROM List
   INTO @Result

   WHILE @@FETCH_STATUS = 0 --if the fatch was successful
   BEGIN
      SET @List = @List + @Result + ','

      --fetch next set of records from List Cursor
      FETCH NEXT FROM List
      INTO @Result
   END

   CLOSE List -- close List cursor
   DEALLOCATE List --deallocate List cursor

   --remove the extra comma
   IF Len(@Result)>0
      SET @List = Left(@List, Len(@List)-1)

   RETURN @List
END

--testing the function 
SELECT dbo.GetList('SELECT Country FROM Countries ORDER BY Country') AS List

Result: List --------------------------
France,Germany,Spain,UK,US (1 row(s) affected)   

   Instead of using a query one can use a stored procedure as well. Here's the stored procedure and the example that calls it:

--creating the stored procedure
CREATE PROCEDURE dbo.pListCountries
AS
SELECT Country
FROM Countries
ORDER BY Country
 
--testing the function
SELECT dbo.GetList('dbo.pListCountries') AS List

Result: List
--------------------------
France,Germany,Spain,UK,US (1 row(s) affected)  

Notes:
As it seems the above code stopped working between the next editions of SQL Server.

SQL Server Troubleshooting: Problems in SQL Server 2000 DTS Packages

    There are cases in which a problem can not be solved with a query, additional data processing being requested. In this category of cases can be useful to use a temporary table or the table data type. Once the logic built you want to export the data using a simple DTS project having as Source the built query.

    While playing with SQL Server 2005 (Beta 2) I observed that the DTS Packages have problem in handling temporary tables as well table data types.

1. Temporary table Sample:

CREATE PROCEDURE dbo.pTemporaryTable
AS
CREATE TABLE #Temp(ID int, Country varchar(10)) -- create a temporary table

-- insert a few records
INSERT #Temp VALUES (1, 'US')
INSERT #Temp VALUES (2, 'UK')
INSERT #Temp VALUES (3, 'Germany')
INSERT #Temp VALUES (4, 'France')   

SELECT * FROM #Temp -- select records

DROP TABLE #Temp  drop the temporary table
 
-- testing the stored procedure
EXEC dbo.pTemporaryTable
Output:
ID Country
    ----------- ----------
1 US
2 UK
3 Germany
4 France
(4 row(s) affected)

Trying to use the stored procedure as Source in a DTS package brought the following error message: Error Source: Microsoft OLE DB Provider for SQL Server Error Description: Invalid object name ‘#Temp’   

 I hoped this has been fixed in SQL Server 2005 (Beta 3) version, but when I tried to use the stored procedure as a Source I got the error:
Error at Data Flow Task [OLE DB Source [1]]: An OLE DB error has occurred. Error code: 0x80004005 An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80004005 Description: "Invalid object name '#Temp'."
Error at Data Flow Task [OLE DB Source [1]]: Unable to retrieve column information from the data source. Make sure your target table in the database is available. ADDITIONAL INFORMATION: Exception from HRESULT: 0xC020204A (Microsoft.SqlServer.DTSPipelineWrap)

2. Table data type sample:

CREATE PROCEDURE dbo.pTableDataType
AS
DECLARE @Temp TABLE (ID int, Country varchar(10)) -- create table

-- insert a few records
INSERT @Temp VALUES (1, 'US')
INSERT @Temp VALUES (2, 'UK')
INSERT @Temp VALUES (3, 'Germany')
INSERT @Temp VALUES (4, 'France')   

SELECT * FROM @Temp -- select records

-- testing the stored procedure
EXEC dbo.pTableDataType

Output:
ID Country
   ----------- ----------
1 US
2 UK
3 Germany
4 France
(4 row(s) affected)    

This time the error didn't appear when the Source was provided, but when the package was run: Error message: Invalid Pointer    

In Server 2005 (Beta 3) it raised a similar error:
SSIS package "Package6.dtsx" starting. Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning Information: 0x40043006 at Data Flow Task, DTS.Pipeline: Prepare for Execute phase is beginning Information: 0x40043007 at Data Flow Task, DTS.Pipeline: Pre-Execute phase is beginning SSIS package "Package6.dtsx" finished: Canceled.    

 I saw there will be problems because in Flat File Connection Manager’s Preview no data were returned. I hope these problems were solved in the last SQL Server Release, if not we are in troubles!

Possible Solutions:    
If the Stored Procedures provided above are run in SQL Query Analyzer or using ADO, they will work without problems. This behavior is frustrating, especially when the logic is really complicated and you put lot of effort in it; however there are two possible solutions, both involving affordable drawbacks:
1. Create a physical table manually or at runtime, save the data in it and later remove the table. This will not work for concurrent use (it works maybe if the name of the table will be time stamped), but for ad-hoc reports might be acceptable.
2. Built an ADO based component which exports the data to a file in the format you want. The component is not difficult to implement, but additional coding might be requested for each destination type. The component can be run from a UI or from a DTS package.
3. In SQL Server Yukon is possible to use the CLR and implement the logic in a managed stored procedure or table valued function.

Happy Coding!

18 November 2005

SQL Reloaded: Prime Numbers

I was just thinking about prime numbers and why people had such a fascination about them. In mysticism every number has a powerful meaning, a power coming from inside, and therewith attracts certain powers from exterior, is strange, hard to understand all this, especially how all numbers can be reduced to a number between 1 and 9.  Every philosophy has a little truth in it. 
 
   I was trying to find a nice article on number reduction, for example the one from sourceryforge could be useful. Last week I implemented a function which returns the reduction of any sequence, could be date or number. I was intending to apply it on prime numbers, but even if can be observed certain similarities between different chains, is difficult to believe this will lead anywhere. CREATE FUNCTION dbo.NumberReduction( @text varchar(50)) RETURNS smallint /* Purpose: reduces to number a sequence of digits Parameters: @text varchar(50) - digits sequence Notes: the function works with any sequnce of alphanumeric, not numeric values being ignored Sample: SELECT dbo.NumberReduction('11/17/2005') SELECT dbo.NumberReduction('1234') SELECT dbo.NumberReduction('1234 8993 90003 3456') */ BEGIN    DECLARE @index smallint    DECLARE @total int    SET @index = 1    SET @total = 0    --for each char in alphanumeric sequence    WHILE (@index<=len(@text))    BEGIN          IF IsNumeric(Substring(@text, @index, 1)) = 1          SET @total = @total + Cast(Substring(@text, @index, 1) as smallint)       SET @index = @index+1    END    --if the number is not in 1..9 range, it calls further the NumberReduction for the new total    IF len(@total)>1       SET @total = dbo.NumberReduction(@total)    RETURN @total END Now, returning to prime numbers, I tried to generate the prime numbers between 1 and 1000000 and analize how many prime numbers are for each thousand. SELECT PrimeNumber/1000 DataSet , count(*) NumberPrimeNumbers FROM PrimeNUmbers GROUP BY PrimeNumber/1000 ORDER BY DataSet As it seems the maximum number of prime numbers is reached for the first thousand and decreases. DataSet*1000 Number Prime Numbers -------------------- ------------------ 0 (2..1000) 168 1 (1001..2000) 135 2 (2001..3000) 127 3 ........... 120 4 ........... 119 5 ........... 114 6 ........... 117 7 ........... 107 8 ........... 110 9 ........... 112 10 .......... 106 11 .......... 103 12 .......... 109 13 .......... 105 14 .......... 102 15 .......... 108 16 .......... 98 17 .......... 104 18 .......... 94 19 .......... 104 20 .......... 98 ........................... 980 ......... 67 981 ......... 75 982 ......... 70 983 ......... 70 984 ......... 70 985 ......... 76 986 ......... 76 987 ......... 63 988 ......... 71 989 ......... 72 990 ......... 71 991 ......... 79 992 ......... 65 993 ......... 68 994 ......... 78 995 ......... 69 996 ......... 69 997 ......... 83 998 ......... 74 999 ......... 65 That's all for today, is hard to believe somebody will read this :) today, anyway a nice weekend to everybody!

SQL Reloaded: To be or not to be a Prime

   According to Wolfram a prime number is "any positive integer p>1 that has no positive integer divisors other than 1 and p itself. In other words there is no positive integer numbers p>m>1 that p = m*n, n>1. So, to check whether p is prime all we have to do is to check if any number m between 2 and p-1 is not a divisor for p.    

   If m is not a divisor for p then we could write: p/m = n + r, r> 0, this is in mathematics, but in programming even if is SQL, C++ or any programming language, when working with integers, by p/m is understood n while by p mod m is r.

   Respecting the order of operations (p/m)*m = p only if m is a divisor for p. Now, is it really necessary to check all the numbers between 2 and p-1 if they are or not divisors for p? Probably not, when m increases, p/m decreases, and also the range of possible divisors, which is between m+1 and p/m-1 until m becomes greater or equal with p/m. m=p/m only when m is the square from p, so is necessary and sufficient to check only the values between 2 and √p.    

    To summarize, if all the numbers m betwen 2 and √p are not divisors of p then p is prime. This is the test approach used in IsPrime function:

CREATE FUNCTION dbo.IsPrime(
 @number bigint)
RETURNS smallint
/*
   Purpose: checks if a positive integer is prime number
   Parameters: @number bigint - the number to be checked if prime
   Notes: 
   Sample: SELECT dbo.IsPrime(1000)
  SELECT dbo.IsPrime(947)
  SELECT dbo.IsPrime(111913)
  SELECT dbo.IsPrime(113083)
*/
BEGIN
    DECLARE @index bigint
 DECLARE @result bit

 SET @index = 2
    SET @result = 1

 WHILE (@index<=sqrt(@number))
 BEGIN
  IF (@number = (@number/@index)*@index)
   SET @result = 0 

  SET @index = @index+1
 END

 RETURN @result
END

The pListPrimeNumbers will be used to test the function:

CREATE PROCEDURE dbo.pListPrimeNumbers(
  @startInterval bigint
, @endInterval bigint)
AS
/*
   Purpose: lists the prime numbers from a the [@startInterval, @endInterval] interval 
   Parameters: @startInterval bigint - the left side of the interval in which will be searched the prime numbers
      @endInterval bigint - the right side of the interval in which will be searched the prime numbers
   Notes: 
   Sample: EXEC dbo.pListPrimeNumbers 900, 1000
           EXEC dbo.pListPrimeNumbers 111900, 112900
*/
DECLARE @index bigint
SET @index = @startInterval

--for each integer in interval
WHILE (@index<=@endInterval)
BEGIN  
    -- list a number if prime 
 IF (dbo.IsPrime(@index)=1)
  SELECT @index [Prime Numbers]

 SET @index = @index+1 
END
 
   One can use a simple table to store the results, which will allow further analyssis of the data:

-- creating the table 
CREATE TABLE PrimeNumbers (PrimeNumber bigint)

-- inserting the data 
INSERT PrimeNumbers
EXEC dbo.pListPrimeNumbers 1, 10000

Notes:    
The utility of IsPrime function is relative, primarily because there are not many the cases when such a function is needed in SQL, and secondarily because the performance gets poorer with the number tested. Programming languages like VB.Net or C# are more appropriate for such tasks. Initially I wanted to test the performance of implementing IsPrime in SQL vs implementing it as a managed function, but an error stopped me from the attempt:
Msg 6505, Level 16, State 1, Procedure IsPrimeNumber, Line 1 Could not find Type 'UserDefinedFunctions' in assembly 'MyLibrary'.    

3 months ago the same piece of code worked without problems, but since then I had to reinstall the SQL Server because I couln't use DTS packages after I installed LINQ preview. Welcome to the "real" world of Microsoft :(.

Happy Coding!

14 November 2005

SQL Reloaded: Out of Space

   Some time ago I found a post in which somebody was annoyed of writing code like this:

DECLARE @zipcode char(5)

SET @zipcode = '90'

SELECT CASE len(@zipcode)
 WHEN 1 THEN '0000' + @zipcode
 WHEN 2 THEN '000' + @zipcode
 WHEN 3 THEN '00' + @zipcode
 WHEN 4 THEN '0' + @zipcode
 WHEN 5 THEN @zipcode
END

   I remembered I met the same situation, but instead of writing a line of code for each case, I prefered to write a simple function, which used the Space and Replace built-in functions to add first a number of spaces and then replace them with a defined character. Maybe somebody will argue about the performance of such a function, but I prefer in many cases the reusability against performance, when the difference is not that big. Additionally, this enhances code’s readability.    

So, here is the function:

CREATE FUNCTION dbo.FillBefore(
   @target varchar(50)
 , @length int
 , @filler char(1))
/*
   Purpose: enlarges a string to a predefiend length by puting in front of it the same character
   Parameters: @target varchar(50) - the string to be transformed
  , @length int - the length of output string
  , @filler char(1) - the character which will be used as filler 
   Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL
*/
RETURNS varchar(50)
AS
BEGIN
   RETURN Replace(Space(@length-len(IsNull(@target, ''))), ' ', @filler) 
       + IsNull(@target, '')
END

-- testing the function
SELECT dbo.FillBefore('1234', 5, '0')
SELECT dbo.FillBefore('1234', 10, ' ')
SELECT dbo.FillBefore(1234, 10, ' ')
SELECT dbo.FillBefore(Cast(1234 as varchar(10)), 10, '0')

  Another situation, when Space function was useful, was when I had to save the result of a query to a text file in fixed width format. It resumes to the same logic used in FillBefore function, except the spaces must be added at the end of the string. For this I just needed to modify the order of concatenation and create a new function:

CREATE FUNCTION dbo.FillAfter(
   @target varchar(50)
 , @length int
 , @filler char(1))
/*
   Purpose: enlarges a string to a predefiend length by adding at the end the same character
   Parameters: @target varchar(50) - the string to be transformed
  , @length int - the length of output string
  , @filler char(1) - the character which will be used as filler 
   Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL

*/
RETURNS varchar(50)
AS
BEGIN
   RETURN IsNull(@target, '') 
        + Replace(Space(@length-len(IsNull(@target,''))), ' ', @filler)
END

  -- testing the function 
  SELECT dbo.FillAfter('1234', 5, '0')
  SELECT dbo.FillAfter('1234', 10, ' ')
  SELECT dbo.FillAfter(1234, 10, '0')
  SELECT dbo.FillAfter(Cast(1234 as varchar(10)), 10, '0')
  SELECT dbo.FillAfter(NULL, 10, '0')

Notes:
1.    In SQL Server 2005, the output of a query can be saved using a DTS Package with fix format directly to a text file using ‘fixed width’ (the columns are defined by fix widths) or ‘ragged right’ (the columns are defined by fix widths, except the last one which is delimited by the new line character) format for the destination file.
2.    To document or not to document. I know that many times we don't have the time to document the database objects we created, but I found that a simple comment saved more time later, even if was about my or other developer's time.
3.    If is really needed to use the CASE function, is good do a analysis of the data and put as first evaluation the case with the highest probability to occur, as second evaluation the case with the second probability to occur, and so on. The reason for this is that each option is evaluated until the comparision operation evaluates to TRUE. For example, if the zip codes are made in most of the cases of 5 characters, then it makes sense to put it in the first WHEN position.

08 November 2005

Metablogging: Introduction

A warm welcome to my blog! 

I started this blog with the idea of documenting some of the issues I found when working with SQL Server, Oracle and other DBMS, respectively to share some of the techniques used in manipulating data via SQL or other means (including .Net, MS Access or Excel). Therefore, it was intended to be a blog on SQL, data, database, data processing and related topics, though with time I arrived to share my ideas on Business Intelligence, Data Warehousing, Data Management, ERP Systems, MS OfficeProject Management, Strategic Management, Performance Management, Programming, Data Science, Software Engineering, Project Management, and other topics directly or indirectly related to the work of a data professional. 

Even if I was planning to post periodically, life took over, the professional as well the personal life made it almost impossible do to that. The unavailability of an easy-to-use editor made blogging more complex than it should be, spending more time in formatting than in writing the code. In the end I found a mix of solutions I can work with.

In time, feeling the lack of meaningful and workable definitions, I felt the need to look also in how people define the various concepts related to the above mentioned topics. I started then to collect definitions and quotes that would help me better understand the respective concepts, definitions and quotes I will try to share with you over time. 

As for myself I have more than 20 years of professional experience in IT in the above mentioned areas.

Happy and errorless coding! 

Best regards,
                        Adrian
Related Posts Plugin for WordPress, Blogger...