Showing posts with label data conversion. Show all posts
Showing posts with label data conversion. Show all posts

10 April 2024

💎🏭SQL Reloaded: Microsoft Fabric's Delta Tables in Action - Import data from CSV Files into Delta Table

Microsoft provides a set of labs and exercises that can be used to learn working with data in Fabric, however the real learning comes when one considers an example that introduces something new. As I've downloaded some time ago an archive with several datasets on Sales forecast from the Kaggle website, I tried to import the Features dataset in different ways and see how it goes.

Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
1,2010-02-05,42.31,2.572,NA,NA,NA,NA,NA,211.0963582,8.106,FALSE
1,2010-02-12,38.51,2.548,NA,NA,NA,NA,NA,211.2421698,8.106,TRUE
1,2010-02-19,39.93,2.514,NA,NA,NA,NA,NA,211.2891429,8.106,FALSE
1,2010-02-26,46.63,2.561,NA,NA,NA,NA,NA,211.3196429,8.106,FALSE
1,2010-03-05,46.5,2.625,NA,NA,NA,NA,NA,211.3501429,8.106,FALSE

Within an existing lakehouse, one can import the CSV as it is via 'Files/Upload' and once the data is imported, once can navigate to the file and use 'Load to Tables/New Table' to important the data into a managed table. Unfortunately, because some of the numeric fields include also literal values "NA" for the columns with NULLs, their data type is considered as varchar(8000), which is not ideal for calculations:

-- table created via Load to Tables
CREATE TABLE [dbo].[walmart_features](
	[Store] [int] NULL,
	[Date] [date] NULL,
	[Temperature] [float] NULL,
	[Fuel_Price] [float] NULL,
	[MarkDown1] [varchar](8000) NULL,
	[MarkDown2] [varchar](8000) NULL,
	[MarkDown3] [varchar](8000) NULL,
	[MarkDown4] [varchar](8000) NULL,
	[MarkDown5] [varchar](8000) NULL,
	[CPI] [varchar](8000) NULL,
	[Unemployment] [varchar](8000) NULL,
	[IsHoliday] [bit] NULL
) ON [PRIMARY]

This could be fixed by replacing the NA values with an empty value, which I did and used this version for the next steps. 

I tried then using Spark to import the data, though then all the fields are defined as varchar(8000).

-- table created via Spark
CREATE TABLE [dbo].[walmart_features2](
	[Store] [varchar](8000) NULL,
	[Date] [varchar](8000) NULL,
	[Temperature] [varchar](8000) NULL,
	[Fuel_Price] [varchar](8000) NULL,
	[MarkDown1] [varchar](8000) NULL,
	[MarkDown2] [varchar](8000) NULL,
	[MarkDown3] [varchar](8000) NULL,
	[MarkDown4] [varchar](8000) NULL,
	[MarkDown5] [varchar](8000) NULL,
	[CPI] [varchar](8000) NULL,
	[Unemployment] [varchar](8000) NULL,
	[IsHoliday] [varchar](8000) NULL
) ON [PRIMARY]
GO

So, is needed to define the schema explicitly, however I had to import the IsHoliday as string and cast the value explicitly to a Boolean using a second data frame (see alternatives):

from pyspark.sql.types import *
from pyspark.sql.functions import *

#define schema
featuresSchema = StructType([
      StructField("Store", IntegerType())
    , StructField("Date", DateType())
    , StructField("Temperature",  DecimalType(13,2))
    , StructField("Fuel_Price", DecimalType(13,2))
    , StructField("MarkDown1", DecimalType(13,2))
    , StructField("MarkDown2", DecimalType(13,2))
    , StructField("MarkDown3", DecimalType(13,2))
    , StructField("MarkDown4", DecimalType(13,2))
    , StructField("MarkDown5", DecimalType(13,2))
    , StructField("CPI", DecimalType(18,6))
    , StructField("Unemployment", DecimalType(13,2))
    , StructField("IsHoliday", StringType())
])

# Load a file into a dataframe
df = spark.read.load('Files/OpenSource/features2.csv'
    , format='csv'
    , schema = featuresSchema
    , header=True)

# do the conversion for isHoliday
df2 = df.withColumn("IsHoliday", df.IsHoliday.cast(BooleanType())) # Save the dataframe as a delta table df2.write.format("delta").saveAsTable("walmart_features3")

Now, table's definition looks much better:

-- table created via Spark with explicit schema
CREATE TABLE [dbo].[walmart_features3](
	[Store] [int] NULL,
	[Date] [date] NULL,
	[Temperature] [decimal](13, 2) NULL,
	[Fuel_Price] [decimal](13, 2) NULL,
	[MarkDown1] [decimal](13, 2) NULL,
	[MarkDown2] [decimal](13, 2) NULL,
	[MarkDown3] [decimal](13, 2) NULL,
	[MarkDown4] [decimal](13, 2) NULL,
	[MarkDown5] [decimal](13, 2) NULL,
	[CPI] [decimal](18, 6) NULL,
	[Unemployment] [decimal](13, 2) NULL,
	[IsHoliday] [bit] NULL
) ON [PRIMARY]
GO

Comments:
(1) I tried to apply the schema change directly on the initial data frame, though the schema didn't change:

df.withColumn("IsHoliday", df.IsHoliday.cast(BooleanType()))

(2) For the third method one could have left the NA in because by the conversion a NULL will be considered. Conversely, it might be needed to check if there are other values that fail the conversion. 

(3) The following warning in the Notebook when running the above code is a hint that something went wrong during the conversion (e.g. decimals were cut): 

"Your file(s) might include corrupted records"

(4) Especially for the transformed values it makes sense to look at the values (at last when the dataset isn't too big):

-- validating the values for the Boolean data field
SELECT IsHoliday
, count(*) NoRecords
FROM dbo.walmart_features3
GROUP BY IsHoliday
ORDER BY 1

(5) The tables can be deleted directly in the lakehouse or via PySpark (observe the catalog.table_name):

#dropping the table
spark.sql('DROP TABLE IF EXISTS UAT.walmart_features3') 

At the beginning probably it makes sense to remove "IF EXISTS" to make sure that the table is available.

(6) For those who run into a similar issue, the SQL Endpoint for the lakehouse is read only, therefore attempting to delete a table via SSMS will result in such an error:

Drop failed for Table 'dbo.walmart_features2'.  (Microsoft.SqlServer.Smo)
The external policy action 'Microsoft.Sql/Sqlservers/Databases/Schemas/Tables/Drop' was denied on the requested resource.
Cannot drop the table 'walmart_features2', because it does not exist or you do not have permission.

Happy coding!

30 December 2020

🧊Data Warehousing: ETL (Part V: The Transform Subprocess)

Data Warehousing

As part of the ETL process, the Transform subprocess is responsible for bridging the gap between source and destination by leveraging SQL or the rich set of (data) transformations available in ETL tools, either to enable the implicit or explicit conversion between source and destination data types, or to transform the data as needed. 

Transformations act on data as operators, the challenge being to transform the data in the smallest number of steps in the most efficient way. Some of the transformations available in the ETL tools (e.g. conversions, sorting, sampling, joins, lookups, aggregation, pivoting, unpivoting) can be replaced by SQL-based logic. One can easily prepare the data directly in the extraction query, taking thus advantage of the power provided by the database engines. Moreover, the logic can be encapsulated in views or other objects and called as required by the extraction logic when the source database allows it. This approach allows maintaining the logic independently of the ETL packages.

Unfortunately, SQL can replace the transformations that address sequential logic and not workflow-related logic (e.g. conditional splitd, merges, multicasts, slowly changing dimensions) or logic that includes certain computational complexity (e.g. fuzzy groupings or lookups). Such gaps need to be filled by the ETL tools via the built-in transformations, by allowing developers to build custom logic or simple use COTS solutions, when they prove capable of filling the gap. 

Copying the data 1:1 at table or entity-level from the source system(s) involves in theory the simplest transformations, transformations revolving mainly around conversions between data types. The casual troublemakers are the numeric and date values, which can be found in different formats or precisions in the various environments. As this can apply to the ETL environment itself, it’s important to consider environment-agnostic data types when possible (e.g. strings). 

Other sources for concerns are the user-defined data types which don’t have equivalents between the systems, needing thus additional transformations for further handing, respectively the invalid values which need to be handled accordingly. Besides the data from the source system(s) and the derived values, upon case one needs to consider the parameter-based or hardcoded metadata created in the process. 

Independently of the purpose of the ETL packages it is usually required to document the data flow associated with them and the rules applied in transformations in what is known as a mapping document. Such a document needs to be understandable by the business, as it can serve for Data Management, projects, or other purposes.  Even if it’s almost impossible to document everything, at minimum needs to be provided the source and destination tables, the attributes considered in the mappings, respectively the most important rules the business should be aware of. Otherwise, the technical people can always turn back to the SQL queries, when needed. 

Some sources consider each non-trivial transformation as a business rule. Even if the rules used in transformations constrain the (business) data, not each rule is relevant for the business to the degree that it constrains some part of the business.

Data Migrations involve transformations between (database) schemas. Therefore, the logic requested to move the data could be handled in theory with a few well-designed packages, though there are considerations like logic complexity, transparency, flexibility, performance or auditability which could be better handled by using other techniques (e.g. saving the data in intermediary tables, breaking down the logic in several steps). Such considerations can apply also to simple ETL packages. Therefore, it’s important to recognize such scenarios, weight the choices and choose what fits best. However, unless one knows what one’s doing, it’s recommended to use the methods one knows best. 

Previous Post <<||>> Next Post

24 February 2018

💎SQL Reloaded: Misusing Views and Pseudo-Constants

Views as virtual tables can be misused to replace tables in certain circumstances, either by storing values within one or multiple rows, like in the below examples:

-- parameters for a BI solution
CREATE VIEW dbo.vLoV_Parameters
AS
SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
 , Cast(GetDate() as Date) AS CurrentDate 
 , Cast(100 as int) AS BatchCount 

GO

SELECT *
FROM dbo.vLoV_Parameters

GO

-- values for a dropdown 
 CREATE VIEW dbo.vLoV_DataAreas
 AS
 SELECT Cast('ABC' as nvarchar(20)) AS DataAreaId
 , Cast('Company ABC' as nvarchar(50)) AS Description 
 UNION ALL
 SELECT 'XYZ' DataAreaId 
 , 'Company XYZ'

GO

SELECT *
FROM dbo.vLoV_DataAreas

GO

These solutions aren’t elegant, and typically not recommended because they go against one of the principles of good database design, namely “data belong in tables”, though they do the trick when needed. Personally, I used them only in a handful of cases, e.g. when it wasn’t allowed to create tables, when it was needed testing something for a short period of time, or when there was some overhead of creating a table for 2-3 values. Because of their scarce use, I haven’t given them too much thought, not until I discovered Jared Ko’s blog posting on pseudo-constants. He considers the values from the first view as pseudo-constants, and advocates for their use especially for easier dependency tracking, easier code refactoring, avoiding implicit data conversion and easier maintenance of values.

All these are good reasons to consider them, therefore I tried to take further the idea to see if it survives a reality check. For this I took Dynamics AX as testing environment, as it makes extensive use of enumerations (aka base enums) to store list of values needed allover through the application. Behind each table there are one or more enumerations, the tables storing master data abounding of them.  For exemplification let’s consider InventTrans, table that stores the inventory transactions, the logic that governs the receipt and issued transactions are governed by three enumerations: StatusIssue, StatusReceipt and Direction.

-- Status Issue Enumeration 
 CREATE VIEW dbo.vLoV_StatusIssue
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Sold
 , cast(2 as int) AS Deducted
 , cast(3 as int) AS Picked
 , cast(4 as int) AS ReservPhysical
 , cast(5 as int) AS ReservOrdered
 , cast(6 as int) AS OnOrder
 , cast(7 as int) AS QuotationIssue

GO

-- Status Receipt Enumeration 
 CREATE VIEW dbo.vLoV_StatusReceipt
 AS
SELECT cast(0 as int) AS None
 , cast(1 as int) AS Purchased
 , cast(2 as int) AS Received
 , cast(3 as int) AS Registered
 , cast(4 as int) AS Arrived
 , cast(5 as int) AS Ordered
 , cast(6 as int) AS QuotationReceipt

GO

-- Inventory Direction Enumeration 
 CREATE VIEW dbo.vLoV_InventDirection
 AS
 SELECT cast(0 as int) AS None
 , cast(1 as int) AS Receipt
 , cast(2 as int) AS Issue

 To see these views at work let’s construct the InventTrans table on the fly:

-- creating an ad-hoc table  
 SELECT *
 INTO  dbo.InventTrans
 FROM (VALUES (1, 1, 0, 2, -1, 'A0001')
 , (2, 1, 0, 2, -10, 'A0002')
 , (3, 2, 0, 2, -6, 'A0001')
 , (4, 2, 0, 2, -3, 'A0002')
 , (5, 3, 0, 2, -2, 'A0001')
 , (6, 1, 0, 1, 1, 'A0001')
 , (7, 0, 1, 1, 50, 'A0001')
 , (8, 0, 2, 1, 100, 'A0002')
 , (9, 0, 3, 1, 30, 'A0003')
 , (10, 0, 3, 1, 20, 'A0004')
 , (11, 0, 1, 2, 10, 'A0001')
 ) A(TransId, StatusIssue, StatusReceipt, Direction, Qty, ItemId)


 Here are two sets of examples using literals vs. pseudo-constants:

--example issued with literals 
SELECT top 100 ITR.*
 FROM dbo.InventTrans ITR
 WHERE ITR.StatusIssue = 1 
   AND ITR.Direction = 2

GO
 --example issued with pseudo-constants
 SELECT top 100 ITR.*
 FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusIssue SI
        ON ITR.StatusIssue = SI.Sold
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Issue

GO

--example receipt with literals 
 SELECT top 100 ITR.*
 FROM dbo.InventTrans ITR
 WHERE ITR.StatusReceipt= 1
   AND ITR.Direction = 1

GO

--example receipt with pseudo-constants
 SELECT top 100 ITR.*
 FROM dbo.InventTrans ITR
      JOIN dbo.vLoV_StatusReceipt SR
        ON ITR.StatusReceipt= SR.Purchased
      JOIN dbo.vLoV_InventDirection ID
        ON ITR.Direction = ID.Receipt

 
As can be seen the queries using pseudo-constants make the code somehow readable, though the gain is only relative, each enumeration implying an additional join. In addition, when further business tables are added to the logic (e.g. items, purchases or sales orders)  it complicates the logic, making it more difficult to separate the essential from nonessential. Imagine a translation of the following query:

-- complex query 
  SELECT top 100 ITR.*
  FROM dbo.InventTrans ITR
              <several tables here>
  WHERE ((ITR.StatusReceipt<=3 AND ITR.Direction = 1)
    OR (ITR.StatusIssue<=3 AND ITR.Direction = 2))
    AND (<more constraints here>)


The more difficult the constraints in the WHERE clause, the more improbable is a translation of the literals into pseudo-constraints. Considering that an average query contains 5-10 tables, each of them with 1-3 enumerations, the queries would become impracticable by using pseudo-constants and quite difficult to troubleshoot their execution plans.

The more I’m thinking about, an enumeration data type as global variable in SQL Server (like the ones available in VB) would be more than welcome, especially because values are used over and over again through the queries. Imagine, for example, the possibility of writing code as follows:

-- hypothetical query
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt = @@StatusReceipt .Purchased
  AND ITR.Direction = @@InventDirection.Receipt

From my point of view this would make the code more readable and easier to maintain. Instead, in order to make the code more readable, one’s usually forced to add some comments in the code. This works as well, though the code can become full of comments.

-- query with commented literals
SELECT top 100 ITR.*
FROM dbo.InventTrans ITR
WHERE ITR.StatusReceipt <=3  Purchased, Received, Registered 
   AND ITR.Direction = 1-- Receip

In conclusion, pseudo-constants’ usefulness is only limited, and their usage is  against developers’ common sense, however a data type in SQL Server with similar functionality would make code more readable and easier to maintain.


PS: It is possible to simulate an enumeration data type in tables’ definition by using a CHECK constraint.

15 March 2017

⛏️Data Management: Data Conversion (Definitions)

"The function to translate data from one format to another" (Yang Xiang & Daxin Tian, "Multi-Core Supported Deep Packet Inspection", 2010)

"1.In systems, the migration from the use of one application to another. 2.In data management, the process of preparing, reengineering, cleansing and transforming data and loading it into a new target data structure. Typically, the term is used to describe a one-time event as part of a new database implementation. However, it is sometimes used to describe an ongoing operational procedure." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"(1)The process of changing data structure, format, or contents to comply with some rule or measurement requirement. (2)The process of changing data contents stored in one system so that it can be stored in another system, or used by an application." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of automatically reading data in one file format and emitting the same data in a different format, thus making the data accessible to a wider range of applications." (Open Data Handbook)

"To change data from one form of representation to another; for example, to convert data from an ASCII representation to an EBCDIC representation." (IEEE 610.5-1990)

15 February 2010

💎SQL Reloaded: Oracle vs. SQL Server (Handling Missing Dates)

    What do we do with missing values? How do they influence the data analysis? Two questions each developer should ask users when creating/modifying a report. In general for numbers things are pretty simple, just replace the NULL values with a 0, this impacting the report minimally. With Dates is a little more complicated because maybe is needed to remove records having a certain date null, replace it with another date representing a certain event or maybe with the current date.

    As NULLIF, COALESCE and CASE functions are available in both systems, they could be used in particular to handle missing dates or other type of data types, especially when intended to make code portable between the two platforms. In addition SQL Server provides the IsNull function, the equivalent in Oracle being the NVL, extended by NVL2. Oracle provides also an equivalent for simple CASE function, namely the DECODE function. Here are some examples based on Product table from AdventureWorks database: 
  
-- SQL Server NULL handling 
SELECT ProductID, SellStartDate, SellEndDate, DiscontinuedDate, GetDate() CurrentDate 
, IsNull(SellEndDate, GetDate()) Example1 
, COALESCE(SellEndDate, GetDate()) Example3 
, COALESCE(DiscontinuedDate, SellEndDate, GetDate()) Example4 
, CASE 
     WHEN SellEndDate IS NULL THEN GetDate() 
     ELSE SellEndDate 
END Example6 
, NullIf(SellEndDate, SellStartDate) Example7 
FROM Production.Product 

-- Oracle NULL handling 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, DISCONTINUEDDATE, SYSDATE 
, NVL(SELLENDDATE, SYSDATE) EXAMPLE1 
, NVL2(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, COALESCE(SELLENDDATE, SYSDATE) EXAMPLE3 
, COALESCE(DISCONTINUEDDATE, SELLENDDATE, SYSDATE) EXAMPLE4 
, DECODE(SELLENDDATE, NULL, SYSDATE, SELLENDDATE) EXAMPLE5 
, CASE 
    WHEN SELLENDDATE IS NULL THEN SYSDATE 
    ELSE SELLENDDATE 
END EXAMPLE6 
, NULLIF(SELLENDDATE, SELLSTARTDATE) EXAMPLE7 
FROM PRODUCT 

      The simple CASE function can’t be use to check for NULL values unless the NULL value is implied within the ELSE branch, and this because the IS NULL clause needs to be used for checking whether a value is missing. An exception from this rule seems to be the DECODE function, as can be seen from 5th example from the Oracle-based query.

     Aggregated functions ignore NULL values in both systems, therefore any of the above techniques could be used to handle the missing values if other behavior is expected. The selection of the date used in case the target date is missing depends on the requirements, usually is used the current date when the reports focus on today’s status, while for the calculation of lead/cycle times is preferred to use the closest date to the target date. 

-- SQL Server: handling nulls in aggregations 
SELECT AVG(DateDiff(d, SELLSTARTDATE, IsNull(SELLENDDATE, GETDATE()))) -- handling null values 
, AVG(DateDiff(d,SELLSTARTDATE, SELLENDDATE)) -- ignoring null values 
FROM Production.Product 

-- Oracle: handling nulls in aggregations 
SELECT AVG(TRUNC(NVL(SELLENDDATE, SYSDATE)) - TRUNC(SELLSTARTDATE)) -- handling null values 
, AVG(TRUNC(SELLENDDATE) - TRUNC(SELLSTARTDATE)) -- ignoring null values 
FROM PRODUCT 

    Another scenario when missing date values need to be handled is in joins, often for such cases a missing value denoting that the record is still active, like in the below example: 

-- SQL Server: getting the Start Date for current List Price 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, ITM.ListPrice 
, PPH.StartDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND PPH.EndDate IS NULL 
   
   The above query could be rewritten also in the following form that can be useful to get the record in use during a certain event:

-- SQL Server: getting the List Price when the product was last time modified 
SELECT ITM.ProductID 
, ITM.ProductNumber 
, PPH.ListPrice 
, PPH.StartDate 
, PPH.EndDate 
, ITM.ModifiedDate 
FROM Production.Product ITM 
JOIN Production.ProductListPriceHistory PPH 
ON ITM.ProductID = PPH.ProductID 
AND DateDiff(d, ITM.ModifiedDate, IsNull(PPH.EndDate, GETDATE()))>=0 
AND DateDiff(d, PPH.StartDate, ITM.ModifiedDate)>=0 

    In Oracle special attention must be given to LEAST and GREATEST functions that consider the smallest/greatest values from a list of attributes because if one of the vales is NULL then function’s value is NULL too, therefore if one of the attributes could take NULL values then it should be replaced with the values from a NOT NULL attribute used in the functions, like in 3rd and 4th example from the below example, this change having no impact on functions’ output.

-- Oracle Greatest/Least example 
SELECT PRODUCTID, SELLSTARTDATE, SELLENDDATE, SYSDATE 
, GREATEST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE1 
, LEAST(SELLSTARTDATE, SELLENDDATE, SYSDATE) EXAMPLE2 
, GREATEST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE3 
, LEAST(SELLSTARTDATE, NVL(SELLENDDATE, SYSDATE), SYSDATE) EXAMPLE4 
FROM PRODUCT

💎SQL Reloaded: Oracle vs. SQL Server (Date Conversion)

    During data conversions, data migrations or also during simple processing of data is requested to format dates to a given format, extract a given time unit or convert a string to a date data type. Even if Oracle and SQL Server provides several functions for this purpose, there are small techniques that could help make things easier.

    In SQL Server the DatePart and DateName functions can be used to extract the various type of time units, the first function returning always an integer, while the second returns a character string, allowing thus to get the name of the current month or of the current day of the week, otherwise the output being quite similar. 
 
-- SQL Server DatePart 
SELECT GETDATE() CurrentDate 
, DatePart(ss, GETDATE()) SecondPart 
, DatePart(mi, GETDATE()) MinutePart 
, DatePart(hh, GETDATE()) MinutePart 
, DatePart(d, GETDATE()) DayPart 
, DatePart(wk, GETDATE()) WeekPart 
, DatePart(mm, GETDATE()) MonthPart 
, DatePart(q, GETDATE()) QuaterPart 
, DatePart(yyyy, GETDATE()) YearPart  
-- SQL Server DateName 
SELECT GETDATE() CurrentDate 
, DateName(ss, GETDATE()) SecondPart 
, DateName(mi, GETDATE()) MinutePart 
, DateName(hh, GETDATE()) MinutePart 
, DateName(d, GETDATE()) DayPart 
, DateName(wk, GETDATE()) WeekPart 
, DateName(mm, GETDATE()) MonthPart 
, DateName(q, GETDATE()) QuaterPart 
, DateName(yyyy, GETDATE()) YearPart , DateName(mm, GETDATE()) MonthName 
, DateName(dd, GETDATE()) DayName 

    SQL Server provides three quite useful functions for getting the Day, Month or Year of a given date: 
 
-- SQL Server alternative functions 
SELECT DAY(GetDate()) DayPart 
, MONTH(GetDate()) MonthPart , YEAR(GetDate()) YearPar 

    Oracle provides a more flexible alternative of DateName function, respectively the TO_CHAR function, that allow not only the extraction of the different time units, but also the conversion of a date to a specified format. 
  
-- Oracle Date parts 
SELECT to_char(SYSDATE, 'dd-mon-yyyy hh24:mi:ss') CurrentDate 
, to_char(SYSDATE, 'SS') SecondPart 
, to_char(SYSDATE, 'MI') MinutePart 
, to_char(SYSDATE, 'HH') HourPart 
, to_char(SYSDATE, 'DD') DayPart 
, to_char(SYSDATE, 'IW') WeekPart 
, to_char(SYSDATE, 'MM') MonthPart 
, to_char(SYSDATE, 'QQ') QuarterPart 
, to_char(SYSDATE, 'YYYY') YearPart 
, to_char(SYSDATE, 'MONTH') MonthName 
, to_char(SYSDATE, 'DAY') DayName 
FROM DUAL 
-- Oracle Date formatting 
SELECT to_char(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') CurrentDate 
, to_char(SYSDATE, 'Mon dd yyyy hh24:mi') USDateFormat 
, to_char(SYSDATE, 'mm/dd/yyyy') ANSIDateFormat 
, to_char(SYSDATE, 'yyyy.mm.dd') BritishDateFormat 
, to_char(SYSDATE, 'dd/mm/yyyy') GermanDateFormat 
, to_char(SYSDATE, 'dd-mm-yyyy') ItalianDateFormat 
, to_char(SYSDATE, 'yyyy/mm/dd') JapanDateFormat 
, to_char(SYSDATE, 'yyyymmdd') ISODateFormat 
, to_char(SYSDATE, 'dd Mon yyyy hh24:mi:ss') EuropeDateFormat 
, to_char(SYSDATE, 'yyyy-mm-dd hh24:mi:ss') ODBCDateFormat 
, Replace(to_char(SYSDATE, 'yyyy-mm-dd hh24:mi:ss'), ' ', 'T') ISO8601DateFormat 
FROM DUAL 7


Date Oracle fomatting
    Even if there are more plausible combinations, the above examples could be used as a starting point, they being chosen to match the similar functionality provided by SQL Server using the CONVERT function and styles. 
-- SQL Server date formatting 
SELECT GETDATE() CurrentDate 
, CONVERT(varchar(20), GETDATE(), 100) USDateFormat 
, CONVERT(varchar(20), GETDATE(), 101) ANSIDateFormat 
, CONVERT(varchar(20), GETDATE(), 102) BritishDateFormat 
, CONVERT(varchar(20), GETDATE(), 103) GermanDateFormat 
, CONVERT(varchar(20), GETDATE(), 105) ItalianDateFormat 
, CONVERT(varchar(20), GETDATE(), 111) JapanDateFormat 
, CONVERT(varchar(20), GETDATE(), 112) ISODateFormat 
, CONVERT(varchar(20), GETDATE(), 113) EuropeDateFormat 
, CONVERT(varchar(20), GETDATE(), 120) ODBCDateFormat 
, CONVERT(varchar(20), GETDATE(), 126) ISO8601DateFormat 
 
Date SQL Server formatting
    The use of CONVERT function with styles is not the best approach though it saves the day. When the same formatting is used in multiple objects it makes sense to encapsulate the used date conversions in a function, making thus easier their use and their maintenance in case of changes of formatting. 
 
CREATE FUNCTION dbo.GetDateAsString( @date datetime) 
RETURNS varchar(10) 
AS 
BEGIN 
    RETURN CONVERT(varchar(10), @date, 103) 
END 
 
   The inverse problem is the conversion of a string to a date, Oracle providing the TO_DATE, CAST, TO_TIMESTAMP and  TO_TIMESTAMP_TZ functions for this purpose, the first two functions being the most used. 
--Oracle String to Date Conversion 
SELECT TO_DATE('25-03-2009', 'DD-MM-YYYY') 
, TO_DATE('25-03-2009 18:30:23', 'DD-MM-YYYY HH24:MI:SS') 
, Cast('25-MAR-2009' as Date) 
FROM DUAL    

Excepting the CONVERT function mentioned above, SQL Server provides a CAST function too, both allowing the conversion of strings to date. 
 
SELECT CAST('2009-03-25' as date) 
, CONVERT(date, '2009-03-25') 


  When saving dates into text attributes in SQL Server it should be targeted to use the ISO format which is independent of the format set by DATEFORMAT, otherwise, in case the format of the date stored is known, the string could be translated to the ISO format like in the below function: 
 
--SQL Server: DD/MM/YYYY  String to Date function 
CREATE 
FUNCTION dbo.GetStringDate( 
@date varchar(10)) 
RETURNS datetime 
AS 
BEGIN 
     RETURN Cast(Right(@date, 4) + '/' + Substring(@date, 4,2) + '/' + Left(@date, 2) as datetime) 
END 
SELECT 
dbo.GetStringDate('25/09/2009') 


    Other approach I found quite useful in several cases is based on the VBScript DateSerial function that allows the creation of a date from its constituents:

-- SQL Server: DateSerial 
CREATE FUNCTION dbo.DateSerial( 
@year int 
, @month smallint , 
@day smallint) 
RETURNS 
date 
AS 
BEGIN 
RETURN (Cast(@year as varchar(4)) + '-' + Cast(@month as varchar(2)) + '-' + Cast(@day as varchar(2))) 
END 
SELECT 
dbo.DateSerial(2009,10,24) 
-- SQL Server: DateTimeSerial 
CREATE FUNCTION dbo.DateTimeSerial( 
@year int 
, @month smallint 
, @day smallint 
, @hour smallint 
, @minute smallint 
, @second smallint) 
RETURNS 
datetime AS 
BEGIN 
RETURN (Cast(@year as varchar(4)) + '-' + Cast(@month as varchar(2)) + '-' + Cast(@day as varchar(2)) 
+ ' ' + Cast(@hour as varchar(2)) + ':' + Cast(@minute as varchar(2)) + ':' + Cast(@second as varchar(2))) 
END 

    Given TO_DATE function’s flexibility none of the three above functions - GetStringDate, DateSerial and DateTimeSerial, are really needed in Oracle.
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.