24 July 2010

Some Notes on Dates

   The Date is one of the most difficult data types to handle, given the fact that it might take different formats (e.g. DD/MM/YYYY, MM/DD/YYYY, DD/MM/YY, DD-MON-YYYY, etc.), and even if the various formats make a limited list, the fact that each database, machine or application could work with a different date format or set of date formats, this makes developers’ life a nightmare. The representational complexity resides also in the fact that each country adopts one or more standard date formats, a mixture between day, month, year and the various delimiters used (e.g. “.”, “/”, “-”). For example the SQL Server Books Online lists several country-specific date formats that could be used with Convert function, while in Date and Time Formats tutorial could be found a list with the formats and literals used in Oracle. Even .

     The various frameworks have attempted to introduce modalities of handling dates (e.g. culture info), though none of them is bulletproof given the interaction/integration between the various layers. Sometimes I prefer to store the date as a (standard) text string and transmit it thus between layers, following to transform (convert) it as needed by the various data consumers (UI, components, databases, etc.). What format to choose is depends entirely on habitude, special requests/constraints or on the flexibility of converting a given format to other formats. A flexible approach is the ISO 8601 format that combines date and time representations (<date>T<time>) as YYYY-MM-DDThh:mi:ss.mmm (e.g. 2010-12-22T23:10:29.300). It’s important to note that in SQL Server this format is not affected by the SET DATEFORMAT or SET LANGUAGE settings (see ISO 8601 Format), fact that makes the ISO format a good candidate for representing dates.

   The ISO format allows also to sort lexicographically a date, fact that makes it valuable when combining strings with dates, for example when naming files, though then some of the delimiters need to be removed. I prefer to remove all the delimiters thus storing a file with the timestamp in the format <meaningful_name><timestamp>.<extension> (e.g. ‘Report 20101222T231029.rpt’).

    When the time is not tracked, the date could be stored also as an integer in YYYYMMDD format, thus for example 2010-12-22 could be stored as 20101222. I’ve seen old databases making use of this technique, I found it awkward at the beginning though I discovered its performance benefits when doing comparisons, the difference of days between two dates reducing at a simple subtraction. Actually storing the date as integer is just a simplification of the way on how dates are stored, for example in SQL Server the date is stored as two integers,  in which the first integer stores the date itself and the second the time in clock ticks after midnight (see here). The danger comes when attempting to work with the integer format in order to express the difference between two dates in higher time units than the day (e.g. weeks, months, years), in such cases being recommended to work with built-in date functions. Given the fact that dates on SQL Server could be expressed also as float values (see Joe Celko’s article on Temporal Data Types in SQL Server), the same danger occurs when using the numeric values in calculations in the detriment of date functions.

    In ERP systems or other “activity-based” systems is useful to store a calendar together with other calendar specific information – whether it’s a working date, the period in which the date falls, the fiscal week, month, years, etc. If in OLTP systems such structures are used to store the time dimension, thus each time record being referenced directly, in OLAP systems there could be a mix between between direct referencing of calendar records or of storing directly the date, following in developers’ attributions to retrieve additional information about the date when needed. As it doesn’t make sense to argue in here the benefits of the two approaches, I limit myself to note that the integer value of the date could be used as a foreign key when the time component is not needed to be stored. By storing a date as integer or to a shorter date format  (e.g. YYYYMM, YYYYWW, YYYY) could be impacted applications’ flexibility. Now it depends, there are cases and cases, different problems requiring different solutions. Just think twice when choosing on how to store a date, think not only about the current requirements but also how the requirements could change given a change in the process/business dynamics!

No comments: