01 April 2024

R Language: Data Transformations (Part I: Temperatures' comparison between F° and C°)

The time series used for weather analysis use either Fahrenheit (F°) or Celsius (C°) for the temperature values. Looking at the A and B plots below that represent the values of the same dataset in F°, respectively C°, there seems to be no difference between the two plots independently on whether one works with F° or C°, however the scales are different. Once one uses the same scale for both values (see C) the plots are distorted according to the formula used for transformation.

Comments:
(1) Typically, it makes sense to adapt the temperature scale to the audience, though on the Web there will be always a mix of audiences (and that's why weather websites allow to choose one of the values). 
(2) Not starting from 0 might show in the end the same trend at same scale, though the behavior can change occasionally. As long as the Y-axis is correctly labeled, this shouldn't be a problem. Conversely, it's better to control the scale and provide the min-max values for the axis accordingly.
(3) When creating such plots, it's important to be aware of the distortion that might be introduced by transformations. For linear transformations of the type a*x+b, the value of the "a" coefficient tells how much the resulting values are stretched or contracted.

I used as exemplification the airquality dataset which contains data for 1973, the temperature being given in F°. Unfortunately, the dataset contains only the day and the month, so the date must be constructed and added to the dataset. For simplification, I've added the calculated temperature in C° as column as well:

#reviewing the data
help("airquality")

#preparing the data
head(airquality)
airquality$date <- with(airquality, as.Date(ISOdate(1973, Month, Day))) #adding the date
airquality$TempC <- with(airquality, (Temp - 32) * 5/9) #adding the temperature in C°
head(airquality)

And, here's the code used to generate the plots:

#Temperatures' comparison between F° and C°
par(mfrow = c(2,2)) #1x2 matrix display

plot(airquality$date, airquality$Temp, ylab="Temperature (F°)", xlab="date", type="l", col="blue", main="A")

plot(airquality$date, airquality$TempC, ylab="Temperature (C°)", xlab="date", type="l", col="brown", main="B")

plot(airquality$date, airquality$Temp, ylab="Temperature (F°) vs (C°)", xlab="date", ylim=c(0,100), type="l", col="blue", main="C")
lines(airquality$date, airquality$TempC, col="brown")

# using inline formula
plot(airquality$date, (airquality$Temp - 32) * 5/9, ylab="(Temp-32)*5/9", xlab="date", ylim=c(0,100), type="l", col="brown", main="D")

mtext("© sql-troubles@blogspot.com @sql_troubles, 2024", side = 1, line = 4, adj = 1, col = "dodgerblue4", cex = .7)
title("Temperatures' comparison between F° and C°", line = -1, outer = TRUE)

In the fourth plot I directly used the formula for transforming the values from F° and C°. If the values based on the formula need to be used repeatedly, it's probably better to add a column to the dataset.

Unfortunately, the standard library has its limitations when creating visualizations. While writing this post I tried to work also with the plotly library, which offers a richer set of tools and can be used to create wonderful visualizations (though it proves also more complex to use). 

install.packages("plotly")
library("plotly")

Here's the code used to plot the below graphic (the points have labels, much like in Power BI):

fig <- plot_ly(airquality, type = 'scatter', mode = 'lines+markers')%>%
  add_trace(x = ~date, y = ~Temp, name = 'Temp (F)')%>%
  add_trace(x = ~date, y = ~TempC, name = 'Temp (C)')%>%
  layout(showlegend = F, title="Temperatures' comparison between K° and C°")

fig
The temperatures via Plotly

Happy coding!

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.