SQL Troubles: 📊R Language: Data Summaries without Using a DataFrame

26 February 2024

📊R Language: Data Summaries without Using a DataFrame

Coming back to the R language after several years and trying to remember some basic functions proved to be a bit challenging, even if the syntax is quite simple. Therefore, I considered putting together a few calls as refresher based on Youden-Beale data. To run the below code you'll need to install the R language and RStudio.

In case you don't have the package installed, run the next two lines:

install.packages("ACSWR") #install the Youden-Beale Experiment package
library(ACSWR)	#load the library

str(yb)		#display datasets' structure

'data.frame': 8 obs. of 2 variables:
$ Preparation_1: int 31 20 18 17 9 8 10 7
$ Preparation_2: int 18 17 14 11 10 7 5 6

yb		#display the dataset

Preparation_1 Preparation_2
1      31              18
2      20              17
3      18              14
4      17              11
5         9               10
6       8               7
7       10                5
8      7            6

summary(yb) 	#display the summary for whole dataset

Preparation_1 Preparation_2
Min. : 7.00      Min. : 5.00
1st Qu.: 8.75      1st Qu.: 6.75
Median :13.50   Median :10.50
Mean :15.00      Mean :11.00
3rd Qu.:18.50      3rd Qu.:14.75
Max. :31.00    Max. :18.00

summary(yb$Preparation_1)	#display the summary for first column

Min. 1st Qu. Median Mean 3rd Qu. Max.
7.00 8.75 13.50 15.00 18.50 31.00

summary(yb$Preparation_2)	#display the summary for second column

Min. 1st Qu. Median Mean 3rd Qu. Max.
5.00 6.75 10.50 11.00 14.75 18.00

min(yb)	#display the minimum value for the whole dataset

[1] 5

min(yb$Preparation_1)	#display the mininun of first column

[1] 7

min(yb$Preparation_2)	#display the minimum of second column

[1] 5

sum(yb)	#display the sum of all values

[1] 208

sum(yb$Preparation_1)	#display the sum of first column

[1] 120

sum(yb$Preparation_2)	#display the sum of second column

[1] 88

#display the percentiles 
quantile(yb$Preparation_1,seq(0,1,.25))

0% 25% 50% 75% 100%
7.00 8.75 13.50 18.50 31.00

#display the percentiles 
quantile(yb$Preparation_2,seq(0,1,.25))

0% 25% 50% 75% 100%
5.00 6.75 10.50 14.75 18.00

#display the percentiles 
quantile(yb$Preparation_2,seq(0,1,.25))

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
7.0 7.7 8.4 9.1 9.8 13.5 17.2 17.9 19.2 23.3 31.0

quantile(yb$Preparation_2,seq(0,1,.1))

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
5.0 5.7 6.4 7.3 9.4 10.5 11.6 13.7 15.8 17.3 18.0

length(yb) 	#display the number of items 
ncol(yb) 	#display the number of columns

[1] 2

sort(yb$Preparation_1) #display the sorted values ascendingly

[1] 7 8 9 10 17 18 20 31

sort(yb$Preparation_1, decreasing = TRUE)

[1] 31 20 18 17 10 9 8 7

#display a vertical poxplot
boxplot(yb, notch=FALSE)
title("A: Vertical Boxplot for Youden-Beale Data")

#display an horizontal poxplot
boxplot(yb, horizontal = TRUE)
title("B: Horizontal Boxplot for Youden-Beale Data")

plot(yb) #scatter diagram

title("Scatter diagram")

lsfit(yb$Preparation_1, yb$Preparation_2)$coefficients #list square fit coefficients

Intercept X

2.8269231 0.5448718

lsfit(yb$Preparation_1, yb$Preparation_2)$residuals #list square fit residuals

[1] -1.7179487  3.2756410  1.3653846 -1.0897436  2.2692308 -0.1858974
[7] -3.2756410 -0.6410256

Happy coding!

Previous Post <<||>> Next Post

SQL Troubles

Pages

26 February 2024

📊R Language: Data Summaries without Using a DataFrame

No comments:

About Me