26 February 2024

R Language: Data Summaries without Using a DataFrame

Coming back to the R language after several years and trying to remember some basic functions proved to be a bit challenging, even if the syntax is quite simple. Therefore, I considered putting together a few calls as refresher based on Youden-Beale data. To run the below code you'll need to install the R language and RStudio.

In case you don't have the package installed, run the next two lines:

install.packages("ACSWR") #install the Youden-Beale Experiment package
library(ACSWR)	#load the library
 
str(yb)		#display datasets' structure

  'data.frame': 8 obs. of 2 variables:
$ Preparation_1: int  31  20  18  17  9  8 10  7
$ Preparation_2: int  18  17  14  11 10 7   5  6

yb		#display the dataset

Preparation_1 Preparation_2
1          31                  18
2          20                  17
3          18                  14
4          17                  11
5            9                  10
6            8                   7
7          10                   5
8            7                   6

summary(yb) 	#display the summary for whole dataset

Preparation_1     Preparation_2
Min. : 7.00          Min. : 5.00
1st Qu.: 8.75       1st Qu.: 6.75
Median :13.50     Median :10.50
Mean :15.00        Mean :11.00
3rd Qu.:18.50      3rd Qu.:14.75
Max. :31.00         Max. :18.00

summary(yb$Preparation_1)	#display the summary for first column

Min. 1st Qu. Median   Mean   3rd Qu.   Max.
7.00      8.75     13.50   15.00     18.50    31.00

summary(yb$Preparation_2)	#display the summary for second column

Min. 1st Qu. Median    Mean   3rd Qu.  Max.
5.00     6.75      10.50    11.00     14.75   18.00

min(yb)	#display the minimum value for the whole dataset

[1] 5

min(yb$Preparation_1)	#display the mininun of first column

[1] 7

min(yb$Preparation_2)	#display the minimum of second column

[1] 5

sum(yb)	#display the sum of all values

[1] 208

sum(yb$Preparation_1)	#display the sum of first column

[1] 120

sum(yb$Preparation_2)	#display the sum of second column

[1] 88

#display the percentiles 
quantile(yb$Preparation_1,seq(0,1,.25))

0%    25%   50%   75%   100%
7.00  8.75  13.50  18.50  31.00

#display the percentiles 
quantile(yb$Preparation_2,seq(0,1,.25))

0%   25%   50%   75%   100%
5.00  6.75 10.50  14.75   18.00

#display the percentiles 
quantile(yb$Preparation_2,seq(0,1,.25))

0%  10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
7.0    7.7     8.4    9.1     9.8  13.5   17.2  17.9  19.2   23.3   31.0

quantile(yb$Preparation_2,seq(0,1,.1))

0%   10%   20%  30%   40% 50%  60% 70%  80%  90% 100%
5.0     5.7     6.4      7.3     9.4 10.5   11.6 13.7  15.8   17.3  18.0

length(yb) 	#display the number of items 
ncol(yb) 	#display the number of columns

[1] 2

sort(yb$Preparation_1) #display the sorted values ascendingly 

[1] 7 8 9 10 17 18 20 31

sort(yb$Preparation_1, decreasing = TRUE)

[1] 31 20 18 17 10 9 8 7

#display a vertical poxplot
boxplot(yb, notch=FALSE)
title("A: Vertical Boxplot for Youden-Beale Data")

#display an horizontal poxplot
boxplot(yb, horizontal = TRUE)
title("B: Horizontal Boxplot for Youden-Beale Data")


 
plot(yb) #scatter diagram
title("Scatter diagram")

lsfit(yb$Preparation_1, yb$Preparation_2)$coefficients #list square fit coefficients 

Intercept         X 
2.8269231 0.5448718 
 
lsfit(yb$Preparation_1, yb$Preparation_2)$residuals #list square fit residuals

[1] -1.7179487  3.2756410  1.3653846 -1.0897436  2.2692308 -0.1858974
[7] -3.2756410 -0.6410256

  Happy coding!

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.