25 March 2024

R Language: Regression Analysis with Simulated & Real Data

Before doing regression on a real dataset, one can use as minimum a set of simulated data to test the steps (code adapted after [1]):

# define the model with simulated data
n <- 100
x <- c(1:n)
error <- rnorm(n,0,10)
y <- 1+2*x+error
fit <- lm(y~x)

# plotting the values
plot(x, y, ylab="1+2*x+error")
lines(x, fit$fitted.values)

#using anova (analysis of variance)
anova(fit)

In the first step is created the data model, while in the second the data are plotted, while in the third the analysis of variance is run. For the y variable, can be used any linear function that represents a line in the plane. 

rnorm() function generates multivariate normal random variates based on the parameters given, therefore the output will vary between the runs of the above code. The bigger the value of the third parameter, the more dispersed the data is.

To test the code on real data, one can use the Sleuth3 library with the data from [2] (see RPubs):

install.packages ("Sleuth3")
library("Sleuth3")

Let's look at the data from the first case, which represent an experiment concerning the effects of intrinsic and extrinsic motivation on creativity run by the psychologist Teresa Amabile (see [2]):

attach(case0101)
case0101
summary(case0101)  

The regression can be applied to all the data:

# case 0101 (all data)
x <- c(1:47)
y <- case0101$Score
fit <- lm(y~x)
plot(x, y, ylab="Score")
lines(x, fit$fitted.values)

Though, a more appropriate analysis should be based on each questionnaire:

# case 0101 (extrinsic vs intrinsic treatments)
extrinsic <- subset(case0101, Treatment %in% "Extrinsic")
intrinsic <- subset(case0101, Treatment %in% "Intrinsic")

par(mfrow = c(1,2)) #1x2 matrix display
x <- c(1:length(extrinsic$Score))
y <- extrinsic$Score
fit <- lm(y~x)
plot(x, y, ylab="Extrinsic Score")
lines(x, fit$fitted.values)

x <- c(1:length(intrinsic$Score))
y <- intrinsic$Score
fit <- lm(y~x)
plot(x, y, ylab="Intrinsic Score")
lines(x, fit$fitted.values)

title("Extrinsic vs. Intrinsic Motivation on Creativity", line = -2, outer = TRUE)

And, here's the output:

Case 0101 Extrinsic vs. Intrinsic Motivation on Creativity

Happy coding!

References:
[1] DeWayne R Derryberry (2014) Basic Data Analysis for Time Series with R 1st Ed.
[2] Fred L Ramsey & Daniel W Schafer (2013) The Statistical Sleuth: A Course in Methods of Data Analysis 3rd Ed.

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.