Independent Samples T Tests with R

The data we shall use here were collected from students in my introductory statistics classes

from 1983 through Spring, 2015. Here is a description of the survey.

The data were in an SPSS file, but I wrote them from SPSS to a csv file. A csv file is a plain

text file that uses a comma as the delimiter. At first R did not want to work with this csv file. I

discovered that this was because I had SPSS set to use Unicode, but R was assuming the file was in

locale code. I went back into SPSS and changed the encoding to locale and all was well after that.

Below, on the left, is a snapshot of the first few lines of the csv file. On the right is how the data

appear in R.

Using commas as delimiters has the advantage of making it easier to deal with missing data.

Look at the data for the first two subjects, above, left. There is just white space between the commas

marking off the scores for SATM. Those two subjects were missing data on SATM. On the right,

notice that R replaces missing values with its missing values code, “NA.”

Here is the code used to read in the csv file:

introq <- read.table("C:/Users/Vati/Documents/StatData/IntroQ/IntroQ.csv", header=TRUE, sep=",")

I intent to use the “psych” package, so I activate it: library(psych)

Now for some basic descriptive statistics comparing men with women on height of ideal mate:

describeBy(introq$Ideal, introq$Gender)

group: 1

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 539 71.43 3.25 72 71.67 2.97 55 80 25 -0.97 2.82 0.14

------------------------------------------------------------

group: 2

vars n mean sd median trimmed mad min max range skew kurtosis se

1 1 180 66.59 3.27 66 66.42 2.97 55 78 23 0.44 1.23 0.24

Not surprisingly, the mean height of female students’ ideal mates is greater than that of male

students’ ideal mates.

Now for an independent samples t test, comparing the two genders on height of ideal mate.

t.test(introq$Ideal ~ introq$Gender)

Welch Two Sample t-test

data: Ideal by Gender

t = 17.218, df = 305.4, p-value < 2.2e-16

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.282968 5.388255

sample estimates:

mean in group 1 mean in group 2

71.43006 66.59444

It would be easier to deal with the output if I did not need to remember which numeric code

stands for male and which for female. Fortunately, SPSS will, upon request, write value labels to the

csv file, so I went back to SPSS and exported to csv with that request.

Gender Ideal Eye Statoph Nucoph SATM Year

1 Female 68.0 Other 8.0 50 NA 2105

2 Male 64.0 Other 4.0 40 NA 2105

3 Male 68.0 Other 5.0 30 430 2105

4 Female 72.0 Green 10.0 100 540 2105

As you can see, above, the csv file now has value labels, rather than numeric values, for the

categorical variables. When I read in the new csv file and run the t test again, I get

Welch Two Sample t-test

data: Ideal by Gender

t = 17.218, df = 305.4, p-value < 2.2e-16

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.282968 5.388255

sample estimates:

mean in group Female mean in group Male

71.43006 66.59444

Notice that, by default, R does a separate variances t test. This is, IMHO, a good idea, but if

you want a pooled variances t test, you can get it this way. Even though the sample sizes here differ

quite a bit, the sample variances are nearly identical, so I am comfortable with the pooled test.

t.test(introq$Ideal ~ introq$Gender, var.equal=TRUE)

Two Sample t-test

data: Ideal by Gender

t = 17.274, df = 717, p-value < 2.2e-16

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

4.286033 5.385189

sample estimates:

mean in group Female mean in group Male

71.43006 66.59444

Since the unit of measure for our scores is well known to folks in the US, it is perfectly

reasonable to use the simple difference between means (4.84 inches) as the effect size estimate.

We might want to convert that to cm for folks more familiar with the metric system. Suppose,

however, that our scores were estimates of political conservativism from the Überarschloch scale.

Unless we are very familiar with that scale, we are not going to know whether a difference of 4.84 is a

small difference or a large difference. In that case, it would be best to use Cohen’s d, the

standardized difference between group means. Here I shall use the “lsr” package to get Cohen’s d

library(lsr)

cohensD(introq$Ideal~ introq$Gender)

[1] 1.487092

To put a confidence interval around d, I am going to use the “compute.es” package.

install.packages("compute.es")

Installing package into ‘C:/Users/Vati/Documents/R/win-library/3.2’

(as ‘lib’ is unspecified)

trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/compute.es_0.2-4.zip'

Content type 'application/zip' length 272294 bytes (265 KB)

downloaded 265 KB

package ‘compute.es’ successfully unpacked and MD5 sums checked

library(compute.es)

des(d=1.487092, n.1=539, n.2=180)

Mean Differences ES:

d [ 95 %CI] = 1.49 [ 1.3 , 1.67 ]

var(d) = 0.01

p-value(d) = 0

Presenting the Results

Students in Professor Karl’s undergraduate statistics classes completed a brief survey. They

were asked to indicate their sex/gender and the height, in inches, of their ideal mate. Mean height of

ideal mate was significantly greater for female students (M = 71.43, SD = 3.25, n = 539) than for male

students (M = 66.59, SD = 3.27, n = 180), t(717) = 17.27, p < .001, d = 1.49, 95% CI [1.30. 1.67].

 Wuensch’s R Lessons