tabulate twoway — Two-way table of frequencies

Title stata.com

Syntax Menu Description Options

Remarks and examples Stored results Methods and formulas References

Also see

Syntax

Two-way table

tabulate varname

varname





weight



, options



Two-way table for all possible combinations—a convenience tool

tab2 varlist





weight



, options



Immediate form of two-way tabulations

tabi #



. . .



\ #



. . .



\ . . .



, options



options Description

Main

chi2 report Pearson’s χ

exact



(#)



report Fisher’s exact test

gamma report Goodman and Kruskal’s gamma

lrchi2 report likelihood-ratio χ

taub report Kendall’s τ

V report Cram

er’s V

cchi2 report Pearson’s χ

in each cell

column report relative frequency within its column of each cell

row report relative frequency within its row of each cell

clrchi2 report likelihood-ratio χ

in each cell

cell report the relative frequency of each cell

expected report expected frequency in each cell

nofreq do not display frequencies

missing treat missing values like other values

wrap do not wrap wide tables





key report/suppress cell contents key

nolabel display numeric codes rather than value labels

nolog do not display enumeration log for Fisher’s exact test

∗

firstonly show only tables that include the ﬁrst variable in varlist

2 tabulate twoway — Two-way table of frequencies

Advanced

matcell(matname) save frequencies in matname; programmer’s option

matrow(matname) save unique values of varname

in matname; programmer’s option

matcol(matname) save unique values of varname

in matname; programmer’s option

‡

replace replace current data with given cell frequencies

all equivalent to specifying chi2 lrchi2 V gamma taub

∗

firstonly is available only for tab2.

‡

replace is available only for tabi.

by is allowed with tabulate and tab2; see [D] by.

fweights, aweights, and iweights are allowed by tabulate. fweights are allowed by tab2. See [U] 11.1.6 weight.

all does not appear in the dialog box.

tabulate

Statistics > Summaries, tables, and tests > Frequency tables > Two-way table with measures of association

tab2

Statistics > Summaries, tables, and tests > Frequency tables > All possible two-way tables

tabi

Statistics > Summaries, tables, and tests > Frequency tables > Table calculator

Description

tabulate produces a two-way table of frequency counts, along with various measures of association,

including the common Pearson’s χ

, the likelihood-ratio χ

, Cram

er’s V , Fisher’s exact test, Goodman

and Kruskal’s gamma, and Kendall’s τ

Line size is respected. That is, if you resize the Results window before running tabulate,

the resulting two-way tabulation will take advantage of the available horizontal space. Stata for

Unix(console) users can instead use the set linesize command to take advantage of this feature.

tab2 produces all possible two-way tabulations of the variables speciﬁed in varlist.

tabi displays the r × c table, using the values speciﬁed; rows are separated by ‘\’. If no options

are speciﬁed, it is as if exact were speciﬁed for a 2 × 2 table and chi2 were speciﬁed otherwise.

See [U] 19 Immediate commands for a general description of immediate commands. See Tables with

immediate data below for examples using tabi.

See [R] tabulate oneway if you want a one-way table of frequencies. See [R] table and [R] tabstat

if you want one-, two-, or n-way table of frequencies and a wide variety of summary statistics. See

[R] tabulate, summarize() for a description of tabulate with the summarize() option; it produces a

table (breakdowns) of means and standard deviations. table is better than tabulate, summarize(),

but tabulate, summarize() is faster. See [ST] epitab for a 2 × 2 table with statistics of interest

to epidemiologists.

tabulate twoway — Two-way table of frequencies 3

Options



 

Main



chi2 calculates and displays Pearson’s χ

for the hypothesis that the rows and columns in a two-way

table are independent. chi2 may not be speciﬁed if aweights or iweights are speciﬁed.

exact



(#)



displays the signiﬁcance calculated by Fisher’s exact test and may be applied to r × c as

well as to 2 × 2 tables. For 2 × 2 tables, both one- and two-sided probabilities are displayed. For

r ×c tables, one-sided probabilities are displayed. The optional positive integer # is a multiplier on

the amount of memory that the command is permitted to consume. The default is 1. This option

should not be necessary for reasonable r × c tables. If the command terminates with error 910,

try exact(2). The maximum row or column dimension allowed when computing Fisher’s exact

test is the maximum row or column dimension for tabulate (see [R] limits).

gamma displays Goodman and Kruskal’s gamma along with its asymptotic standard error. gamma is

appropriate only when both variables are ordinal. gamma may not be speciﬁed if aweights or

iweights are speciﬁed.

lrchi2 displays the likelihood-ratio χ

statistic. lrchi2 may not be speciﬁed if aweights or

iweights are speciﬁed.

taub displays Kendall’s τ

along with its asymptotic standard error. taub is appropriate only when

both variables are ordinal. taub may not be speciﬁed if aweights or iweights are speciﬁed.

V (note capitalization) displays Cram

er’s V . V may not be speciﬁed if aweights or iweights are

speciﬁed.

cchi2 displays each cell’s contribution to Pearson’s chi-squared in a two-way table.

column displays the relative frequency of each cell within its column in a two-way table.

row displays the relative frequency of each cell within its row in a two-way table.

clrchi2 displays each cell’s contribution to the likelihood-ratio chi-squared in a two-way table.

cell displays the relative frequency of each cell in a two-way table.

expected displays the expected frequency of each cell in a two-way table.

nofreq suppresses the printing of the frequencies.

missing requests that missing values be treated like other values in calculations of counts, percentages,

and other statistics.

wrap requests that Stata take no action on wide, two-way tables to make them readable. Unless wrap

is speciﬁed, wide tables are broken into pieces to enhance readability.





key suppresses or forces the display of a key above two-way tables. The default is to display the

key if more than one cell statistic is requested, and otherwise to omit it. key forces the display

of the key. nokey suppresses its display.

nolabel causes the numeric codes to be displayed rather than the value labels.

nolog suppresses the display of the log for Fisher’s exact test. Using Fisher’s exact test requires

counting all tables that have a probability exceeding that of the observed table given the observed

row and column totals. The log counts down each stage of the network computations, starting from

the number of columns and counting down to 1, displaying the number of nodes in the network

at each stage. A log is not displayed for 2 × 2 tables.

firstonly, available only with tab2, restricts the output to only those tables that include the ﬁrst

variable in varlist. Use this option to interact one variable with a set of others.

4 tabulate twoway — Two-way table of frequencies



 

Advanced



matcell(matname) saves the reported frequencies in matname. This option is for use by programmers.

matrow(matname) saves the numeric values of the r × 1 row stub in matname. This option is for

use by programmers. matrow() may not be speciﬁed if the row variable is a string.

matcol(matname) saves the numeric values of the 1 × c column stub in matname. This option is

for use by programmers. matcol() may not be speciﬁed if the column variable is a string.

replace indicates that the immediate data speciﬁed as arguments to the tabi command be left as

the current data in place of whatever data were there.

The following option is available with tabulate but is not shown in the dialog box:

all is equivalent to specifying chi2 lrchi2 V gamma taub. Note the omission of exact. When

all is speciﬁed, no may be placed in front of the other options. all noV requests all association

measures except Cram

er’s V (and Fisher’s exact). all exact requests all association measures,

including Fisher’s exact test. all may not be speciﬁed if aweights or iweights are speciﬁed.

Limits

Two-way tables may have a maximum of 1,200 rows and 80 columns (Stata/MP and Stata/SE),

300 rows and 20 columns (Stata/IC), or 160 rows and 20 columns (Small Stata). If larger tables are

needed, see [R] table.

Remarks and examples stata.com

Remarks are presented under the following headings:

tabulate

Measures of association

N-way tables

Weighted data

Tables with immediate data

tab2

Video examples

For each value of a speciﬁed variable (or a set of values for a pair of variables), tabulate

reports the number of observations with that value. The number of times a value occurs is called its

frequency.

tabulate

Example 1

tabulate will make two-way tables if we specify two variables following the word tabulate.

In our highway dataset, we have a variable called rate that divides the accident rate into three

categories: below 4, 4 – 7, and above 7 per million vehicle miles. Let’s make a table of the speed

limit category and the accident-rate category:

tabulate twoway — Two-way table of frequencies 5

. use http://www.stata-press.com/data/r13/hiway2

(Minnesota Highway Data, 1973)

. tabulate spdcat rate

Speed Accident rate per million

Limit vehicle miles

Category Below 4 4-7 Above 7 Total

40 to 50 3 5 3 11

55 to 50 19 6 1 26

Above 60 2 0 0 2

Total 24 11 4 39

The table indicates that three stretches of highway have an accident rate below 4 and a speed limit of

40 to 50 miles per hour. The table also shows the row and column sums (called the marginals). The

number of highways with a speed limit of 40 to 50 miles per hour is 11, which is the same result

we obtained in our previous one-way tabulations.

Stata can present this basic table in several ways—16, to be precise—and we will show just a

few below. It might be easier to read the table if we included the row percentages. For instance, of

11 highways in the lowest speed limit category, three are also in the lowest accident-rate category.

Three-elevenths amounts to some 27.3%. We can ask Stata to ﬁll in this information for us by using

the row option:

. tabulate spdcat rate, row

Key

frequency

row percentage

Speed Accident rate per million

Limit vehicle miles

Category Below 4 4-7 Above 7 Total

40 to 50 3 5 3 11

27.27 45.45 27.27 100.00

55 to 50 19 6 1 26

73.08 23.08 3.85 100.00

Above 60 2 0 0 2

100.00 0.00 0.00 100.00

Total 24 11 4 39

61.54 28.21 10.26 100.00

The number listed below each frequency is the percentage of cases that each cell represents out of

its row. That is easy to remember because we see 100% listed in the “Total” column. The bottom

row is also informative. We see that 61.54% of all the highways in our dataset fall into the lowest

accident-rate category, that 28.21% are in the middle category, and that 10.26% are in the highest.

tabulate can calculate column percentages and cell percentages, as well. It does so when we

specify the column or cell options, respectively. We can even specify them together. Below is a

table that includes everything:

6 tabulate twoway — Two-way table of frequencies

. tabulate spdcat rate, row column cell

Key

frequency

row percentage

column percentage

cell percentage

Speed Accident rate per million

Limit vehicle miles

Category Below 4 4-7 Above 7 Total

40 to 50 3 5 3 11

27.27 45.45 27.27 100.00

12.50 45.45 75.00 28.21

7.69 12.82 7.69 28.21

55 to 50 19 6 1 26

73.08 23.08 3.85 100.00

79.17 54.55 25.00 66.67

48.72 15.38 2.56 66.67

Above 60 2 0 0 2

100.00 0.00 0.00 100.00

8.33 0.00 0.00 5.13

5.13 0.00 0.00 5.13

Total 24 11 4 39

61.54 28.21 10.26 100.00

100.00 100.00 100.00 100.00

61.54 28.21 10.26 100.00

The number at the top of each cell is the frequency count. The second number is the

row percentage — they sum to 100% going across the table. The third number is the column

percentage—they sum to 100% going down the table. The bottom number is the cell percentage— they

sum to 100% going down all the columns and across all the rows. For instance, highways with a

speed limit above 60 miles per hour and in the lowest accident rate category account for 100% of

highways with a speed limit above 60 miles per hour; 8.33% of highways in the lowest accident-rate

category; and 5.13% of all our data.

A fourth option, nofreq, tells Stata not to print the frequency counts. To construct a table consisting

of only row percentages, we type

. tabulate spdcat rate, row nofreq

Speed Accident rate per million

Limit vehicle miles

Category Below 4 4-7 Above 7 Total

40 to 50 27.27 45.45 27.27 100.00

55 to 50 73.08 23.08 3.85 100.00

Above 60 100.00 0.00 0.00 100.00

Total 61.54 28.21 10.26 100.00

tabulate twoway — Two-way table of frequencies 7

Measures of association

Example 2

tabulate will calculate the Pearson χ

test for the independence of the rows and columns if we

specify the chi2 option. Suppose that we have 1980 census data on 956 cities in the United States

and wish to compare the age distribution across regions of the country. Assume that agecat is the

median age in each city and that region denotes the region of the country in which the city is

located.

. use http://www.stata-press.com/data/r13/citytemp2

(City Temperature Data)

. tabulate region agecat, chi2

Census agecat

Region 19-29 30-34 35+ Total

NE 46 83 37 166

N Cntrl 162 92 30 284

South 139 68 43 250

West 160 73 23 256

Total 507 316 133 956

Pearson chi2(6) = 61.2877 Pr = 0.000

We obtain the standard two-way table and, at the bottom, a summary of the χ

test. Stata informs us

that the χ

associated with this table has 6 degrees of freedom and is 61.29. The observed differences

are signiﬁcant.

The table is, perhaps, easier to understand if we suppress the frequencies and print just the row

percentages:

. tabulate region agecat, row nofreq chi2

Census agecat

Region 19-29 30-34 35+ Total

NE 27.71 50.00 22.29 100.00

N Cntrl 57.04 32.39 10.56 100.00

South 55.60 27.20 17.20 100.00

West 62.50 28.52 8.98 100.00

Total 53.03 33.05 13.91 100.00

Pearson chi2(6) = 61.2877 Pr = 0.000

Example 3

We have data on dose level and outcome for a set of patients and wish to evaluate the association

between the two variables. We can obtain all the association measures by specifying the all and

exact options:

8 tabulate twoway — Two-way table of frequencies

. use http://www.stata-press.com/data/r13/dose

. tabulate dose function, all exact

Enumerating sample-space combinations:

stage 3: enumerations = 1

stage 2: enumerations = 9

stage 1: enumerations = 0

Function

Dosage < 1 hr 1 to 4 4+ Total

1/day 20 10 2 32

2/day 16 12 4 32

3/day 10 16 6 32

Total 46 38 12 96

Pearson chi2(4) = 6.7780 Pr = 0.148

likelihood-ratio chi2(4) = 6.9844 Pr = 0.137

Cramr’s V = 0.1879

gamma = 0.3689 ASE = 0.129

Kendall’s tau-b = 0.2378 ASE = 0.086

Fisher’s exact = 0.145

We ﬁnd evidence of association but not enough to be truly convincing.

If we had not also speciﬁed the exact option, we would not have obtained Fisher’s exact test.

Stata can calculate this statistic both for 2 × 2 tables and for r × c. For 2 × 2 tables, the calculation

is almost instant. On more general tables, however, the calculation can take longer.

We carefully constructed our example so that all would be meaningful. Kendall’s τ

and Goodman

and Kruskal’s gamma are relevant only when both dimensions of the table can be ordered, say, from

low to high or from worst to best. The other statistics, however, are always applicable.

Technical note

Be careful when attempting to compute the p-value for Fisher’s exact test because the number of

tables that contribute to the p-value can be extremely large and a solution may not be feasible. The

errors that are indicative of this situation are errors 910, exceeded memory limitations, and 1401,

integer overﬂow due to large row-margin frequencies. If execution terminates because of memory

limitations, use exact(2) to permit the algorithm to consume twice the memory, exact(3) for three

times the memory, etc. The default memory usage should be sufﬁcient for reasonable tables.

N-way tables

If you need more than two-way tables, your best alternative to is use table, not tabulate; see

[R] table.

The technical note below shows you how to use tabulate to create a sequence of two-way tables

that together form, in effect, a three-way table, but using table is easy and produces prettier results:

tabulate twoway — Two-way table of frequencies 9

. use http://www.stata-press.com/data/r13/birthcat

(City data)

. table birthcat region agecat, c(freq)

agecat and Census Region

birthcat 19-29 30-34

NE N Cntrl South West NE N Cntrl South West

29-136 11 23 11 11 34 27 10 8

137-195 31 97 65 46 48 58 45 42

196-529 4 38 59 91 1 3 12 21

agecat and Census Region

birthcat 35+

NE N Cntrl South West

29-136 34 26 27 18

137-195 3 4 7 4

196-529 4

Technical note

We can make n-way tables by combining the by varlist: preﬁx with tabulate. Continuing with

the dataset of 956 cities, say that we want to make a table of age category by birth-rate category by

region of the country. The birth-rate category variable is named birthcat in our dataset. To make

separate tables for each age category, we would type

. by agecat, sort: tabulate birthcat region

-> agecat = 19-29

Census Region

birthcat NE N Cntrl South West Total

29-136 11 23 11 11 56

137-195 31 97 65 46 239

196-529 4 38 59 91 192

Total 46 158 135 148 487

-> agecat = 30-34

Census Region

birthcat NE N Cntrl South West Total

29-136 34 27 10 8 79

137-195

48 58 45 42 193

196-529 1 3 12 21 37

Total 83 88 67 71 309

10 tabulate twoway — Two-way table of frequencies

-> agecat = 35+

Census Region

birthcat NE N Cntrl South West Total

29-136 34 26 27 18 105

137-195 3 4 7 4 18

196-529 0 0 4 0 4

Total 37 30 38 22 127

Weighted data

Example 4

tabulate can process weighted as well as unweighted data. As with all Stata commands, we

indicate the weight by specifying the [weight] modiﬁer; see [U] 11.1.6 weight.

Continuing with our dataset of 956 cities, we also have a variable called pop, the population of

each city. We can make a table of region by age category, weighted by population, by typing

. tabulate region agecat [freq=pop]

Census agecat

Region 19-29 30-34 35+ Total

NE 4,721,387 10,421,387 5,323,610 20,466,384

N Cntrl 16,901,550 8,964,756 4,015,593 29,881,899

South 13,894,254 7,686,531 4,141,863 25,722,648

West 16,698,276 7,755,255 2,375,118 26,828,649

Total 52,215,467 34,827,929 15,856,184 102899580

If we specify the cell, column, or row options, they will also be appropriately weighted. Below we

repeat the table, suppressing the counts and substituting row percentages:

. tabulate region agecat [freq=pop], nofreq row

Census agecat

Region 19-29 30-34 35+ Total

NE 23.07 50.92 26.01 100.00

N Cntrl 56.56 30.00 13.44 100.00

South 54.02 29.88 16.10 100.00

West 62.24 28.91 8.85 100.00

Total 50.74 33.85 15.41 100.00

tabulate twoway — Two-way table of frequencies 11

Tables with immediate data

Example 5

tabi ignores the dataset in memory and uses as the table the values that we specify on the

command line:

. tabi 30 18 \ 38 14

col

row 1 2 Total

1 30 18 48

2 38 14 52

Total 68 32 100

Fisher’s exact = 0.289

1-sided Fisher’s exact = 0.179

We may specify any of the options of tabulate and are not limited to 2 × 2 tables:

. tabi 30 18 38 \ 13 7 22, chi2 exact

Enumerating sample-space combinations:

stage 3: enumerations = 1

stage 2: enumerations = 3

stage 1: enumerations = 0

col

row 1 2 3 Total

1 30 18 38 86

2 13 7 22 42

Total 43 25 60 128

Pearson chi2(2) = 0.7967 Pr = 0.671

Fisher’s exact = 0.707

. tabi 30 13 \ 18 7 \ 38 22, all exact col

Key

frequency

column percentage

Enumerating sample-space combinations:

stage 3: enumerations = 1

stage 2: enumerations = 3

stage 1: enumerations = 0

col

row 1 2 Total

1 30 13 43

34.88 30.95 33.59

2 18 7 25

20.93 16.67 19.53

3 38 22 60

44.19 52.38 46.88

Total 86 42 128

100.00 100.00 100.00

12 tabulate twoway — Two-way table of frequencies

Pearson chi2(2) = 0.7967 Pr = 0.671

likelihood-ratio chi2(2) = 0.7985 Pr = 0.671

Cramr’s V = 0.0789

gamma = 0.1204 ASE = 0.160

Kendall’s tau-b = 0.0630 ASE = 0.084

Fisher’s exact = 0.707

For 2 × 2 tables, both one- and two-sided Fisher’s exact probabilities are displayed; this is true of

both tabulate and tabi. See Cumulative incidence data and Case–control data in [ST] epitab for

more discussion on the relationship between one- and two-sided probabilities.

Technical note

tabi, as with all immediate commands, leaves any data in memory undisturbed. With the replace

option, however, the data in memory are replaced by the data from the table:

. tabi 30 18 \ 38 14, replace

col

row 1 2 Total

1 30 18 48

2 38 14 52

Total 68 32 100

Fisher’s exact = 0.289

1-sided Fisher’s exact = 0.179

. list

row col pop

1. 1 1 30

2. 1 2 18

3. 2 1 38

4. 2 2 14

With this dataset, you could re-create the above table by typing

. tabulate row col [freq=pop], exact

col

row 1 2 Total

1 30 18 48

2 38 14 52

Total 68 32 100

Fisher’s exact = 0.289

1-sided Fisher’s exact = 0.179

tabulate twoway — Two-way table of frequencies 13

tab2

tab2 is a convenience tool. Typing

. tab2 myvar thisvar thatvar, chi2

is equivalent to typing

. tabulate myvar thisvar, chi2

. tabulate myvar thatvar, chi2

. tabulate thisvar thatvar, chi2

Video examples

Pearson’s chi2 and Fisher’s exact test in Stata

Tables and cross-tabulations in Stata

Immediate commands in Stata: Cross-tabulations and chi-squared tests from summary data

Stored results

tabulate, tab2, and tabi store the following in r():

Scalars

r(N) number of observations r(p exact) Fisher’s exact p

r(r) number of rows r(chi2 lr) likelihood-ratio χ

r(c) number of columns r(p lr) signiﬁcance of likelihood-ratio χ

r(chi2) Pearson’s χ

r(CramersV) Cram

er’s V

r(p) signiﬁcance of Pearson’s χ

r(ase gam) ASE of gamma

r(gamma) gamma r(ase taub) ASE of τ

r(p1 exact) one-sided Fisher’s exact p r(taub) τ

r(p1 exact) is deﬁned only for 2×2 tables. Also, the matrow(), matcol(), and matcell() options allow you to

obtain the row values, column values, and frequencies, respectively.

Methods and formulas

Let n

, i = 1, . . . , I and j = 1, . . . , J, be the number of observations in the ith row and jth

column. If the data are not weighted, n

is just a count. If the data are weighted, n

is the sum of

the weights of all data corresponding to the (i, j) cell.

Deﬁne the row and column marginals as

i·

j=1

·j

i=1

and let n =

be the overall sum. Also, deﬁne the concordance and discordance as

k>i

l>j

k<i

l<j

k>i

l<j

k<i

l>j

along with twice the number of concordances P =

and twice the number of discordances

Q =

14 tabulate twoway — Two-way table of frequencies

The Pearson χ

statistic with (I − 1)(J − 1) degrees of freedom (so called because it is based

on Pearson (1900); see Conover [1999, 240] and Fienberg [1980, 9]) is deﬁned as

− m

)

where m

= n

i·

·j

/n.

The likelihood-ratio χ

statistic with (I − 1)(J − 1) degrees of freedom (Fienberg 1980, 40) is

deﬁned as

= 2

ln(n

)

Cram

er’s V (Cram

er 1946) is a measure of association designed so that the attainable upper bound

is 1. For 2 × 2 tables, −1 ≤ V ≤ 1, and otherwise, 0 ≤ V ≤ 1.

V =

(

− n

)/(n

1·

2·

·1

·2

)

1/2

for 2 × 2



/n)/min(I − 1, J − 1)



1/2

otherwise

Gamma (Goodman and Kruskal 1954, 1959, 1963, 1972; also see Agresti [2010,186–188])

ignores tied pairs and is based only on the number of concordant and discordant pairs of observations,

−1 ≤ γ ≤ 1,

γ = (P − Q)/(P + Q)

with asymptotic variance

(QA

− P D

)

/(P + Q)

Kendall’s τ

(Kendall 1945; also see Agresti 2010, 188–189), −1 ≤ τ

≤ 1, is similar to gamma,

except that it uses a correction for ties,

= (P − Q)/(w

)

1/2

with asymptotic variance

(2w

+ τ

)

− n

+ w

)

tabulate twoway — Two-way table of frequencies 15

where

−

i·

−

·j

− D

i·

+ n

·j

Fisher’s exact test (Fisher 1935; Finney 1948; see Zelterman and Louis [1992, 293–301] for

the 2 × 2 case) yields the probability of observing a table that gives at least as much evidence of

association as the one actually observed under the assumption of no association. Holding row and

column marginals ﬁxed, the hypergeometric probability P of every possible table A is computed,

and the

P =

T ∈A

Pr(T )

where A is the set of all tables with the same marginals as the observed table, T

∗

, such that

Pr(T ) ≤ Pr(T

∗

). For 2 × 2 tables, the one-sided probability is calculated by further restricting A to

tables in the same tail as T

∗

. The ﬁrst algorithm extending this calculation to r × c tables was Pagano

and Halvorsen (1981); the one implemented here is the FEXACT algorithm by Mehta and Patel (1986).

This is a search-tree clipping method originally published by Mehta and Patel (1983) with further

reﬁnements by Joe (1988) and Clarkson, Fan, and Joe (1993). Fisher’s exact test is a permutation

test. For more information on permutation tests, see Good (2005 and 2006) and Pesarin (2001).

References

Agresti, A. 2010. Analysis of Ordinal Categorical Data. 2nd ed. Hoboken, NJ: Wiley.

Campbell, M. J., D. Machin, and S. J. Walters. 2007. Medical Statistics: A Textbook for the Health Sciences. 4th

ed. Chichester, UK: Wiley.

Clarkson, D. B., Y.-A. Fan, and H. Joe. 1993. A remark on Algorithm 643: FEXACT: An algorithm for performing

Fisher’s exact test in r×c contingency tables. ACM Transactions on Mathematical Software 19: 484–488.

Conover, W. J. 1999. Practical Nonparametric Statistics. 3rd ed. New York: Wiley.

Cox, N. J. 1996. sg57: An immediate command for two-way tables. Stata Technical Bulletin 33: 7–9. Reprinted in

Stata Technical Bulletin Reprints, vol. 6, pp. 140–143. College Station, TX: Stata Press.

. 1999. sg113: Tabulation of modes. Stata Technical Bulletin 50: 26–27. Reprinted in Stata Technical Bulletin

Reprints, vol. 9, pp. 180–181. College Station, TX: Stata Press.

. 2003. sg113 1: Software update: Tabulation of modes. Stata Journal 3: 211.

. 2009. Speaking Stata: I. J. Good and quasi-Bayes smoothing of categorical frequencies. Stata Journal 9:

306–314.

Cram

er, H. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press.

Fienberg, S. E. 1980. The Analysis of Cross-Classiﬁed Categorical Data. 2nd ed. Cambridge, MA: MIT Press.

Finney, D. J. 1948. The Fisher–Yates test of signiﬁcance in 2 × 2 contingency tables. Biometrika 35: 145–156.

Fisher, R. A. 1935. The logic of inductive inference. Journal of the Royal Statistical Society 98: 39–82.

Good, P. I. 2005. Permutation, Parametric, and Bootstrap Tests of Hypotheses: A Practical Guide to Resampling

Methods for Testing Hypotheses. 3rd ed. New York: Springer.

. 2006. Resampling Methods: A Practical Guide to Data Analysis. 3rd ed. Boston: Birkh

auser.

Goodman, L. A., and W. H. Kruskal. 1954. Measures of association for cross classiﬁcations. Journal of the American

Statistical Association 49: 732–764.

16 tabulate twoway — Two-way table of frequencies

. 1959. Measures of association for cross classiﬁcations II: Further discussion and references. Journal of the

American Statistical Association 54: 123–163.

. 1963. Measures of association for cross classiﬁcations III: Approximate sampling theory. Journal of the American

Statistical Association 58: 310–364.

. 1972. Measures of association for cross classiﬁcations IV: Simpliﬁcation of asymptotic variances. Journal of

the American Statistical Association 67: 415–421.

Harrison, D. A. 2006. Stata tip 34: Tabulation by listing. Stata Journal 6: 425–427.

Jann, B. 2008. Multinomial goodness-of-ﬁt: Large-sample tests with survey design correction and exact tests for small

samples. Stata Journal 8: 147–169.

Joe, H. 1988. Extreme probabilities for contingency tables under row and column independence with application to

Fisher’s exact test. Communications in Statistics, Theory and Methods 17: 3677–3685.

Judson, D. H. 1992. sg12: Extended tabulate utilities. Stata Technical Bulletin 10: 22–23. Reprinted in Stata Technical

Bulletin Reprints, vol. 2, pp. 140–141. College Station, TX: Stata Press.

Kendall, M. G. 1945. The treatment of ties in rank problems. Biometrika 33: 239–251.

Longest, K. C. 2012. Using Stata for Quantitative Analysis. Thousand Oaks, CA: Sage.

Mehta, C. R., and N. R. Patel. 1983. A network algorithm for performing Fisher’s exact test in r×c contingency

tables. Journal of the American Statistical Association 78: 427–434.

. 1986. Algorithm 643 FEXACT: A FORTRAN subroutine for Fisher’s exact test on unordered r×c contingency

tables. ACM Transactions on Mathematical Software 12: 154–161.

Newson, R. B. 2002. Parameters behind “nonparametric” statistics: Kendall’s tau, Somers’ D and median differences.

Stata Journal 2: 45–64.

Pagano, M., and K. T. Halvorsen. 1981. An algorithm for ﬁnding the exact signiﬁcance levels of r×c contingency

tables. Journal of the American Statistical Association 76: 931–934.

Pearson, K. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated

system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical

Magazine, Series 5 50: 157–175.

Pesarin, F. 2001. Multivariate Permutation Tests: With Applications in Biostatistics. Chichester, UK: Wiley.

Weesie, J. 2001. dm91: Patterns of missing values. Stata Technical Bulletin 61: 5–7. Reprinted in Stata Technical

Bulletin Reprints, vol. 10, pp. 49–51. College Station, TX: Stata Press.

Wolfe, R. 1999. sg118: Partitions of Pearson’s χ

for analyzing two-way tables that have ordered columns. Stata

Technical Bulletin 51: 37–40. Reprinted in Stata Technical Bulletin Reprints, vol. 9, pp. 203–207. College Station,

TX: Stata Press.

Zelterman, D., and T. A. Louis. 1992. Contingency tables in medical studies. In Medical Uses of Statistics, 2nd ed,

ed. J. C. Bailar III and C. F. Mosteller, 293–310. Boston: Dekker.

tabulate twoway — Two-way table of frequencies 17

Also see

[R] table — Flexible table of summary statistics

[R] tabstat — Compact table of summary statistics

[R] tabulate oneway — One-way table of frequencies

[R] tabulate, summarize() — One- and two-way tables of summary statistics

[D] collapse — Make dataset of summary statistics

[ST] epitab — Tables for epidemiologists

[SVY] svy: tabulate oneway — One-way tables for survey data

[SVY] svy: tabulate twoway — Two-way tables for survey data

[XT] xttab — Tabulate xt data

[U] 12.6.3 Value labels

[U] 25 Working with categorical data and factor variables