Cross
Tabulation is a way of representing how categories of one variable
(independent variable) are distributed across the categories of another
variable (dependent variable). Thus one can see if there are patterns
of association between two variables in a cross tabulation matrix.
The variables can be nominal, ordinal, and grouped-interval
data. Like regression, cross tabulation has specific
statistics associated with it that tell us something about the degree
to which variables are related (called a measure of association) and
the likelihood that the patterns (or lack of patterns) represented
by the sample data did not occur by chance (test of significance).
Cramer's V is a measure of association for cross tabs.
It ranges from 0 to 1 (one indicating a strong relationship between
variables and 0 indicating none). Chi- Square on the other hand
is a measure of statistical significance. It will not
tell you how closely the variables are related, but rather indicates
whether it is likely that the sample distribution is a reflection
of the larger population rather than the result of chance. For
this it is assumed that a computer is generating chi-square and Cramer's
V. Refer to a statistics text book for details on how to compute
these statistics. This tutorial will focus on what the statisitcs
mean.
Reading cross
tabulation matrixes
Raw frequencies:
Tells you the number of cases that fall into each cell.
For example, 4 men prefer pizza and no women prefer pizza.

Percentage of
column: Read this as "of all the men surveyed, 50% prefer
pizza and 50% prefer pasta."

Percentage
of row: Read this as " 100% of those who
prefer pizza are men. Of those who prefer pasta, half are
men and half are women."

Percentages
of total sample: Read as "of everybody surveyed, 33.3% are
men who prefer, no women prefer pizza, 33.3% are men who prefer
pasta and 33.3% are women who prefer pasta.

Most statistical
packages allow you define how cell percentages are reported.
Make sure the format you select makes sense for the questions you
are asking of your data.
Statistical
significance and measuring association between variables.
Sample size
is important for establishing whether the cross tabulation is a
reasonable representation of reality for the population. If the
sample is too small so will be the chi-square value. Below you will
see examples of cross tabulations using a fairly large sample and
then a small sample.
Relatively
Large Sample
This cross tab
has two variables. Gender, the independent variable has two
categories: male and female. Food in this case has three categories:
Pizza, pasta, and sandwich. Variables may have multiple categories
but be cautioned about having too many, as there may not be enough
entries in the cells to produce statistical significance.

Here chi-square
is 54.973 and there are 2 degrees of freedom.
Degrees of freedom
(df) = (number of rows-1)(number of columns-1). For this there
are three rows and two columns.
Using a
Chi-Square table: Virtually any statistics
textbook will contain a table of critical values for Chi-Square.
Most give you two alpha levels to chose from: 0.05 and 0.01.
To find the critical value determine your alpha level (0.05 is usually
fine for most research purposes, it means that there is a 95% probability
that the sample reflects the population). Locate the degree
of freedom on the table for your cross tab. The number is the critical
value for chi-square. If the value computed for your cross
tab meets or exceeds the critical value, then there is statistical
significance. For the above cross tab the number well exceeds
the critical value of 5.991 for alpha = .05.
Cramer's V =
.643
The statistics
package calculated Cramer's V of .643. This means that the
re is a moderately strong link between one's gender and food choice
(remember this data is made up strictly for demonstration
purposes and does reflect reality). While there are no strict
standards for interpreting V, generally speaking, if it is less
than 0.10 then there is a weak relationship between variables.
Between 0.10 and 0.30 there is a moderate relationship, and more
than 0.30 indicates a strong relationship. Therefore, this
example, where V=.643, shows a strong relationship between gender
and food preference.
Relatively
Small sample
Here is a cross
tabulation with a small sample size. In order for a measures
of association to mean something with cross tabulation, there
must me statistical significance. In order for statistical
significance to occur the sample must be large enough. Here
is a case where the sample is not large enough to render statistical
significance. Note that this example has only two categories
for food the the degree of freedom, obtained by the above mentioned
formula, is 1.

N =
12
Chi-Square
= 3.00 and df = 1
Cramer's
V = .5
This cross tab
is not statically significant because at 1 df for alpha =.05 the
critical value of chi-square = 3.81. Thus, even though Cramer's
v is 0.5, which would indicate a moderately strong relationship
between variable if chi-square were significant, chi-square in this
case does not meet the critical value for our desired alpha and
we cannot say the that data tell us anything about a relationship
between variables.
Cautionary note:
Even though you may get a value for chi-square which exceed the
critical values for the alpha you want, if any cells have an expected
frequency of less than 5 you should be careful about putting too
much stock into any correlation measure. Most statistics packages
will indicate along with the reporting of statistics whether any
of the cells have an expected count of less than 5. If you
want to learn about expected counts (also called expected frequencies,
consult a statistics textbook.
|