|
Note: These definitions
are intentionally informal in an attempt to help students understand
the concepts. We suggest that once the student has a grasp of the
concept to consult a more formal definition from his or her course
textbook.
A B C D E F G H I J K L M
N O P Q R S
T U V W X Y Z
Alternative
hypothesis: What you accept if you determine that the null hypothesis
is not supported.
Average: A single
number that represents a group of data.
Anova: A type
of statistical test used to determine if several population means
are significantly different.
Chi-square:
A test of significance that tests for relationships between two
variables in a cross tabulation.
Coefficient
of variation: A number which expresses the standard deviation as
a percentage of the mean.
Confidence interval:
An interval which is computed from a sample but which represents
a range that you are a certain percent sure the population parameter
will fall into.
Correlation
coefficient (a.k.a. Pearsons product moment correlation coefficient):
A number between 1 and 1 which represents how well a line
of best fit (Least Squares Line) describes a relationship between
two variables. An r value of 0 implies that the line does not describe
this relationship. An r value of 1 indicates that the data values
fall directly on the line of best fit and hence the line is a good
description of the relationship between these two variables. Similarly,
an r value of 1 indicates that the data values fall directly
on the line of best fit but indicates that there is an inverse relationship
between the two variables (an increase in one occurs as the other
decreases).
Cramer's V:
A measure of strength of a association for cross tabulations. It
is typically used with Chi Square (a test of significance). It ranges
in value from 0 to 1.
Dependent
samples: Two samples for which there is a natural pairing of
the data (for example if you are testing whether or not a weigh
gain program were affective and so your data represented 10 peoples
weights both before and after they participated in the program)
When working with dependent samples you are interested in the difference
between each pair of data values.
Descriptive
statistics: Statistics that define characteristics of a given set
of data. (Mean, median, mode, and standard deviation).
Frequency:
The number of times data values fall within a specified range.
Independent
samples: Samples for which there is no attempt to pair the data
values (for example if you went to two schools and asked 25 students
at each school how many hours per week they sleep).
Inferential
statistics: When a sample is used to make projections about characteristics
of a population. (r, Chi square).
Interval Data.
Numerical data where you can measure the difference between data
points. Examples include temperature (in Fahrenheit or Celsius)....
Least
squares line: For paired data, this is a line which best models
the data. That is it is the line that comes "closest"
to all the data values.
Level of significance
(alpha): The probability with which you are willing to risk rejecting
the null hypothesis, when it is in fact true. An alternative way
of describing the level of probability is that it is the probability
that the sample you have does not reflect the characteristics of
the actual population from which it is drawn. This means that when
p=.05 there is a one in 20 chance that the sample will not reflect
the population.
Median:
A type of average which is computed by ranking the data in numerical
order and then determining the "middle" value.
Mean: A type
of average that is computed by adding up all the data values, and
dividing that sum by the number of data values.
Measure of association:
A number which measures the degree of the relationship between variables.
Mode: A type
of average which represents the data value(s) which occur most often.
Multivariate
statistic: A statistic which involves two or more variables. Examples
of a multivariate statistic include the correlation coefficient,
chi-square, F, and the coefficient of determination.
Nominal
Data. Data which can be grouped into categories that can be named
but not put into rank order. Examples include religious denomination,
make of a vehicle, political party.
Normal distribution:
A type of distribution which follows a symmetrical, bell shaped
curve which is centered about the mean.
Null hypothesis:
A hypothesis set up for the purpose of seeing whether or not it
can be supported by the data.
Ordinal
Data. Data where the categories have an order but the differences
in the categories are not quantified.. Examples include scales of
like-dislike, strongly agree- strongly disagree, degrees of liberalness
for political parties.
Oultier. A data
point on scatterplot that is relatively far from line of best fit
(regression line).
Pearson's
r (correlation coefficient): A number between 1 and 1 which
represents how well a line of best fit (Least Squares Line) describes
a relationship between two variables. An r value of 0 implies that
the line does not describe this relationship. An r value of 1 indicates
that the data values fall directly on the line of best fit and hence
the line is a good description of the relationship between these
two variables. Similarly, an r value of 1 indicates that the
data values fall directly on the line of best fit but indicates
that there is an inverse relationship between the two variables
(an increase in one occurs as the other decreases).
Percentile:
a number which represents what percentage of the data falls at or
below a given score (for example if you score in the 80th percentile
on a test it means that 80% of those who took the test scored the
same as or below your score)
Population:
All measurements or observations of interest (ex. heights of all
NBA basketball players).
Population parameter:
A numerical descriptive measure of an entire population.
Power of a test:
The probability with which you will accept the null hypothesis when
it is in fact not true.
p-value: The
smallest level of significance for which the statistic tells us
to reject the null hypothesis ( p is computed from a given statistic
such as a z-score, and if p is less than the level of significance,
you would reject the null hypothesis. If p were greater than the
level of significance you would fail to reject the null hypothesis.)
Random
sample: A sample which has been formed without bias (every member
of the Population has an equal chance of being chosen in the sample)
Range: A number
which represents how many units over which the data is spread. It
is computed by taking the largest data value minus the smallest
data value.
Ratio Data:
Interval data that includes a starting point for all measurements
so it is possible to make definitive comparisons between data. Examples
include GPA, age, year graduated from high school, and income.
Regression:
A type of statistical analysis which allows us to describe a set
of data with a mathematical equation. The most common type of regression
studied in an introductory statistics class is linear regression.
The advantage of regression is that it affords you a formula which
could be used to make predictions of the outcome of one variable
when given the value of the other variables.
R-squared (coefficient
of determination): A number between 1 and 1 which represents
how well a line of best fit (Least Squares Line) describes a relationship
between two variables. An r value of 0 implies that the line does
not describe this relationship. An r value of 1 indicates that the
data values fall directly on the line of best fit and hence the
line is a good description of the relationship between these two
variables. Similarly, an r value of 1 indicates that the data
values fall directly on the line of best fit but indicates that
there is an inverse relationship between the two variables (an increase
in one occurs as the other decreases).
Sample:
A part of the population.
Standard deviation:
A number which represents, on average, how much the individual data
values differ from the mean.
Standard error:
The amount, on average, that individual data values lie from the
line of best fit (standard error is to the line of best fit as standard
deviation is to the mean).
Statistic: A
numerical descriptive of a sample.
Statistics:
The study of how to collect, organize, analyze, and interpret data.
Statistical
hypothesis: An assumption about a population parameter.
Strength of
Association: This term refers to the degree with which variables
are related. Note: this is different from level of significance
which reflects the likelihood that the sample reflects the population.
Beginning statistics students often confuse these two terms. It
is important to remember that if the level of significance is not
high enough, then the strength of association, no matter how high
it is, is not indicative of a correlation between the two variables
t-distribution:
A type of distribution used when your population is known to have
normal distribution but your sample size is small. This distribution
is different in shape pending your sample size but as the sample
size gets large, the t-distribution becomes closer to that of a
normal distribution.
t-test: A test
of significance for population means when you have a small sample
size.
Trimmed mean:
A type of average for which a percentage of the low and high data
are omitted before computing the mean (e.g. this is used in Olympic
scoring where the low and high rankings are omitted and then a mean
is computed on the remaining scores).
Type 1 error:
An error where you reject the null hypothesis when it should not
have been rejected.
Type 2 error:
An error where you fail to reject the null hypothesis but it should
have been rejected.
Univariate
statistic: A statistics which involves a single variables. Examples
of a univariate statistic include mean, median, mode, standard deviation,
and z (or t) score.
Variables: Units
that describe characteristics of the object of statistical analysis.
Examples would include age, temperature, attitudes on a scale, etc.
z-score:
A standard score which represents how many standard deviations away
a given data value is from the mean (ex. If the mean of a set of
data is 21 and the standard deviation is 3 then the number 24 would
have a z score of 1 since it is 1 standard deviation away from the
mean
|