home
Millsaps Academics
Spacer Image
             
MAJOR/MINOR REQUIREMENTS         Research & Field Opportunities        alumni community        fACULTY        
Spacer Image
             
Student Resources        Classes Offered        Anthropology club        news        links       e-mail us        HOME         
 
 
 

Sociology & Anthropology - Statistics Definitions

Note: These definitions are intentionally informal in an attempt to help students understand the concepts. We suggest that once the student has a grasp of the concept to consult a more formal definition from his or her course textbook.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Alternative hypothesis: What you accept if you determine that the null hypothesis is not supported.

Average: A single number that represents a group of data.

Anova: A type of statistical test used to determine if several population means are significantly different.

Chi-square: A test of significance that tests for relationships between two variables in a cross tabulation.

Coefficient of variation: A number which expresses the standard deviation as a percentage of the mean.

Confidence interval: An interval which is computed from a sample but which represents a range that you are a certain percent sure the population parameter will fall into.

Correlation coefficient (a.k.a. Pearson’s product moment correlation coefficient): A number between –1 and 1 which represents how well a line of best fit (Least Squares Line) describes a relationship between two variables. An r value of 0 implies that the line does not describe this relationship. An r value of 1 indicates that the data values fall directly on the line of best fit and hence the line is a good description of the relationship between these two variables. Similarly, an r value of –1 indicates that the data values fall directly on the line of best fit but indicates that there is an inverse relationship between the two variables (an increase in one occurs as the other decreases).

Cramer's V: A measure of strength of a association for cross tabulations. It is typically used with Chi Square (a test of significance). It ranges in value from 0 to 1.

Dependent samples: Two samples for which there is a natural pairing of the data (for example if you are testing whether or not a weigh gain program were affective and so your data represented 10 peoples weights both before and after they participated in the program) When working with dependent samples you are interested in the difference between each pair of data values.

Descriptive statistics: Statistics that define characteristics of a given set of data. (Mean, median, mode, and standard deviation).

Frequency: The number of times data values fall within a specified range.

Independent samples: Samples for which there is no attempt to pair the data values (for example if you went to two schools and asked 25 students at each school how many hours per week they sleep).

Inferential statistics: When a sample is used to make projections about characteristics of a population. (r, Chi square).

Interval Data. Numerical data where you can measure the difference between data points. Examples include temperature (in Fahrenheit or Celsius)....

Least squares line: For paired data, this is a line which best models the data. That is it is the line that comes "closest" to all the data values.

Level of significance (alpha): The probability with which you are willing to risk rejecting the null hypothesis, when it is in fact true. An alternative way of describing the level of probability is that it is the probability that the sample you have does not reflect the characteristics of the actual population from which it is drawn. This means that when p=.05 there is a one in 20 chance that the sample will not reflect the population.

Median: A type of average which is computed by ranking the data in numerical order and then determining the "middle" value.

Mean: A type of average that is computed by adding up all the data values, and dividing that sum by the number of data values.

Measure of association: A number which measures the degree of the relationship between variables.

Mode: A type of average which represents the data value(s) which occur most often.

Multivariate statistic: A statistic which involves two or more variables. Examples of a multivariate statistic include the correlation coefficient, chi-square, F, and the coefficient of determination.

Nominal Data. Data which can be grouped into categories that can be named but not put into rank order. Examples include religious denomination, make of a vehicle, political party.

Normal distribution: A type of distribution which follows a symmetrical, bell shaped curve which is centered about the mean.

Null hypothesis: A hypothesis set up for the purpose of seeing whether or not it can be supported by the data.

Ordinal Data. Data where the categories have an order but the differences in the categories are not quantified.. Examples include scales of like-dislike, strongly agree- strongly disagree, degrees of liberalness for political parties.

Oultier. A data point on scatterplot that is relatively far from line of best fit (regression line).

Pearson's r (correlation coefficient): A number between –1 and 1 which represents how well a line of best fit (Least Squares Line) describes a relationship between two variables. An r value of 0 implies that the line does not describe this relationship. An r value of 1 indicates that the data values fall directly on the line of best fit and hence the line is a good description of the relationship between these two variables. Similarly, an r value of –1 indicates that the data values fall directly on the line of best fit but indicates that there is an inverse relationship between the two variables (an increase in one occurs as the other decreases).

Percentile: a number which represents what percentage of the data falls at or below a given score (for example if you score in the 80th percentile on a test it means that 80% of those who took the test scored the same as or below your score)

Population: All measurements or observations of interest (ex. heights of all NBA basketball players).

Population parameter: A numerical descriptive measure of an entire population.

Power of a test: The probability with which you will accept the null hypothesis when it is in fact not true.

p-value: The smallest level of significance for which the statistic tells us to reject the null hypothesis ( p is computed from a given statistic such as a z-score, and if p is less than the level of significance, you would reject the null hypothesis. If p were greater than the level of significance you would fail to reject the null hypothesis.)

Random sample: A sample which has been formed without bias (every member of the Population has an equal chance of being chosen in the sample)

Range: A number which represents how many units over which the data is spread. It is computed by taking the largest data value minus the smallest data value.

Ratio Data: Interval data that includes a starting point for all measurements so it is possible to make definitive comparisons between data. Examples include GPA, age, year graduated from high school, and income.

Regression: A type of statistical analysis which allows us to describe a set of data with a mathematical equation. The most common type of regression studied in an introductory statistics class is linear regression. The advantage of regression is that it affords you a formula which could be used to make predictions of the outcome of one variable when given the value of the other variables.

R-squared (coefficient of determination): A number between –1 and 1 which represents how well a line of best fit (Least Squares Line) describes a relationship between two variables. An r value of 0 implies that the line does not describe this relationship. An r value of 1 indicates that the data values fall directly on the line of best fit and hence the line is a good description of the relationship between these two variables. Similarly, an r value of –1 indicates that the data values fall directly on the line of best fit but indicates that there is an inverse relationship between the two variables (an increase in one occurs as the other decreases).

Sample: A part of the population.

Standard deviation: A number which represents, on average, how much the individual data values differ from the mean.

Standard error: The amount, on average, that individual data values lie from the line of best fit (standard error is to the line of best fit as standard deviation is to the mean).

Statistic: A numerical descriptive of a sample.

Statistics: The study of how to collect, organize, analyze, and interpret data.

Statistical hypothesis: An assumption about a population parameter.

Strength of Association: This term refers to the degree with which variables are related. Note: this is different from level of significance which reflects the likelihood that the sample reflects the population. Beginning statistics students often confuse these two terms. It is important to remember that if the level of significance is not high enough, then the strength of association, no matter how high it is, is not indicative of a correlation between the two variables

t-distribution: A type of distribution used when your population is known to have normal distribution but your sample size is small. This distribution is different in shape pending your sample size but as the sample size gets large, the t-distribution becomes closer to that of a normal distribution.

t-test: A test of significance for population means when you have a small sample size.

Trimmed mean: A type of average for which a percentage of the low and high data are omitted before computing the mean (e.g. this is used in Olympic scoring where the low and high rankings are omitted and then a mean is computed on the remaining scores).

Type 1 error: An error where you reject the null hypothesis when it should not have been rejected.

Type 2 error: An error where you fail to reject the null hypothesis but it should have been rejected.

Univariate statistic: A statistics which involves a single variables. Examples of a univariate statistic include mean, median, mode, standard deviation, and z (or t) score.

Variables: Units that describe characteristics of the object of statistical analysis. Examples would include age, temperature, attitudes on a scale, etc.

z-score: A standard score which represents how many standard deviations away a given data value is from the mean (ex. If the mean of a set of data is 21 and the standard deviation is 3 then the number 24 would have a z score of 1 since it is 1 standard deviation away from the mean

 

 

 

 

La Tinaja
  EXPLORE AFRICA
Travel to East Africa and study life, history, economics and culture in Tanzania.
 
MIIAR
Kiuic
African Studies