The goal of every statistical study is to collect data and then use that data to make a decision. The results of the study will only be as good as the process used to obtain the data. If the data collection is flawed or biased, the results of the study will be questionable, thus diminishing the validity of the study.

bullet Collecting Data

There are several ways to collect data. The following are four popular ways to obtain data. The appropriate choice will be based upon the nature of the study being undertaken.

1. Taking a total count (a census): count, or measure, an entire population. While yielding valuable information, it is often difficult (and costly) to reach an entire population (be it people, animals, plants, brands, etc.).
2. Collect a sample: count, or measure, only a part of a population. This is easier to accomplish than a "census", and is used to "predict" the results for a population. Beware of using biased data.
3. Create a simulation: reproduce an event, often on a computer, that would be impractical or dangerous to create in real life.
4. Perform an experiment: an action is performed, and responses observed and recorded. A control group (to which the action is not applied) may be used for comparison purposes.

survey

divider

 

bullet Random Sampling

When conducting statistical studies,
having a "random" sample of data will avoid the creation of bias.

drfinition A simple random sample is a subset of the statistical population in which each subject from the subset has an equal possibility of being chosen,

For example, you conduct a survey to determine the favorite TV program viewed by freshman students, but you only survey the football team. Your survey then shows the favorite TV program to be a sports program.

Your survey is biased. The students answering your survey were a "select" group of students, not a random group of students. A random group would give you a result more representative of ALL freshman students.

Random Samples: No favoritism is shown. The selection is purely by chance.
Random sampling guarantees that the sample chosen is representative of the population
and that the sample is selected in an unbiased way.

An example of a random sample for the TV survey mentioned above could be
surveying students as they enter the school building.

Other methods of random sampling may include drawing names from a hat, assigning and drawing numbers, using random number generators, and using a random number table.
Using a random number table is discussed in Algebra 2.

divider


bullet Population Data and Sample Data

When dealing with statistical data, it is important to distinguish between
"population" data sets and "sample" data sets.

definition A population data set contains all members of a specified group (the entire list of possible data values).
Example: The population may be "ALL people living in the US."

In statistics, a "population" is the entire set of items possible, be it people, objects, animals, plants, clubs, phones, etc. Unlike the traditional definition of "population" referring to people, in statistics population refers to all of the "items/people" that could possibility belong to the data set.

Population:
All students in your high school who have blonde hair.

Collecting this data is a do-able task. The students who would comprise the population are in one local area making collecting information from them possible.

Population: All high school students in the United States who have blonde hair.
Collecting this data is likely a problem. The students of the population are scatter across the United States making their availability questionable.

CONCLUSION: It is easy to see why securing data from a broad population might be difficult, if not impossible.

Use "population data" only when you know you are working with the ENTIRE population.

 

When a population is extremely large and it is difficult (if not impossible) to secure information from each participant, statisticians choose to work with a representative "sample" of the population.

definition A sample data set contains a part, or a subset, of a population. The size of a sample is always less than the size of the population from which it is taken.
Example: The sample may be "SOME people living in the US."

A "sample" is a smaller, more manageable portion of the larger population.
It is important to remember when choosing a "sample" to guarantee that the sample reflects the characteristics of the population. Avoid choosing a sample that reflects only one specific portion of the population, thus creating a biased data set.

Population: All high school students in the United States.

Viable Sample: All students in a large diverse high school in your state.
This "sample" may be a good choice for a statistical study.
The size and the diversity lend themselves to a good unbiased sample data set.

Biased Sample: All students in a small private high school in your state.

This "sample" will most likely not be a good representative choice for a statistical study.
The small size and the association to only a private school lend themselves to a biased sample data set.

CONCLUSION: It is crucial that when choosing a "sample" data set, that the choice reflects the characteristics of the population. Avoid a biased data set, as it will destroy the validity of the study.

Use "sample data" when you have a sample of a larger population, and you wish to generalize your findings from this sample to the entire larger population from which this sample was taken. The sample will be used as an estimate of the population.


There is one situation that changes these rules:
If you have a
"sample" of a larger population, but you are only interested in this sample (and you will not be generalizing your findings to the entire larger population), you should consider your "sample" as the "population data" for this one case.

 

Remember: When working with statistics, it is important to know if you are working with an entire population (where you have ALL of the possible data), or if you are working with only a sample (a part) of the data.


hint gal
Some questions will clearly state whether you are working with a population or a sample. If no statement is present, ask yourself if the statistical findings will be used to describe a larger group.
If the answer is yes, you are working with a sample.
Real world statisticians primarily work with sample situations,
since real-world data can be overwhelmingly large.

For more information and formulas on population data and sample data,
see "Population" Versus "Sample" Data



divider


NOTE: The re-posting of materials (in part or whole) from this site to the Internet is copyright violation
and is not considered "fair use" for educators. Please read the "Terms of Use".