Sampling Theory

Lecture 9

Dave Brocker

Farmingdale State College

Standard Normal Distribution

The 4 properties of a standard normal distribution are:

They are shaped like a bell (“bell curve”).
They are symmetric.
They are unimodal.
The mean = median = mode.

Statistics

Descriptive Statistics:
- Goal: Describe the sample
  - Examples: Mean, Standard Deviation
Inferential Statistics:
- Goal: Use the sample to make inferences about the population
  - Examples: t-test, ANOVA, Regression

Sampling theory

Professor Brocker still wants to know how much adults in the US enjoy the Netflix Original, Dark. He has unlimited funds to study this very important research question. He hires his 24 students from to collect the data. Each student has to collect 500 responses to the following question:

Sampling theory

On a scale of 1 (I hate it with my entire being) to 10 (I believe in my soul that Dark is the best show ever made), how much do you enjoy Dark?

Sampling Theory

Each one of you asks 500 people how much they enjoy the Netflix Original Dark.
This is you on the street asking people:

Sampling Theory

How many people did each of you ask?

S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15	S16	S17	S18	S19	S20	S21	S22	S23	S24	S25	S26	S27	S28	S29	S30
7	3	4	1	4	7	2	6	7	4	2	4	2	5	5	2	3	7	6	4	5	6	7	4	3	7	2	3	1	1
6	5	2	1	3	3	5	6	3	6	4	3	4	3	5	7	7	7	5	5	2	7	1	4	3	4	7	5	1	5
7	2	2	5	4	5	6	6	4	7	3	5	3	4	7	4	6	7	7	1	1	1	2	6	4	2	1	3	3	7
4	1	2	5	3	1	2	5	7	3	5	5	2	3	3	7	4	2	5	1	4	7	1	1	5	5	1	4	4	4
4	6	3	6	7	1	6	6	3	7	4	1	7	6	3	1	6	1	4	7	2	7	2	5	7	6	4	3	6	3
5	5	3	5	3	7	1	4	4	7	2	5	7	7	3	6	6	5	4	4	4	7	1	6	6	1	7	3	7	2

Sampling theory

Once each of you collect 500 responses, I ask you to calculate the average answer. So you calculate the mean (you add up all 500 responses, and then divide that number by 500).

\[\frac{\sum(x_1...x_{500}}{500}\]

Sampling theory

They are the means from each of your samples.
These are called Sample Means

Sampling theory

We can do lots of cool things with Sample Means. We could calculate the Mean of the Sample Means. We could calculate the standard deviation of the Sample Means.
But…it’s theoretical. It doesn’t really exist, but we imagine it’s existence for the sake of Sampling Theory.

Sampling distribution

A Sampling Distribution is the theoretical distribution of means across every single possible sample.

Like all the Dark samples.
If we took samples of the population until we got every single person in the population, then calculated the mean for each sample, they would distribute like this:

Sampling theory

Except now each value in this distribution no longer represents 1 person/participant.
Each value in this distribution represents the average of 1 sample, a Sample Mean.

What does the Stats Cat say?

Sampling theory

Except now each value in this distribution no longer represents 1 person/participant.
Each value in this distribution represents the average of 1 sample, a Sample Mean.

Review

What is a sample distribution?
What does each X value in a sample distribution represent?
What does \(\mu\) represent?
What does \(\sigma\) represent?

Probability

What percentage of participants rated Dark with a z-score of 2 or HIGHER?

Probability

This is the distribution of sample means from adults in the US.
What is the probability of any sample having a mean that is a z-score of 2 or HIGHER?

THIS IS the key point

We assume that the distribution of sample means is normal.
We use that assumption to gauge the probability of getting a particular mean from a single sample.
We can literally find that probability, the same way we did with x-values in a normal distribution.

Example

We want to know how much people like pizza.

There are 12,500 people in our population. Each of the 25 of us collects a sample of 500.
\(500 \times 25 = 12,500\)
Each of us calculates the mean response from our sample of 500 people.
We plot those means and it gives us a normal distribution.
What do we call those means?

Pizza Plot

Visual Example

Example

I randomly choose a mean from our distribution of sample means, about how much people like pizza.

What is the probability of picking a mean with a z-score of -1 or less?

Sampling theory

Sampling Theory is the body of principles underlying the drawing of infinite samples that accurately represent the population from which they are taken and to which inferences can be made.

Sampling theory as the basis for inferential statistics

Independent variable:
- Experimental Group: Super secret limitless drug
Control Group: Placebo
- Dependent variable: IQ

Sampling theory

Sampling Theory is the idea that if we took infinite samples of a population, they would create a normal distribution.

And because they’d create a normal distribution, we can make guesses about the probability of getting a specific mean.
We will use this probability to gauge significance of our inferential statistics.

Sampling theory

The mean of a sample is notated as M.
The mean of a population is notated as (\(\mu\)).

Sampling theory

The dispersion of a sample is known as the standard deviation, or SD, or sometimes just s.
The dispersion of a population cannot be a “deviation” because we don’t actually know it. So instead, we call it the Standard Error, or SE, or most often (sigma).

Sampling theory, but candy

This is a jar of 600 pieces of candy.

If I took scoops of 25 pieces of candy at a time, on average how many oranges would I get per scoop?
I have a scoop of 25 pieces of candy, and 6 of them are orange.

Sampling theory, but candy

[Insert two peaks with MM]

Did this scoop come from the jar of m&ms?
- Probably!
I have a scoop of 25 pieces of candy, and all 25 of them are orange.
Did this scoop come from the jar of m&ms?
- Probably not

P-values

### Hypothesis testing, Pt 1

Sampling theory

We ask every single FSC student to rate their sense of belonging on FSC campus on a scale of 1 (I don’t belong at all) to 10 (I belong completely).

We each calculate the average response from our own sample of 400.

There are about 10,000 students at Farmingdale State College.
Each of the 25 of us recruits a sample of 400 students.

Sampling theory

Now we have 25 samples of 400 FSC students each, which equals the full student population of 10,000.
We take 25 samples of 400 students each from another college…John Jacob Jinglehymer Smith University. JJJSMU also has 10,000 students.
We ask the JJJSMU students the same question about sense of belonging.
We find the mean of each of the 25 samples from JJJSMU

Sampling Theory as The Basis for Inferential Statistics

What is a hypothesis?

A hypothesis is a testable prediction of what will happen in our experiment that:

Names of the variables (independent and dependent)
Clearly contrasts the groups

Hypothesis:

Example

Professor Brocker wants to know if Millennials enjoy the Netflix Original, Dark significantly more than Gen Z. She recruits 500 Millennials and 500 Zoomers and asks them to rate Dark on a scale of 1 to 10 (10 being fantastic).

Hypothesis: Millennials will rate their enjoyment of Dark as significantly higher than their Gen Z peers.

Hypothesis:

Example

Hypothesis: Millennials will rate their enjoyment of Dark as significantly higher than their Gen Z peers.

What is a Hypothesis?

An Alternative Hypothesis is a testable prediction of what will happen in our experiment that:

Names of the variables (independent and dependent)
Clearly contrasts the groups.

Alternative hypothesis

An Alternative Hypothesis is a testable prediction of what will happen in our experiment that names of the variables (independent and dependent) and clearly contrasts the groups.

The Alternative Hypothesis is written as \(H_1\)

Null Hypothesis

The Null Hypothesis states that nothing will happen. \(H_0\)

Because Null means zero, nothing, nada.

The Null Hypothesis states that nothing will happen while also:

Naming of the variables (independent and dependent)

Alternative & Null Hypotheses:

Example

Professor Brocker’s Dark Experiment:

Alternative Hypothesis: Millennials will rate their enjoyment of Dark as significantly higher than their Gen Z peers.
Null Hypothesis: Millennials and Gen Z will not differ in their rating of enjoyment of the Eric Andre Show.

Hypothesis testing

Null Hypothesis: These is no difference in the DV between the IV groups.
Alternative Hypothesis: The experimental group is significantly different from the control group on the DV.

Hypothesis Testing

Example 1

Dr. Apriceno wants to know if giving her students coffee will improve their exam scores. She randomly assigns 13 of her 26 students to drink a doubleshot; she calls this the experimental group. The other 13 students drink decaf (a placebo); she calls this the control group.

\(H_0\):

\(H_1\):

Hypothesis Testing

Example 2

Esmeralda gives an anti-depressant to 100 individuals suffering from depression. She gives another 100 individuals a placebo. After 2 months, we measure their depression.

\(H_0\):

\(H_1\):

Hypothesis Testing

Practice

Jonas assigns half of the participants to engage in aerobic exercise for one hour a day 5 days a week for 6 months. The other half of the participants do not exercise for 6 months. At the end of the 6 months, Jonas measures the participants’ working memory capacity.

\(H_0\):

\(H_1\):

[Make big text] ## Better to be a reject than a failure

Rejects and Failures

In science, we do not say that we proved anything.
Nothing is ever really proven.
Our findings will be stated in terms of the Null Hypothesis.
The Null Hypothesis is that there are no differences between the groups.

Hypothesis testing practice

Jonas assigns half of the participants to engage in aerobic exercise for one hour a day 5 days a week for 6 months. The other half of the participants do not exercise for 6 months. At the end of the 6 months, Jonas measures the participants’ working memory capacity.

\(H_0\): There are no differences in working memory capacity between the experimental and control groups.

Rejects and Failures

In science, we do not say that we proved anything.
Our findings will be stated in terms of the Null Hypothesis.
If there are significant differences between the groups, we Reject the \(H_0\).
- Rejecting is good.
If there are NO differences between the groups, we Fail to Reject the \(H_0\).

Hypothesis testing

[Include YODA picture or something similar]

Hypothesis Testing

When reporting your findings, you must state them in terms of the Null Hypothesis. We do not mention the Alternative Hypothesis.

If the groups differ, we reject the null hypothesis: Reject \(H_0\).
If the groups do NOT differ, we fail: Fail to Reject \(H_0\)

Practice

Jonas assigns half of the participants to engage in aerobic exercise for one hour a day 5 days a week for 6 months. The other half of the participants do not exercise for 6 months. At the end of the 6 months, Jonas measures the participants’ working memory capacity. His test is not significant.

State Claudio’s findings in terms of the null hypothesis:

Hypothesis Practice

Practice

Brendan assigns half of the participants to view a picture of a face on a mortuary table (control condition). The other half of the participants view an image of their own face made to look old using an aging filter (experimental condition). Pedro then measures all participants’ anxiety about dying and runs a t-test, which is statistically significant.

State Brendan’s findings in terms of the null hypothesis:

S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15	S16	S17	S18	S19	S20	S21	S22	S23	S24	S25	S26	S27	S28	S29	S30
7	3	4	1	4	7	2	6	7	4	2	4	2	5	5	2	3	7	6	4	5	6	7	4	3	7	2	3	1	1
6	5	2	1	3	3	5	6	3	6	4	3	4	3	5	7	7	7	5	5	2	7	1	4	3	4	7	5	1	5
7	2	2	5	4	5	6	6	4	7	3	5	3	4	7	4	6	7	7	1	1	1	2	6	4	2	1	3	3	7
4	1	2	5	3	1	2	5	7	3	5	5	2	3	3	7	4	2	5	1	4	7	1	1	5	5	1	4	4	4
4	6	3	6	7	1	6	6	3	7	4	1	7	6	3	1	6	1	4	7	2	7	2	5	7	6	4	3	6	3
5	5	3	5	3	7	1	4	4	7	2	5	7	7	3	6	6	5	4	4	4	7	1	6	6	1	7	3	7	2

S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15	S16	S17	S18	S19	S20	S21	S22	S23	S24	S25	S26	S27	S28	S29	S30
7	3	4	1	4	7	2	6	7	4	2	4	2	5	5	2	3	7	6	4	5	6	7	4	3	7	2	3	1	1
6	5	2	1	3	3	5	6	3	6	4	3	4	3	5	7	7	7	5	5	2	7	1	4	3	4	7	5	1	5
7	2	2	5	4	5	6	6	4	7	3	5	3	4	7	4	6	7	7	1	1	1	2	6	4	2	1	3	3	7
4	1	2	5	3	1	2	5	7	3	5	5	2	3	3	7	4	2	5	1	4	7	1	1	5	5	1	4	4	4
4	6	3	6	7	1	6	6	3	7	4	1	7	6	3	1	6	1	4	7	2	7	2	5	7	6	4	3	6	3
5	5	3	5	3	7	1	4	4	7	2	5	7	7	3	6	6	5	4	4	4	7	1	6	6	1	7	3	7	2

S1	S2	S3	S4	S5	S6	S7	S8	S9	S10	S11	S12	S13	S14	S15	S16	S17	S18	S19	S20	S21	S22	S23	S24	S25	S26	S27	S28	S29	S30
7	3	4	1	4	7	2	6	7	4	2	4	2	5	5	2	3	7	6	4	5	6	7	4	3	7	2	3	1	1
6	5	2	1	3	3	5	6	3	6	4	3	4	3	5	7	7	7	5	5	2	7	1	4	3	4	7	5	1	5
7	2	2	5	4	5	6	6	4	7	3	5	3	4	7	4	6	7	7	1	1	1	2	6	4	2	1	3	3	7
4	1	2	5	3	1	2	5	7	3	5	5	2	3	3	7	4	2	5	1	4	7	1	1	5	5	1	4	4	4
4	6	3	6	7	1	6	6	3	7	4	1	7	6	3	1	6	1	4	7	2	7	2	5	7	6	4	3	6	3
5	5	3	5	3	7	1	4	4	7	2	5	7	7	3	6	6	5	4	4	4	7	1	6	6	1	7	3	7	2