3 New Material: Mean, Standard Deviation, Variance, Z-Score, Summation, and Stem and Leaf Plots

Welcome back!

How did everyone find the exercises? Were they too difficult? Were they too easy? Were there any parts that you didn’t understand? Each week when an exercise is turned in I would like to take time in the beginning of lab to make sure that everyone is on the same page. Statistics, especially when using an unfamiliar program like R, can be quite difficult. If you fall behind it is essential that you let me know so I know what things to stress.

First, let us do some review.

What would I need to type into R to have it display a histogram of the dataset: 1,2,3,4,5,6,7,8,9,10?
How could I rename the x-axis on this plot so that it carried more information than it currently does?

Uh-oh! I had just finished typing in a really long function and I accidentally forgot a comma. Is there any way for me to get everything I typed back without needing to retype it?
Given the following dataset, create a plot that shows the relationship between x and y, y and x, and z and y:

x = {2,4,6,10,16,26,42} y = {1,3,5,7,9,11,13} z = {5,10,15,20,25,30,35}

Below are two plots. Of the two, which one do you think shows a more apparent relationship or pattern?

Any questions?

You will see the formula for the arithmetic mean in several different ways. Here are just a few.

\(\mu = \frac{\sum x}{n}\) or \(M = \frac{x_1 + x_2 + x_3...}{N}\)

In other words, all numbers in a set should be added together and divided by the number of items in the set.

Standard Deviation

\(\sigma = \sqrt\frac{\sum (x - \mu)^2 }{n}\)

Variance

Variance is simply \((\sigma ^2)\)

This is the formula used to find a z score.

Z is equal to the raw score \(x\) minus the mean \(\mu\), divided by the standard deviation \(\sigma\).

\(z =\frac{x-\mu}{\sigma}\)

There is no function in R to calculate a z-score, so instead we will have to use the built in functions for each part of the equation like such:

Here are a few practice problems:

Find the mean, standard deviation, and variance for the following dataset: datset1 = {10,63,51,24,87,42}

Find the z scores for the same dataset.

Perform the following summations on this dataset:

x = (5,8,1,16,4,11)

\(\sum X^2\)
\(\sum (X)^2\)

Suppose that a psychologist wants to investigate the scores of his students in his statistics class. He may want to create a stem and leaf diagram in order to visualize the scores.

Here is how he might do this for the following data:

{85,90,88,95,90,91,85,94,83,86,90,90,88,94,90}

From this output we can infer a few things:

Several students (9) scored somewhere in the 90’.
No students failed
There are two areas for scores in the 80’s and scores in the 90’s, this is because the first “leaf” represents values from 0-4 and the second “leaf” accounts for the 5-9.

Similar to histograms, ( hist(x, scale = )) we can manipulate how the stem and leaf plot is displayed using the following phrase: stem(x, scale =)

Watch what happens when we use the different scale values.

The first stem and leaf plot is the default you will get if you just type in stem(x). The second stem and leaf plot separates each leaf making the list twice as long. The third stem and leaf plot uses half the spacing as the first, so the scale is about half as long.