Lab 2: Exploring Data
Farmingdale State College
This week we are exploring the Billboard Hot 100 Number Ones Database. This workbook contains substantial data about every song to ever top the Billboard Hot 100 between August 4, 1958 and January 11, 2025. It was compiled by Chris Dalla Riva as he wrote the book Uncharted Territory: What Numbers Tell Us about the Biggest Hit Songs and Ourselves.
This dataset has 105 columns (wow!) and they might not all be needed. Let’s focus on the ones that can be useful!
# Step 1: Read data
billboard <-
readr::read_csv(
'https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-08-26/billboard.csv',
show_col_types = FALSE
)
# Step 2: Group by Artist
billboard |>
group_by(artist)
# Step 3: Count, ungroup, and select top 10
count(sort = TRUE) |>
ungroup() |>
top_n(10)
artist | Frequency |
---|---|
The Beatles | |
Mariah Carey | |
Madonna | |
Michael Jackson | |
Whitney Houston | |
Janet Jackson | |
Taylor Swift | |
The Supremes | |
Bee Gees | |
Stevie Wonder | |
The Rolling Stones |
Given the following dataset, find the mean:
\(\{1,3,5,4,3,2,1,4,5\}\)
00:42
⬡⬡⬡