Create a Treemap visualisation (kinda like a pie-chart but better, in my opinion). Treemaps are great for hierarchical data.
Visual from Information is Beautiful where the viz was a treemap featuring rectangles showing differences between different big numbers
Treemaps are great for hierarchical data visualisation. I was inspired by the informative and beautiful The Billion Dollar-o-Gram 2009 treemap at Information is Beautiful.
The data comes from the TidyTuesday Project in particular Week 11 of 2021.
It is data about the Bechdel Test. As per the TidyTuesday Readme these are the criteria needed to pass the test:
raw_bechdel <- readr::read_csv(glue::glue('https://raw.githubusercontent.com',
'/rfordatascience/tidytuesday/master/data',
'/2021/2021-03-09/raw_bechdel.csv'))
movies <- readr::read_csv(glue::glue('https://raw.githubusercontent.com',
'/rfordatascience/tidytuesday/master/data',
'/2021/2021-03-09/movies.csv'))
Let’s get a feel for the data we’re working with.
# output reproducible
set.seed(2187)
movie_sample <- movies %>%
slice_sample(n = 10)
movie_sample %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Movie Data')
))
movies %>%
count(clean_test, binary, sort=TRUE) %>%
gt::gt()
clean_test | binary | n |
---|---|---|
ok | PASS | 803 |
notalk | FAIL | 514 |
men | FAIL | 194 |
dubious | FAIL | 142 |
nowomen | FAIL | 141 |
movies %>%
count(genre, binary, clean_test) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Test criteria for pass/fail')
))
raw_bechdel %>%
count(rating) %>%
gt::gt()
rating | n |
---|---|
0 | 894 |
1 | 1940 |
2 | 896 |
3 | 5109 |
movies %>%
inner_join(raw_bechdel, by = "imdb_id") %>%
select(imdb_id, "title" = "title.x",
"year" = "year.x",
country, genre, rating) %>%
distinct() %>%
drop_na() %>%
slice_sample(n = 50) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Ratings for movies')
))
For a treemap you need to provide:
dtf
: A dataset
index
: The column(s) in your dataset that represents your group(s).
vSize
: The column(s) that represent the size of each of the group(s).
Perhaps we’re interested in seeing how many films in each genre are represented. Each movie may be categorised as multiple genres so let’s make each genre a separate row (tip from David Robinson’s screencasts). For example, the movie The Reader is listed as Drama, Romance
so it will have a separate row after this with each genre (two rows now instead of one).
movies %>%
# split each into a separate row with each genre listed separately
# creates multiple rows for a movie broken down by each genre it
# belongs to.
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, sort = TRUE) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 8),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Number of movies per Genre')
))
movies %>%
# split each into a separate row with each genre listed separately
# creates multiple rows for a movie broken down by each genre it
# belongs to.
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre) %>%
treemap(
# simple treemap with one group
index = "genre",
# size of rect = number of movies in each category
vSize = "n",
type = "index",
# make size of title and labels larger
fontsize.title = 30,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24)
)
That’s nice, clean and pretty! But we can also change the title etc. to make it more appealing.
Let’s say we just want to see the genre by pass and fail. To achieve this we make our index have the two groups we’re interested in seeing. In the below it is the:
index = c("genre", "binary")
which specifies that we’d like to see each rectangle first represent a genre and then represent whether it passed or failed the Bechdel test.
# package to allow me to add the font I want to use
library(showtext)
font_add(family = "segoeui", regular = r"(C:\WindowsFonts\segoeui.ttf)")
showtext_auto()
# install palettes
# devtools::install_github("sciencificity/werpals")
# I am using a palette I created
palette_prov <- werpals::nature_palettes[["provence"]]
movies %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, binary, sort = TRUE) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Number of Pass / Fail per Genre')
))
movies %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, binary) %>%
treemap(
# treemap with a level of hierarchy
index = c("genre", "binary"),
vSize = "n",
type = "index",
title = "Number of Pass vs Fail of Bechdel test in each Genre",
fontsize.title = 30,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24, 20),
# what font to use
fontfamily.title = "segeoui",
fontfamily.labels = "segeoui",
# align the labels again in order
align.labels=list(
# group 1
c("center", "top"),
# sub group 1
c("center", "center")
),
lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
palette = palette_prov
)
Let’s have a look at the gross income
Name of Field | Type | Description |
---|---|---|
intgross_2013 | character | International gross normalized to 2013 |
movies_summ <-
# setup our dataset for the treemap
movies %>%
# it is a character field so let's convert it
mutate(intgross_2013 = parse_number(intgross_2013)) %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
group_by(genre, binary) %>%
# add median gross income normalised to 2013 year
# make it millions
mutate(median_inc = round(median(intgross_2013, na.rm = TRUE)/1e6),2) %>%
ungroup() %>%
count(genre, binary, median_inc)
movies_summ %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Median Income for each Genre based on whether the movie passed / failed the test')
))
movie_summ_map <- movies_summ %>%
treemap(
# treemap with a level of hierarchy
index = c("genre", "median_inc", "binary"),
vSize = "median_inc",
type = "index",
title = "Income of each Genre based on whether the movie passed / failed test",
fontsize.title = 26,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24, 0, 20),
# what font to use
fontfamily.title = "segeoui",
fontfamily.labels = "segeoui",
# align the labels again in order
align.labels=list(
# group 1
c("center", "top"),
# sub group 1 - we're not showing this though
c("center", "bottom"),
# sub group 2
c("center", "center")
),
palette = "Set3",
lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
force.print.labels = FALSE,
)
On the viz below you can click on a block to enter it. Click again to exit.
# Make it interactive
d3tree(movie_summ_map, rootname = "Revenue per Genre per Pass/Fail Bechdel")
The movies that fail the test seem to have a higher revenue than those that pass.
You can save the widget using the code below (courtesy of R Graph Gallery).
# save the widget
library(htmlwidgets)
saveWidget(movie_summ_map, file=paste0(here::here("HtmlWidget"),
"interactiveTreemap-bechdel.html"))
For attribution, please cite this work as
Naidoo (2021, Nov. 21). Sciencificity's Blog: Treemap for Visualising data. Retrieved from https://sciencificity-blog.netlify.app/posts/2021-11-21-tree-map/
BibTeX citation
@misc{naidoo2021treemap, author = {Naidoo, Vebash}, title = {Sciencificity's Blog: Treemap for Visualising data}, url = {https://sciencificity-blog.netlify.app/posts/2021-11-21-tree-map/}, year = {2021} }