Create a Treemap visualisation (kinda like a pie-chart but better, in my opinion). Treemaps are great for hierarchical data.
Visual from Information is Beautiful where the viz was a treemap featuring rectangles showing differences between different big numbers
Treemaps are great for hierarchical data visualisation. I was inspired by the informative and beautiful The Billion Dollar-o-Gram 2009 treemap at Information is Beautiful.
library(tidyverse)
library(treemap)
# install.packages("devtools")
# devtools::install_github("timelyportfolio/d3treeR")
library(d3treeR)
The data comes from the TidyTuesday Project in particular Week 11 of 2021.
It is data about the Bechdel Test. As per the TidyTuesday Readme these are the criteria needed to pass the test:
raw_bechdel <- readr::read_csv(glue::glue('https://raw.githubusercontent.com',
'/rfordatascience/tidytuesday/master/data',
'/2021/2021-03-09/raw_bechdel.csv'))
movies <- readr::read_csv(glue::glue('https://raw.githubusercontent.com',
'/rfordatascience/tidytuesday/master/data',
'/2021/2021-03-09/movies.csv'))
Let’s get a feel for the data we’re working with.
# output reproducible
set.seed(2187)
movie_sample <- movies %>%
slice_sample(n = 10)
movie_sample %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Movie Data')
))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
year | imdb | title | test | clean_test | binary | budget | domgross | intgross | code | budget_2013 | domgross_2013 | intgross_2013 | period_code | decade_code | imdb_id | plot | rated | response | language | country | writer | metascore | imdb_rating | director | released | actors | genre | awards | runtime | type | poster | imdb_votes | error |
---|
year | imdb | title | test | clean_test | binary | budget | domgross | intgross | code | budget_2013 | domgross_2013 | intgross_2013 | period_code | decade_code | imdb_id | plot | rated | response | language | country | writer | metascore | imdb_rating | director | released | actors | genre | awards | runtime | type | poster | imdb_votes | error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1995 | tt0113416 | The Incredibly True Adventures of Two Girls in Love | ok | ok | PASS | 250000 | 2210408 | 2477155 | 1995PASS | 382195 | 3379226 | 3787024 | 4 | 3 | 0113416 | ||||||||||||||||||
2 | 2008 | tt0866439 | Made of Honor | men | men | FAIL | 40000000 | 46012734 | 106548738 | 2008FAIL | 43290252 | 49797572 | 115313044 | 2 | 2 | 0866439 | A guy in love with an engaged woman tries to win her over after she asks him to be her maid of honor. | PG-13 | true | English | USA, UK | Adam Sztykiel (screenplay), Deborah Kaplan (screenplay), Harry Elfont (screenplay), Adam Sztykiel (story) | 37 | 5.8 | Paul Weiland | 02 May 2008 | Patrick Dempsey, Michelle Monaghan, Kevin McKidd, Kadeem Hardison | Comedy, Romance | 1 nomination. | 101 min | movie | http://ia.media-imdb.com/images/M/MV5BMTk1MzA5MjEzMF5BMl5BanBnXkFtZTcwNTk0MjU1MQ@@._V1_SX300.jpg | 42509 | |
3 | 1992 | tt0104797 | Malcolm X | dubious | dubious | FAIL | 35000000 | 48169910 | 48169910 | 1992FAIL | 58112153 | 79978777 | 79978777 | 5 | 3 | 0104797 | Biographical epic of the controversial and influential Black Nationalist leader, from his early life and career as a small-time gangster, to his ministry as a member of the Nation of Islam and his assassination. | PG-13 | true | English | USA, Japan | Alex Haley (book), Malcolm X (book), Arnold Perl (screenplay), Spike Lee (screenplay) | 72 | 7.7 | Spike Lee | 18 Nov 1992 | Denzel Washington, Angela Bassett, Albert Hall, Al Freeman Jr. | Biography, Drama, History | Nominated for 2 Oscars. Another 19 wins & 8 nominations. | 202 min | movie | http://ia.media-imdb.com/images/M/MV5BMTYyNzg4MTMxM15BMl5BanBnXkFtZTcwMzE5MzIyMQ@@._V1_SX300.jpg | 50949 | |
4 | 2006 | tt0454921 | The Pursuit of Happyness | nowomen-disagree | nowomen | FAIL | 55000000 | 162586036 | 307325633 | 2006FAIL | 63568799 | 187916346 | 355205844 | 2 | 2 | 0454921 | A struggling salesman takes custody of his son as he's poised to begin a life-changing professional endeavor. | PG-13 | true | English, Cantonese | USA | Steve Conrad | 64 | 7.9 | Gabriele Muccino | 15 Dec 2006 | Will Smith, Jaden Smith, Thandie Newton, Brian Howe | Biography, Drama | Nominated for 1 Oscar. Another 11 wins & 18 nominations. | 117 min | movie | http://ia.media-imdb.com/images/M/MV5BMTQ5NjQ0NDI3NF5BMl5BanBnXkFtZTcwNDI0MjEzMw@@._V1_SX300.jpg | 245551 | |
5 | 2001 | tt0265086 | Black Hawk Down | nowomen-disagree | nowomen | FAIL | 95000000 | 108638745 | 159691085 | 2001FAIL | 125005366 | 142951853 | 210128869 | 3 | 2 | 0265086 | 123 elite U.S. soldiers drop into Somalia to capture two top lieutenants of a renegade warlord and find themselves in a desperate battle with a large force of heavily-armed Somalis. | R | true | English, Somali | USA, UK | Mark Bowden (book), Ken Nolan (screenplay) | 74 | 7.7 | Ridley Scott | 18 Jan 2002 | Josh Hartnett, Ewan McGregor, Tom Sizemore, Eric Bana | Drama, History, War | Won 2 Oscars. Another 8 wins & 28 nominations. | 144 min | movie | http://ia.media-imdb.com/images/M/MV5BMTQxODgzMjYyN15BMl5BanBnXkFtZTgwNDU4NTYxMTE@._V1_SX300.jpg | 225412 |
raw_bechdel %>%
filter(imdb_id %in% c(movie_sample %>% pull(imdb_id))) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Bechdel Data')
))
|
|
|
|
| |
year | id | imdb_id | title | rating |
---|
year | id | imdb_id | title | rating | |
---|---|---|---|---|---|
1 | 1992 | 4391 | 0104797 | Malcolm X | 3 |
2 | 1995 | 2931 | 0113416 | The Incredibly True Adventures of Two Girls in Love | 3 |
3 | 1997 | 3917 | 0124819 | Orgazmo | 1 |
4 | 2000 | 1843 | 0168629 | Dancer in the Dark | 3 |
5 | 2001 | 698 | 0265086 | Black Hawk Down | 0 |
movies %>%
count(clean_test, binary, sort=TRUE) %>%
gt::gt()
clean_test | binary | n |
---|---|---|
ok | PASS | 803 |
notalk | FAIL | 514 |
men | FAIL | 194 |
dubious | FAIL | 142 |
nowomen | FAIL | 141 |
movies %>%
count(genre, binary, clean_test) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Test criteria for pass/fail')
))
raw_bechdel %>%
count(rating) %>%
gt::gt()
rating | n |
---|---|
0 | 894 |
1 | 1940 |
2 | 896 |
3 | 5109 |
movies %>%
inner_join(raw_bechdel, by = "imdb_id") %>%
select(imdb_id, "title" = "title.x",
"year" = "year.x",
country, genre, rating) %>%
distinct() %>%
drop_na() %>%
slice_sample(n = 50) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Ratings for movies')
))
|
|
|
|
|
| |
imdb_id | title | year | country | genre | rating |
---|
imdb_id | title | year | country | genre | rating | |
---|---|---|---|---|---|---|
1 | 0483726 | Man of the Year | 2006 | USA | Comedy, Drama, Romance | 1 |
2 | 0296572 | The Chronicles of Riddick | 2004 | USA | Action, Adventure, Sci-Fi | 2 |
3 | 1201167 | Funny People | 2009 | USA | Comedy, Drama | 1 |
4 | 0472033 | 9 | 2009 | USA | Animation, Action, Adventure | 0 |
5 | 0271027 | Kiss of the Dragon | 2001 | France, USA | Action, Crime, Thriller | 1 |
For a treemap you need to provide:
dtf
: A dataset
index
: The column(s) in your dataset that represents your group(s).
vSize
: The column(s) that represent the size of each of the group(s).
Perhaps we’re interested in seeing how many films in each genre are represented. Each movie may be categorised as multiple genres so let’s make each genre a separate row (tip from David Robinson’s screencasts). For example, the movie The Reader is listed as Drama, Romance
so it will have a separate row after this with each genre (two rows now instead of one).
movies %>%
# split each into a separate row with each genre listed separately
# creates multiple rows for a movie broken down by each genre it
# belongs to.
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, sort = TRUE) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 8),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Number of movies per Genre')
))
movies %>%
# split each into a separate row with each genre listed separately
# creates multiple rows for a movie broken down by each genre it
# belongs to.
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre) %>%
treemap(
# simple treemap with one group
index = "genre",
# size of rect = number of movies in each category
vSize = "n",
type = "index",
# make size of title and labels larger
fontsize.title = 30,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24)
)
That’s nice, clean and pretty! But we can also change the title etc. to make it more appealing.
Let’s say we just want to see the genre by pass and fail. To achieve this we make our index have the two groups we’re interested in seeing. In the below it is the:
index = c("genre", "binary")
which specifies that we’d like to see each rectangle first represent a genre and then represent whether it passed or failed the Bechdel test.
# package to allow me to add the font I want to use
library(showtext)
font_add(family = "segoeui", regular = r"(C:\WindowsFonts\segoeui.ttf)")
showtext_auto()
# install palettes
# devtools::install_github("sciencificity/werpals")
# I am using a palette I created
palette_prov <- werpals::nature_palettes[["provence"]]
movies %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, binary, sort = TRUE) %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Number of Pass / Fail per Genre')
))
movies %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
count(genre, binary) %>%
treemap(
# treemap with a level of hierarchy
index = c("genre", "binary"),
vSize = "n",
type = "index",
title = "Number of Pass vs Fail of Bechdel test in each Genre",
fontsize.title = 30,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24, 20),
# what font to use
fontfamily.title = "segeoui",
fontfamily.labels = "segeoui",
# align the labels again in order
align.labels=list(
# group 1
c("center", "top"),
# sub group 1
c("center", "center")
),
lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
palette = palette_prov
)
Let’s have a look at the gross income
Name of Field | Type | Description |
---|---|---|
intgross_2013 | character | International gross normalized to 2013 |
movies_summ <-
# setup our dataset for the treemap
movies %>%
# it is a character field so let's convert it
mutate(intgross_2013 = parse_number(intgross_2013)) %>%
separate_rows(genre, sep = ",") %>%
mutate(genre = str_squish(genre)) %>%
filter(!is.na(genre)) %>%
group_by(genre, binary) %>%
# add median gross income normalised to 2013 year
# make it millions
mutate(median_inc = round(median(intgross_2013, na.rm = TRUE)/1e6),2) %>%
ungroup() %>%
count(genre, binary, median_inc)
movies_summ %>%
DT::datatable(filter = 'top', list(scrollX = TRUE,
pageLength = 5),
caption = htmltools::tags$caption(
style = 'caption-side: bottom; text-align: center;',
'Table Name: ', htmltools::em('Median Income for each Genre based on whether the movie passed / failed the test')
))
movie_summ_map <- movies_summ %>%
treemap(
# treemap with a level of hierarchy
index = c("genre", "median_inc", "binary"),
vSize = "median_inc",
type = "index",
title = "Income of each Genre based on whether the movie passed / failed test",
fontsize.title = 26,
# specify size of labels in order of group, sub group etc.
fontsize.labels = c(24, 0, 20),
# what font to use
fontfamily.title = "segeoui",
fontfamily.labels = "segeoui",
# align the labels again in order
align.labels=list(
# group 1
c("center", "top"),
# sub group 1 - we're not showing this though
c("center", "bottom"),
# sub group 2
c("center", "center")
),
palette = "Set3",
lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
force.print.labels = FALSE,
)
On the viz below you can click on a block to enter it. Click again to exit.
# Make it interactive
d3tree(movie_summ_map, rootname = "Revenue per Genre per Pass/Fail Bechdel")
The movies that fail the test seem to have a higher revenue than those that pass.
You can save the widget using the code below (courtesy of R Graph Gallery).
# save the widget
library(htmlwidgets)
saveWidget(movie_summ_map, file=paste0(here::here("HtmlWidget"),
"interactiveTreemap-bechdel.html"))
For attribution, please cite this work as
Naidoo (2021, Nov. 21). Sciencificity's Blog: Treemap for Visualising data. Retrieved from https://sciencificity-blog.netlify.app/posts/2021-11-21-tree-map/
BibTeX citation
@misc{naidoo2021treemap, author = {Naidoo, Vebash}, title = {Sciencificity's Blog: Treemap for Visualising data}, url = {https://sciencificity-blog.netlify.app/posts/2021-11-21-tree-map/}, year = {2021} }