Sciencificity's Blog: Treemap for Visualising data

Vebash Naidoo

Treemaps are great for hierarchical data visualisation. I was inspired by the informative and beautiful The Billion Dollar-o-Gram 2009 treemap at Information is Beautiful.

Read in data

It is data about the Bechdel Test. As per the TidyTuesday Readme these are the criteria needed to pass the test:

EDA

Show entries

Search:

	year	imdb	title	test	clean_test	binary	budget	domgross	intgross	code	budget_2013	domgross_2013	intgross_2013	period_code	decade_code	imdb_id	plot	rated	response	language	country	writer	metascore	imdb_rating	director	released	actors	genre	awards	runtime	type	poster	imdb_votes	error

	year	imdb	title	test	clean_test	binary	budget	domgross	intgross	code	budget_2013	domgross_2013	intgross_2013	period_code	decade_code	imdb_id	plot	rated	response	language	country	writer	metascore	imdb_rating	director	released	actors	genre	awards	runtime	type	poster	imdb_votes

1	1995	tt0113416	The Incredibly True Adventures of Two Girls in Love	ok	ok	PASS	250000	2210408	2477155	1995PASS	382195	3379226	3787024	4	3	0113416
2	2008	tt0866439	Made of Honor	men	men	FAIL	40000000	46012734	106548738	2008FAIL	43290252	49797572	115313044	2	2	0866439	A guy in love with an engaged woman tries to win her over after she asks him to be her maid of honor.	PG-13	true	English	USA, UK	Adam Sztykiel (screenplay), Deborah Kaplan (screenplay), Harry Elfont (screenplay), Adam Sztykiel (story)	37	5.8	Paul Weiland	02 May 2008	Patrick Dempsey, Michelle Monaghan, Kevin McKidd, Kadeem Hardison	Comedy, Romance	1 nomination.	101 min	movie	http://ia.media-imdb.com/images/M/MV5BMTk1MzA5MjEzMF5BMl5BanBnXkFtZTcwNTk0MjU1MQ@@._V1_SX300.jpg	42509
3	1992	tt0104797	Malcolm X	dubious	dubious	FAIL	35000000	48169910	48169910	1992FAIL	58112153	79978777	79978777	5	3	0104797	Biographical epic of the controversial and influential Black Nationalist leader, from his early life and career as a small-time gangster, to his ministry as a member of the Nation of Islam and his assassination.	PG-13	true	English	USA, Japan	Alex Haley (book), Malcolm X (book), Arnold Perl (screenplay), Spike Lee (screenplay)	72	7.7	Spike Lee	18 Nov 1992	Denzel Washington, Angela Bassett, Albert Hall, Al Freeman Jr.	Biography, Drama, History	Nominated for 2 Oscars. Another 19 wins & 8 nominations.	202 min	movie	http://ia.media-imdb.com/images/M/MV5BMTYyNzg4MTMxM15BMl5BanBnXkFtZTcwMzE5MzIyMQ@@._V1_SX300.jpg	50949
4	2006	tt0454921	The Pursuit of Happyness	nowomen-disagree	nowomen	FAIL	55000000	162586036	307325633	2006FAIL	63568799	187916346	355205844	2	2	0454921	A struggling salesman takes custody of his son as he's poised to begin a life-changing professional endeavor.	PG-13	true	English, Cantonese	USA	Steve Conrad	64	7.9	Gabriele Muccino	15 Dec 2006	Will Smith, Jaden Smith, Thandie Newton, Brian Howe	Biography, Drama	Nominated for 1 Oscar. Another 11 wins & 18 nominations.	117 min	movie	http://ia.media-imdb.com/images/M/MV5BMTQ5NjQ0NDI3NF5BMl5BanBnXkFtZTcwNDI0MjEzMw@@._V1_SX300.jpg	245551
5	2001	tt0265086	Black Hawk Down	nowomen-disagree	nowomen	FAIL	95000000	108638745	159691085	2001FAIL	125005366	142951853	210128869	3	2	0265086	123 elite U.S. soldiers drop into Somalia to capture two top lieutenants of a renegade warlord and find themselves in a desperate battle with a large force of heavily-armed Somalis.	R	true	English, Somali	USA, UK	Mark Bowden (book), Ken Nolan (screenplay)	74	7.7	Ridley Scott	18 Jan 2002	Josh Hartnett, Ewan McGregor, Tom Sizemore, Eric Bana	Drama, History, War	Won 2 Oscars. Another 8 wins & 28 nominations.	144 min	movie	http://ia.media-imdb.com/images/M/MV5BMTQxODgzMjYyN15BMl5BanBnXkFtZTgwNDU4NTYxMTE@._V1_SX300.jpg	225412

Showing 1 to 5 of 10 entries

Previous1 2Next

Show entries

Search:

	year	id	imdb_id	title	rating

	year	id	imdb_id	title	rating

1	1992	4391	0104797	Malcolm X	3
2	1995	2931	0113416	The Incredibly True Adventures of Two Girls in Love	3
3	1997	3917	0124819	Orgazmo	1
4	2000	1843	0168629	Dancer in the Dark	3
5	2001	698	0265086	Black Hawk Down	0

Showing 1 to 5 of 10 entries

Previous1 2Next

clean_test	binary	n
ok	PASS	803
notalk	FAIL	514
men	FAIL	194
dubious	FAIL	142
nowomen	FAIL	141

Show entries

Search:

	genre	binary	clean_test	n

	genre	binary	clean_test	n

1	Action	FAIL	notalk	2
2	Action, Adventure	FAIL	notalk	2
3	Action, Adventure	FAIL	nowomen	3
4	Action, Adventure	PASS	ok	1
5	Action, Adventure, Comedy	FAIL	notalk	9

Showing 1 to 5 of 571 entries

Previous1 2 3 4 5…115Next

rating	n
0	894
1	1940
2	896
3	5109

Show entries

Search:

	imdb_id	title	year	country	genre	rating

	imdb_id	title	year	country	genre	rating

1	0483726	Man of the Year	2006	USA	Comedy, Drama, Romance	1
2	0296572	The Chronicles of Riddick	2004	USA	Action, Adventure, Sci-Fi	2
3	1201167	Funny People	2009	USA	Comedy, Drama	1
4	0472033	9	2009	USA	Animation, Action, Adventure	0
5	0271027	Kiss of the Dragon	2001	France, USA	Action, Crime, Thriller	1

Showing 1 to 5 of 50 entries

Previous1 2 3 4 5…10Next

Treemap

Simple treemap

Perhaps we’re interested in seeing how many films in each genre are represented. Each movie may be categorised as multiple genres so let’s make each genre a separate row (tip from David Robinson’s screencasts). For example, the movie The Reader is listed as Drama, Romance so it will have a separate row after this with each genre (two rows now instead of one).

movies %>% 
  # split each into a separate row with each genre listed separately
  # creates multiple rows for a movie broken down by each genre it
  # belongs to.
  separate_rows(genre, sep = ",") %>% 
  mutate(genre = str_squish(genre)) %>%
  filter(!is.na(genre)) %>% 
  count(genre, sort = TRUE) %>% 
  DT::datatable(filter = 'top', list(scrollX = TRUE,
                     pageLength = 8),
    caption = htmltools::tags$caption(
      style = 'caption-side: bottom; text-align: center;',
     'Table Name: ', htmltools::em('Number of movies per Genre')
  ))

Show entries

Search:

	genre	n

1	Drama	739
2	Comedy	560
3	Action	443
4	Adventure	359
5	Thriller	304
6	Crime	259
7	Romance	238
8	Sci-Fi	204

Showing 1 to 8 of 21 entries

Previous1 2 3Next

movies %>% 
  # split each into a separate row with each genre listed separately
  # creates multiple rows for a movie broken down by each genre it
  # belongs to. 
  separate_rows(genre, sep = ",") %>% 
  mutate(genre = str_squish(genre)) %>% 
  filter(!is.na(genre)) %>% 
  count(genre) %>% 
  treemap(
    # simple treemap with one group
    index = "genre",
    # size of rect = number of movies in each category
    vSize = "n",
    type = "index",
    # make size of title and labels larger
    fontsize.title = 30,
    # specify size of labels in order of group, sub group etc.
    fontsize.labels = c(24)
  )

That’s nice, clean and pretty! But we can also change the title etc. to make it more appealing.

Treemap with some hierarchy

Let’s say we just want to see the genre by pass and fail. To achieve this we make our index have the two groups we’re interested in seeing. In the below it is the:

which specifies that we’d like to see each rectangle first represent a genre and then represent whether it passed or failed the Bechdel test.

# package to allow me to add the font I want to use
library(showtext)
font_add(family = "segoeui", regular = r"(C:\WindowsFonts\segoeui.ttf)")
showtext_auto()

# install palettes
# devtools::install_github("sciencificity/werpals")
# I am using a palette I created
palette_prov <-  werpals::nature_palettes[["provence"]]

movies %>% 
  separate_rows(genre, sep = ",") %>% 
  mutate(genre = str_squish(genre)) %>%
  filter(!is.na(genre)) %>% 
  count(genre, binary, sort = TRUE) %>% 
  DT::datatable(filter = 'top', list(scrollX = TRUE,
                     pageLength = 5),
    caption = htmltools::tags$caption(
      style = 'caption-side: bottom; text-align: center;',
     'Table Name: ', htmltools::em('Number of Pass / Fail per Genre')
  ))

Show entries

Search:

	genre	binary	n

1	Drama	FAIL	375
2	Drama	PASS	364
3	Action	FAIL	315
4	Comedy	FAIL	287
5	Comedy	PASS	273

Showing 1 to 5 of 42 entries

Previous1 2 3 4 5…9Next

movies %>% 
  separate_rows(genre, sep = ",") %>% 
  mutate(genre = str_squish(genre)) %>% 
  filter(!is.na(genre)) %>% 
  count(genre, binary) %>% 
  treemap(
    # treemap with a level of hierarchy
    index = c("genre", "binary"),
    vSize = "n",
    type = "index",
    title = "Number of Pass vs Fail of Bechdel test in each Genre",
    fontsize.title = 30,
    # specify size of labels in order of group, sub group etc.
    fontsize.labels = c(24, 20), 
    # what font to use
    fontfamily.title = "segeoui",
    fontfamily.labels = "segeoui",
    # align the labels again in order
    align.labels=list(
        # group 1
        c("center", "top"),
        # sub group 1
        c("center", "center")
    ),
    lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
    palette = palette_prov
  )

Make it interactive with {d3treeR}

movies_summ <- 
  # setup our dataset for the treemap
  movies %>%
  # it is a character field so let's convert it
  mutate(intgross_2013 = parse_number(intgross_2013)) %>% 
  separate_rows(genre, sep = ",") %>% 
  mutate(genre = str_squish(genre)) %>%
  filter(!is.na(genre)) %>% 
  group_by(genre, binary) %>% 
  # add median gross income normalised to 2013 year
  # make it millions
  mutate(median_inc = round(median(intgross_2013, na.rm = TRUE)/1e6),2) %>% 
  ungroup() %>% 
  count(genre, binary, median_inc) 

movies_summ %>% 
  DT::datatable(filter = 'top', list(scrollX = TRUE,
                     pageLength = 5),
    caption = htmltools::tags$caption(
      style = 'caption-side: bottom; text-align: center;',
     'Table Name: ', htmltools::em('Median Income for each Genre based on whether the movie passed / failed the test')
  ))

Show entries

Search:

	genre	binary	median_inc	n

	genre	binary	median_inc	n

1	Action	FAIL	200	315
2	Action	PASS	140	128
3	Adventure	FAIL	261	246
4	Adventure	PASS	268	113
5	Animation	FAIL	264	74

Showing 1 to 5 of 42 entries

Previous1 2 3 4 5…9Next

movie_summ_map <- movies_summ %>% 
  treemap(
    # treemap with a level of hierarchy
    index = c("genre", "median_inc", "binary"),
    vSize = "median_inc",
    type = "index",
    title = "Income of each Genre based on whether the movie passed / failed test",
    fontsize.title = 26,
    # specify size of labels in order of group, sub group etc.
    fontsize.labels = c(24, 0, 20), 
    # what font to use
    fontfamily.title = "segeoui",
    fontfamily.labels = "segeoui",
    # align the labels again in order
    align.labels=list(
        # group 1
        c("center", "top"),
        # sub group 1 - we're not showing this though
        c("center", "bottom"),
        # sub group 2
        c("center", "center")
    ),
    palette = "Set3",
    lowerbound.cex.labels = 0.5, # multiplier for when labels drawn
    force.print.labels = FALSE,
  )

The movies that fail the test seem to have a higher revenue than those that pass.

Treemap for Visualising data

Author

Affiliation

Published

Citation

Treemap Visualisations

Read in data

EDA

Treemap

Simple treemap

Treemap with some hierarchy

Make it interactive with {d3treeR}

Want to save the visual?

Resources

Footnotes

Citation