EDC101: Week 4 Chart Types

<!----- Make a pdf using:

decktape generic --key=ArrowRight --load-pause 1800 --slides '1-47' --size '1216x684' --url-load-timeout 80000 --page-load-timeout 40000 "week-04/slides/smi105-week-4.html" week-04/slides/smi105-week-4.pdf

----->

# Chart Types
#### EDC101: Week 4

**Dr. Calum Webb**<br>
Sheffield Methods Institute, the University of Sheffield<br>
[c.j.webb@sheffield.ac.uk](mailto:c.j.webb@sheffield.ac.uk)

<div>
<style type="text/css">.xaringan-extra-logo {
width: 180px;
height: 128px;
z-index: 0;
background-image: url(header/smi-logo-white.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:1em;right:2em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---

# Sign in

---

## Learning outcomes

By the end of this week you will know:

* Key terminology for working with quantitative data

* Two different ways to help us think about which kind of data visualisation we should use:

* The "classic" approach, where the type of data informs the visualisation
  * Some more modern approaches, where the data story helps inform the choice of the visualisation

]
]

---

# How do I choose what kind of data visualisation is appropriate for my data?

---

#### Part I

# Some key terminology

???

Datasets, variables, types of variables

---

### This is a **dataset**

---

### Each **row** is an **observation**, e.g. a song, person, measurement at a specific time

---

### Each **column** is a **variable**, e.g. the name of the song, the tempo, etc.

---

### Each **cell** is a **value**, e.g. the duration of the song *for this observation*

---

# Variables (columns) can be put into three different categories: categorical (nominal), ordinal, and continuous

---

---

### Examples of **categorical** variables might be track name, whether it is explicit, or album name

---

### Examples of **ordinal** variables might be time signature or track number

---

### Examples of **continuous** variables might be duration or tempo

---

### If you're struggling to recognise the difference, here's something that helps:

### <br>

---

### One variable we might have for runners is finish time: this is continuous because we know that if Runner 2 was 30 seconds slower than Runner 1, and Runner 3 was 1 minute slower than Runner 2, Runner 3 was behind by twice as much. 1 second means the same thing.

---

### Another variable might be the placement of the runners: 1st, 2nd, 3rd. We know that runner 2 was slower than runner 1, but we don't know *how much by*. 1st, 2nd, and 3rd also means different things depending on the *race*. 3rd in a Park Run ≠ 3rd in Olympics.  This is ordinal.

---

### Lastly, another variable we could have is the country the runner is representing. There is no inherent ordering to countries, any ordering would depend on some other external variable (such as alphabetical order).

---

#### Part II

# The 'classic' approach: visualisation determined by the type of variables

.footnote[As found in many, many, introduction to quantitative research textbooks, but largely based on the work of John Tukey (1997) *Exploratory data analysis*]

---

## Classic approaches to data visualisation

These are data visualisations that are often driven by:

* The types of the variables being visualised (how many, categorical, continuous, ordinal).
* The computing power available at the time or even the effort to draw the visualisation by hand.

]

]

---

### In this approach, depending on the type of variable, we can identify an appropriate chart type:

### One variable

<div id="htmlwidget-c98603072006ea97de72" class="formattable_widget html-widget" style="width:100%;height:252px;" width="100%" height="252"></div>
<script type="application/json" data-for="htmlwidget-c98603072006ea97de72">{"x":{"html":"<table class=\"table table-condensed\">\n <thead>\n  <tr>\n   <th style=\"text-align:right;\"> Variable Type <\/th>\n   <th style=\"text-align:right;\"> Visualisation <\/th>\n  <\/tr>\n <\/thead>\n<tbody>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Nominal   <\/span> <\/td>\n   <td style=\"text-align:right;\"> Bar Chart <\/td>\n  <\/tr>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Ordinal   <\/span> <\/td>\n   <td style=\"text-align:right;\"> Bar Chart <\/td>\n  <\/tr>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Continuous<\/span> <\/td>\n   <td style=\"text-align:right;\"> Histogram <\/td>\n  <\/tr>\n<\/tbody>\n<\/table>"},"evals":[],"jsHooks":[]}</script>

### Two variables

<div id="htmlwidget-1a151ae3f84954c5fbe5" class="formattable_widget html-widget" style="width:100%;height:252px;" width="100%" height="252"></div>
<script type="application/json" data-for="htmlwidget-1a151ae3f84954c5fbe5">{"x":{"html":"<table class=\"table table-condensed\">\n <thead>\n  <tr>\n   <th style=\"text-align:right;\"> Variable Type <\/th>\n   <th style=\"text-align:right;\"> Nominal <\/th>\n   <th style=\"text-align:right;\"> Ordinal <\/th>\n   <th style=\"text-align:right;\"> Continuous <\/th>\n  <\/tr>\n <\/thead>\n<tbody>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Nominal   <\/span> <\/td>\n   <td style=\"text-align:right;\"> Bivariate Bar Chart <\/td>\n   <td style=\"text-align:right;\">  <\/td>\n   <td style=\"text-align:right;\">  <\/td>\n  <\/tr>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Ordinal   <\/span> <\/td>\n   <td style=\"text-align:right;\"> Bivariate Bar Chart <\/td>\n   <td style=\"text-align:right;\"> Bivariate Bar Chart <\/td>\n   <td style=\"text-align:right;\">  <\/td>\n  <\/tr>\n  <tr>\n   <td style=\"text-align:right;\"> <span style=\"font-weight: bold\">Continuous<\/span> <\/td>\n   <td style=\"text-align:right;\"> Boxplot <\/td>\n   <td style=\"text-align:right;\"> Boxplot <\/td>\n   <td style=\"text-align:right;\"> Scatterplot <\/td>\n  <\/tr>\n<\/tbody>\n<\/table>"},"evals":[],"jsHooks":[]}</script>

---

## Some examples:

* Favourite messaging platform (categorical)

]

<br>

]

---

## Some examples:

* Favourite messaging platform (categorical)
* Number of hours spent on social media per day (continuous)

]

<br>

]

---

## Some examples:

]

<br>

]

---

## Some examples:

* Favourite messaging platform (categorical)
* Number of hours spent on social media per day (continuous)
* Do students who use some instant messaging platforms meet up in person less often than others? (categorical ╳ ordinal)
* Do students spend different amounts of time on instant messaging platforms depending on the platform they use? (categorical ╳ continuous)

]

<br>

]

---

## Some examples:

]

<br>

]

---

## Classic approaches to data visualisation

* Easy to apply
* Easy to remember
* Quite restrictive, doesn't provide multiple options
* Not always very appealing aesthetically
* Some redundancy in visual presentation
* Sometimes doesn't work so well with very large datasets (e.g. scatterplots)

]

]

---

#### Part III

# Story-driven choice of chart type

.footnote[Using Andy Kirk's (2019) Data Visualisation A Handbook for Data Driven Design & FT's Visual Vocabulary (Smith, et al. 2019)]

---

<br>

## Andy Kirk's CHRTS

While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell.

* **Categorical**: Comparing categories and distributions of quantitative values.

* **Hierarchical**: Revealing part-to-whole relationships and hierarchies.

* **Relational**: Exploring correlations and connections.

* **Temporal**: Plotting trends and intervals over time.

* **Spatial**: Mapping spatial patterns through overlays and distortions.

]

<br>

]

---

<br>

## Andy Kirk's CHRTS

* **Categorical**: Comparing categories and distributions of quantitative values.

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

bar chart: `geom_col` / `geom_bar`

]

---

<br>

## Andy Kirk's CHRTS

* **Categorical**: Comparing categories and distributions of quantitative values.

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

sunburst chart: `geom_col` / `geom_bar` + `coord_polar`

]

---

<br>

## Andy Kirk's CHRTS

* **Categorical**: Comparing categories and distributions of quantitative values.

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

dumbbell chart: `geom_point` x2 + `geom_segment`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* **Hierarchical**: Revealing part-to-whole relationships and hierarchies.

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

100% Stacked Bar Chart: `geom_col/bar` + `position = "fill"`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* **Hierarchical**: Revealing part-to-whole relationships and hierarchies.

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

Donut Chart: Horizontal 100% Stacked Bar`geom_col/bar` + `coord_polar`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* **Hierarchical**: Revealing part-to-whole relationships and hierarchies.

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

Dendrogram using `ggdendro`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* **Relational**: Exploring correlations and connections.

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

<br>

Scatterplot: `geom_point`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* **Relational**: Exploring correlations and connections.

* .grey[Temporal: Plotting trends and intervals over time.]

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

Heatmap: `geom_tile`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* **Temporal**: Plotting trends and intervals over time.

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

Line plot: `geom_line` + `group`/`colour` `aes`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* **Temporal**: Plotting trends and intervals over time.

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

Line plot: `geom_line` + `group`/`colour` `aes`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* **Temporal**: Plotting trends and intervals over time.

* .grey[Spatial: Mapping spatial patterns through overlays and distortions.]

]

Moving plot: `gganimate`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* **Spatial**: Mapping spatial patterns through overlays and distortions.

]

Choropleth: `geom_sf`

]

---

<br>

## Andy Kirk's CHRTS

* .grey[Categorical: Comparing categories and distributions of quantitative values.]

* .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.]

* .grey[Relational: Exploring correlations and connections.]

* .grey[Temporal: Plotting trends and intervals over time.]

* **Spatial**: Mapping spatial patterns through overlays and distortions.

]

Dorling Cartogram: `cartogram` package & `geom_sf`

]

---

class: hide-logo
background-image: url("images/poster-1.png")
background-position: center
background-size: contain

???

Easy example: Lollipop

---

## We can use both the types of data that we have and the type of story we are trying to tell to identify appropriate ways to visualise our data.

---

# The rest of this week:

**This week's workshop**:

Let’s extend graphs where we show the relationship between a continuous and a categorical variable: box plots, density curves, histograms, and so on. What’s the easiest way to understand this information? Why would we use one of these graphs over another?

]

**Before week 5**:

Core tasks:

* **Before lecture, read at least one of these**:
  * On the basics of formatting charts: https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-charts/#section-6 (ignore the last section ‘Communicating quality and uncertainty in charts’)
  * On working with text: https://blog.datawrapper.de/text-in-data-visualizations/
  * On working with colour: https://blog.datawrapper.de/emphasize-with-color-in-data-visualizations/

**Before workshop**: Read the assessment brief for assessment 1

**Before workshop**: Work through the task at the end of the workshop handout

**Supplementary tasks**: Before workshop: Read chapter 4/Show the right numbers of the Healy book

]