class: middle, title background-size: contain <!----- Make a pdf using: decktape generic --key=ArrowRight --load-pause 1800 --slides '1-47' --size '1216x684' --url-load-timeout 80000 --page-load-timeout 40000 "week-04/slides/smi105-week-4.html" week-04/slides/smi105-week-4.pdf -----> <br><br> # Chart Types #### EDC101: Week 4 <br><br> **Dr. Calum Webb**<br> Sheffield Methods Institute, the University of Sheffield<br> [c.j.webb@sheffield.ac.uk](mailto:c.j.webb@sheffield.ac.uk)
--- class: middle, inverse # Sign in --- class: middle ## Learning outcomes .panelset[ .panel[.panel-name[What will I learn?] By the end of this week you will know: * Key terminology for working with quantitative data * Two different ways to help us think about which kind of data visualisation we should use: * The "classic" approach, where the type of data informs the visualisation * Some more modern approaches, where the data story helps inform the choice of the visualisation ] ] --- class: inverse, middle # How do I choose what kind of data visualisation is appropriate for my data? --- class: inverse, middle #### Part I # Some key terminology ??? Datasets, variables, types of variables --- ### This is a **dataset** .center[ <img src="images/terminology-1-dataset.png" width="90%" /> ] --- ### Each **row** is an **observation**, e.g. a song, person, measurement at a specific time .center[ <img src="images/terminology-2-observations.png" width="90%" /> ] --- ### Each **column** is a **variable**, e.g. the name of the song, the tempo, etc. .center[ <img src="images/terminology-3-variables.png" width="90%" /> ] --- ### Each **cell** is a **value**, e.g. the duration of the song *for this observation* .center[ <img src="images/terminology-4-values.png" width="90%" /> ] --- class: inverse, middle # Variables (columns) can be put into three different categories: categorical (nominal), ordinal, and continuous --- class: middle background-color: white .center[ <img src="images/continuous-ordinal-categorical.png" width="80%" /> ] .footnote[Art by Allison Horst] --- ### Examples of **categorical** variables might be track name, whether it is explicit, or album name .center[ <img src="images/terminology-5-categorical.png" width="90%" /> ] --- ### Examples of **ordinal** variables might be time signature or track number .center[ <img src="images/terminology-6-ordinal.png" width="90%" /> ] --- ### Examples of **continuous** variables might be duration or tempo .center[ <img src="images/terminology-7-continuous.png" width="90%" /> ] --- ### If you're struggling to recognise the difference, here's something that helps: ### <br> .center[ <img src="images/three-runners-1.svg" width="60%" /> ] --- ### One variable we might have for runners is finish time: this is continuous because we know that if Runner 2 was 30 seconds slower than Runner 1, and Runner 3 was 1 minute slower than Runner 2, Runner 3 was behind by twice as much. 1 second means the same thing. .center[ <img src="images/three-runners-2.svg" width="60%" /> ] --- ### Another variable might be the placement of the runners: 1st, 2nd, 3rd. We know that runner 2 was slower than runner 1, but we don't know *how much by*. 1st, 2nd, and 3rd also means different things depending on the *race*. 3rd in a Park Run ≠ 3rd in Olympics. This is ordinal. .center[ <img src="images/three-runners-3.svg" width="60%" /> ] --- ### Lastly, another variable we could have is the country the runner is representing. There is no inherent ordering to countries, any ordering would depend on some other external variable (such as alphabetical order). .center[ <img src="images/three-runners-4.svg" width="60%" /> ] --- class: inverse, middle #### Part II # The 'classic' approach: visualisation determined by the type of variables .footnote[As found in many, many, introduction to quantitative research textbooks, but largely based on the work of John Tukey (1997) *Exploratory data analysis*] --- class: middle .pull-left[ <br><br><br> ## Classic approaches to data visualisation These are data visualisations that are often driven by: * The types of the variables being visualised (how many, categorical, continuous, ordinal). * The computing power available at the time or even the effort to draw the visualisation by hand. ] .pull-right[ .center[ <img src="images/tukey.jpg" width="70%" /> ] ] --- class: middle ### In this approach, depending on the type of variable, we can identify an appropriate chart type: ### One variable
### Two variables
--- .pull-left[ <br><br><br> ## Some examples: * Favourite messaging platform (categorical) ] .pull-right[ <br> <img src="smi105-week-4_files/figure-html/unnamed-chunk-17-1.png" width="500" height="500" /> ] --- .pull-left[ <br><br><br> ## Some examples: * Favourite messaging platform (categorical) * Number of hours spent on social media per day (continuous) ] .pull-right[ <br> <img src="smi105-week-4_files/figure-html/unnamed-chunk-18-1.png" width="500" height="500" /> ] --- .pull-left[ <br><br><br> ## Some examples: * Favourite messaging platform (categorical) * Number of hours spent on social media per day (continuous) * Do students who use some instant messaging platforms meet up in person less often than others? (categorical ╳ ordinal) ] .pull-right[ <br> <img src="smi105-week-4_files/figure-html/unnamed-chunk-19-1.png" width="500" height="500" /> ] --- .pull-left[ <br><br><br> ## Some examples: * Favourite messaging platform (categorical) * Number of hours spent on social media per day (continuous) * Do students who use some instant messaging platforms meet up in person less often than others? (categorical ╳ ordinal) * Do students spend different amounts of time on instant messaging platforms depending on the platform they use? (categorical ╳ continuous) ] .pull-right[ <br> <img src="smi105-week-4_files/figure-html/unnamed-chunk-20-1.png" width="500" height="500" /> ] --- .pull-left[ <br><br><br> ## Some examples: * Favourite messaging platform (categorical) * Number of hours spent on social media per day (continuous) * Do students who use some instant messaging platforms meet up in person less often than others? (categorical ╳ ordinal) * Do students spend different amounts of time on instant messaging platforms depending on the platform they use? (categorical ╳ continuous) * Do students who spend more time on instant messaging platforms spend less time in-person with people they don't live with? (continuous ╳ continuous) ] .pull-right[ <br> <img src="smi105-week-4_files/figure-html/unnamed-chunk-21-1.png" width="500" height="500" /> ] --- class: middle .pull-left[ <br><br><br> ## Classic approaches to data visualisation * Easy to apply * Easy to remember * Quite restrictive, doesn't provide multiple options * Not always very appealing aesthetically * Some redundancy in visual presentation * Sometimes doesn't work so well with very large datasets (e.g. scatterplots) ] .pull-right[ .center[ <img src="images/tukey.jpg" width="70%" /> ] ] --- class: inverse, middle #### Part III # Story-driven choice of chart type .footnote[Using Andy Kirk's (2019) Data Visualisation A Handbook for Data Driven Design & FT's Visual Vocabulary (Smith, et al. 2019)] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * **Categorical**: Comparing categories and distributions of quantitative values. * **Hierarchical**: Revealing part-to-whole relationships and hierarchies. * **Relational**: Exploring correlations and connections. * **Temporal**: Plotting trends and intervals over time. * **Spatial**: Mapping spatial patterns through overlays and distortions. ] .pull-right[ <br> .center[ <div class="figure"> <img src="images/kirk.jpg" alt="Kirk (2019)" width="70%" /> <p class="caption">Kirk (2019)</p> </div> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * **Categorical**: Comparing categories and distributions of quantitative values. * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> bar chart: `geom_col` / `geom_bar` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-24-1.png" width="500" height="500" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * **Categorical**: Comparing categories and distributions of quantitative values. * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> sunburst chart: `geom_col` / `geom_bar` + `coord_polar` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-25-1.png" width="500" height="500" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * **Categorical**: Comparing categories and distributions of quantitative values. * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> dumbbell chart: `geom_point` x2 + `geom_segment` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-26-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * **Hierarchical**: Revealing part-to-whole relationships and hierarchies. * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> 100% Stacked Bar Chart: `geom_col/bar` + `position = "fill"` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-27-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * **Hierarchical**: Revealing part-to-whole relationships and hierarchies. * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> Donut Chart: Horizontal 100% Stacked Bar`geom_col/bar` + `coord_polar` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-28-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * **Hierarchical**: Revealing part-to-whole relationships and hierarchies. * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> Dendrogram using `ggdendro` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-29-1.png" width="500" height="500" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * **Relational**: Exploring correlations and connections. * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br> Scatterplot: `geom_point` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-30-1.png" width="500" height="500" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * **Relational**: Exploring correlations and connections. * .grey[Temporal: Plotting trends and intervals over time.] * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br><br><br> Heatmap: `geom_tile` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-31-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * **Temporal**: Plotting trends and intervals over time. * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br><br><br> Line plot: `geom_line` + `group`/`colour` `aes` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-32-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * **Temporal**: Plotting trends and intervals over time. * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br><br><br> Line plot: `geom_line` + `group`/`colour` `aes` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-33-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * **Temporal**: Plotting trends and intervals over time. * .grey[Spatial: Mapping spatial patterns through overlays and distortions.] ] .pull-right[ <br><br><br> Moving plot: `gganimate` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-34-1.gif" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * **Spatial**: Mapping spatial patterns through overlays and distortions. ] .pull-right[ <br><br><br> Choropleth: `geom_sf` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-35-1.png" width="500" height="450" /> ] ] --- .pull-left[ <br> ## Andy Kirk's CHRTS While the 'data-driven' approach to selecting a visualisation works well, we now have many, many, different types of visualisation we could choose for the same types of data. It can make more sense to start thinking about what kind of story we want to tell. * .grey[Categorical: Comparing categories and distributions of quantitative values.] * .grey[Hierarchical: Revealing part-to-whole relationships and hierarchies.] * .grey[Relational: Exploring correlations and connections.] * .grey[Temporal: Plotting trends and intervals over time.] * **Spatial**: Mapping spatial patterns through overlays and distortions. ] .pull-right[ <br><br><br> Dorling Cartogram: `cartogram` package & `geom_sf` .center[ <img src="smi105-week-4_files/figure-html/unnamed-chunk-36-1.png" width="500" height="450" /> ] ] --- class: hide-logo background-image: url("images/poster-1.png") background-position: center background-size: contain ??? Easy example: Lollipop --- class: inverse, middle ## We can use both the types of data that we have and the type of story we are trying to tell to identify appropriate ways to visualise our data. --- # The rest of this week: .pull-left[ <br><br> **This week's workshop**: Let’s extend graphs where we show the relationship between a continuous and a categorical variable: box plots, density curves, histograms, and so on. What’s the easiest way to understand this information? Why would we use one of these graphs over another? ] -- .pull-right[ **Before week 5**: Core tasks: * **Before lecture, read at least one of these**: * On the basics of formatting charts: https://analysisfunction.civilservice.gov.uk/policy-store/data-visualisation-charts/#section-6 (ignore the last section ‘Communicating quality and uncertainty in charts’) * On working with text: https://blog.datawrapper.de/text-in-data-visualizations/ * On working with colour: https://blog.datawrapper.de/emphasize-with-color-in-data-visualizations/ **Before workshop**: Read the assessment brief for assessment 1 **Before workshop**: Work through the task at the end of the workshop handout **Supplementary tasks**: Before workshop: Read chapter 4/Show the right numbers of the Healy book ]