forked from clauswilke/dataviz
-
Notifications
You must be signed in to change notification settings - Fork 0
/
no_3d.Rmd
281 lines (219 loc) · 20 KB
/
no_3d.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
```{r echo = FALSE, message = FALSE, warning = FALSE}
# run setup script
source("_common.R")
library(tidyr)
```
# Don't go 3D {#no-3d}
3D plots are quite popular, in particular in business presentations but also among academics. They are also almost always inappropriately used. It is rare that I see a 3D plot that couldn't be improved by turning it into a regular 2D figure. In this chapter, I will explain why 3D plots have problems, why they generally are not needed, and in what limited circumstances 3D plots may be appropriate.
## Avoid gratuitous 3D
Many visualization softwares enable you to spruce up your plots by turning the plots' graphical elements into three-dimensional objects. Most commonly, we see pie charts turned into disks rotated in space, bar plots turned into columns, and line plots turned into bands. Notably, in none of these cases does the third dimension convey any actual data. 3D is used simply to decorate and adorn the plot. I consider this use of 3D as gratuitous. It is unequivocally bad and should be erased from the visual vocabulary of data scientists.
The problem with gratuitous 3D is that the projection of 3D objects into two dimensions for printing or display on a monitor distorts the data. The human visual system tries to correct for this distortion as it maps the 2D projection of a 3D image back into a 3D space. However, this correction can only ever be partial. As an example, let's take a simple pie chart with two slices, one representing 25% of the data and one 75%, and rotate this pie in space (Figure \@ref(fig:rotated-pie)). As we change the angle at which we're looking at the pie, the size of the slices seems to change as well. In particular, the 25% slice, which is located in the front of the pie, looks much bigger than 25% when we look at the pie from a flat angle (Figure \@ref(fig:rotated-pie)a).
(ref:rotated-pie) The same 3D pie chart shown from four different angles. Rotating a pie into the third dimension makes pie slices in the front appear larger than they really are and pie slices in the back appear smaller. Here, in parts (a), (b), and (c), the blue slice corresponding to 25% of the data visually occupies more than 25% of the area representing the pie. Only part (d) is an accurate representation of the data.
```{r rotated-pie, fig.asp = 5.1/6, fig.cap = '(ref:rotated-pie)'}
ggdraw() + draw_image("figures/3d/3d-pie-assembled.png")
```
Similar problems arise for other types of 3D plot. Figure \@ref(fig:titanic-3d) shows the breakdown of Titanic passengers by class and gender using 3D bars. Because of the way the bars are arranged relative to the axes, the bars all look shorter than they actually are. For example, there were 322 passengers total traveling in 1st class, yet Figure \@ref(fig:titanic-3d) suggests that the number was less than 300. This illusion arises because the columns representing the data are located at a distance from the two back surfaces on which the gray horizontal lines are drawn. To see this effect, consider extending any of the bottom edges of one of the columns until it hits the lowest gray line, which represents 0. Then, imagine doing the same to any of the top edges, and you'll see that all columns are taller than they appear at first glance. (See Figure \@ref(fig:titanic-passengers-by-class-sex) in Chapter \@ref(visualizing-amounts) for a more reasonable 2D version of this figure.)
(ref:titanic-3d) Numbers of female and male passengers on the Titanic traveling in 1st, 2nd, and 3rd class, shown as a 3D stacked bar plot. The total numbers of passengers in 1st, 2nd, and 3rd class are 322, 279, and 711, respectively (see Figure \@ref(fig:titanic-passengers-by-class-sex)). Yet in this plot, the 1st class bar appears to represent fewer than 300 passengers, the 3rd class bar appears to represent fewer than 700 passengers, and the 2nd class bar seems to be closer to 210--220 passengers than the actual 279 passengers. Furthermore, the 3rd class bar visually dominates the figure and makes the number of passengers in 3rd class appear larger than it actually is.
```{r titanic-3d, fig.asp = 4.5/6, fig.cap = '(ref:titanic-3d)'}
stamp_bad(ggdraw() + draw_image("figures/3d/titanic-3d-bars-assembled.png"))
```
## Avoid 3D position scales
While visualizations with gratuitous 3D can easily be dismissed as bad, it is less clear what to think of visualizations using three genuine position scales (*x*, *y*, and *z*) to represent data. In this case, the use of the third dimension serves an actual purpose. Nevertheless, the resulting plots are frequently difficult to interpret, and in my mind they should be avoided.
Consider a 3D scatter plot of fuel efficiency versus displacement and power for 32 cars. We have seen this dataset previously in Chapter \@ref(aesthetic-mapping), Figure \@ref(fig:mtcars-five-scale). Here, we plot displacement along the *x* axis, power along the *y* axis, and fuel efficiency along the *z* axis, and we represent each car with a dot (Figure \@ref(fig:mtcars-3d)). Even though this 3D visualization is shown from four different perspectives, it is difficult to envision how exactly the points are distributed in space. I find part (d) of Figure \@ref(fig:mtcars-3d) particularly confusing. It almost seems to show a different dataset, even though nothing has changed other than the angle from which we look at the dots.
(ref:mtcars-3d) Fuel efficiency versus displacement and power for 32 cars (1973–74 models). Each dot represents one car, and the dot color represents the number of cylinders of the car. The four panels (a)--(d) show exactly the same data but use different perspectives. Data source: *Motor Trend,* 1974.
```{r mtcars-3d, fig.asp = 1.1, fig.cap = '(ref:mtcars-3d)'}
library(plot3D)
library(cowplot)
set_null_device("png")
colors <- c("#0072B2", "#CC79A7", "#E69F00")
cyls <- data.frame(cyl = factor(c(4, 6, 8)))
p <- ggplot(cyls, aes(cyl, cyl, color = cyl)) + geom_point() +
scale_color_manual(values = colors, name = "cylinders") +
theme_dviz_open(font_size = 12, font_family = dviz_font_family) +
theme(legend.position = "top",
legend.justification = "right")
legend <- get_legend(p)
pfun <- function(theta = 30, phi = 20) {
function() {
par(xpd = NA,
bg = "transparent",
mai = c(0, 0.1, 0, 0),
family = dviz_font_family_condensed
)
scatter3D(mtcars$disp, mtcars$hp, mtcars$mpg, colvar = mtcars$cyl,
col = colors,
pch = 19, bty ="b2", theta = theta, phi = phi, colkey = FALSE,
xlab = "displacement (cu. in.)", ylab ="power (hp)", zlab = "efficiency (mpg)",
cex.lab = 1) #0.857)
}
}
plot_grid(pfun(30, 20), pfun(-30, 20),
NULL, legend,
pfun(30, 40), pfun(-30, 40),
rel_heights = c(1, 0.1, 1), ncol = 2,
labels = c("a", "b", "", "", "c", "d"),
label_fontface = "plain", label_fontfamily = dviz_font_family)
```
The fundamental problem with such 3D visualizations is that they require two separate, successive data transformations. The first transformation maps the data from the data space into the 3D visualization space, as discussed in Chapters \@ref(aesthetic-mapping) and \@ref(coordinate-systems-axes) in the context of position scales. The second one maps the data from the 3D visualization space into the 2D space of the final figure. (This second transformation obviously does not occur for visualizations shown in a true 3D environment, such as when shown as physical sculptures or 3D-printed objects. My primary objection here is to 3D visualizations shown on 2D displays.) The second transformation is non-invertible, because each point on the 2D display corresponds to a line of points in the 3D visualization space. Therefore, we cannot uniquely determine where in 3D space any particular data point lies.
Our visual system nevertheless attempts to invert the 3D to 2D transformation. However, this process is unreliable, fraught with error, and highly dependent on appropriate cues in the image that convey some sense of three-dimensionality. When we remove these cues the inversion becomes entirely impossible. This can be seen in Figure \@ref(fig:mtcars-3d-no-axes), which is identical to Figure \@ref(fig:mtcars-3d) except all depth cues have been removed. The result is four random arrangements of points that we cannot interpret at all and that aren't even easily relatable to each other. Could you tell which points in part (a) correspond to which points in part (b)? I certainly cannot.
(ref:mtcars-3d-no-axes) Fuel efficiency versus displacement and power for 32 cars (1973–74 models). The four panels (a)--(d) correspond to the same panels in Figure \@ref(fig:mtcars-3d), only that all grid lines providing depth cues have been removed. Data source: *Motor Trend,* 1974.
```{r mtcars-3d-no-axes, fig.asp = 1.1, fig.cap = '(ref:mtcars-3d-no-axes)'}
pfun2 <- function(theta = 30, phi = 20) {
function() {
par(xpd = NA,
bg = "transparent",
mai = c(0, 0.1, 0, 0),
family = dviz_font_family_condensed
)
scatter3D(mtcars$disp, mtcars$hp, mtcars$mpg, colvar = mtcars$cyl,
col = colors,
pch = 19, axes = FALSE, theta = theta, phi = phi, colkey = FALSE, box = FALSE,
cex.lab = 1) #0.857)
}
}
plot_grid(pfun2(30, 20), pfun2(-30, 20),
NULL, legend,
pfun2(30, 40), pfun2(-30, 40),
rel_heights = c(1, 0.1, 1), ncol = 2,
labels = c("a", "b", "", "", "c", "d"),
label_fontface = "plain", label_fontfamily = dviz_font_family)
```
Instead of applying two separate data transformations, one of which is non-invertible, I think it is generally better to just apply one appropriate, invertible transformation and map the data directly into 2D space. It is rarely necessary to add a third dimension as a position scale, since variables can also be mapped onto color, size, or shape scales. For example, in Chapter \@ref(aesthetic-mapping), I plotted five variables of the fuel-efficency dataset at once yet used only two position scales (Figure \@ref(fig:mtcars-five-scale)).
Here, I want to show two alternative ways of plotting exactly the variables used in Figure \@ref(fig:mtcars-3d). First, if we primarily care about fuel efficiency as the response variable, we can plot it twice, once against displacement and once against power (Figure \@ref(fig:mtcars-2d-multiple)). Second, if we are more interested in how displacement and power relate to each other, with fuel efficiency as a secondary variable of interest, we can plot power versus displacement and map fuel efficiency onto the size of the dots (Figure \@ref(fig:mtcars-2d-size)). Both figures are more useful and less confusing than Figure \@ref(fig:mtcars-3d).
(ref:mtcars-2d-multiple) Fuel efficiency versus displacement (a) and power (b). Data source: *Motor Trend,* 1974.
```{r mtcars-2d-multiple, fig.asp = .45, fig.cap = '(ref:mtcars-2d-multiple)'}
p1 <- ggplot(mtcars, aes(x = disp, y = mpg, color = factor(cyl))) +
geom_point(size = 1.5) +
scale_color_manual(values = colors, name = "cylinders", guide = "none") +
xlab("displacement (cu. in.)") +
ylab("efficiency (mpg)") +
theme_dviz_open(12)
p2 <- ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl))) +
geom_point(size = 1.5) +
scale_color_manual(values = colors, name = "cylinders") +
xlab("power (hp)") +
ylab("efficiency (mpg)") +
theme_dviz_open(12) +
theme(
legend.position = c(1, 1),
legend.justification = c(1, 1),
legend.spacing.y = grid::unit(3, "pt")
)
plot_grid(p1, p2)
```
(ref:mtcars-2d-size) Power versus displacement for 32 cars, with fuel efficiency represented by dot size. Data source: *Motor Trend,* 1974.
```{r mtcars-2d-size, fig.width = 5.5, fig.asp = 0.75, fig.cap = '(ref:mtcars-2d-size)'}
p <- ggplot(mtcars, aes(x = disp, y = hp, size = mpg, fill = factor(cyl))) +
geom_point(color = "white", pch = 21) +
scale_fill_manual(
values = colors, name = "cylinders",
guide = guide_legend(override.aes = list(size = 3))
) +
scale_size(
name = " mpg ",
range = c(1, 8),
limits = c(8, 40),
breaks = c(5, 10, 20, 40),
guide = guide_legend(override.aes = list(fill = "gray50"))
) +
xlab("displacement (cu. in.)") +
ylab("power (hp)") +
theme_dviz_open() +
theme(
legend.title.align = 0.5,
legend.spacing.y = grid::unit(4, "pt")
)
ggdraw(align_legend(p))
```
You may wonder whether the problem with 3D scatter plots is that the actual data representation, the dots, do not themselves convey any 3D information. What happens, for example, if we use 3D bars instead? Figure \@ref(fig:VA-death-rates-3d) shows a typical dataset that one might visualize with 3D bars, the mortality rates in 1940 Virginia stratified by age group and by gender and housing location. We can see that indeed the 3D bars help us interpret the plot. It is unlikely that one might mistake a bar in the foreground for one in the background or vise versa. Nevertheless, the problems discussed in the context of Figure \@ref(fig:titanic-3d) exist here as well. It is difficult to judge exactly how tall the individual bars are, and it is also difficult to make direct comparisons. For example, was the mortality rate of urban females in the 65--69 age group higher or lower than that of urban males in the 60--64 age group?
(ref:VA-death-rates-3d) Mortality rates in Virginia in 1940, visualized as a 3D bar plot. Mortality rates are shown for four groups of people (urban and rural females and males) and five age categories (50--54, 55--59, 60--64, 65--69, 70--74), and they are reported in units of deaths per 1000 persons. This figure is labeled as "bad" because the 3D perspective makes the plot difficult to read. Data source: @Molyneaux-et-al-1947
```{r VA-death-rates-3d, fig.width = 4.64, fig.asp = 0.8, fig.cap = '(ref:VA-death-rates-3d)'}
pfun3 <- function() {
par(xpd = NA,
bg = "transparent",
mai = c(0.1, 0.1, 0, 0),
family = dviz_font_family_condensed
)
hist3D(x = 1:5, y = 1:4, z = VADeaths,
bty = "b2", phi = 20, theta = -65,
xlab = "", ylab = "", zlab = "deaths / 1000",
col = "#56B4E9", border = darken("#56B4E9", .5),
shade = 0.2,
ticktype = "detailed", space = .3, d = 2, cex.axis = 1e-9
)
# Use text3D to label x axis
text3D(x = 1:5, y = rep(0.5, 5), z = rep(3, 5),
labels = rownames(VADeaths),
add = TRUE, adj = -0.2)
# Use text3D to label y axis
text3D(x = rep(0., 3), y = rep(5, 3), z = 20*(1:3),
labels = 20*(1:3),
add = TRUE, adj = -1.2)
# Use text3D to label z axis
text3D(x = rep(1, 4), y = 1:4, z = rep(0, 4),
labels = colnames(VADeaths),
add = TRUE, adj = 1)
}
# Doesn't currently work, due to bug in gridGraphics. Revisit later.
# For now, we do a work-around by rendering to png.
#stamp_bad(pfun3)
png("figures/VA-deaths.png", width = 4.64, height = 0.618*6, units = "in",
res = 600)
pfun3()
null <- dev.off()
stamp_bad(ggdraw() + draw_image("figures/VA-deaths.png"))
```
In general, it is better to use Trellis plots (Chapter \@ref(multi-panel-figures)) instead of 3D visualizations. The Virginia mortality dataset requires only four panels when shown as Trellis plot (Figure \@ref(fig:VA-death-rates-Trellis)). I consider this figure clear and easy to interpret. It is immediately obvious that mortality rates were higher among men than among women, and also that urban males seem to have had higher mortality rates than rural males whereas no such trend is apparent for urban and rural females.
(ref:VA-death-rates-Trellis) Mortality rates in Virginia in 1940, visualized as a Trellis plot. Mortality rates are shown for four groups of people (urban and rural females and males) and five age categories (50--54, 55--59, 60--64, 65--69, 70--74), and they are reported in units of deaths per 1000 persons. Data source: @Molyneaux-et-al-1947
```{r VA-death-rates-Trellis, fig.cap = '(ref:VA-death-rates-Trellis)'}
df <- data.frame(VADeaths)
df$age <- row.names(df)
row.names(df) <- NULL
df_long <- gather(df, type, rate, -age) %>%
mutate(type = case_when(type == "Urban.Male" ~ "urban male",
type == "Urban.Female" ~ "urban female",
type == "Rural.Male" ~ "rural male",
type == "Rural.Female" ~ "rural female"))
ggplot(df_long, aes(age, rate)) +
geom_col(fill = "#56B4E9D0") +
facet_wrap(~type) +
scale_y_continuous(name = "deaths / 1000", expand = c(0, 0)) +
scale_x_discrete(name = "age group") +
theme_dviz_hgrid()
```
## Appropriate use of 3D visualizations
Visualizations using 3D position scales can sometimes be appropriate, however. First, the issues described in the preceding section are of lesser concern if the visualization is interactive and can be rotated by the viewer, or alternatively, if it is shown in a VR or augmented reality environment where it can be inspected from multiple angles. Second, even if the visualization isn't interactive, showing it slowly rotating, rather than as a static image from one perspective, will allow the viewer to discern where in 3D space different graphical elements reside. The human brain is very good at reconstructing a 3D scene from a series of images taken from different angles, and the slow rotation of the graphic provides exactly these images.
Finally, it makes sense to use 3D visualizations when we want to show actual 3D objects and/or data mapped onto them. For example, showing the topographic relief of a mountainous island is a reasonable choice (Figure \@ref(fig:corsica-relief)). Similarly, if we want to visualize the evolutionary sequence conservation of a protein mapped onto its structure, it makes sense to show the structure as a 3D object (Figure \@ref(fig:protein-3d)). In either case, however, these visualizations would still be easier to interpret if they were shown as rotating animations. While this is not possible in traditional print publications, it can be done easily when posting figures on the web or when giving presentations.
(ref:corsica-relief) Relief of the Island of Corsica in the Mediterranean Sea. Data source: Copernicus Land Monitoring Service
```{r corsica-relief, fig.cap = '(ref:corsica-relief)'}
# figure rendered with rayshader; it's too slow to do while rendering the R markdown
knitr::include_graphics("figures/3d/Corsica.png", auto_pdf = FALSE)
```
(ref:protein-3d) Patterns of evolutionary variation in a protein. The colored tube represents the backbone of the protein Exonuclease III from the bacterium *Escherichia coli* (Protein Data Bank identifier: 1AKO). The coloring indicates the evolutionary conservation of the individual sites in this protein, with dark coloring indicating conserved amino acids and light coloring indicating variable amino acids. Data source: @Marcos-Echave-2015
```{r protein-3d, fig.width = 4.5, fig.asp = (16+4)/19, fig.cap = '(ref:protein-3d)'}
# Make legend via ggplot2
df <- data.frame(x = 1:10,
fill = runif(10))
p <- ggplot(df, aes(x, y = 1, fill = fill)) + geom_tile() +
scale_fill_gradient2(low = darken("#A6522B", .07), mid = darken("#FFFF00", .05),
high = darken("#FFFFFF", .02),
midpoint = .5,
limits = c(0, 1),
breaks = c(0, 1),
labels = c("highly\nconserved", "highly\nvariable"),
name = "sequence conservation",
guide = guide_colorbar(direction = "horizontal",
label.position = "bottom",
title.position = "top",
ticks = FALSE,
barwidth = grid::unit(3.5, "in"),
barheight = grid::unit(0.2, "in"))) +
theme_dviz_open(12) +
theme(legend.title.align = 0.5,
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.justification = "center")
legend <- get_legend(p)
plot_grid(ggdraw() + draw_image("figures/3d/1AKO-cropped.png"),
legend, ncol = 1, rel_heights = c(16, 4))
```