-
Notifications
You must be signed in to change notification settings - Fork 0
/
NMDS.Rmd
137 lines (110 loc) · 8.04 KB
/
NMDS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "NMDS"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(vegan)
library(BiodiversityR)
library(viridis)
library(ggvegan)
library(ggpubr)
library(kableExtra)
library(knitr)
```
```{r data, echo=FALSE, message=TRUE, warning=TRUE, include=TRUE}
ShaSPP <- read.csv2("https://raw.githubusercontent.com/cmlglvz/datasets/master/Data/eAnalisis/ShaSPP.csv", header = TRUE, sep = ";", dec = ".", skip = 0)
rownames(ShaSPP) <- ShaSPP[, 1]
ShaSPP <- ShaSPP[, -1]
```
From [GUSTA.ME](https://mb3is.megx.net/gustame/reference/transformations)
## Data transformation
*Occasionally, the variables in a "raw" data set have properties that violate an assumption of a statistical procedure (e.g. normally distributed values) or which cannot be compared to other variables due to differences in scale or variability. For example, principal components analysis (PCA) requires that variables be linearly related to one another and on roughly the same scale or will perform poorly. Rather than abandoning an analysis due to inappropriate data structure, it may be possible to transform the variables so they satisfy the conditions in question. A transformation involves the application of a mathematical procedure to every value of a given variable or set of variables to create a new set of values. The new values of the transformed variables should still represent the data, but will be more amenable to analysis or comparison.*
*The sections below first describe some basic transformations and then discuss transformations specifically geared towards comparing variables. A set of ecologically-motivated transformations intended to allow Euclidean representation of ecological dissimilarities by methods such as PCA and redundancy analysis (RDA) are also summarised.*
__*Before you begin transforming your data, ensure there is a defined and well-supported reason to do so. Common rationale includes linearising, normalising, or standardising data in order to respect a method's assumptions.*__
### Ecologically motivated transformations
*Presented in Legendre and Gallagher (2001), the transformations listed below are closely related to several (dis)similarity and distance measures and have their collective basis in ecological theory. These transformations may be applied prior to analyses such as principal components analysis (PCA) or redundancy analysis (RDA) of, for example, abundance data. These analyses use simple Euclidean distances in their ordinations which are often not appropriate for count data. Hence, these transformations may improve the effectiveness of many analyses in representing ecological relationships. Formulae, further explanation, and examples are available in Legendre and Gallagher (2001).*
#### Hellinger
*Particularly suited to species abundance data, this transformation gives low weights to variables with low counts and many zeros. The transformation itself comprises dividing each value in a data matrix by its row sum, and taking the square root of the quotient.*
This transformation will be performed using `vegan` `decostand`function
```{r data transformation, echo=TRUE, message=TRUE, warning=TRUE, include=TRUE}
hel.trans <- decostand(ShaSPP, method = "hellinger")
```
We now have hellinger transformed abundance for the 52 species from **shared PPE-ASV** dataframe. We will select 20 species to perform the **NMDS analysis**. Species will be selected according to **species abundance rank order** with the `rankabundance()` function from `BiodiversityR` package
```{r ranking, echo=TRUE, message=TRUE, warning=TRUE}
rank <- rankabundance(hel.trans)
```
```{r ranking table, echo=FALSE, message=TRUE, warning=TRUE}
kbl(rank) %>% kable_minimal() %>% scroll_box(width = "900px", height = "300px")
```
With these 20 selected species we will do the **Non-metric multidimensional scaling** analysis
```{r selection by rank, echo=TRUE}
ranked <- rownames(rank)[1:20]
ranked.hel <- select(hel.trans, all_of(ranked))
nmds <- metaMDS(ranked.hel, autotransform = FALSE) #We already transformed the data, but as you can tell we can do it from this function
```
We can visualize the NMDS with `ordiplot()`function from `vegan` or `autoplot()` from `ggvegan`. The latter package implement **ggplot**-versions of the plots produced by `vegan`
```{r ordiplot, echo=TRUE, fig.align='center', fig.width=8, fig.cap="Grafico 1. NMDS para especies (en rojo) y las muestras correspondientes representando los 4 sitios de interes"}
ordiplot(nmds, type = "t")
```
```{r autoplot, echo=TRUE, fig.align='center', fig.width=10, fig.cap="Grafico 2. NMDS producido mediante `autoplot`. Sitios se representan con circulos en tono rojo, y especies mediante triangulos en tono verde"}
autoplot(nmds)
```
`ggplot2`give us more options to work with and easily improve the data visualization. We can use the `fortify()` function to transform the **NMDS** object to a fitter dataframe to plot to. Then we will use the `ggpubr`package to take advantage from the power of these plot tools
```{r fort, echo=TRUE, message=TRUE, warning=TRUE}
fort <- fortify(nmds)
```
```{r plots, echo=FALSE, warning=TRUE, message=TRUE, include=FALSE}
plot1 <- ggplot() +
geom_point(data = subset(fort, Score == 'sites'),
mapping = aes(x = NMDS1, y = NMDS2),
colour = c(rep("#E68431", 12), rep("#D8A64F", 12), rep("#183D84", 12), rep("#80E0FB", 5)),
alpha = 0.8,
shape = c(rep(15, 12), rep(17, 12), rep(16, 12), rep(18, 5)),
size = 2.5) +
geom_segment(data = subset(fort, Score == 'species'),
mapping = aes(x = 0, y = 0, xend = NMDS1, yend = NMDS2),
arrow = arrow(length = unit(0.015, "npc"),
type = "closed"),
colour = "darkgray",
size = 0,
alpha = 0) +
geom_text(data = subset(fort, Score == 'sites'),
mapping = aes(label = c(rep("Cha", 12), rep("Fla", 12), rep("Hu", 12), rep("Pc", 5)),
x = NMDS1 * 1.05,
y = NMDS2 * 1.15),
check_overlap = TRUE,
alpha = 0) +
geom_abline(intercept = 0, slope = 0, linetype = "dashed", size = 0.8, colour = "gray") +
geom_vline(aes(xintercept = 0), linetype = "dashed", size = 0.8, colour = "gray") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))
plot2 <- ggplot() +
geom_point(data = subset(fort, Score == 'sites'),
mapping = aes(x = NMDS1, y = NMDS2),
colour = "black",
alpha = 0) +
geom_segment(data = subset(fort, Score == 'species'),
mapping = aes(x = 0, y = 0, xend = NMDS1, yend = NMDS2),
arrow = arrow(length = unit(0.015, "npc"),
type = "closed"),
colour = "#F71B46",
size = 0.8) +
geom_text(data = subset(fort, Score == 'species'),
mapping = aes(label = Label, x = NMDS1 * 1.1, y = NMDS2 * 1.1),
check_overlap = TRUE) +
geom_abline(intercept = 0, slope = 0, linetype = "dashed", size = 0.8, colour = "gray") +
geom_vline(aes(xintercept = 0), linetype = "dashed", size = 0.8, colour = "gray") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line = element_line(colour = "black"))
```
```{r ggpubr, echo=TRUE, message=TRUE, warning=TRUE, fig.height=8, fig.align='center', fig.width=7, fig.cap= "Figura 1. Se presentan los gráficos 3 y 4 (superior e inferior respectivamente) de forma apilada. Grafico 3 corresponde a NMDS para los sitios representados por las muestras correspondientes sitios (formas y colores). Grafico 4 muestra NMDS para las especies respectivas a las secuencias de PPE-ASV compartidas en los sitios de interes"}
ggarrange(plot1, plot2, ncol = 1, align = "hv")
```
Notice that we call the objects corresponding to the plots for **sites** and **species**, this way we can facilitate the plots interpretation
#### END