This tutorial guides you through the workflow of conducting a spatial
phylogenetics analysis in R, using the very recently developped R
package phylospatial
. This tutorial relies heavily on the
vignettes of this package written by Matthew Kling. You
can find more details about the functions and the package by visiting
the repository
https://matthewkling.github.io/phylospatial/
. One of the
most important developments of this package is the fact that it is able
to deal with different types of input occurrence data, from
presence/abscence, to abundances, to probabilities of occurrence derived
from SDMs (Species distribution models).
Phylogenetic diversity (PD) is a measurement of the relatedeness of a set of taxa occurring in a region. In practice, measuring phylogenetic diversity involves adding the branch lengths that connect a set of taxa in a phylogeny. Like other diversity metrics, measurments of PD across the geographic space allows us to understand patterns of diversity. For example we aim to understand questions like: how much does PD varies across pixels (subsets of equal size) of a given region? Are there pixels with significantly higher or lower PD? The answer to these questions can lead both to studies on the processes that generate such patterns, but also to direct applications on conservation.
While studying PD has been used in multiple contexts, in this tutorial we will focus on a specific approach named “spatial phylogenetics”. This approach focuses on the study of the patterns of PD and other derived metrics across the geographic space, treating it as a set of pixels.
In this tutorial we will use a dataset of California mosses that is a subset of the data used for in the recent publication “Spatial Phylogenetics with Continuous Data: an Application to California Bryophytes” where we studied the phylogenetic diversity patterns of bryophytes in California.
Let’s install the packages we will need.
# Let's make sure we have the libraries we need
# if you haven't installed them, uncomment the line below
# install.packages('phylospatial')
library(phylospatial)
library(terra)
library(ape)
library(sf)
library(tmap)
We will use the dataset that is distributed with the library
phylospatial
. First let’s read a raster data that contains
the community data
moss_comm <- rast(system.file("extdata", "moss_comm.tif", package = "phylospatial"))
# check the structure of this object by uncommenting and running the line below
# moss_comm
# plot it!
plot(moss_comm)
Now we read the phylogeny and plot it.
moss_tree <- read.tree(system.file("extdata", "moss_tree.nex", package = "phylospatial"))
# check what the object looks like by uncommenting the line below
# moss_tree
# plot it
plot(moss_tree, type = "fan", cex = 0.15)
Now let’s use the function phylospatial()
to create an
phylospatial object. This function requires two arguments: the community
data set, and a phylogeny of the terminals (usually species) in such
dataset.
ps <- phylospatial(moss_comm, moss_tree)
plot(ps, "comm")
comm
, moss_tree
,
and ps
?ps
?
Use the console to explore what information is stored un this
object.data_type=
when
constructing the phylospatial object.Now that we have our spatiaphylo object that contains information on
taxa occurrences in each pixel and a phylogeny with information on the
phylogenetic relantionships of those taxa, we might naturally want to
know what is the PD of each pixel. The package phylospatial
computes all the alpha-phylogenetic diversity metrics with the function
ps_diversity()
. In the argument metric=
we can
specify which metrics we want to compute. Today, we will compute taxon
diversity (TD), phylogenetic diversity (PD), and phylogenetic endemism
(PE) as describe below (but notice that there are many others metrics
that can be calculated, you can explore that by looking up the function
documentation ?ps_diversity()
):
# compute the alpha diversity metrics
div <- ps_diversity(ps, metric = c("TD","TE", "PD", "PE"))
Use the console to explore the structure of the object
div
. And let’s plot it!
# plot diversity metrics
#PD
TD_plot <- tm_shape(div$TD) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Taxon diversity, TD")
TE_plot <- tm_shape(div$TE) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Taxon endemism, TE")
PD_plot <- tm_shape(div$PD) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Phylogenetic diversity, PD")
PE_plot <- tm_shape(div$PE) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Phylogenetic endemism, PE")
# plots
tmap_arrange(TD_plot, TE_plot, PD_plot, PE_plot, nrow = 2, ncol = 2)
In the figure above, we observe that the phylogenetic diversity is highly correlated with the taxon diversity. The more terminals occur in a pixel, the more branch lengths we add when computing PD. If we are interested in knowing wether a pixel has relatively longer branches while correcting for the number of taxa in that pixel, we can use “relative metrics”. Let’s visualize how PD and relative phylogenetic diversity look like for our dataset.
# calculate pd an rpd
pd_rpd <- ps_diversity(ps, metric = c("PD","RPD"))
# plot them
pd_plot <- tm_shape(pd_rpd$PD) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Phylogenetic diversity, PD")
rpd_plot <- tm_shape(pd_rpd$RPD) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE) + tm_title("Relative phylogenetic diversity, RPD")
tmap_arrange(pd_plot, rpd_plot, ncol = 2)
So far we have explored the spatial patterns of diversity metrics.
Our observations allow us to compare the diversity across pixels.
Nevertheless, we would like to identify regions that have significantly
different values than what we would expect, i.e., we want to assign a
significance value to our observations. For that purpose, we will use a
randomization test. A randomization test involves comparing the observed
data to a null expectation. So in this case, we will need to generate a
null expectation of what the values of the diversity metrics would be
for each pixel. The function ps_rand()
internally applies
the quantize()
algorithm to generate a null hypothesis
based on running multiple stratified randomizations.
To run this function, we need to pass on a phylospatial object, indicate the number of randomizations we want to use to create our null expectation, and indicate the metrics that we want to compare. Notice this step might take some time.
# Run the randomizations for our moss dataset, you can increase the number of randomizations modifying the argument n_rand
rand <- ps_rand(ps, n_rand = 50, progress = F,
metric = c("PD", "PE", "CE", "RPE"))
If you investigate the output of this function, you will observe that it consists of one raster per metric. Let’s plot one of these rasters.
tm_shape(rand$qPE) +
tm_raster(palette = "inferno", style = "cont") +
tm_layout(legend.outside = TRUE)
The values that are plotted indicate, for each cell, the proportion of the randomizations in which the observed value of the metric is larger than the randomization value. For example in the figure above, if a cell has a value of 0.8 this means that in 80% of the randomizations the observed value of PE was greater than the simulated value. In science we often use significance values of alpha = 0.5. If we were to apply such criterium in this scenario, we would only consider the cells with values greater than 0.95 as having significantly higher PE than expected.
Researches may be interested in testing the significance of particular metrics of phylodiversity, and they can use variants of the code above to do so. But one analysis that has become standard in spatial phylogenetics is CANAPE (a categorical analysis of neo- and paleo-endemism, described in Mishler et al. 2014. CANAPE consists on two steps. First, the pixels that have significantly high phylogenetic endemism (PE) are identified. And second, each of these pixels with significantly high PE are assigned into categories of endemism based on RPE, which as we discussed earlier reflects the relative length of the branches in a pixel in comparison with other pixels.
# run CANAPE
# notice we can modify the value of alpha
cp <- ps_canape(rand, alpha = .05)
# plot the results
terra::plot(cp)
The categories of significant endemism to which the pixels may be assigned are:
neoendemism: pixels that have significantly high endemism due to significantly relative short branches as measured in RPE.
paleoendemism: pixels that have significantly high endemism due to significantly relative long branches indicated by RPE
mixed-endemism: pixels that have significantly high endemism but no significance is detected in RPE, which is interpreted as a mix of short and long branches.
super-endemism: its a subcategory of mixed-endemism with particularly high significant endemism.
Another interesting aspect of diversity studies is to understand the spatial distribution of individual taxa, answering questions like how similar are communities across space? Is there an evident turnover of communities?. We will also address this question from the phylogenetic diversity perspective.
For this, we will start by computing a pairwise phylogenetic
dissimilarity index based on the Bray-Curtis distance, also known as
quantitative Sorensen’s dissimilarity. The function
ps_add_dissim()
will add a dissimilarity matrix to the
spatialphylo object.
ps <- ps_add_dissim(ps, method = "sorensen", endemism = TRUE, normalize = TRUE)
ps
## `phylospatial` object
## - 884 lineages across 1116 sites
## - community data type: probability
## - spatial data class: SpatRaster
## - dissimilarity data: sorensen
In the code above setting endemism = TRUE
will scale
every lineage’s occurrence values to sum to 1 across all sites, giving
greater weight to narrowly distributed taxa. Setting
normalize = TRUE
scales every site’s total occurrence value
to sum to 1 across taxa, which results in a distance matrix that
emphasizes proportional differences in composition rather than alpha
diversity gradients.
Once we have computed the phylogenetic dissimilarity matrix, we might
want to visualize how phylogenetically similar are the pixels. For this,
we will use the function ps_rbg()
. Internally, this
function will perform an ordination analysis—in this case we will use a
CMDS (classical multidimensional scaling)—that reduces the variation in
the communities to three components, which are then used to define
colors (using the RBG bands of color space). As a result, the more
similar the colors, the more phylogenetically similar are the
communities.
ps %>% # take the phylospatial object
ps_rgb(method = "cmds") %>% # declare what ordination algorithm we want to use
tm_shape() + # and make a plot
tm_rgb(max.value = 1, interpolate = FALSE) +
tm_layout(title = "Phylogenetic community ordination")
While this ordination visualization helps to visualize similarity, it
might be difficult to decide where to divide the map if we are trying to
do a regionalization of the geographic. For that purpose, we can use a
cluster analysis with the function ps_regions()
. This
function will require us to decide the number of regions that we want to
cluster the space in. While there is no perfect answer to define the
number of regions and the decision will be somewhat arbitrary, a common
approach in clustering methods is to select the least amount of
categories k
that explain most of the data. To find out
that value for our data we will use the function
ps_regions_eval()
.
ps_regions_eval(ps, k = 1:15, plot = TRUE, method = "average")
The plot above suggests that K=3
might be a sensitive
value for our clustering analyses. So now we can run a regionalization
with k=3, and subsequently plot it.
ps %>%
ps_regions(k = 3, method = "average") %>% # regionalization
tm_shape() + # plot
tm_raster(style = "cat", palette = "Dark2",
title = "phylogenetic region") +
tm_layout(legend.outside = TRUE)
It is important to mention that there are multiple algorithms that can be used to generate the clusters, and the regionalization might vary depending on which one is selected, we recommend to read more about clustering methods before making a decision on which algorithm to use.
By now, I hope you have an idea of the type of analyses you can do with this R package. This tutorial is in no way exhaustive, as there are many other functionalities to explore. I encourage you to use the vignettes of the packages to learn more. In particular an aspect that might be of interest to many is the direct application in conservation with the included prioritization algorithm.