Tutorial R : Spatial Analysis
14 pages
English

Tutorial R : Spatial Analysis

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
14 pages
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Tutorial using the free softwarePerforming a spatial Principal ComponentAnalysis using the R softwareT. JOMBARTNovember 2006For any questions or comments, please send an email to:jombart@biomserv.univ-lyon1.fr.Contents1 Introduction 22 Building a connection network 43 Performing the analyses 53.1 Testing for the existence of spatial structures . . . . . . . . . . . . . 63.2 Identifying and displaying the spatial structures . . . . . . . . . . . 83.2.1 The Principal Component Analysis . . . . . . . . . . . . . . 83.2.2 The spatial Principal Component Analysis . . . . . . . . . . 101T. JOMBART, A sPCA tutorial using R1 IntroductionThistutorialshowshowtoperformaspatial Principal Component Analysis (sPCA,Jombart et al., submitted), starting from a data set of allelic frequencies of geo-referenced individuals or groups of individuals. The second illustration presentedinthepreviouslycitedarticleisreproducedbelow. Itinvolvesgroupsofindividuals(cat colonies) data, but the process is exactly the same using individuals data.Before starting, please download the files Nancy cats.rda and map.ppm athttp://pbil.univ-lyon1.fr/R/html/additifs.html. The first file is a dataset with R format (’rda’), and the second one is a picture of the sampled area.We assume that both files were saved in your R working directory (which can beknown and set using the commands getwd() and setwd()). Several R packagesare also used to perform the analyses. These can be installed using ...

Informations

Publié par
Nombre de lectures 101
Langue English

Extrait

Tutorial using the free software
Performing a spatial Principal Component
Analysis using the R software
T. JOMBART
November 2006
For any questions or comments, please send an email to:
jombart@biomserv.univ-lyon1.fr.
Contents
1 Introduction 2
2 Building a connection network 4
3 Performing the analyses 5
3.1 Testing for the existence of spatial structures . . . . . . . . . . . . . 6
3.2 Identifying and displaying the spatial structures . . . . . . . . . . . 8
3.2.1 The Principal Component Analysis . . . . . . . . . . . . . . 8
3.2.2 The spatial Principal Component Analysis . . . . . . . . . . 10
1T. JOMBART, A sPCA tutorial using R
1 Introduction
Thistutorialshowshowtoperformaspatial Principal Component Analysis (sPCA,
Jombart et al., submitted), starting from a data set of allelic frequencies of geo-
referenced individuals or groups of individuals. The second illustration presented
inthepreviouslycitedarticleisreproducedbelow. Itinvolvesgroupsofindividuals
(cat colonies) data, but the process is exactly the same using individuals data.
Before starting, please download the files Nancy cats.rda and map.ppm at
http://pbil.univ-lyon1.fr/R/html/additifs.html. The first file is a data
set with R format (’rda’), and the second one is a picture of the sampled area.
We assume that both files were saved in your R working directory (which can be
known and set using the commands getwd() and setwd()). Several R packages
are also used to perform the analyses. These can be installed using the following
commands:
> install.packages("ade4", dep = TRUE)
> install.packages("spdep", dep = TRUE)
> install.packages("pixmap", dep = TRUE)
We first load the data object:
> load("Nancy_cats.rda")
> names(Nancy_cats)
[1] "xy" "tab" "sampsizes"
This list contains a matrix of geographic coordinates of the 17 stray cats colonies
(Nancy_cats$xy), a table of allelic frequencies for 9 microsatellite markers
(Nancy_cats$tab), and the number of cats sampled for each colonies
(Nancy_cats$sampsizes, not used in this tutorial). We rename these objects
respectively to xy and tab. We also load the map of the sampled area, stored as
a ppm file, using the pixmap library.
> xy = Nancy_cats$xy
> colnames(xy) <- c("x", "y")
> rownames(xy) <- 1:nrow(xy)
> head(xy)
x y
1 263.3498 171.10939
2 183.5028 122.40790
3 391.1050 254.70148
4 458.6121 41.72336
5 182.7769 219.08398
6 335.2121 344.83557
2T. JOMBART, A sPCA tutorial using R
> tab = Nancy_cats$tab
> tab[1:5, 1:6]
L1.01 L1.02 L1.03 L1.04 L1.05 L1.06
P01 0.00000000 0.00000000 0.1000000 0.3000000 0.10000000 0.05000000
P02 0.00000000 0.27272727 0.0000000 0.5454545 0.04545455 0.00000000
P03 0.04166667 0.16666667 0.1666667 0.2083333 0.00000000 0.08333333
P04 0.02173913 0.08695652 0.1304348 0.4130435 0.00000000 0.06521739
P05 0.00000000 0.00000000 0.0000000 0.5333333 0.00000000 0.10000000
> library(pixmap)
> map <- read.pnm("map.ppm")
> plot(map, main = "Sampled area and positions \nof the colonies")
At this step, we need to express the spatial relationships between the colonies
in order to include spatial information in further analyses. This is achieved by
building a connection network (also called ’neighbouring graph’), which defines
which colonies are neighbours or not.
3T. JOMBART, A sPCA tutorial using R
2 Building a connection network
As we have no information about the real connectivity between these stray cat
colonies, we use an algorithm to build their connection network. The R software
offers many tools to perform this task (see Table 1). A remaining difficulty is
that these tools are spread throughout different packages, the main of which are
presented below. Note that each package produces different output types (called
classes in R framework); however, this difficulty is easily circumvented by ’trans-
lation’ functions. For instance, having an object of the class titi, it will often be
possible to get an object of the class toto using a function titi2toto().
We choose Gabriel criterion (Gabriel and Sokal, 1969). The connection
network is built using the function gabrielneigh, whose argument is a matrix of
the geographic coordinates of the points.
> library(spdep)
Package SparseM (0.71) loaded. To cite, see citation("SparseM")
> gab.graph <- gabrielneigh(Nancy_cats$xy)
> class(gab.graph)
[1] "Graph" "Gabriel"
Note that we could add or remove connections of the Gabriel graph using the
function fix.nb of the spdep package. The presentation of usual algorithms other
than Gabriel criterion (Table 1) is beyond the scope of this document. For more
details, a starting point can be found in (Legendre and Legendre, 1998, p.
752-756), although R documentation may be sufficient.
Table 1: Usual algorithms used to build connection networks, and the corresponding func-
tions in the R software, precising their library, class and the associated conversion functions.
Name of the algorithm Function Library Class Useful conversion functions
Delaunay triangulation tri2nb tripack nb nb2neig
Gabriel graph gabrielneigh spdep graph graph2nb, nb2neig
Relative neighbours relativeneigh spdep graph graph2nb, nb2neig
K nearest neighbours knearneigh spdep knn knn2nb, nb2neig
Neighbourhood by distance dnearneigh spdep nb nb2neig
Minimum spanning tree mstree ade4 neig neig2nb
Among the many R classes related to the connection networks, we are partic-
ularly interested in the the following: nb, which can be viewed as the reference
4T. JOMBART, A sPCA tutorial using R
class for connection networks, listw, which is used in computations (Moran’s I
and sPCA), and neig which is used for plotting the results in ade4 graphical func-
tions. As said previously, many functions allow to convert one class into another
(Table 1).
In this example, we need graph2nb to convert a graph object into the a nb
object, required for further operations. We call the resulting object cn.gab.nb.
Note that the argument sym=TRUE specifies that spatial connections are symetrical
(i is connected to j implies that j is connected to i). We also proceed to other
conversions, latter involved in computations and graphical display.
> library(spdep)
> library(ade4)
> cn.gab.nb <- graph2nb(gab.graph, sym = TRUE)
> class(cn.gab.nb)
[1] "nb"
> cn.gab.listw <- nb2listw(cn.gab.nb)
> class(cn.gab.listw)
[1] "listw" "nb"
> cn.gab.neig <- nb2neig(cn.gab.nb)
> class(cn.gab.neig)
[1] "neig"
3 Performing the analyses
Once the connection network is defined, we can proceed to the analyses, which
can be detailed into two main steps. First, the existence of spatial structures is
assessed in the raw data, using a spatial autocorrelation test. If this test detects
the presence of spatial patterns, then these patterns are searched and displayed
using the sPCA.
5T. JOMBART, A sPCA tutorial using R
3.1 Testing for the existence of spatial structures
The procedure proposed by Smouse and Peakall (1999) is used to test the
existence of spatial autocorrelation. Basically, as this test is based on Moran’s
I, it would not be able to differenciate between the absence of structures, and a
mixture of global and local structures. As such a case is less likely to occur in a
single locus, we perform the test for each locus separately.
Smouse and Peakall (1999) procedure was generalized by Ollier (2004),
and is available in the function multispati.randtest (package: ade4). We have
to split the main allelic frequencies table in order to have a table for each locus.
Here are the numbers of alleles per marker:
> table(substr(colnames(tab), 1, 2))
L1 L2 L3 L4 L5 L6 L7 L8 L9
11 18 10 9 12 8 16 12 12
> blocs <- as.numeric(table(substr(colnames(tab), 1, 2)))
Then we use the ade4 function ktab.data.frame to create one table per marker.
> tab_loc <- ktab.data.frame(tab, blocs, tabnames = paste("loc",
+ 1:9, sep = "."))
> names(tab_loc)[1:9]
[1] "loc.1" "loc.2" "loc.3" "loc.4" "loc.5" "loc.6" "loc.7" "loc.8" "loc.9"
Now, we perform the spatial autocorrelation test for each of the nine tables. Note
that here, the function dudi.pca is only used by the procedure to get the inertia
of each table.
> SPtest <- lapply(1:9, function(i) multispati.randtest(dudi.pca(tab_loc[[i]],
+ center = T, scale = F, scannf = F), cn.gab.listw, nrep = 9999))
> names(SPtest) <- names(tab_loc)[1:9]
> SPtest
$loc.1
Monte-Carlo test
Observation: -0.1228694
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
Simulated p-value: 0.7608
$loc.2
Monte-Carlo test
Observation: 0.1051725
Call: multispati.randtest(dudi = dudi.pca(tab_loc[[i]], center = T,
scale = F, scannf = F), listw = cn.gab.listw, nrepet = 9999)
Based on 9999 replicates
6T. JOMBART, A sPCA tutorial using R
Simulated p-value: 0.0421
$loc.3

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents