pcadapt
performs principal component analysis and computes p-values to
test for outliers. The test for outliers is based on the correlations between
genetic variation and the first K
principal components. pcadapt
also handles Pool-seq data for which the statistical analysis is performed on
the genetic markers frequencies. Returns an object of class pcadapt
.
pcadapt(
input,
K = 2,
method = "mahalanobis",
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
# S3 method for pcadapt_matrix
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
# S3 method for pcadapt_bed
pcadapt(
input,
K = 2,
method = c("mahalanobis", "componentwise"),
min.maf = 0.05,
ploidy = 2,
LD.clumping = NULL,
pca.only = FALSE,
tol = 1e-04
)
# S3 method for pcadapt_pool
pcadapt(
input,
K = (nrow(input) - 1),
method = "mahalanobis",
min.maf = 0.05,
ploidy = NULL,
LD.clumping = NULL,
pca.only = FALSE,
tol
)
The output of function read.pcadapt
.
an integer specifying the number of principal components to retain.
a character string specifying the method to be used to compute
the p-values. Two statistics are currently available, "mahalanobis"
,
and "componentwise"
.
Threshold of minor allele frequencies above which p-values are
computed. Default is 0.05
.
Number of trials, parameter of the binomial distribution. Default is 2, which corresponds to diploidy, such as for the human genome.
Default is NULL
and doesn't use any SNP thinning.
If you want to use SNP thinning, provide a named list with parameters
$size
and $thr
which corresponds respectively to the window
radius and the squared correlation threshold. A good default value would
be list(size = 500, thr = 0.1)
.
a logical value indicating whether PCA results should be returned (before computing any statistic).
Convergence criterion of RSpectra::svds()
.
Default is 1e-4
.
The returned value is an object of class pcadapt
.
First, a principal component analysis is performed on the scaled and
centered genotype data. Depending on the specified method
, different
test statistics can be used.
mahalanobis
(default): the robust Mahalanobis distance is computed for
each genetic marker using a robust estimate of both mean and covariance
matrix between the K
vectors of z-scores.
communality
: the communality statistic measures the proportion of
variance explained by the first K
PCs. Deprecated in version 4.0.0.
componentwise
: returns a matrix of z-scores.
To compute p-values, test statistics (stat
) are divided by a genomic
inflation factor (gif
) when method="mahalanobis"
. When using
method="mahalanobis"
, the scaled statistics
(chi2_stat
) should follow a chi-squared distribution with K
degrees of freedom. When using method="componentwise"
, the z-scores
should follow a chi-squared distribution with 1
degree of freedom. For
Pool-seq data, pcadapt
provides p-values based on the Mahalanobis
distance for each SNP.