R/group_feature_methods.R
groupFeatures-eic-correlation.Rd
Group features based on correlation of their extracted ion chromatograms.
This correlation is performed separately for each sample with the correlation
coefficients being aggregated across samples for the final comparison with
parameter threshold
(the 75% quantile of the per-sample correlation values
is used for the comparison with threshold
).
This feature grouping should be called after an initial feature
grouping by retention time (see SimilarRtimeParam()
). The feature groups
defined in columns "feature_group"
of featureDefinitions(object)
(for
features matching msLevel
) will be used and refined by this method.
Features with a value of NA
in featureDefinitions(object)$feature_group
will be skipped/not considered for feature grouping.
While being possible to be performed on the full data set without prior
feature grouping , this is not suggested for the following reasons: I) the
selection of the top n
samples with the highest signal for the
feature group will be biased by very abundant compounds as this is
performed on the full data set (i.e. the samples with the highest overall
intensities are used for correlation of all features) and II) it is
computationally much more expensive because a pairwise correlation between
all features has to be performed.
It is also suggested to perform the correlation on a subset of samples
per feature with the highest intensities of the peaks (for that feature)
although it would also be possible to run the correlation on all samples by
setting n
equal to the total number of samples in the data set. EIC
correlation should however be performed ideally on samples in which the
original compound is highly abundant to avoid correlation of missing values
or noisy peak shapes as much as possible.
By default also the signal which is outside identified chromatographic peaks
is excluded from the correlation (parameter clean
).
EicCorrelationParam( threshold = 0.9, n = 1, clean = TRUE, value = c("maxo", "into"), inclusive = FALSE ) # S4 method for XCMSnExp,EicCorrelationParam groupFeatures(object, param, msLevel = 1L)
threshold |
|
---|---|
n |
|
clean |
|
value |
|
inclusive |
|
object |
|
param |
|
msLevel |
|
input XCMSnExp
with feature groups added (i.e. in column
"feature_group"
of its featureDefinitions
data frame.
feature-grouping for a general overview.
Other feature grouping methods:
groupFeatures-abundance-correlation
,
groupFeatures-similar-rtime
Johannes Rainer
## Performing a quick preprocessing of a test data set. library(faahKO) fls <- c(system.file('cdf/KO/ko15.CDF', package = "faahKO"), system.file('cdf/KO/ko16.CDF', package = "faahKO"), system.file('cdf/WT/wt19.CDF', package = "faahKO")) od <- readMSData(fls, mode = "onDisk")#>xod <- findChromPeaks( od, param = CentWaveParam(noise = 10000, snthresh = 40, prefilter = c(3, 10000)))#>#>#>#>#>#>#>#>#>#>#>#>#>#>## Performing a feature grouping based on EIC correlation on a single ## sample xodg_grp <- groupFeatures(xodg, param = EicCorrelationParam(n = 1))#> | | | 0% | |======================================================================| 100%#> #> FG.001 FG.002 FG.003 FG.004 FG.005 FG.006 FG.007 FG.008 FG.009 FG.010 FG.011 #> 2 2 3 2 4 2 2 2 2 2 2 #> FG.012 FG.013 FG.014 FG.015 FG.016 FG.017 FG.018 FG.019 FG.020 FG.021 FG.022 #> 3 3 3 4 2 3 2 2 2 2 2 #> FG.023 FG.024 FG.025 FG.026 FG.027 FG.028 FG.029 FG.030 FG.031 FG.032 FG.033 #> 2 2 2 2 1 1 1 1 1 1 1 #> FG.034 FG.035 FG.036 FG.037 FG.038 FG.039 FG.040 FG.041 FG.042 FG.043 FG.044 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.045 FG.046 FG.047 FG.048 FG.049 FG.050 FG.051 FG.052 FG.053 FG.054 FG.055 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.056 FG.057 FG.058 FG.059 FG.060 FG.061 FG.062 FG.063 FG.064 FG.065 FG.066 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.067 FG.068 FG.069 FG.070 FG.071 FG.072 FG.073 FG.074 FG.075 FG.076 FG.077 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.078 FG.079 FG.080 FG.081 FG.082 FG.083 FG.084 FG.085 FG.086 FG.087 FG.088 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.089 FG.090 FG.091 FG.092 FG.093 FG.094 FG.095 FG.096 FG.097 FG.098 FG.099 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.100 FG.101 FG.102 FG.103 FG.104 FG.105 FG.106 FG.107 FG.108 FG.109 FG.110 #> 1 1 1 1 1 1 1 1 1 1 1 #> FG.111 FG.112 FG.113 FG.114 FG.115 FG.116 FG.117 FG.118 FG.119 FG.120 FG.121 #> 1 1 1 1 1 1 1 1 1 1 1## Usually it is better to perform this correlation on pre-grouped features ## e.g. based on similar retention time. xodg_grp <- groupFeatures(xodg, param = SimilarRtimeParam(diffRt = 4)) xodg_grp <- groupFeatures(xodg_grp, param = EicCorrelationParam(n = 1))#> | | | 0% | |= | 2% | |== | 3% | |=== | 4% | |==== | 5% | |==== | 6% | |===== | 8% | |====== | 8% | |======= | 10% | |======== | 12% | |========= | 13% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 21% | |================ | 23% | |================= | 24% | |================== | 26% | |==================== | 28% | |===================== | 29% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |============================ | 40% | |============================= | 42% | |============================== | 43% | |=============================== | 44% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |=================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 60% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================== | 80% | |========================================================= | 81% | |========================================================== | 83% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================= | 92% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%#> #> FG.001.001 FG.001.002 FG.001.003 FG.002.001 FG.002.002 FG.002.003 FG.003.001 #> 1 1 1 2 1 1 2 #> FG.004.001 FG.005.001 FG.005.002 FG.006.001 FG.007.001 FG.008.001 FG.009.001 #> 2 3 1 2 3 2 2 #> FG.010.001 FG.010.002 FG.011.001 FG.012.001 FG.012.002 FG.013.001 FG.013.002 #> 4 1 4 2 1 1 1 #> FG.014.001 FG.015.001 FG.016.001 FG.017.001 FG.017.002 FG.018.001 FG.018.002 #> 2 2 2 2 1 2 1 #> FG.019.001 FG.019.002 FG.020.001 FG.020.002 FG.021.001 FG.021.002 FG.022.001 #> 1 1 1 1 1 1 2 #> FG.022.002 FG.022.003 FG.023.001 FG.023.002 FG.023.003 FG.024.001 FG.024.002 #> 1 1 1 1 1 2 1 #> FG.025.001 FG.025.002 FG.026.001 FG.026.002 FG.027.001 FG.028.001 FG.029.001 #> 1 1 1 1 2 2 2 #> FG.030.001 FG.031.001 FG.031.002 FG.032.001 FG.032.002 FG.033.001 FG.033.002 #> 2 1 1 1 1 1 1 #> FG.034.001 FG.034.002 FG.035.001 FG.035.002 FG.036.001 FG.036.002 FG.037.001 #> 1 1 1 1 1 1 2 #> FG.038.001 FG.038.002 FG.039.001 FG.040.001 FG.040.002 FG.041.001 FG.042.001 #> 1 1 2 1 1 1 1 #> FG.043.001 FG.044.001 FG.045.001 FG.046.001 FG.047.001 FG.048.001 FG.049.001 #> 1 1 1 1 1 1 1 #> FG.050.001 FG.051.001 FG.052.001 FG.053.001 FG.054.001 FG.055.001 FG.056.001 #> 1 1 1 1 1 1 1 #> FG.057.001 FG.058.001 FG.059.001 FG.060.001 FG.061.001 FG.062.001 FG.063.001 #> 1 1 1 1 1 1 1 #> FG.064.001 FG.065.001 FG.066.001 FG.067.001 FG.068.001 FG.069.001 FG.070.001 #> 1 1 1 1 1 1 1 #> FG.071.001 FG.072.001 FG.073.001 FG.074.001 FG.075.001 FG.076.001 FG.077.001 #> 1 1 1 1 1 1 1 #> FG.078.001 FG.079.001 FG.080.001 FG.081.001 FG.082.001 FG.083.001 FG.084.001 #> 1 1 1 1 1 1 1 #> FG.085.001 FG.086.001 FG.087.001 FG.088.001 FG.089.001 FG.090.001 FG.091.001 #> 1 1 1 1 1 1 1 #> FG.092.001 FG.093.001 FG.094.001 FG.095.001 FG.096.001 FG.097.001 FG.098.001 #> 1 1 1 1 1 1 1