R/summarized_experiment.R
chris-SummarizedExperiment.Rd
The SummarizedExperiment()
class is a container for data from large scale
assays for biological experiments. In contrast to TDF data, samples are
organized in columns of a SummarizedExperiment
and measurements in rows.
The data
, labels
and groups
methods allow to extract information from
such objects in a Textual Dataset File (TDF)-compliant format
(structure):
columns are variables, rows individuals (samples).
variable IDs (labels) follow a standard format (e.g. "x0pt001"
).
The available methods are:
data
: export the (quantitative) assay data from an
SummarizedExperiment
as a data.frame
with columns representing
variables and rows samples (study participants). Parameter assayNames.
allows to specify which of the assays from the SummarizedExperiment
should be extracted. Variables (rows) in the SummarizedExperiment
get
assigned a new variable ID (called label), which consists of the
labelPrefix
followed by an integer representing the index of the
variable in the SummarizedExperiment
(i.e. the row number of the
variable in the SummarizedExperiment
). A letter is appended to IDs for
variables from assays different than the first one. Thus, "x0xx001"
corresponds to the first row in the first assay, while "x0xx001a"
represents the first row in the second assay. The colnames
of the
SummarizedExperiment
are used as sample identifiers and are returned in
column "aid"
of the result data.frame
.
groups
: retrieves a data.frame
that specifies the grouping of
variables returned by data
from a SummarizedExperiment
. Columns
(variables) containing data from the same assay
of the
SummarizedExperiment
are grouped into the same group.
labels
: extracts label annotations for the data extracted with
data
from a SummarizedExperiment
. The returned data.frame
is in the
labels format of the TDF but contains additional columns with the
available annotations from the SummarizedExperiment
's rowData()
. The
rownames
of the SummarizedExperiment
are returned in columns
"description"
.
a SummarizedExperiment
object
character
defining the names of the assays in
object
from which data should be extracted.
character(1)
defining the prefix for the variable IDs
(labels) of the object.
Further arguments passed to downstream groups
method
A SummarizedExperiment
object.
a data.frame
with the data.
## Create a simple SummarizedExperiment with some random data as one assay
## and a second assay with all values multiplied with 2. For a
## SummarizedExperiment columns represent samples and rows measurements
## (variables).
mat <- matrix(rnorm(60), ncol = 10, nrow = 6)
## SummarizedExperiments allow to store also column and row annotations along
## with the data. We thus define below a data.frame with some annotations
## for the variables.
rowd <- data.frame(analyte_id = paste0("id", 1:6), analyte_name = letters[1:6])
rownames(rowd) <- rowd$analyte_id
library(SummarizedExperiment)
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
se <- SummarizedExperiment(
assay = list(values = mat, double = 2 * mat),
rowData = rowd)
se
#> class: SummarizedExperiment
#> dim: 6 10
#> metadata(0):
#> assays(2): values double
#> rownames(6): id1 id2 ... id5 id6
#> rowData names(2): analyte_id analyte_name
#> colnames: NULL
#> colData names(0):
## What assays are available?
assayNames(se)
#> [1] "values" "double"
## Get a data.frame with all variables
data(se)
#> aid x0xx1 x0xx2 x0xx3 x0xx4 x0xx5 x0xx6
#> 1 1 -0.8965242 0.90966433 -0.1288219 -0.013031450 0.3597483 1.1864404
#> 2 2 2.2266181 -0.24722918 -0.5157435 -0.816109299 1.4736489 0.1180701
#> 3 3 0.1875334 -0.12046762 -0.6131071 1.118765882 -1.6989048 0.3422766
#> 4 4 0.3422617 -0.92282700 -0.7719974 0.193640410 1.0732061 0.6174077
#> 5 5 0.1515003 -0.71151095 -0.1881688 -0.774229785 -0.8924119 0.7953266
#> 6 6 0.3369002 -0.76522830 -1.2260711 -0.009805471 -0.4122893 1.0963385
#> 7 7 0.9444024 0.14786264 1.2996096 1.306019228 -0.1200253 0.5310052
#> 8 8 0.9207570 -1.22420575 1.3915198 -0.163163564 -0.7596888 -0.3985823
#> 9 9 -1.1429766 -0.87863854 1.4535518 -0.710723674 0.5847751 -0.8690617
#> 10 10 0.3760858 -0.04179954 1.0281860 -1.614637827 0.4474193 -1.2262790
#> x0xx1a x0xx2a x0xx3a x0xx4a x0xx5a x0xx6a
#> 1 -1.7930484 1.81932866 -0.2576439 -0.02606290 0.7194965 2.3728809
#> 2 4.4532362 -0.49445836 -1.0314870 -1.63221860 2.9472977 0.2361401
#> 3 0.3750669 -0.24093523 -1.2262142 2.23753176 -3.3978095 0.6845531
#> 4 0.6845233 -1.84565400 -1.5439949 0.38728082 2.1464122 1.2348153
#> 5 0.3030007 -1.42302190 -0.3763377 -1.54845957 -1.7848239 1.5906533
#> 6 0.6738003 -1.53045660 -2.4521422 -0.01961094 -0.8245786 2.1926771
#> 7 1.8888048 0.29572529 2.5992192 2.61203846 -0.2400506 1.0620104
#> 8 1.8415140 -2.44841150 2.7830395 -0.32632713 -1.5193776 -0.7971645
#> 9 -2.2859533 -1.75727709 2.9071036 -1.42144735 1.1695503 -1.7381233
#> 10 0.7521716 -0.08359908 2.0563720 -3.22927565 0.8948385 -2.4525579
## Get the label information
labels(se)
#> label unit type min max missing description
#> x0xx1 x0xx1 float -1.142977 2.2266181 -89 id1
#> x0xx2 x0xx2 float -1.224206 0.9096643 -89 id2
#> x0xx3 x0xx3 float -1.226071 1.4535518 -89 id3
#> x0xx4 x0xx4 float -1.614638 1.3060192 -89 id4
#> x0xx5 x0xx5 float -1.698905 1.4736489 -89 id5
#> x0xx6 x0xx6 float -1.226279 1.1864404 -89 id6
#> x0xx1a x0xx1a float -2.285953 4.4532362 -89 id1 assay double
#> x0xx2a x0xx2a float -2.448412 1.8193287 -89 id2 assay double
#> x0xx3a x0xx3a float -2.452142 2.9071036 -89 id3 assay double
#> x0xx4a x0xx4a float -3.229276 2.6120385 -89 id4 assay double
#> x0xx5a x0xx5a float -3.397810 2.9472977 -89 id5 assay double
#> x0xx6a x0xx6a float -2.452558 2.3728809 -89 id6 assay double
#> analyte_id analyte_name
#> x0xx1 id1 a
#> x0xx2 id2 b
#> x0xx3 id3 c
#> x0xx4 id4 d
#> x0xx5 id5 e
#> x0xx6 id6 f
#> x0xx1a id1 a
#> x0xx2a id2 b
#> x0xx3a id3 c
#> x0xx4a id4 d
#> x0xx5a id5 e
#> x0xx6a id6 f
## Get the variable grouping
groups(se)
#> group label
#> 1 assay_values x0xx1
#> 2 assay_values x0xx2
#> 3 assay_values x0xx3
#> 4 assay_values x0xx4
#> 5 assay_values x0xx5
#> 6 assay_values x0xx6
#> 7 assay_double x0xx1a
#> 8 assay_double x0xx2a
#> 9 assay_double x0xx3a
#> 10 assay_double x0xx4a
#> 11 assay_double x0xx5a
#> 12 assay_double x0xx6a
#> 13 analyte_x0xx1 x0xx1
#> 14 analyte_x0xx1 x0xx1a
#> 15 analyte_x0xx2 x0xx2
#> 16 analyte_x0xx2 x0xx2a
#> 17 analyte_x0xx3 x0xx3
#> 18 analyte_x0xx3 x0xx3a
#> 19 analyte_x0xx4 x0xx4
#> 20 analyte_x0xx4 x0xx4a
#> 21 analyte_x0xx5 x0xx5
#> 22 analyte_x0xx5 x0xx5a
#> 23 analyte_x0xx6 x0xx6
#> 24 analyte_x0xx6 x0xx6a