A Data Module provides the data of one specific module, which can be the interview, clinical blood parameters or the metabolomics or proteomics data sets. The actual data from a module is stored in the Textual Dataset Format (TDF - see TDF for more details).
The tidyfr
package represents a data module with the DataModule
object
which provides all necessary functionality to import data of a module and to
format it properly for R.
data(..., aidAsRownames = TRUE)
data_module(name = character(), version = character(), path = data_path())
grp_labels(object)
# S4 method for DataModule
labels(object)
groups(x, ...)
# S3 method for DataModule
groups(x, ...)
moduleName(object)
modulePath(object, base = FALSE)
moduleVersion(object)
moduleDescription(object)
moduleDate(object)
Further arguments passed to downstream groups
method
optional parameter for data
: if TRUE
(the default)
the AIDs provided by the data module are used as row names of the
returned data.frame
(unless they are not unique).
For data_module
: character(1)
defining the name of the
module to load.
For data_module
: character(1)
defining the version of the
module to load.
For data_module
: character(1)
defining the path where data
modules are stored.
A DataModule
object.
A DataModule
object.
For modulePath
: logical(1)
whether the base folder or the
actual data folder should be returned. The base folder (returned with
base = TRUE
) is the folder of the module containing eventual multiple
versions of it. The data folder (returned with base = FALSE
, default)
is the actual folder containing the data for the selected version of
the module.
See the individual function description.
Available data modules in a certain path can be listed using the
list_data_modules()
function.
data_module
: load a specific data module. The name and version of the
data module to load needs to be specified with parameters name
and
version
respectively. Parameter path
can be used to set the base path
where the data module can be found. The function returns an instance of
DataModule
.
data
: returns the data of a module as a data.frame
. By default (with
parameter aidAsRownames = TRUE
) aids provided by the data module
will be used as row names for the returned data.frame
.
For aidAsRownames = FALSE
or if the aids provided by the module are not
unique, a column "aid"
is added (as first column) to the returned
data.frame
.
Columns (variables) in the returned data.frame
are correctly formatted
(i.e. as factors
, integers
, numeric
, character
or date/time
formats) according to the labels information of the data module. Use
the labels
function to retrieve variable information (annotation)
from the data module.
groups
: returns a data.frame
with the optional grouping of variables.
The group descriptions are provided byt the grp_labels
function.
grp_labels
: returns a data.frame
with a description for each defined
variable group.
labels
: returns a data.frame
with the description and annotation of the
individual variables (labels).
moduleName
: returns the name of a module.
modulePath
: returns the (full) file path to the data module.
moduleVersion
: returns the version of the data module.
moduleDescription
: returns the description of the module.
moduleDate
: returns the date of the module.
Data maintainers can use functions listed here to manage existing data
resources. Alternatively, see also export_tdf()
for information how to
create new data modules in TDF format.
remove_participants()
: create a new version of the current module by
removing participant data for individuals with the specified aids.
## List available test data modules provided by the tidyfr package
pth <- system.file("txt", package = "tidyfr")
list_data_modules(pth)
#> name version
#> 1 db_example1 1.0.0
#> 2 db_example2 1.0.0
#> 3 db_example2 1.0.1
#> description
#> 1 CHRIS baseline dataset: General information (test version)
#> 2 CHRIS baseline dataset: General information (test version)
#> 3 CHRIS baseline dataset: General information (test version)
## Load one data module
mdl <- data_module(name = "db_example2", version = "1.0.1", path = pth)
mdl
#> Object of class DataModule
#> o name: CHRIS baseline
#> o version: 1.0.1
#> o description: CHRIS baseline dataset: General information (test version)
#> o date: 2021-07-01
## Get the name, description and version of the module
moduleName(mdl)
#> [1] "CHRIS baseline"
moduleDescription(mdl)
#> [1] "CHRIS baseline dataset: General information (test version)"
moduleVersion(mdl)
#> [1] "1.0.1"
## Get the data from the module
d <- data(mdl)
d
#> x0_sex x0_age x0_ager x0_birthd x0_birthpc x0_residpc
#> 0010100001 Female 19.61465 20 1983-01-04 Vinschgau district MALS
#> 0010200002 Male 54.44558 54 1956-03-08 Vinschgau district LATSCH
#> x0_examd x0_workf x0_note x0_notesaliva x0_noteint x0_noteself
#> 0010100001 2016-01-02 G something something else <NA> <NA>
#> 0010200002 2012-02-01 E <NA> <NA> <NA> <NA>
#> x0_notespiro x0_birthm x0_birthy x0_examm x0_examy
#> 0010100001 <NA> 2 1983 2 2016
#> 0010200002 <NA> 3 1956 1 2012
## Variables are correctly formatted:
## categorical variables (factors):
d$x0_sex
#> [1] Female Male
#> Levels: Male Female
## Dates:
d$x0_examd
#> [1] "2016-01-02 UTC" "2012-02-01 UTC"
## Numeric:
d$x0_age
#> [1] 19.61465 54.44558
## Get information on all variables
labels(mdl)
#> label unit type min max missing
#> x0_age x0_age year float 0 100 -89
#> x0_ager x0_ager year integer 0 100 -89
#> x0_sex x0_sex categorical NA NA -89
#> x0_birthpc x0_birthpc categorical NA NA -89
#> x0_residpc x0_residpc categorical NA NA -89
#> x0_examd x0_examd date date NA NA 0000-00-00
#> x0_workf x0_workf character NA NA missing
#> x0_birthd x0_birthd date NA NA 0000-00-00
#> x0_note x0_note character NA NA missing
#> x0_notesaliva x0_notesaliva character NA NA missing
#> x0_noteint x0_noteint character NA NA missing
#> x0_noteself x0_noteself character NA NA missing
#> x0_notespiro x0_notespiro character NA NA missing
#> x0_birthm x0_birthm month integer 1 12 -89
#> x0_birthy x0_birthy year integer 1900 2021 -89
#> x0_examm x0_examm month integer 1 12 -89
#> x0_examy x0_examy year integer 2000 2021 -89
#> description
#> x0_age Age at examination (years)
#> x0_ager Age at examination (years)
#> x0_sex Sex
#> x0_birthpc Birthplace (offical) - coded
#> x0_residpc Place of recidence (official)
#> x0_examd Date of examination
#> x0_workf Workflow
#> x0_birthd Date of birth
#> x0_note Participation notes
#> x0_notesaliva Notes saliva collection
#> x0_noteint Notes interview
#> x0_noteself Notes self-admin
#> x0_notespiro Notes spiralography
#> x0_birthm Month of birth
#> x0_birthy Year of birth
#> x0_examm Month of examination
#> x0_examy Year of examination
## Get information of variable grouping
groups(mdl)
#> group label
#> 1 person x0_sex
#> 2 person x0_age
#> 3 person x0_ager
#> 4 person x0_birthd
#> 5 person x0_birthm
#> 6 person x0_birthy
#> 7 person x0_birthpc
#> 8 person x0_residpc
#> 9 age x0_age
#> 10 age x0_ager
#> 11 birthdate x0_birthd
#> 12 birthdate x0_birthm
#> 13 birthdate x0_birthy
#> 14 participation x0_examd
#> 15 participation x0_examm
#> 16 participation x0_examy
#> 17 participation x0_workf
#> 18 participation x0_note
#> 19 participation x0_notesaliva
#> 20 participation x0_noteint
#> 21 participation x0_noteself
#> 22 participation x0_notespiro
#> 23 examdate x0_examd
#> 24 examdate x0_examm
#> 25 examdate x0_examy
#> 26 notes x0_note
#> 27 notes x0_notesaliva
#> 28 notes x0_noteint
#> 29 notes x0_noteself
#> 30 notes x0_notespiro
## Get the corresponding group description
grp_labels(mdl)
#> group description
#> person person Personal data
#> participation participation Participation-related data
#> age age Age
#> birthdate birthdate Birth date
#> examdate examdate Examination date
#> notes notes Notes