Data Modules — data • tidyfr

A Data Module provides the data of one specific module, which can be the interview, clinical blood parameters or the metabolomics or proteomics data sets. The actual data from a module is stored in the Textual Dataset Format (TDF - see TDF for more details).

The tidyfr package represents a data module with the DataModule object which provides all necessary functionality to import data of a module and to format it properly for R.

data(..., aidAsRownames = TRUE)

data_module(name = character(), version = character(), path = data_path())

grp_labels(object)

# S4 method for DataModule
labels(object)

groups(x, ...)

# S3 method for DataModule
groups(x, ...)

moduleName(object)

modulePath(object, base = FALSE)

moduleVersion(object)

moduleDescription(object)

moduleDate(object)

Arguments

...: Further arguments passed to downstream groups method
aidAsRownames: optional parameter for data: if TRUE (the default) the AIDs provided by the data module are used as row names of the returned data.frame (unless they are not unique).
name: For data_module: character(1) defining the name of the module to load.
version: For data_module: character(1) defining the version of the module to load.
path: For data_module: character(1) defining the path where data modules are stored.
object: A DataModule object.
x: A DataModule object.
base: For modulePath: logical(1) whether the base folder or the actual data folder should be returned. The base folder (returned with base = TRUE) is the folder of the module containing eventual multiple versions of it. The data folder (returned with base = FALSE, default) is the actual folder containing the data for the selected version of the module.

Value

See the individual function description.

Loading a module

Available data modules in a certain path can be listed using the list_data_modules() function.

data_module: load a specific data module. The name and version of the data module to load needs to be specified with parameters name and version respectively. Parameter path can be used to set the base path where the data module can be found. The function returns an instance of DataModule.

Accessing properties and data from a module

data: returns the data of a module as a data.frame. By default (with parameter aidAsRownames = TRUE) aids provided by the data module will be used as row names for the returned data.frame. For aidAsRownames = FALSE or if the aids provided by the module are not unique, a column "aid" is added (as first column) to the returned data.frame. Columns (variables) in the returned data.frame are correctly formatted (i.e. as factors, integers, numeric, character or date/time formats) according to the labels information of the data module. Use the labels function to retrieve variable information (annotation) from the data module.
groups: returns a data.frame with the optional grouping of variables. The group descriptions are provided byt the grp_labels function.
grp_labels: returns a data.frame with a description for each defined variable group.
labels: returns a data.frame with the description and annotation of the individual variables (labels).
moduleName: returns the name of a module.
modulePath: returns the (full) file path to the data module.
moduleVersion: returns the version of the data module.
moduleDescription: returns the description of the module.
moduleDate: returns the date of the module.

Managing data modules

Data maintainers can use functions listed here to manage existing data resources. Alternatively, see also export_tdf() for information how to create new data modules in TDF format.

remove_participants(): create a new version of the current module by removing participant data for individuals with the specified aids.

Author

Johannes Rainer

Examples


## List available test data modules provided by the tidyfr package
pth <- system.file("txt", package = "tidyfr")
list_data_modules(pth)
#>          name version
#> 1 db_example1   1.0.0
#> 2 db_example2   1.0.0
#> 3 db_example2   1.0.1
#>                                                  description
#> 1 CHRIS baseline dataset: General information (test version)
#> 2 CHRIS baseline dataset: General information (test version)
#> 3 CHRIS baseline dataset: General information (test version)

## Load one data module
mdl <- data_module(name = "db_example2", version = "1.0.1", path = pth)
mdl
#> Object of class DataModule 
#>  o name:	CHRIS baseline
#>  o version:	1.0.1
#>  o description:	CHRIS baseline dataset: General information (test version)
#>  o date:	2021-07-01

## Get the name, description and version of the module
moduleName(mdl)
#> [1] "CHRIS baseline"
moduleDescription(mdl)
#> [1] "CHRIS baseline dataset: General information (test version)"
moduleVersion(mdl)
#> [1] "1.0.1"

## Get the data from the module
d <- data(mdl)
d
#>            x0_sex   x0_age x0_ager  x0_birthd         x0_birthpc x0_residpc
#> 0010100001 Female 19.61465      20 1983-01-04 Vinschgau district       MALS
#> 0010200002   Male 54.44558      54 1956-03-08 Vinschgau district     LATSCH
#>              x0_examd x0_workf   x0_note  x0_notesaliva x0_noteint x0_noteself
#> 0010100001 2016-01-02        G something something else       <NA>        <NA>
#> 0010200002 2012-02-01        E      <NA>           <NA>       <NA>        <NA>
#>            x0_notespiro x0_birthm x0_birthy x0_examm x0_examy
#> 0010100001         <NA>         2      1983        2     2016
#> 0010200002         <NA>         3      1956        1     2012

## Variables are correctly formatted:
## categorical variables (factors):
d$x0_sex
#> [1] Female Male  
#> Levels: Male Female

## Dates:
d$x0_examd
#> [1] "2016-01-02 UTC" "2012-02-01 UTC"

## Numeric:
d$x0_age
#> [1] 19.61465 54.44558

## Get information on all variables
labels(mdl)
#>                       label  unit        type  min  max    missing
#> x0_age               x0_age  year       float    0  100        -89
#> x0_ager             x0_ager  year     integer    0  100        -89
#> x0_sex               x0_sex       categorical   NA   NA        -89
#> x0_birthpc       x0_birthpc       categorical   NA   NA        -89
#> x0_residpc       x0_residpc       categorical   NA   NA        -89
#> x0_examd           x0_examd  date        date   NA   NA 0000-00-00
#> x0_workf           x0_workf         character   NA   NA    missing
#> x0_birthd         x0_birthd              date   NA   NA 0000-00-00
#> x0_note             x0_note         character   NA   NA    missing
#> x0_notesaliva x0_notesaliva         character   NA   NA    missing
#> x0_noteint       x0_noteint         character   NA   NA    missing
#> x0_noteself     x0_noteself         character   NA   NA    missing
#> x0_notespiro   x0_notespiro         character   NA   NA    missing
#> x0_birthm         x0_birthm month     integer    1   12        -89
#> x0_birthy         x0_birthy  year     integer 1900 2021        -89
#> x0_examm           x0_examm month     integer    1   12        -89
#> x0_examy           x0_examy  year     integer 2000 2021        -89
#>                                 description
#> x0_age           Age at examination (years)
#> x0_ager          Age at examination (years)
#> x0_sex                                  Sex
#> x0_birthpc     Birthplace (offical) - coded
#> x0_residpc    Place of recidence (official)
#> x0_examd                Date of examination
#> x0_workf                           Workflow
#> x0_birthd                     Date of birth
#> x0_note                 Participation notes
#> x0_notesaliva       Notes saliva collection
#> x0_noteint                  Notes interview
#> x0_noteself                Notes self-admin
#> x0_notespiro            Notes spiralography
#> x0_birthm                    Month of birth
#> x0_birthy                     Year of birth
#> x0_examm               Month of examination
#> x0_examy                Year of examination

## Get information of variable grouping
groups(mdl)
#>            group         label
#> 1         person        x0_sex
#> 2         person        x0_age
#> 3         person       x0_ager
#> 4         person     x0_birthd
#> 5         person     x0_birthm
#> 6         person     x0_birthy
#> 7         person    x0_birthpc
#> 8         person    x0_residpc
#> 9            age        x0_age
#> 10           age       x0_ager
#> 11     birthdate     x0_birthd
#> 12     birthdate     x0_birthm
#> 13     birthdate     x0_birthy
#> 14 participation      x0_examd
#> 15 participation      x0_examm
#> 16 participation      x0_examy
#> 17 participation      x0_workf
#> 18 participation       x0_note
#> 19 participation x0_notesaliva
#> 20 participation    x0_noteint
#> 21 participation   x0_noteself
#> 22 participation  x0_notespiro
#> 23      examdate      x0_examd
#> 24      examdate      x0_examm
#> 25      examdate      x0_examy
#> 26         notes       x0_note
#> 27         notes x0_notesaliva
#> 28         notes    x0_noteint
#> 29         notes   x0_noteself
#> 30         notes  x0_notespiro

## Get the corresponding group description
grp_labels(mdl)
#>                       group                description
#> person               person              Personal data
#> participation participation Participation-related data
#> age                     age                        Age
#> birthdate         birthdate                 Birth date
#> examdate           examdate           Examination date
#> notes                 notes                      Notes