The tidyfr package provides convenience and utility functions to import data stored in the new Textual Dateset Format into R. These functions ensure the data is properly formatted which include the correct encoding categorical variables (factors) or also missing values.

TDF data is structured in the following format:

  • data: contains the data of the various variables. Columns are variables, rows study participants. Column "aid" contains the ID of the study participants.

  • labels: provides information on the variables in data. Columns are "label" (the name of the column in data), "unit" (unit of the measured value), "type" (the data type), "min" (the minimal value), "max" (the maximal value), "missing" (the value with which missing values in data are encoded) and "description" (a name/description of the variable). Optional additional columns (annotations) might be available depending on the data module.

  • groups: provides optional grouping of variables in data.

  • grp_labels: contains descriptions for the groups.

See the official TDF definition for a complete description of the format.

Accessing Data

The following functions can be used to import CHRIS data from one of the available data modules.

  • data: retrieves the data.

  • groups: retrieves the grouping information for variables.

  • labels: retrieves additional annotation/information for specific variables of a module.

Which data is retrieved depends on the parameter object. See chris-SummarizedExperiment if object is a SummarizedExperiment().

Author

Johannes Rainer