The export_tdf
exports the provided data in the TDFF format. The function
first creates all required folders, checks the input
files and then exports the data in the TDFF format (see below for more
information on this format).
The data is organized in the following way:
Within the base directory path
a folder name
is created for the
data set.
Within a folder with the version of the data set (parameter version
)
two folders data and docs are created. The actual data files
are stored in the data folder while the docs folder allows to
contains any documentation files (any file) related to the data set. The
docs folder contains also a file docs.txt that is supposed to
contain information for any added documentation file (this information
needs to be manually addedd).
Within the base folder (with the name of the data set) a NEWS.md file is created which is supposed to be manually edited to add some information or change log for the currently exported version of the data.
Automatic convertions performed by the export function are:
Columns in data
that are of data type factor
are correctly and
automatically converted to the expected format (i.e. their categories are
added to the mapping
data.frame
and the values are replaced with the
indices).
If not specified in labels
, columns "min"
, "max"
in labels
are
calculated on the provided data
.
Missing values in data
are automatically converted and the respective
encoding specified in labels
.
The labels_from_data
creates a template labels data.frame
from the
provided data
. The function retrieves various information like the data
type of the various columns from the provided data
and adds the
corresponding values to the data.frame
. Other columns, such as
"description"
or "unit"
need to be filled out manually.
The mapping_from_data
creates a mapping data.frame
for all categorical
variables in data
(i.e. columns in data
with data type factor
).
export_tdf(
name = character(),
description = character(),
version = character(),
date = character(),
path = ".",
data = data.frame(),
groups = data.frame(),
grp_labels = data.frame(),
labels = labels_from_data(data),
mapping = mapping_from_data(data),
na = -89
)
labels_from_data(data, na = -89)
mapping_from_data(data)
required character(1)
with the name of the data module.
character(1)
providing a description of the data.
required character(1)
with the version of the data.
character(1)
providing the date of the data.
character(1)
with the base path where the folders and data
files should be created. Defaults to path = "."
.
data.frame
with the data to export. Required column "aid"
is
expected to contain the unique identifiers of the participants. All
additional columns are expected to contain the data of additional
variables.
data.frame
with optional grouping of labels (variables) in
data
. Expected columns are "group"
and "label"
. See the TDF
definition for details.
data.frame
with the names (descriptions) of the groups
defined in groups
.
data.frame
with annotations to the variables (labels) in
data
. See the TDF definition for details. Columns "min"
, "max"
and "missing"
will be filled by the export_tdf
function if not
already provided. Defaults to labels = labels_from_data()
hence
creates a labels data.frame
from the provided data
. Any
annotations that are not part of the pre-defined hard set of
columns are stored in a separate file labels_additional_info.txt.
data.frame
with the definition of the levels (categories)
of the categorical variables in labels
. Expected columns are
"label"
, "code"
and "value"
. The default is
mapping = mapping_from_data(data)
and a mapping data.frame
is thus
generated by default from the provided data
.
the value to represent missing values in data
.
export_tdf
: (invisibly) returns a character(1)
with the path to
the folder where the data was stored. labels_from_data
returns a
labels
data.frame
based on the data in data
.
See the official Textual Dataset Format definition for a complete description of the format.
data: contains the data of the various variables. Columns are
variables, rows study participants. Column "aid"
is mandatory and
contains the ID of the study participants.
labels: provides information on the variables in data. Columns are
"label"
(the name of the column in data), "unit"
(unit of the
measured value), "type"
(the data type), "min"
(the minimal value),
"max"
(the maximal value), "missing"
(the value with which missing
values in data are encoded) and "description"
(a name/description of
the variable).
mapping: contains the encoding of categorical variables (factor
s) in
data. Required columns are: "label"
(the name of the column in data),
"code"
(the value of the category in data) and "value"
(the category,
i.e. the level
of the factor
).
groups: allows to optionally group variables in data. Expected
columns are "group"
(the name of the group) and "label"
(the name of
the column in data).
grp_labels: contains descriptions for the groups. Expected columns
are "group"
(the name of the group) and "description"
(the
name/description of the group).
## Exporting a test data set. Creating a *data* data.frame with data on
## 5 individuals.
d <- data.frame(
aid = c("00101", "00102", "00103", "00104", "00105"),
x0_sex = factor(c("Male", "Female", "Female", NA, "Male")),
x0_age = c(45, 54, 33, 36, 66),
x0_weight = c(78.5, 57.2, 55.2, 67.9, 84.2))
## Generate a *labels* data.frame from the data
l <- labels_from_data(d)
l
#> label unit type min max missing description
#> x0_sex x0_sex categorical NA NA -89
#> x0_age x0_age float 33.0 66.0 -89
#> x0_weight x0_weight float 55.2 84.2 -89
## Fill missing information to labels
l$unit <- c(NA, "Year", "kg")
l$description <- c("Sex", "Age", "Weight")
## Generate a *mapping* data.frame from data
m <- mapping_from_data(d)
m
#> label code value
#> x0_sex.1 x0_sex 1 Female
#> x0_sex.2 x0_sex 2 Male
## Create a simple grouping of all variables into a "general information"
## group
g <- data.frame(
group = c("ginfo", "ginfo", "ginfo"),
label = c("x0_sex", "x0_age", "x0_weight"))
## Define a description for the group
gl <- data.frame(group = "ginfo", description = "General information")
## Now export all data to a temporary folder
path <- tempdir()
## Export the data specifying the name of the module, the version and other
## information
export_tdf(name = "test_data", description = "Simple test data.",
version = "1.0.0", date = date(), path = path, data = d,
groups = g, grp_labels = gl, labels = l, mapping = m)
#> Data set was written to: /tmp/RtmprNgPNi/test_data/1.0.0/data