Match features based on their retention time and m/z values

matchRtMz matches elements (features) in x with elements in table based on similarity of their m/z and retention time. With parameter duplicates = "closest", the function return the index of the best match, considering the m/z difference of features that are within the acceptable retention time difference (defined by rt_tolerance). With duplicates = "keep" the index of all matching rows in table are returned.

matchRtMz(
  x,
  table,
  nomatch = NA_integer_,
  rt_tolerance = 2,
  tolerance = 0,
  ppm = 20,
  duplicates = c("closest", "keep"),
  mzcol = "mz",
  rtcol = "rt"
)

Arguments

x	`data.frame`, `matrix` or `DataFrame` with feature definitions (i.e. m/z and retentin times) that should be matched against features in `table`.
table	`data.frame`, `matrix` or `DataFrame` with feature definitions to match features in `x` against.
nomatch	value that should be returned if no match for a feature is found.
rt_tolerance	`numeric(1)` with the largest acceptable difference in retention time.
tolerance	`numeric(1)` with a constant acceptable difference of m/z values for features to be considered matching.
ppm	`numeric(1)` with a m/z-dependent relative acceptable difference (in parts per million) of m/z values.
duplicates	`character(1)` whether the best (`duplicates = "closest", default) or all matches (`duplicates = "keep"`) shpuld be returned.
mzcol	`character(1)` with the name of the column containing the m/z ratios.
rtcol	`character(1)` with the name of the column containing the retention times.

Value

for duplicates = "closest": integer of length equal to nrow(x) with the index of the row in table that matches each row in x (e.g. c(3, 4) means the first feature in x matches with the 3rd feature in table. For duplicates = "keep": list of length equal to nrow(x) with indices of all rows in table that match each row in x.

Note

The function first finds features in table with a difference of retention time which is smaller than rt_tolerance and matches these using the closest() function.

Author

Johannes Rainer

Examples


x <- data.frame(mz = c(23.4, 45.6, 56.9, 76.5, 76.5, 76.5, 80.1),
    rt = c(12, 34, 59, 34, 67, 65, 67))

set.seed(123)
y <- rbind(x, x)
y$mz <- y$mz + rnorm(nrow(y), sd = 0.0002)
y$rt[1:nrow(x)] <- x$rt + 2
y <- y[order(y$mz), ]

matchRtMz(x, y)
#> [1]  2  4  5  7  8  9 13

## Keeping all matches
matchRtMz(x, y, duplicates = "keep")
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 3 4
#> 
#> [[3]]
#> [1] 5 6
#> 
#> [[4]]
#> [1]  7 11
#> 
#> [[5]]
#> [1]  8  9 10 12
#> 
#> [[6]]
#> [1]  9 10 12
#> 
#> [[7]]
#> [1] 13 14
#> 

## Lower ppm
matchRtMz(x, y, duplicates = "keep", ppm = 5)
#> [[1]]
#> [1] 2
#> 
#> [[2]]
#> [1] 3 4
#> 
#> [[3]]
#> [1] 5
#> 
#> [[4]]
#> [1]  7 11
#> 
#> [[5]]
#> [1]  8  9 10 12
#> 
#> [[6]]
#> [1]  9 10 12
#> 
#> [[7]]
#> [1] 13 14
#> 

## even lower
matchRtMz(x, y, duplicates = "keep", ppm = 2)
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] 4
#> 
#> [[3]]
#> [1] 5
#> 
#> [[4]]
#> [1] 7
#> 
#> [[5]]
#> [1]  8  9 10
#> 
#> [[6]]
#> [1]  9 10
#> 
#> [[7]]
#> [1] 13 14
#> 

matchRtMz(x, y, ppm = 0)
#> [1] NA NA NA NA NA NA NA