Skip to contents

The procedure is mostly similar for both rounds, except for a few changes. Cleaning/transformation leading up to the first round is vastly different than second round. In second round we pretty much just use the data frame we've worked up to Graph 5, whereas first round has to take different steps.

Estimating the threshold goes like:

  1. Prep the segment data, depending on second_round. First round works with only the large segments. Second round works with all (large + small) segments initially, then only with large in step 4.

  2. Get the differences between ratio_medians for all pairs of segments.

  3. Run the simulations: more description found in RunThresholdSimulations docs, but essentially samples a batch of the differences, and finds the local minima/maxima of this batch, via Kernel Density Estimation (KDE). Repeat num_simulations times, and in the end we get a list of a bunch of local minima. The threshold is the average of these values.

  4. If second_round == TRUE, then few extra steps (estimates threshold but only with large segments).

  5. For QC purposes, the final threshold is = min(max(0.025, thr), 0.045).

Usage

FindThreshold(
  granges_obj,
  segments,
  num_simulations = 1e+05,
  second_round,
  seed
)

Arguments

granges_obj

A GRanges object: genomic data to obtain for reference.

segments

A data frame: segment data. For the first round, these segments have been gathered by ratio_median. For the second round, the segments have already been re-inserted with small ones, and LGAs have already been called.

num_simulations

An integer: the number of simulations to run to estimate the critical points.

second_round

A boolean: which route to take for the algorithm.

seed

A seed to use for random number generation. Ensures reproducibility between runs.