Skip to contents

At each simulation:

  • Takes a random number of samples from all_ratio_differences (the batch)

  • Passes this batch to GetCriticalPoints to estimate (via KDE) the batch's density minima/maxima

  • Adds the minima as a possible threshold if it meets certain criteria. Most specifically, a local minima from a batch is considered a valid possible threshold if the difference between the distance between first maxima and the minima, and the second maxima and the minima, are 'close' to each other. If first round, this closeness is < 0.10 distance-wise. If second_round, this closeness is < 0.05 distance-wise. So, for the second round, we really care about the minima being very close to the two neighboring maxima. The point of these tolerances is that we do not get a 'low' and a 'flat' local minima, meaning that it is a ratio_median difference that's not very common. This would affect the rest of the algorithm as not many segments would meet this threshold. Once simulations are done and we have a list of possible thresholds, we return the threshold (mean of list)

Usage

RunThresholdSimulations(
  num_simulations = 1e+05,
  all_ratio_differences,
  second_round = FALSE,
  seed
)

Arguments

num_simulations

An integer: the number of simulations to run on all_ratio_differences. Set to 100,000 by default

all_ratio_differences

An array with all the differences of ratio medians between all segment-pairs in the data set

second_round

A boolean. False if first round of simulation, True if second round. This should be set to True once the simulations have been ran before and we already have a possible THR that we want to re-calibrate with new differences of ratio medians (i.e. with updated segments, such as post-merging). Set to False by default.

seed

A seed to use for random number generation. Ensures reproducibility between runs.