Find best fitting solutions for each sample — FindRascalSolutions • utanos

FindRascalSolutions() finds the best ploidy and cellularity pairs from the relative copy number profiles of each sample. Makes use of the find_best_fit_solutions() function from rascal.

Usage

FindRascalSolutions(
  cnobj,
  min_ploidy = 1.5,
  max_ploidy = 5.5,
  ploidy_step = 0.01,
  min_cellularity = 0.2,
  max_cellularity = 1,
  cellularity_step = 0.01,
  distance_function = c("MAD", "RMSD"),
  distance_filter_scale_factor = 1.25,
  max_proportion_zero = 0.05,
  min_proportion_close_to_whole_number = 0.5,
  max_distance_from_whole_number = 0.15,
  solution_proximity_threshold = 5,
  keep_all = FALSE
)

Arguments

cnobj: An S4 object of type QDNAseqCopyNumbers.
min_ploidy, max_ploidy: the range of ploidies.
ploidy_step: the stepwise increment of ploidies along the grid.
min_cellularity, max_cellularity: the range of cellularities.
cellularity_step: the stepwise increment of cellularities along the grid.
distance_function: the distance function to use, either "MAD" for the mean absolute difference or "RMSD" for the root mean square difference, where differences are between the fitted absolute copy number values and the nearest whole number.
distance_filter_scale_factor: the distance threshold above which solutions will be discarded as a multiple of the solution with the smallest distance.
max_proportion_zero: the maximum proportion of fitted absolute copy number values in the zero copy number state.
min_proportion_close_to_whole_number: the minimum proportion of fitted absolute copy number values sufficiently close to a whole number.
max_distance_from_whole_number: the maximum distance from a whole number that a fitted absolute copy number can be to be considered sufficiently close.
solution_proximity_threshold: how close two solutions can be before one will be filtered; reduces the number of best fit solutions where there are many minima in close proximity.
keep_all: set to TRUE to return all solutions but with additional best_fit column to indicate which are the local minima that are acceptable solutions (may be useful to avoid computing the distance grid twice)

Value

a dataframe containing the calculated solutions. Each solution includes: sample_id, ploidy, celluarity, and distance.