Fit Mixture Models for each CN-Feature — FitMixtureModels • utanos

Perform mixture modelling on CN-features using either a mixture of gaussians or poissons.

Usage

FitMixtureModels(
  CN_features,
  seed = 77777,
  min_comp = 2,
  max_comp = 10,
  min_prior = 0.001,
  model_selection = "BIC",
  nrep = 1,
  niter = 1000,
  cores = 1,
  featsToFit = seq(1, 6),
  multi_seed = FALSE,
  num_seed = 100
)

Arguments

CN_features: A list. The output from either the ExtractRelativeCopyNumberFeatures or ExtractCopyNumberFeatures functions.
seed: Integer. (flexmix param) The random seed to use while modelling.
min_comp: Integer. (flexmix param) The minimum number of components for each CN-feature to consider.
max_comp: Integer. (flexmix param) The maximum number of components for each CN-feature to consider.
min_prior: Numeric. (flexmix param) Minimum prior probability of clusters, components falling below this threshold are removed during the iteration.
model_selection: Integer or character. (flexmix param) Which model to get. Choose by number or name of the information criterion.
nrep: Integer. (flexmix param) The number of times flexmix is run for each k (number of components).
niter: Integer. (flexmix param) The maximum number of iterations for the EM-algorithm.
cores: Integer. The number of cores to use for parallel processing.
featsToFit: Integer vector. The CN-features to fit.
multi_seed: Logical. If TRUE, the function is run multiple times over different seeds to find the best mixtures. It is highly recommended to use multiple cores.
num_seed: Integer. The number of seeds to use when multi_seed = TRUE.

Value

A list containing flexmix objects for each CN-feature. If multi_seed = TRUE, the function will return a nested list with two components: one containing the flexmix objects and the other containing BIC values for each CN-feature. The BIC values are organized such that the rows represent different seeds and the columns represent the number of components. This structure allows for easy identification of the optimal seed and component configuration.

Details

The segment size, changepoint copy number, and segment copy-number value CN-features are modelled with a mixture of Gaussians. For the breakpoint count per 10MB, length of segments with oscillating copy-number, and breakpoint count per chromosome a mixture of Poissons is used instead. Mixture modelling is done using the FlexMix package.