Available Optimizers in the optimizers module#

This submodule contains Optimizer classes to learn an MHN from mutation data.

class mhn.optimizers.Optimizer(mhn_type: MHNType = MHNType.oMHN)#

A dynamic wrapper for optimizer classes (e.g., oMHNOptimizer, cMHNOptimizer) that provides access to all methods and attributes of the wrapped optimizer instance.

Parameters:: mhn_type (MHNType, optional) – Type of MHN trained by this optimizer class. Defaults to the most recent type.

class mhn.optimizers.cMHNOptimizer#

Optimizes an cMHN for given cross-sectional data.

get_data_properties() → dict#

Retrieves properties of the loaded training data.

Returns:: A dictionary with information about the training data, including sample and event statistics.
Return type:: dict

lambda_from_cv(lambda_min: float | None = None, lambda_max: float | None = None, steps: int = 9, nfolds: int = 5, lambda_vector: ndarray | None = None, show_progressbar: bool = False, return_lambda_scores: bool = False, pick_1se: bool = True) → float | tuple[float, DataFrame]#

Finds the best value for lambda according to either the maximal average test set likelihood or the “one-standard-error-rule” through n-fold cross-validation.

You can specify the lambda values that should be tested in cross-validation by setting the lambda_vector parameter accordingly.

Alternatively, you can specify the minimum, maximum and step size for potential lambda values. This method will then create a range of possible lambdas with logarithmic grid-spacing, e.g. (0.0001, 0.0010, 0.0100, 0.1000) for lambda_min=0.0001, lambda_max=0.1 and steps=4.

If you set neither lambda_vector nor lambda_min and lambda_max, the default range (0.1/#datasamples, 10/#datasamples) will be used.

By default, the largest lambda that performed within one standard error of the best-performing lambda is returned as the preferred lambda (“one-standard-error-rule”). When setting pick_1se=False, the function will simply return the best-performing lambda instead.

Use np.random.seed() to make results reproducible.

Parameters:

lambda_min (float, optional) – Minimum lambda value to test. Will be ignored if lambda_vector is set.
lambda_max (float, optional) – Maximum lambda value to test. Will be ignored if lambda_vector is set.
steps (int, optional) – Number of steps between lambda_min and lambda_max. Defaults to 9. Will be ignored if lambda_vector is set.
nfolds (int, optional) – Number of folds for cross-validation. Defaults to 5.
lambda_vector (np.ndarray, optional) – Specific lambda values to test.
show_progressbar (bool, optional) – Whether to show a progress bar during cross-validation. Defaults to False.
return_lambda_scores (bool, optional) – Whether to return lambda scores along with the best lambda. Defaults to False.
pick_1se (bool, optional) – If True (default), applies the one-standard-error-rule to pick the returned lambda value. If False, returns the best-performing lambda.

Returns:

Best lambda value, or, if return_lambda_scores is set to True, a tuple with the best lambda and: a DataFrame containing the mean scores for each lambda.

Return type:

float | tuple[float, pd.DataFrame]

load_data_from_csv(src: str, delimiter: str = ',', **kwargs)#

Load mutation data from a CSV file. The rows have to represent samples and the columns represent genes. Mutations of genes are represented by 1s, intact genes are represented by 0s.

Parameters:

src (str) – Path to the CSV file.
delimiter (str, optional) – Delimiter used in the CSV file (default is ‘,’).
kwargs – Additional keyword arguments passed to pandas’ read_csv() function.

Returns:

This optimizer object.

Return type:

cMHNOptimizer

load_data_matrix(data_matrix: ndarray | DataFrame)#

Loads mutation data stored in a numpy array or pandas DataFrame.

Parameters:: data_matrix (np.ndarray | pd.DataFrame) – Data matrix where rows represent samples and columns represent genes. Mutations of genes are represented by 1s, intact genes by 0s.
Returns:: This optimizer object.
Return type:: cMHNOptimizer

property penalty#

property result: cMHN#

Property for retrieving the training result.

Returns:: The resulting MHN model after training.
Return type:: model.cMHN

save_progress(steps: int = -1, always_new_file: bool = False, filename: str = 'theta_backup.npy') → _Optimizer#

Configures periodic saving of training progress.

Parameters:

steps (int) – Number of training iterations between progress saves. Defaults to -1 (disabled).
always_new_file (bool) – Whether to save each backup as a new file. Defaults to False.
filename (str) – Name of the backup file. Defaults to ‘theta_backup.npy’.

Returns:

The optimizer instance.

Return type:

_Optimizer

set_callback_func(callback=None) → _Optimizer#

Sets a callback function to be invoked after each iteration of the BFGS algorithm.

Parameters:: callback (Callable) – A function that takes a single argument (theta matrix computed in the last iteration). Defaults to None.
Raises:: ValueError – If the provided callback is not callable.
Returns:: The optimizer instance.
Return type:: _Optimizer

set_device(device: Device) → cMHNOptimizer#

Sets the computational device for training.

You have three options:: Device.AUTO: (default) automatically select the device that is likely to match the data Device.CPU: use the CPU implementations to compute the scores and gradients Device.GPU: use the GPU/CUDA implementations to compute the scores and gradients

The Device enum is part of this optimizer class.

Parameters:: device (Device) – The device to use (AUTO, CPU, or GPU).
Returns:: The optimizer instance.
Return type:: cMHNOptimizer
Raises:: ValueError – If the given device is not an instance of Device.

set_init_theta(init: ndarray | None) → _Optimizer#

Sets the initial theta matrix for learning a new MHN.

Parameters:: init (np.ndarray | None) – Initial theta matrix in logarithmic form. If None, uses an independence model where the baseline hazard Theta_ii of each event is set to its empirical odds and the hazard ratios (off-diagonal entries) are set to exactly 1.
Returns:: The optimizer instance.
Return type:: _Optimizer

set_penalty(penalty: Penalty | tuple[Callable[[ndarray], float], Callable[[ndarray], ndarray]]) → _Optimizer#

Sets the penalty type for training.

You have four options:

Penalty.L1: (default) uses the L1 penalty as: regularization
Penalty.L2: uses the L2 penalty as: regularization
Penalty.SYM_SPARSE: uses a penalty which induces: sparsity and soft symmetry
(penalty, penalty_deriv): a custom penalty function and: its derivative. They must both take the log_theta matrix in the form of a numpy array as input and return a float and a numpy array, respectively.

The Penalty enum is part of this optimizer class.

Parameters:

penalty (Penalty | tuple[Callable[[np.ndarray], float], Callable[[np.ndarray], np.ndarray]]) – The penalty to use (L1, L2, SYM_SPARSE).

Returns:

The optimizer instance.

Return type:

_Optimizer

Raises:

ValueError – If the given penalty is not an instance of
Penalty or a tuple of two functions. –

train(lam: float | None = None, maxit: int = 5000, trace: bool = False, reltol: float = 1e-07, round_result: bool = True) → cMHN#

Trains a new MHN model using the loaded data.

Parameters:

lam (float, optional) – Regularization parameter. Defaults to 1/(number of samples).
maxit (int) – Maximum number of training iterations. Defaults to 5000.
trace (bool) – Whether to print convergence messages. Defaults to False.
reltol (float) – Gradient norm tolerance for termination. Defaults to 1e-7. (see “gtol” in scipy.optimize.minimize)
round_result (bool) – Whether to round the result to two decimal places. Defaults to True.

Returns:

The trained MHN model.

Return type:

model.cMHN

Raises:

ValueError – If no data has been loaded.

property training_data: ndarray#

Returns the data used to train the cMHN model.

Returns:: The data matrix used for training.
Return type:: np.ndarray

class mhn.optimizers.oMHNOptimizer#

Optimizer for the oMHN model.

get_data_properties() → dict#

Retrieves properties of the loaded training data.

Returns:: A dictionary with information about the training data, including sample and event statistics.
Return type:: dict

Finds the best value for lambda according to either the maximal average test set likelihood or the “one-standard-error-rule” through n-fold cross-validation.

You can specify the lambda values that should be tested in cross-validation by setting the lambda_vector parameter accordingly.

If you set neither lambda_vector nor lambda_min and lambda_max, the default range (0.1/#datasamples, 10/#datasamples) will be used.

Use np.random.seed() to make results reproducible.

Parameters:

lambda_min (float, optional) – Minimum lambda value to test. Will be ignored if lambda_vector is set.
lambda_max (float, optional) – Maximum lambda value to test. Will be ignored if lambda_vector is set.
steps (int, optional) – Number of steps between lambda_min and lambda_max. Defaults to 9. Will be ignored if lambda_vector is set.
nfolds (int, optional) – Number of folds for cross-validation. Defaults to 5.
lambda_vector (np.ndarray, optional) – Specific lambda values to test.
show_progressbar (bool, optional) – Whether to show a progress bar during cross-validation. Defaults to False.
return_lambda_scores (bool, optional) – Whether to return lambda scores along with the best lambda. Defaults to False.
pick_1se (bool, optional) – If True (default), applies the one-standard-error-rule to pick the returned lambda value. If False, returns the best-performing lambda.

Returns:

Best lambda value, or, if return_lambda_scores is set to True, a tuple with the best lambda and: a DataFrame containing the mean scores for each lambda.

Return type:

float | tuple[float, pd.DataFrame]

load_data_from_csv(src: str, delimiter: str = ',', **kwargs)#

Load mutation data from a CSV file. The rows have to represent samples and the columns represent genes. Mutations of genes are represented by 1s, intact genes are represented by 0s.

Parameters:

src (str) – Path to the CSV file.
delimiter (str, optional) – Delimiter used in the CSV file (default is ‘,’).
kwargs – Additional keyword arguments passed to pandas’ read_csv() function.

Returns:

This optimizer object.

Return type:

cMHNOptimizer

load_data_matrix(data_matrix: ndarray | DataFrame)#

Loads mutation data stored in a numpy array or pandas DataFrame.

Parameters:: data_matrix (np.ndarray | pd.DataFrame) – Data matrix where rows represent samples and columns represent genes. Mutations of genes are represented by 1s, intact genes by 0s.
Returns:: This optimizer object.
Return type:: cMHNOptimizer

property penalty#

property result: oMHN#

Property for retrieving the training result.

Returns:: The resulting MHN model after training.
Return type:: model.oMHN

save_progress(steps: int = -1, always_new_file: bool = False, filename: str = 'theta_backup.npy') → _Optimizer#

Configures periodic saving of training progress.

Parameters:

steps (int) – Number of training iterations between progress saves. Defaults to -1 (disabled).
always_new_file (bool) – Whether to save each backup as a new file. Defaults to False.
filename (str) – Name of the backup file. Defaults to ‘theta_backup.npy’.

Returns:

The optimizer instance.

Return type:

_Optimizer

set_callback_func(callback=None) → _Optimizer#

Sets a callback function to be invoked after each iteration of the BFGS algorithm.

Parameters:: callback (Callable) – A function that takes a single argument (theta matrix computed in the last iteration). Defaults to None.
Raises:: ValueError – If the provided callback is not callable.
Returns:: The optimizer instance.
Return type:: _Optimizer

set_device(device: Device)#

Sets the computational device for training.

You have three options:: Device.AUTO: (default) automatically select the device that is likely to match the data Device.CPU: use the CPU implementations to compute the scores and gradients Device.GPU: use the GPU/CUDA implementations to compute the scores and gradients

The Device enum is part of this optimizer class.

Parameters:: device (Device) – The device to use (AUTO, CPU, or GPU).
Returns:: The optimizer instance.
Return type:: _Optimizer
Raises:: ValueError – If the given device is not an instance of Device.

set_init_theta(init: ndarray | None) → _Optimizer#

Sets the initial theta matrix for learning a new MHN.

Parameters:: init (np.ndarray | None) – Initial theta matrix in logarithmic form. If None, uses an independence model where the baseline hazard Theta_ii of each event is set to its empirical odds and the hazard ratios (off-diagonal entries) are set to exactly 1.
Returns:: The optimizer instance.
Return type:: _Optimizer

set_penalty(penalty: Penalty | tuple[Callable[[ndarray], float], Callable[[ndarray], ndarray]])#

Sets the penalty type for training.

You have four options:

Penalty.L1: (default) uses the L1 penalty as: regularization
Penalty.L2: uses the L2 penalty as: regularization
Penalty.SYM_SPARSE: uses a penalty which induces: sparsity and soft symmetry
(penalty, penalty_deriv): a custom penalty function and: its derivative. They must both take the log_theta matrix in the form of a numpy array as input and return a float and a numpy array, respectively.

The Penalty enum is part of this optimizer class.

Parameters:

penalty (Penalty | tuple[Callable[[np.ndarray], float], Callable[[np.ndarray], np.ndarray]]) – The penalty to use (L1, L2, SYM_SPARSE).

Returns:

The optimizer instance.

Return type:

_Optimizer

Raises:

ValueError – If the given penalty is not an instance of
Penalty or a tuple of two functions. –

train(lam: float | None = None, maxit: int = 5000, trace: bool = False, reltol: float = 1e-07, round_result: bool = True) → oMHN#

Trains a new oMHN model using the loaded data.

Parameters:

lam (float, optional) – Regularization parameter. Defaults to 1/(number of samples).
maxit (int) – Maximum number of training iterations. Defaults to 5000.
trace (bool) – Whether to print convergence messages. Defaults to False.
reltol (float) – Gradient norm tolerance for termination. Defaults to 1e-7. (see “gtol” in scipy.optimize.minimize)
round_result (bool) – Whether to round the result to two decimal places. Defaults to True.

Returns:

The trained MHN model.

Return type:

model.oMHN

Raises:

ValueError – If no data has been loaded.

property training_data: ndarray#

Returns the data used to train the cMHN model.

Returns:: The data matrix used for training.
Return type:: np.ndarray

class mhn.optimizers.Device(value)#

Enum of device types.

AUTO#

Automatically selects the device based on the number of active events in each sample.

Type:: int

CPU#

Executes all computations on the CPU.

Type:: int

GPU#

Executes score and gradient computations on the GPU.

Type:: int

class mhn.optimizers.Penalty(value)#

Enumeration of penalty functions.

L1#

Applies L1 regularization during training.

Type:: int

L2#

Applies L2 regularization during training.

Type:: int

SYM_SPARSE#

Induces sparsity and soft symmetry (see Schill et al., 2024).

Type:: int

class mhn.optimizers.MHNType(value)#

Enum representing the types of MHN models that can be trained.

cMHN#: Classical MHN as proposed by Schill et al. (2019).

oMHN#: MHN with observation bias correction as proposed by Schill et al. (2024).

Available Optimizers in the optimizers module#

This Page