Available Optimizers in the optimizers module#
This submodule contains Optimizer classes to learn an MHN from mutation data.
- class mhn.optimizers.Optimizer(mhn_type: MHNType = MHNType.oMHN)#
A dynamic wrapper for optimizer classes (e.g., oMHNOptimizer, cMHNOptimizer) that provides access to all methods and attributes of the wrapped optimizer instance.
- Parameters:
mhn_type (MHNType, optional) – Type of MHN trained by this optimizer class. Defaults to the most recent type.
- class mhn.optimizers.cMHNOptimizer#
Optimizes an cMHN for given cross-sectional data.
- get_data_properties() dict#
Retrieves properties of the loaded training data.
- Returns:
A dictionary with information about the training data, including sample and event statistics.
- Return type:
dict
- lambda_from_cv(lambda_min: float | None = None, lambda_max: float | None = None, steps: int = 9, nfolds: int = 5, lambda_vector: ndarray | None = None, show_progressbar: bool = False, return_lambda_scores: bool = False, pick_1se: bool = True) float | tuple[float, DataFrame]#
Finds the best value for lambda according to either the maximal average test set likelihood or the “one-standard-error-rule” through n-fold cross-validation.
You can specify the lambda values that should be tested in cross-validation by setting the lambda_vector parameter accordingly.
Alternatively, you can specify the minimum, maximum and step size for potential lambda values. This method will then create a range of possible lambdas with logarithmic grid-spacing, e.g. (0.0001, 0.0010, 0.0100, 0.1000) for lambda_min=0.0001, lambda_max=0.1 and steps=4.
If you set neither lambda_vector nor lambda_min and lambda_max, the default range (0.1/#datasamples, 10/#datasamples) will be used.
By default, the largest lambda that performed within one standard error of the best-performing lambda is returned as the preferred lambda (“one-standard-error-rule”). When setting pick_1se=False, the function will simply return the best-performing lambda instead.
Use np.random.seed() to make results reproducible.
- Parameters:
lambda_min (float, optional) – Minimum lambda value to test. Will be ignored if lambda_vector is set.
lambda_max (float, optional) – Maximum lambda value to test. Will be ignored if lambda_vector is set.
steps (int, optional) – Number of steps between lambda_min and lambda_max. Defaults to 9. Will be ignored if lambda_vector is set.
nfolds (int, optional) – Number of folds for cross-validation. Defaults to 5.
lambda_vector (np.ndarray, optional) – Specific lambda values to test.
show_progressbar (bool, optional) – Whether to show a progress bar during cross-validation. Defaults to False.
return_lambda_scores (bool, optional) – Whether to return lambda scores along with the best lambda. Defaults to False.
pick_1se (bool, optional) – If True (default), applies the one-standard-error-rule to pick the returned lambda value. If False, returns the best-performing lambda.
- Returns:
- Best lambda value, or, if return_lambda_scores is set to True, a tuple with the best lambda and
a DataFrame containing the mean scores for each lambda.
- Return type:
float | tuple[float, pd.DataFrame]
- load_data_from_csv(src: str, delimiter: str = ',', **kwargs)#
Load mutation data from a CSV file. The rows have to represent samples and the columns represent genes. Mutations of genes are represented by 1s, intact genes are represented by 0s.
- Parameters:
src (str) – Path to the CSV file.
delimiter (str, optional) – Delimiter used in the CSV file (default is ‘,’).
kwargs – Additional keyword arguments passed to pandas’ read_csv() function.
- Returns:
This optimizer object.
- Return type:
- load_data_matrix(data_matrix: ndarray | DataFrame)#
Loads mutation data stored in a numpy array or pandas DataFrame.
- Parameters:
data_matrix (np.ndarray | pd.DataFrame) – Data matrix where rows represent samples and columns represent genes. Mutations of genes are represented by 1s, intact genes by 0s.
- Returns:
This optimizer object.
- Return type:
- property penalty#
- property result: cMHN#
Property for retrieving the training result.
- Returns:
The resulting MHN model after training.
- Return type:
- save_progress(steps: int = -1, always_new_file: bool = False, filename: str = 'theta_backup.npy') _Optimizer#
Configures periodic saving of training progress.
- Parameters:
steps (int) – Number of training iterations between progress saves. Defaults to -1 (disabled).
always_new_file (bool) – Whether to save each backup as a new file. Defaults to False.
filename (str) – Name of the backup file. Defaults to ‘theta_backup.npy’.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_callback_func(callback=None) _Optimizer#
Sets a callback function to be invoked after each iteration of the BFGS algorithm.
- Parameters:
callback (Callable) – A function that takes a single argument (theta matrix computed in the last iteration). Defaults to None.
- Raises:
ValueError – If the provided callback is not callable.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_device(device: Device) cMHNOptimizer#
Sets the computational device for training.
- You have three options:
Device.AUTO: (default) automatically select the device that is likely to match the data Device.CPU: use the CPU implementations to compute the scores and gradients Device.GPU: use the GPU/CUDA implementations to compute the scores and gradients
The Device enum is part of this optimizer class.
- Parameters:
device (Device) – The device to use (AUTO, CPU, or GPU).
- Returns:
The optimizer instance.
- Return type:
- Raises:
ValueError – If the given device is not an instance of Device.
- set_init_theta(init: ndarray | None) _Optimizer#
Sets the initial theta matrix for learning a new MHN.
- Parameters:
init (np.ndarray | None) – Initial theta matrix in logarithmic form. If None, uses an independence model where the baseline hazard Theta_ii of each event is set to its empirical odds and the hazard ratios (off-diagonal entries) are set to exactly 1.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_penalty(penalty: Penalty | tuple[Callable[[ndarray], float], Callable[[ndarray], ndarray]]) _Optimizer#
Sets the penalty type for training.
- You have four options:
- Penalty.L1: (default) uses the L1 penalty as
regularization
- Penalty.L2: uses the L2 penalty as
regularization
- Penalty.SYM_SPARSE: uses a penalty which induces
sparsity and soft symmetry
- (penalty, penalty_deriv): a custom penalty function and
its derivative. They must both take the log_theta matrix in the form of a numpy array as input and return a float and a numpy array, respectively.
The Penalty enum is part of this optimizer class.
- Parameters:
penalty (Penalty | tuple[Callable[[np.ndarray], float], Callable[[np.ndarray], np.ndarray]]) – The penalty to use (L1, L2, SYM_SPARSE).
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- Raises:
ValueError – If the given penalty is not an instance of
Penalty or a tuple of two functions. –
- train(lam: float | None = None, maxit: int = 5000, trace: bool = False, reltol: float = 1e-07, round_result: bool = True) cMHN#
Trains a new MHN model using the loaded data.
- Parameters:
lam (float, optional) – Regularization parameter. Defaults to 1/(number of samples).
maxit (int) – Maximum number of training iterations. Defaults to 5000.
trace (bool) – Whether to print convergence messages. Defaults to False.
reltol (float) – Gradient norm tolerance for termination. Defaults to 1e-7. (see “gtol” in scipy.optimize.minimize)
round_result (bool) – Whether to round the result to two decimal places. Defaults to True.
- Returns:
The trained MHN model.
- Return type:
- Raises:
ValueError – If no data has been loaded.
- property training_data: ndarray#
Returns the data used to train the cMHN model.
- Returns:
The data matrix used for training.
- Return type:
np.ndarray
- class mhn.optimizers.oMHNOptimizer#
Optimizer for the oMHN model.
- get_data_properties() dict#
Retrieves properties of the loaded training data.
- Returns:
A dictionary with information about the training data, including sample and event statistics.
- Return type:
dict
- lambda_from_cv(lambda_min: float | None = None, lambda_max: float | None = None, steps: int = 9, nfolds: int = 5, lambda_vector: ndarray | None = None, show_progressbar: bool = False, return_lambda_scores: bool = False, pick_1se: bool = True) float | tuple[float, DataFrame]#
Finds the best value for lambda according to either the maximal average test set likelihood or the “one-standard-error-rule” through n-fold cross-validation.
You can specify the lambda values that should be tested in cross-validation by setting the lambda_vector parameter accordingly.
Alternatively, you can specify the minimum, maximum and step size for potential lambda values. This method will then create a range of possible lambdas with logarithmic grid-spacing, e.g. (0.0001, 0.0010, 0.0100, 0.1000) for lambda_min=0.0001, lambda_max=0.1 and steps=4.
If you set neither lambda_vector nor lambda_min and lambda_max, the default range (0.1/#datasamples, 10/#datasamples) will be used.
By default, the largest lambda that performed within one standard error of the best-performing lambda is returned as the preferred lambda (“one-standard-error-rule”). When setting pick_1se=False, the function will simply return the best-performing lambda instead.
Use np.random.seed() to make results reproducible.
- Parameters:
lambda_min (float, optional) – Minimum lambda value to test. Will be ignored if lambda_vector is set.
lambda_max (float, optional) – Maximum lambda value to test. Will be ignored if lambda_vector is set.
steps (int, optional) – Number of steps between lambda_min and lambda_max. Defaults to 9. Will be ignored if lambda_vector is set.
nfolds (int, optional) – Number of folds for cross-validation. Defaults to 5.
lambda_vector (np.ndarray, optional) – Specific lambda values to test.
show_progressbar (bool, optional) – Whether to show a progress bar during cross-validation. Defaults to False.
return_lambda_scores (bool, optional) – Whether to return lambda scores along with the best lambda. Defaults to False.
pick_1se (bool, optional) – If True (default), applies the one-standard-error-rule to pick the returned lambda value. If False, returns the best-performing lambda.
- Returns:
- Best lambda value, or, if return_lambda_scores is set to True, a tuple with the best lambda and
a DataFrame containing the mean scores for each lambda.
- Return type:
float | tuple[float, pd.DataFrame]
- load_data_from_csv(src: str, delimiter: str = ',', **kwargs)#
Load mutation data from a CSV file. The rows have to represent samples and the columns represent genes. Mutations of genes are represented by 1s, intact genes are represented by 0s.
- Parameters:
src (str) – Path to the CSV file.
delimiter (str, optional) – Delimiter used in the CSV file (default is ‘,’).
kwargs – Additional keyword arguments passed to pandas’ read_csv() function.
- Returns:
This optimizer object.
- Return type:
- load_data_matrix(data_matrix: ndarray | DataFrame)#
Loads mutation data stored in a numpy array or pandas DataFrame.
- Parameters:
data_matrix (np.ndarray | pd.DataFrame) – Data matrix where rows represent samples and columns represent genes. Mutations of genes are represented by 1s, intact genes by 0s.
- Returns:
This optimizer object.
- Return type:
- property penalty#
- property result: oMHN#
Property for retrieving the training result.
- Returns:
The resulting MHN model after training.
- Return type:
- save_progress(steps: int = -1, always_new_file: bool = False, filename: str = 'theta_backup.npy') _Optimizer#
Configures periodic saving of training progress.
- Parameters:
steps (int) – Number of training iterations between progress saves. Defaults to -1 (disabled).
always_new_file (bool) – Whether to save each backup as a new file. Defaults to False.
filename (str) – Name of the backup file. Defaults to ‘theta_backup.npy’.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_callback_func(callback=None) _Optimizer#
Sets a callback function to be invoked after each iteration of the BFGS algorithm.
- Parameters:
callback (Callable) – A function that takes a single argument (theta matrix computed in the last iteration). Defaults to None.
- Raises:
ValueError – If the provided callback is not callable.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_device(device: Device)#
Sets the computational device for training.
- You have three options:
Device.AUTO: (default) automatically select the device that is likely to match the data Device.CPU: use the CPU implementations to compute the scores and gradients Device.GPU: use the GPU/CUDA implementations to compute the scores and gradients
The Device enum is part of this optimizer class.
- Parameters:
device (Device) – The device to use (AUTO, CPU, or GPU).
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- Raises:
ValueError – If the given device is not an instance of Device.
- set_init_theta(init: ndarray | None) _Optimizer#
Sets the initial theta matrix for learning a new MHN.
- Parameters:
init (np.ndarray | None) – Initial theta matrix in logarithmic form. If None, uses an independence model where the baseline hazard Theta_ii of each event is set to its empirical odds and the hazard ratios (off-diagonal entries) are set to exactly 1.
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- set_penalty(penalty: Penalty | tuple[Callable[[ndarray], float], Callable[[ndarray], ndarray]])#
Sets the penalty type for training.
- You have four options:
- Penalty.L1: (default) uses the L1 penalty as
regularization
- Penalty.L2: uses the L2 penalty as
regularization
- Penalty.SYM_SPARSE: uses a penalty which induces
sparsity and soft symmetry
- (penalty, penalty_deriv): a custom penalty function and
its derivative. They must both take the log_theta matrix in the form of a numpy array as input and return a float and a numpy array, respectively.
The Penalty enum is part of this optimizer class.
- Parameters:
penalty (Penalty | tuple[Callable[[np.ndarray], float], Callable[[np.ndarray], np.ndarray]]) – The penalty to use (L1, L2, SYM_SPARSE).
- Returns:
The optimizer instance.
- Return type:
_Optimizer
- Raises:
ValueError – If the given penalty is not an instance of
Penalty or a tuple of two functions. –
- train(lam: float | None = None, maxit: int = 5000, trace: bool = False, reltol: float = 1e-07, round_result: bool = True) oMHN#
Trains a new oMHN model using the loaded data.
- Parameters:
lam (float, optional) – Regularization parameter. Defaults to 1/(number of samples).
maxit (int) – Maximum number of training iterations. Defaults to 5000.
trace (bool) – Whether to print convergence messages. Defaults to False.
reltol (float) – Gradient norm tolerance for termination. Defaults to 1e-7. (see “gtol” in scipy.optimize.minimize)
round_result (bool) – Whether to round the result to two decimal places. Defaults to True.
- Returns:
The trained MHN model.
- Return type:
- Raises:
ValueError – If no data has been loaded.
- property training_data: ndarray#
Returns the data used to train the cMHN model.
- Returns:
The data matrix used for training.
- Return type:
np.ndarray
- class mhn.optimizers.Device(value)#
Enum of device types.
- AUTO#
Automatically selects the device based on the number of active events in each sample.
- Type:
int
- CPU#
Executes all computations on the CPU.
- Type:
int
- GPU#
Executes score and gradient computations on the GPU.
- Type:
int