model: A submodule containing the MHN classes#
This submodule contains classes to represent Mutual Hazard Networks.
- mhn.model.bits_fixed_n(n: int, k: int) Iterator[int]#
Generator over integers whose binary representation has a fixed number of 1s, in lexicographical order.
From https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
- Parameters:
n (int) – How many 1s there should be.
k (int) – How many bits the integer should have.
- Yields:
Iterator[int] – Integers with the specified binary properties.
- class mhn.model.cMHN(log_theta: array, events: list[str] | None = None, meta: dict | None = None)#
Represents a classical Mutual Hazard Network (cMHN) (see Schill et al. (2019)).
- log_theta#
logarithmic values of the theta matrix representing the cMHN
- Type:
np.ndarray
- events#
Names of the events considered by the cMHN.
- Type:
list[str] | None
- meta#
Metadata for the cMHN, e.g., parameters used to train the model.
- Type:
dict | None
- compute_marginal_likelihood(state: ndarray) float#
Computes the likelihood of observing a given state. We consider the observation time to be an exponential random variable with mean 1.
- Parameters:
state (np.ndarray) – Binary array (dtype=np.int32) representing the presence (1) or absence (0) of events.
- Returns:
Likelihood of the given state.
- Return type:
float
- Raises:
ValueError – If the given state array contains anything but 0s and 1s.
- compute_next_event_probs(state: ndarray, as_dataframe: bool = False, allow_observation: bool = False) ndarray | DataFrame#
Computes probabilities for each event to be the next to occur.
- Parameters:
state (np.ndarray) – Binary array (dtype=np.int32) representing the presence (1) or absence (0) of events.
as_dataframe (bool, optional) – Whether to return the probabilities as a DataFrame. Defaults to False.
allow_observation (bool, optional) – Whether to include an observation event in the probabilities. Defaults to False.
- Returns:
Probabilities for the next event, in the specified format.
- Return type:
np.ndarray | pd.DataFrame
- Raises:
ValueError – If the number of events in state does not align with the number of events modeled by this cMHN object.
- get_restr_diag(state: array) array#
Get the diagonal of the state-space-restricted Q_Theta matrix.
- Parameters:
state (np.array) – State (binary, dtype int32) which should be considered for the state space restriction. Shape (n,) with n the number of total events.
- Returns:
Diagonal of the state-space-restricted Q_Theta matrix. Shape (2^k,) with k the number of 1s in state
- Return type:
np.array
- likeliest_order(state: array, normalize: bool = False) tuple[float, array]#
Returns the likeliest order in which a given state accumulated according to the MHN.
- Parameters:
state (np.array) – State (binary, dtype int32), shape (n,) with n the number of total events.
normalize (bool, optional) – Whether to normalize among all possible accumulation orders. Defaults to False.
- Returns:
Likelihood of the likeliest accumulation order and the order itself.
- Return type:
tuple[float, np.ndarray]
- classmethod load(filename: str, events: list[str] | None = None) cMHN#
Loads a cMHN object from a CSV file.
- Parameters:
filename (str) – Name of the CSV file.
events (list[str], optional) – List of event names considered by the cMHN. Defaults to None.
- Returns:
Loaded cMHN object.
- Return type:
- m_likeliest_orders(state: array, m: int, normalize: bool = False) tuple[array, array]#
Returns the m likeliest orders in which a given state accumulated according to the MHN.
- Parameters:
state (np.array) – State (binary, dtype int32), shape (n,) with n the number of total events.
m (int) – Number of likeliest orders to compute.
normalize (bool, optional) – Whether to normalize among all possible accumulation orders. Defaults to False.
- Returns:
Array of likelihoods of the likeliest accumulation order and array of the order itself.
- Return type:
tuple[np.ndarray, np.ndarray]
- order_likelihood(sigma: tuple[int]) float#
Computes the marginal likelihood of an order of events.
- Parameters:
sigma (tuple[int]) – Tuple of integers where the integers represent the events.
- Returns:
Marginal likelihood of observing sigma.
- Return type:
float
- plot(cmap_thetas: str | matplotlib.colors.Colormap = 'RdBu_r', cmap_brs: str | matplotlib.colors.Colormap = 'Greens', colorbar: bool = True, annot: float | bool = 0.1, ax: np.arraymatplotlib.axes.Axes | None = None, logarithmic: bool = True) tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage, matplotlib.colorbar.Colorbar, matplotlib.colorbar.Colorbar] | tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage]#
Plots the theta matrix.
- Parameters:
cmap_thetas (Union[str, matplotlib.colors.Colormap], optional) – Colormap to use for thetas. Defaults to “RdBu_r”.
cmap_brs (Union[str, matplotlib.colors.Colormap], optional) – Colormap to use for the base rates. Defaults to “Greens”.
colorbar (bool, optional) – Whether to display the colorbars. Defaults to True.
annot (Union[float, bool], optional) – If boolean, either all or no annotations are displayed. If numerical, displays annotations for all effects greater than this threshold in the logarithmic theta matrix. Defaults to 0.1.
ax (Optional[matplotlib.axes.Axes], optional) – Matplotlib axes to plot on. Defaults to None.
logarithmic (bool, optional) – If set to True, plots the logarithmic theta matrix, else plots the exponential theta matrix. Defaults to True.
- Returns:
If colorbar is True, returns the two heatmaps and the two colorbars. Else, returns only the two axes images.
- Return type:
tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage, matplotlib.colorbar.Colorbar, matplotlib.colorbar.Colorbar] | tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage]
- plot_order_tree(orderings: list[tuple[int]] | None = None, states: array | None = None, max_event_num: int = 4, min_line_width: int = 1, max_line_width: int = 10, ax: Axes | None = None, inner_circle_radius: float = 2.0, circle_radius_diff: float = 1.0, markers: tuple[str] = ('o', 's', 'D', '^', 'p', 'P', '>'), min_symbol_size: float = 30.0, min_number_of_occurrence: int = 3, legend: bool | dict | None = None) Axes#
Plots a tree representing the most probable chronological orders of events according to this MHN. Each path from the root of the tree (white circle) to a leaf illustrates a possible cancer progression within the given dataset. The symbols along each path denote events whose most probable chronological order was derived from this MHN model. Each ordering / state corresponds to a terminal node in the tree or an internal node with a black outline. The size of the edges and symbols along a path scale with the total number of patients with that cancer state.
- Parameters:
orderings (list[tuple[int]], optional) – A list where each element represents an ordering of events that should be added to the tree. The elements of each ordering should be the index of the event in this objects events list. If None is given, states is used instead.
states (np.ndarray, optional) – An array of states ((binary, dtype int32), shape (n,) with n the number of total events) to compute the orders from. If None, orderings must be provided.
max_event_num (int) – Maximum number of events of a single state that should be plotted in the tree. If a state has more that this number of active events, only the first active events up until this point are plotted.
min_line_width (int) – Minimum line width of the lines connecting the events in the tree.
max_line_width (int) – Maximum line width of the lines connecting the events in the tree.
ax (matplotlib.axes.Axes, optional) – Axis on which the tree is plotted. If None, a new axis is created.
inner_circle_radius (float) – Distance between the tree root and the first event in the tree.
circle_radius_diff (float) – Difference in radius between the circles on which consecutive events lie on.
markers (tuple[str]) – A list of markers to use for plotting the events. Defaults to (“o”, “s”, “D”, “^”, “p”, “P”, “>”).
min_symbol_size (float) – Minimum size of the markers representing events in the tree.
min_number_of_occurrence (int) – Minimum number of occurrence of a state / ordering to be plotted in the tree. Used to avoid clutter.
legend (Optional[Union[bool, dict]]) – If True, a legend is added to the plot. If a dictionary is provided, it is passed to the legend function.
- Returns:
The axis with the plotted tree.
- Return type:
matplotlib.axes.Axes
- plot_orders(*, orders: np.array | None = None, states: np.array | None = None, ax: np.arraymatplotlib.axes.Axes | None = None, cmap: str | matplotlib.colors.Colormap = 'hsv', markers: list[str] = ['o', 's', 'D', '^', 'p', 'P'], names: list[str] | None = None) matplotlib.image.AxesImage#
Plots a given order of events or, if states are provided, plots the most likely order in which the state accumulated its events.
- Parameters:
orders (np.ndarray, optional) – An array of orders to plot. If None, states must be provided to compute the orders. Defaults to None.
states (np.ndarray, optional) – An array of states to compute the orders from. If None, orders must be provided. Defaults to None.
ax (np.arraymatplotlib.axes.Axes, optional) – An array of matplotlib axes to plot on. Should have shape (i + 1,) where i is the number of orders to plot. If None, new axes will be created. Defaults to None.
cmap (str | matplotlib.colors.Colormap, optional) – The colormap to use for plotting the events. Defaults to “hsv”.
markers (list[str], optional) – A list of markers to use for plotting the events. Defaults to [“o”, “s”, “D”, “^”, “p”, “P”].
names (list[str], optional) – An optional list of names for orders to be plotted as titles. Defaults to None.
- Returns:
The axes with the plotted orders.
- Return type:
matplotlib.image.AxesImage
- Raises:
ValueError – If neither orders nor states are provided, or if both are provided.
- sample_artificial_data(sample_num: int, as_dataframe: bool = False) ndarray | DataFrame#
Samples artificial data from the cMHN. Use np.random.seed() to make results reproducible.
- Parameters:
sample_num (int) – Number of samples to generate.
as_dataframe (bool, optional) – Whether to return the data as a pandas DataFrame. Defaults to False.
- Returns:
Samples as rows and events as columns, in the specified format.
- Return type:
np.ndarray | pd.DataFrame
- sample_trajectories(trajectory_num: int, initial_state: ndarray | list[str] | None = None, output_event_names: bool = False, timed: float | Literal[False] = False, return_event_times: bool = False) tuple[list[list[int | str]], list[list[float]] | ndarray | list[list[int | str]], list[list[float]], ndarray]#
Simulates event accumulation using the Gillespie algorithm. Use np.random.seed() to make results reproducible.
- Parameters:
trajectory_num (int) – Number of trajectories to simulate.
initial_state (np.ndarray | list[str], optional) – Starting state for the trajectories. Can be either a numpy array containing 0s and 1s, where each entry represents an event being present (1) or not (0), or a list of strings, where each string is the name of an event. The later can only be used if events were specified during creation of the cMHN object. Can also be None. In this case a vector of zeros is used as initial_state. Defaults to None.
output_event_names (bool, optional) – Whether to return event names instead of indices. Defaults to False.
timed (float, optional) – If a float is given, only sample trajectories until this (abstract) timepoint (without units). May also be np.inf. If False, sample trajectories until observation event. In this case the observation times are returned as last argument. Defaults to False.
return_event_times (bool, optional) – If True, returns the accumulation times of all events in the trajectories as second argument. Accumulation times of events already present in initial_state are declared as None. Defaults to False.
- Returns:
- A tuple with 1-3 elements containing:
List of all trajectories,
if return_event_times is True, a list of lists of accumulation times for all trajectories’ events,
if timed is False, a numpy array of all simulated samples’ observation times.
- Return type:
tuple[list[list[int | str]] | list[list[int | str]], (list[list[float]] | np.ndarray) | list[list[int | str]], list[list[float]], np.ndarray]
- save(filename: str)#
Saves the cMHN to a CSV file. Metadata is stored in a separate JSON file if provided.
- Parameters:
filename (str) – Name of the CSV file. JSON metadata file is named accordingly.
- class mhn.model.oMHN(log_theta: array, events: list[str] | None = None, meta: dict | None = None)#
Represents a Mutual Hazard Network that additionally models the observation event (oMHN) (see Schill et al. (2024)).
- log_theta#
logarithmic values of the theta matrix representing the oMHN
- Type:
np.ndarray
- events#
Names of the events considered by the cMHN.
- Type:
list[str] | None
- meta#
Metadata for the oMHN, e.g., parameters used to train the model.
- Type:
dict | None
- compute_marginal_likelihood(state: ndarray) float#
Computes the likelihood of observing a given state. We consider the observation time to be an exponential random variable with mean 1.
- Parameters:
state (np.ndarray) – Binary array (dtype=np.int32) representing the presence (1) or absence (0) of events.
- Returns:
Likelihood of the given state.
- Return type:
float
- Raises:
ValueError – If the given state array contains anything but 0s and 1s.
- compute_next_event_probs(state: ndarray, as_dataframe: bool = False, allow_observation: bool = False) ndarray | DataFrame#
Computes probabilities for each event to be the next to occur.
- Parameters:
state (np.ndarray) – Binary array (dtype=np.int32) representing the presence (1) or absence (0) of events.
as_dataframe (bool, optional) – Whether to return the probabilities as a DataFrame. Defaults to False.
allow_observation (bool, optional) – Whether to include an observation event in the probabilities. Defaults to False.
- Returns:
Probabilities for the next event, in the specified format.
- Return type:
np.ndarray | pd.DataFrame
- Raises:
ValueError – If the number of events in state does not align with the number of events modeled by this cMHN object.
- get_equivalent_classical_mhn() cMHN#
Converts this oMHN into an equivalent classical cMHN object representing the same distribution.
- Returns:
Equivalent cMHN object.
- Return type:
- get_restr_diag(state: array) array#
Get the diagonal of the state-space-restricted Q_Theta matrix.
- Parameters:
state (np.array) – State (binary, dtype int32) which should be considered for the state space restriction. Shape (n,) with n the number of total events.
- Returns:
Diagonal of the state-space-restricted Q_Theta matrix. Shape (2^k,) with k the number of 1s in state
- Return type:
np.array
- likeliest_order(state: array, normalize: bool = False) tuple[float, array]#
Returns the likeliest order in which a given state accumulated according to the MHN.
- Parameters:
state (np.ndarray) – State (binary, dtype int32), shape (n,) with n the number of total events.
normalize (bool, optional) – Whether to normalize among all possible accumulation orders. Defaults to False.
- Returns:
Likelihood of the likeliest accumulation order and the order itself.
- Return type:
tuple[float, Any]
- classmethod load(filename: str, events: list[str] | None = None) cMHN#
Loads a cMHN object from a CSV file.
- Parameters:
filename (str) – Name of the CSV file.
events (list[str], optional) – List of event names considered by the cMHN. Defaults to None.
- Returns:
Loaded cMHN object.
- Return type:
- m_likeliest_orders(state: array, m: int, normalize: bool = False) tuple[array, array]#
Returns the m likeliest orders in which a given state accumulated according to the MHN.
- Parameters:
state (np.ndarray) – State (binary, dtype int32), shape (n,) with n the number of total events.
m (int) – Number of likeliest orders to compute.
normalize (bool, optional) – Whether to normalize among all possible accumulation orders. Defaults to False.
- Returns:
Array of likelihoods of the likeliest accumulation order and array of the order itself.
- Return type:
tuple[np.ndarray, np.ndarray]
- order_likelihood(sigma: tuple[int]) float#
Marginal likelihood of an order of events.
- Parameters:
sigma (tuple[int]) – Tuple of integers where the integers represent the events.
- Returns:
Marginal likelihood of observing sigma.
- Return type:
float
- plot(cmap_thetas: str | matplotlib.colors.Colormap = 'RdBu_r', cmap_brs: str | matplotlib.colors.Colormap = 'Greens', colorbar: bool = True, annot: float | bool = 0.1, ax: np.arraymatplotlib.axes.Axes | None = None, logarithmic: bool = True) tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage, matplotlib.colorbar.Colorbar, matplotlib.colorbar.Colorbar] | tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage]#
Plots the theta matrix.
- Parameters:
cmap_thetas (Union[str, matplotlib.colors.Colormap], optional) – Colormap to use for thetas. Defaults to “RdBu_r”.
cmap_brs (Union[str, matplotlib.colors.Colormap], optional) – Colormap to use for the base rates. Defaults to “Greens”.
colorbar (bool, optional) – Whether to display the colorbars. Defaults to True.
annot (Union[float, bool], optional) – If boolean, either all or no annotations are displayed. If numerical, displays annotations for all effects greater than this threshold in the logarithmic theta matrix. Defaults to 0.1.
ax (Optional[matplotlib.axes.Axes], optional) – Matplotlib axes to plot on. Defaults to None.
logarithmic (bool, optional) – If set to True, plots the logarithmic theta matrix, else plots the exponential theta matrix. Defaults to True.
- Returns:
If colorbar is True, returns the two heatmaps and the two colorbars. Else, returns only the two axes images.
- Return type:
tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage, matplotlib.colorbar.Colorbar, matplotlib.colorbar.Colorbar] | tuple[matplotlib.image.AxesImage, matplotlib.image.AxesImage]
- plot_order_tree(orderings: list[tuple[int]] | None = None, states: array | None = None, max_event_num: int = 4, min_line_width: int = 1, max_line_width: int = 10, ax: Axes | None = None, inner_circle_radius: float = 2.0, circle_radius_diff: float = 1.0, markers: tuple[str] = ('o', 's', 'D', '^', 'p', 'P', '>'), min_symbol_size: float = 30.0, min_number_of_occurrence: int = 3, legend: bool | dict | None = None) Axes#
Plots a tree representing the most probable chronological orders of events according to this MHN. Each path from the root of the tree (white circle) to a leaf illustrates a possible cancer progression within the given dataset. The symbols along each path denote events whose most probable chronological order was derived from this MHN model. Each ordering / state corresponds to a terminal node in the tree or an internal node with a black outline. The size of the edges and symbols along a path scale with the total number of patients with that cancer state.
- Parameters:
orderings (list[tuple[int]], optional) – A list where each element represents an ordering of events that should be added to the tree. The elements of each ordering should be the index of the event in this objects events list. If None is given, states is used instead.
states (np.ndarray, optional) – An array of states ((binary, dtype int32), shape (n,) with n the number of total events) to compute the orders from. If None, orderings must be provided.
max_event_num (int) – Maximum number of events of a single state that should be plotted in the tree. If a state has more that this number of active events, only the first active events up until this point are plotted.
min_line_width (int) – Minimum line width of the lines connecting the events in the tree.
max_line_width (int) – Maximum line width of the lines connecting the events in the tree.
ax (matplotlib.axes.Axes, optional) – Axis on which the tree is plotted. If None, a new axis is created.
inner_circle_radius (float) – Distance between the tree root and the first event in the tree.
circle_radius_diff (float) – Difference in radius between the circles on which consecutive events lie on.
markers (tuple[str]) – A list of markers to use for plotting the events. Defaults to (“o”, “s”, “D”, “^”, “p”, “P”, “>”).
min_symbol_size (float) – Minimum size of the markers representing events in the tree.
min_number_of_occurrence (int) – Minimum number of occurrence of a state / ordering to be plotted in the tree. Used to avoid clutter.
legend (Optional[Union[bool, dict]]) – If True, a legend is added to the plot. If a dictionary is provided, it is passed to the legend function.
- Returns:
The axis with the plotted tree.
- Return type:
matplotlib.axes.Axes
- plot_orders(*, orders: np.array | None = None, states: np.array | None = None, ax: np.arraymatplotlib.axes.Axes | None = None, cmap: str | matplotlib.colors.Colormap = 'hsv', markers: list[str] = ['o', 's', 'D', '^', 'p', 'P'], names: list[str] | None = None) matplotlib.image.AxesImage#
Plots a given order of events or, if states are provided, plots the most likely order in which the state accumulated its events.
- Parameters:
orders (np.ndarray, optional) – An array of orders to plot. If None, states must be provided to compute the orders. Defaults to None.
states (np.ndarray, optional) – An array of states to compute the orders from. If None, orders must be provided. Defaults to None.
ax (np.arraymatplotlib.axes.Axes, optional) – An array of matplotlib axes to plot on. Should have shape (i + 1,) where i is the number of orders to plot. If None, new axes will be created. Defaults to None.
cmap (str | matplotlib.colors.Colormap, optional) – The colormap to use for plotting the events. Defaults to “hsv”.
markers (list[str], optional) – A list of markers to use for plotting the events. Defaults to [“o”, “s”, “D”, “^”, “p”, “P”].
names (list[str], optional) – An optional list of names for orders to be plotted as titles. Defaults to None.
- Returns:
The axes with the plotted orders.
- Return type:
matplotlib.image.AxesImage
- Raises:
ValueError – If neither orders nor states are provided, or if both are provided.
- sample_artificial_data(sample_num: int, as_dataframe: bool = False) ndarray | DataFrame#
Samples artificial data from the cMHN. Use np.random.seed() to make results reproducible.
- Parameters:
sample_num (int) – Number of samples to generate.
as_dataframe (bool, optional) – Whether to return the data as a pandas DataFrame. Defaults to False.
- Returns:
Samples as rows and events as columns, in the specified format.
- Return type:
np.ndarray | pd.DataFrame
- sample_trajectories(trajectory_num: int, initial_state: ndarray | list[str] | None = None, output_event_names: bool = False, timed: float | Literal[False] = False, return_event_times: bool = False) tuple[list[list[int | str]], list[list[float]] | ndarray | list[list[int | str]], list[list[float]], ndarray]#
Simulates event accumulation using the Gillespie algorithm. Use np.random.seed() to make results reproducible.
- Parameters:
trajectory_num (int) – Number of trajectories to simulate.
initial_state (np.ndarray | list[str], optional) – Starting state for the trajectories. Can be either a numpy array containing 0s and 1s, where each entry represents an event being present (1) or not (0), or a list of strings, where each string is the name of an event. The later can only be used if events were specified during creation of the cMHN object. Can also be None. In this case a vector of zeros is used as initial_state. Defaults to None.
output_event_names (bool, optional) – Whether to return event names instead of indices. Defaults to False.
timed (float, optional) – If a float is given, only sample trajectories until this (abstract) timepoint (without units). May also be np.inf. If False, sample trajectories until observation event. In this case the observation times are returned as last argument. Defaults to False.
return_event_times (bool, optional) – If True, returns the accumulation times of all events in the trajectories as second argument. Accumulation times of events already present in initial_state are declared as None. Defaults to False.
- Returns:
- A tuple with 1-3 elements containing:
List of all trajectories,
if return_event_times is True, a list of lists of accumulation times for all trajectories’ events,
if timed is False, a numpy array of all simulated samples’ observation times.
- Return type:
tuple[list[list[int | str]] | list[list[int | str]], (list[list[float]] | np.ndarray) | list[list[int | str]], list[list[float]], np.ndarray]
- save(filename: str)#
Saves the oMHN to a CSV file. Metadata is stored in a separate JSON file if provided.
- Parameters:
filename (str) – Name of the CSV file. JSON metadata file is named accordingly.