Data Input¶

src.data_input.load_attributes(csv_file: str, subgroup_information: dict, image_path_column: str = 'Path', id_column: str | None = None, missing_information: str = 'raise', info_format: str = 'categorical', rel_path: str | None = None, n_processes: int | None = None) → DataFrame¶

Loads image filepaths and patient attributes from provided csv file.

Parameters:

csv_file (str) – Filepath to summary csv.
subgroup_information (dict) – Subgroup attributes in the format Group:[subgroups] (ex. {"Sex":["Male","Female"]}).
image_path_column (str) – Name of column in csv_file listing image filepaths.
id_column (str | None) – Name of columns in csv_file listing unique patient/sample identifiers.
missing_information (str) – How to handle samples missing information; ‘raise’: raise an exception, ‘remove’: remove samples missing information
info_format (str) – Format to return patient information in.
rel_path (str | None) – Declare a relative path for image file paths.
n_processes (int | None) – Number of processes to use while checking image file paths, if None, uses number of available cores.

Returns:

DataFrame of ids, image filepaths, and subgroup information.

Return type:

pandas.DataFrame

src.data_input.load_image(image_path, mode: str = 'RGB', scale: int | tuple[int] | None = None) → array¶

Loads image in the specified image mode.

Parameters:

image_path – File path to image.
mode (str) – PIL.Image mode to use.
scale (int | tuple[int] | None) – Scale to resize image; if int, will resize to (scale, scale); if None, will not resize.

Returns:

Image array.

Return type:

numpy.array

Raises:

Exception – The provided scale is not in a supported format

Decision Region Generation¶

class src.decision_region_generation.triplet_manager.TripletManager(input_csv, classes, triplets_per_group, subgroup_attributes=None, image_rel_path=None, sample_id_column=None, mix_subgroups=False, mix_classes=False, random_seed=None)¶

Wrapper class to generate vicinal distributions for specified groups.

Parameters:

input_csv (str) – File path for input csv; passed to data_input.load_attributes.
classes (dict) – All potential output classes, organized by task.
triplets_per_group (int) – Number of triplets to generate for each group of samples.
subgroup_attributes (dict, optional) – All subgroup attributes options, organized by attribute; ex. {‘Sex’:[‘F’,’M’]}
image_rel_path (str, optional) – Relative image path; passed to data_input.load_attributes.
sample_id_column (str, optional) – ID column for input csv; passed to data_input.load_attributes.
mix_subgroups (bool, optional) – If true, will not separate groups by subgroup attributes.
mix_classes (bool, optional) – If true, will not separate groups by class.

__getitem__(key)¶

Returns:: The triplet, group, key and images for the specified triplet
Return type:: dict

src.decision_region_generation.generate.generate_decision_regions(input_csv_path: str, onnx_model_path: str, output_path: str, batch_size: int, manager_kwargs={}, vicinal_kwargs={}, overwrite=True)¶

Generates, evaluates and saves decision regions.

Parameters:

input_csv_path (str) – File path to the input csv; passed to load_attributes.
onnx_model_path (str) – Onnx model file path.
output_path (str) – Name and path for output file.
batch_size (int) – Batch size for plane_loader.
manager_kwargs (dict) – Keyword arguments to be passed to TripletManager.
vicinal_kwargs (dict) – Keyword arguments to be passed to plane_dataset.
overwrite (bool) – If True, will overwrite existing file at output_path.

src.decision_region_generation.vicinal_distribution.get_plane(img1, img2, img3)¶

Calculate the plane (basis vecs) spanned by 3 images

Parameters:

img1 (numpy.array) – Three numpy arrays of images; must all be the same size.
img2 (numpy.array) – Three numpy arrays of images; must all be the same size.
img3 (numpy.array) – Three numpy arrays of images; must all be the same size.

Returns:

a, b_orthog (numpy.array) – 2 orthogonal basis vectors for the plane spanned by the input images
b (numpy.array) – The second basis vecotr, before being made orthogonal
coords (list) – Coordinates of img0, img1, and img2

class src.decision_region_generation.vicinal_distribution.plane_dataloader(dataset: plane_dataset, batch_size: int, output_dtype=None, channel_idx=-1, output_channel_idx=0)¶

Dataloader to be used alongside the plane_dataset class.

Parameters:

dataset (plane_dataset) – Plane_data to be loaded.
batch_size (int) – Number of images to include in each batch.
output_dtype (optional) – Data type of returned arrays.
channel_idx (int, optional) – Dimension index of the images’ channel dimension.
output_channel_idx (int, optional) – Desired output dimension index of the images’ channel dimension.

class src.decision_region_generation.vicinal_distribution.plane_dataset(img1, img2, img3, steps=5, expand=0, shape='rectangle')¶

Generates a vicinal distribution from the input images.

Parameters:

img1 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.
img2 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.
img3 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.
steps (int, optional) – Number of steps to take between images, affects the number of virtual images generated.
expand (float, optional) – How far beyond the original images to expand the plane; only works with shape=’rectangle’
shape (str, optional) – Shape of the region of plane to generate samples for.

Composition Analysis¶

src.composition_analysis.get_compositions(filepath: str, tasks: dict, output_function: str | None = None, thresholds: list = [0.5], aggregate: str | None = None) → DataFrame¶

Gets the compositions of all decision regions in a decision region file.

Parameters:

filepath (str) – File path of decision region hdf5 file.
tasks (dict) – Model classification tasks.
output_function (str | None) – Function to be applied to model output scores.
thresholds (list) – Thresholds for each task.
aggregate (str | None) – How to aggregate the compositions.

Returns:

Dataframe of decision region compositions.

Return type:

pandas.DataFrame

src.composition_analysis.plot_decision_regions(filepath: str, save_loc: str, n_per_group: int | None = None, threshold: int = 1, palette: str = 'Set2')¶

Plot decision regions from decision region files.

Notes

TODO: - threshold support: legend + palette

Parameters:

filepath (str) – File path to decision region hdf5 file.
save_loc (str) – Save location.
n_per_group (int | None) – Number per group to plot; if None, plots all.
threshold (int) – Ouput score threshold; if None, does not threshold
palette (str) –

Returns:

Plot figure.

Return type:

matplotlib.pyplot.figure

src.composition_analysis.plot_figures(df, plot: str, save_loc: str, tasks: dict, palette: str | dict | list = 'Set2', aggregate: str = 'group', show_percent: bool = True, errorbar: bool = True, show: bool = False, save: bool = True, save_dpi: int = 800, filepath: str | None = None, n_per_group: int | None = None, threshold: float | None = None, output_formats=['.svg'])¶

Plots composition, performance or region figures.

Parameters:

df (pandas.DataFrame) – DataFrame containing decision region composition infromation; used with plot = composition or performance.
plot (str) – Type of plot to create, options: composition, performance, region.
save_loc (str) – Folder in which to save plot images.
tasks (dict) – Model classification tasks.
palette (str | dict | list) – Color palette to use. (TODO: expand description)
aggregate (str) – Method by which to aggregate results, must match the aggregation used during composition analysis.
show_percent (bool) – If True, exact percent values will be included on composition/performance plots.
errorbar (bool) – If True, error bars will be included on composition/performance plots.
show (bool) – If True, will display plots.
save (bool) – If True, will save plot files to save_loc.
save_dpi (int) – DPI of saved output image(s).
filepath (str | None) – File path of decision region hdf5 file; only used for region figures.
n_per_group (int | None) – Number of triplets per group to create plots for; only used for region figures.
threshold (float | None) – The threshold to be applied to convert output scores to a binary classification, if None, no threshold is applied; only used for region figures.
output_formats – The image formats in which to save the output figures.

src.composition_analysis.save_compositions(compositions: DataFrame, save_loc: str, overwrite: bool = False, aggregate: str | None = None)¶

Saves the compositions analysis in a csv file

Parameters:

compositions (DataFrame) – Decision region compositions, as output by get_compositions().
save_loc (str) – Save location
overwrite (bool) – Whether or not to overwriting existing files.
aggregate (str | None) – The aggregation method; used as part of naming convention.

src.composition_analysis.set_params(plot: str)¶

Sets figure style parameters.

Parameters:: plot (str) – The type of plot being created (composition, performance or region).

Data Input¶

Decision Region Generation¶

Composition Analysis¶

Table of Contents

This Page