Data Input

src.data_input.load_attributes(csv_file: str, subgroup_information: dict, image_path_column: str = 'Path', id_column: str | None = None, missing_information: str = 'raise', info_format: str = 'categorical', rel_path: str | None = None, n_processes: int | None = None) DataFrame

Loads image filepaths and patient attributes from provided csv file.

Parameters:
  • csv_file (str) – Filepath to summary csv.

  • subgroup_information (dict) – Subgroup attributes in the format Group:[subgroups] (ex. {"Sex":["Male","Female"]}).

  • image_path_column (str) – Name of column in csv_file listing image filepaths.

  • id_column (str | None) – Name of columns in csv_file listing unique patient/sample identifiers.

  • missing_information (str) – How to handle samples missing information; ‘raise’: raise an exception, ‘remove’: remove samples missing information

  • info_format (str) – Format to return patient information in.

  • rel_path (str | None) – Declare a relative path for image file paths.

  • n_processes (int | None) – Number of processes to use while checking image file paths, if None, uses number of available cores.

Returns:

DataFrame of ids, image filepaths, and subgroup information.

Return type:

pandas.DataFrame

src.data_input.load_image(image_path, mode: str = 'RGB', scale: int | tuple[int] | None = None) array

Loads image in the specified image mode.

Parameters:
  • image_path – File path to image.

  • mode (str) – PIL.Image mode to use.

  • scale (int | tuple[int] | None) – Scale to resize image; if int, will resize to (scale, scale); if None, will not resize.

Returns:

Image array.

Return type:

numpy.array

Raises:

Exception – The provided scale is not in a supported format

Decision Region Generation

class src.decision_region_generation.triplet_manager.TripletManager(input_csv, classes, triplets_per_group, subgroup_attributes=None, image_rel_path=None, sample_id_column=None, mix_subgroups=False, mix_classes=False, random_seed=None)

Wrapper class to generate vicinal distributions for specified groups.

Parameters:
  • input_csv (str) – File path for input csv; passed to data_input.load_attributes.

  • classes (dict) – All potential output classes, organized by task.

  • triplets_per_group (int) – Number of triplets to generate for each group of samples.

  • subgroup_attributes (dict, optional) – All subgroup attributes options, organized by attribute; ex. {‘Sex’:[‘F’,’M’]}

  • image_rel_path (str, optional) – Relative image path; passed to data_input.load_attributes.

  • sample_id_column (str, optional) – ID column for input csv; passed to data_input.load_attributes.

  • mix_subgroups (bool, optional) – If true, will not separate groups by subgroup attributes.

  • mix_classes (bool, optional) – If true, will not separate groups by class.

__getitem__(key)
Returns:

The triplet, group, key and images for the specified triplet

Return type:

dict

src.decision_region_generation.generate.generate_decision_regions(input_csv_path: str, onnx_model_path: str, output_path: str, batch_size: int, manager_kwargs={}, vicinal_kwargs={}, overwrite=True)

Generates, evaluates and saves decision regions.

Parameters:
  • input_csv_path (str) – File path to the input csv; passed to load_attributes.

  • onnx_model_path (str) – Onnx model file path.

  • output_path (str) – Name and path for output file.

  • batch_size (int) – Batch size for plane_loader.

  • manager_kwargs (dict) – Keyword arguments to be passed to TripletManager.

  • vicinal_kwargs (dict) – Keyword arguments to be passed to plane_dataset.

  • overwrite (bool) – If True, will overwrite existing file at output_path.

src.decision_region_generation.vicinal_distribution.get_plane(img1, img2, img3)

Calculate the plane (basis vecs) spanned by 3 images

Parameters:
  • img1 (numpy.array) – Three numpy arrays of images; must all be the same size.

  • img2 (numpy.array) – Three numpy arrays of images; must all be the same size.

  • img3 (numpy.array) – Three numpy arrays of images; must all be the same size.

Returns:

  • a, b_orthog (numpy.array) – 2 orthogonal basis vectors for the plane spanned by the input images

  • b (numpy.array) – The second basis vecotr, before being made orthogonal

  • coords (list) – Coordinates of img0, img1, and img2

class src.decision_region_generation.vicinal_distribution.plane_dataloader(dataset: plane_dataset, batch_size: int, output_dtype=None, channel_idx=-1, output_channel_idx=0)

Dataloader to be used alongside the plane_dataset class.

Parameters:
  • dataset (plane_dataset) – Plane_data to be loaded.

  • batch_size (int) – Number of images to include in each batch.

  • output_dtype (optional) – Data type of returned arrays.

  • channel_idx (int, optional) – Dimension index of the images’ channel dimension.

  • output_channel_idx (int, optional) – Desired output dimension index of the images’ channel dimension.

class src.decision_region_generation.vicinal_distribution.plane_dataset(img1, img2, img3, steps=5, expand=0, shape='rectangle')

Generates a vicinal distribution from the input images.

Parameters:
  • img1 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.

  • img2 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.

  • img3 (numpy.array) – Images from which to construct the vicinal distribution, should all be the same shape and data type.

  • steps (int, optional) – Number of steps to take between images, affects the number of virtual images generated.

  • expand (float, optional) – How far beyond the original images to expand the plane; only works with shape=’rectangle’

  • shape (str, optional) – Shape of the region of plane to generate samples for.

Composition Analysis

src.composition_analysis.get_compositions(filepath: str, tasks: dict, output_function: str | None = None, thresholds: list = [0.5], aggregate: str | None = None) DataFrame

Gets the compositions of all decision regions in a decision region file.

Parameters:
  • filepath (str) – File path of decision region hdf5 file.

  • tasks (dict) – Model classification tasks.

  • output_function (str | None) – Function to be applied to model output scores.

  • thresholds (list) – Thresholds for each task.

  • aggregate (str | None) – How to aggregate the compositions.

Returns:

Dataframe of decision region compositions.

Return type:

pandas.DataFrame

src.composition_analysis.plot_decision_regions(filepath: str, save_loc: str, n_per_group: int | None = None, threshold: int = 1, palette: str = 'Set2')

Plot decision regions from decision region files.

Notes

TODO: - threshold support: legend + palette

Parameters:
  • filepath (str) – File path to decision region hdf5 file.

  • save_loc (str) – Save location.

  • n_per_group (int | None) – Number per group to plot; if None, plots all.

  • threshold (int) – Ouput score threshold; if None, does not threshold

  • palette (str) –

Returns:

Plot figure.

Return type:

matplotlib.pyplot.figure

src.composition_analysis.plot_figures(df, plot: str, save_loc: str, tasks: dict, palette: str | dict | list = 'Set2', aggregate: str = 'group', show_percent: bool = True, errorbar: bool = True, show: bool = False, save: bool = True, save_dpi: int = 800, filepath: str | None = None, n_per_group: int | None = None, threshold: float | None = None, output_formats=['.svg'])

Plots composition, performance or region figures.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing decision region composition infromation; used with plot = composition or performance.

  • plot (str) – Type of plot to create, options: composition, performance, region.

  • save_loc (str) – Folder in which to save plot images.

  • tasks (dict) – Model classification tasks.

  • palette (str | dict | list) – Color palette to use. (TODO: expand description)

  • aggregate (str) – Method by which to aggregate results, must match the aggregation used during composition analysis.

  • show_percent (bool) – If True, exact percent values will be included on composition/performance plots.

  • errorbar (bool) – If True, error bars will be included on composition/performance plots.

  • show (bool) – If True, will display plots.

  • save (bool) – If True, will save plot files to save_loc.

  • save_dpi (int) – DPI of saved output image(s).

  • filepath (str | None) – File path of decision region hdf5 file; only used for region figures.

  • n_per_group (int | None) – Number of triplets per group to create plots for; only used for region figures.

  • threshold (float | None) – The threshold to be applied to convert output scores to a binary classification, if None, no threshold is applied; only used for region figures.

  • output_formats – The image formats in which to save the output figures.

src.composition_analysis.save_compositions(compositions: DataFrame, save_loc: str, overwrite: bool = False, aggregate: str | None = None)

Saves the compositions analysis in a csv file

Parameters:
  • compositions (DataFrame) – Decision region compositions, as output by get_compositions().

  • save_loc (str) – Save location

  • overwrite (bool) – Whether or not to overwriting existing files.

  • aggregate (str | None) – The aggregation method; used as part of naming convention.

src.composition_analysis.set_params(plot: str)

Sets figure style parameters.

Parameters:

plot (str) – The type of plot being created (composition, performance or region).