Data Input¶
- src.data_input.load_attributes(csv_file: str, subgroup_information: dict, image_path_column: str = 'Path', id_column: str | None = None, missing_information: str = 'raise', info_format: str = 'categorical', rel_path: str | None = None, n_processes: int | None = None) DataFrame ¶
Loads image filepaths and patient attributes from provided csv file.
- Parameters:
csv_file (str) – Filepath to summary csv.
subgroup_information (dict) – Subgroup attributes in the format Group:[subgroups] (ex.
{"Sex":["Male","Female"]}
).image_path_column (str) – Name of column in csv_file listing image filepaths.
id_column (str | None) – Name of columns in csv_file listing unique patient/sample identifiers.
missing_information (str) – How to handle samples missing information; ‘raise’: raise an exception, ‘remove’: remove samples missing information
info_format (str) – Format to return patient information in.
rel_path (str | None) – Declare a relative path for image file paths.
n_processes (int | None) – Number of processes to use while checking image file paths, if
None
, uses number of available cores.
- Returns:
DataFrame of ids, image filepaths, and subgroup information.
- Return type:
pandas.DataFrame
- src.data_input.load_image(image_path, mode: str = 'RGB', scale: int | tuple[int] | None = None) array ¶
Loads image in the specified image mode.
- Parameters:
image_path – File path to image.
mode (str) – PIL.Image mode to use.
scale (int | tuple[int] | None) – Scale to resize image; if
int
, will resize to (scale, scale); ifNone
, will not resize.
- Returns:
Image array.
- Return type:
numpy.array
- Raises:
Exception – The provided scale is not in a supported format
Decision Region Generation¶
- class src.decision_region_generation.triplet_manager.TripletManager(input_csv, classes, triplets_per_group, subgroup_attributes=None, image_rel_path=None, sample_id_column=None, mix_subgroups=False, mix_classes=False, random_seed=None)¶
Wrapper class to generate vicinal distributions for specified groups.
- Parameters:
input_csv (
str
) – File path for input csv; passed todata_input.load_attributes
.classes (
dict
) – All potential output classes, organized by task.triplets_per_group (
int
) – Number of triplets to generate for each group of samples.subgroup_attributes (
dict
, optional) – All subgroup attributes options, organized by attribute; ex. {‘Sex’:[‘F’,’M’]}image_rel_path (
str
, optional) – Relative image path; passed todata_input.load_attributes
.sample_id_column (
str
, optional) – ID column for input csv; passed todata_input.load_attributes
.mix_subgroups (
bool
, optional) – If true, will not separate groups by subgroup attributes.mix_classes (
bool
, optional) – If true, will not separate groups by class.
- __getitem__(key)¶
- Returns:
The triplet, group, key and images for the specified triplet
- Return type:
dict
- src.decision_region_generation.generate.generate_decision_regions(input_csv_path: str, onnx_model_path: str, output_path: str, batch_size: int, manager_kwargs={}, vicinal_kwargs={}, overwrite=True)¶
Generates, evaluates and saves decision regions.
- Parameters:
input_csv_path (str) – File path to the input csv; passed to
load_attributes
.onnx_model_path (str) – Onnx model file path.
output_path (str) – Name and path for output file.
batch_size (int) – Batch size for
plane_loader
.manager_kwargs (dict) – Keyword arguments to be passed to
TripletManager
.vicinal_kwargs (dict) – Keyword arguments to be passed to
plane_dataset
.overwrite (bool) – If True, will overwrite existing file at output_path.
- src.decision_region_generation.vicinal_distribution.get_plane(img1, img2, img3)¶
Calculate the plane (basis vecs) spanned by 3 images
- Parameters:
img1 (
numpy.array
) – Three numpy arrays of images; must all be the same size.img2 (
numpy.array
) – Three numpy arrays of images; must all be the same size.img3 (
numpy.array
) – Three numpy arrays of images; must all be the same size.
- Returns:
a, b_orthog (
numpy.array
) – 2 orthogonal basis vectors for the plane spanned by the input imagesb (
numpy.array
) – The second basis vecotr, before being made orthogonalcoords (
list
) – Coordinates of img0, img1, and img2
- class src.decision_region_generation.vicinal_distribution.plane_dataloader(dataset: plane_dataset, batch_size: int, output_dtype=None, channel_idx=-1, output_channel_idx=0)¶
Dataloader to be used alongside the
plane_dataset
class.- Parameters:
dataset (
plane_dataset
) – Plane_data to be loaded.batch_size (
int
) – Number of images to include in each batch.output_dtype (optional) – Data type of returned arrays.
channel_idx (
int
, optional) – Dimension index of the images’ channel dimension.output_channel_idx (
int
, optional) – Desired output dimension index of the images’ channel dimension.
- class src.decision_region_generation.vicinal_distribution.plane_dataset(img1, img2, img3, steps=5, expand=0, shape='rectangle')¶
Generates a vicinal distribution from the input images.
- Parameters:
img1 (
numpy.array
) – Images from which to construct the vicinal distribution, should all be the same shape and data type.img2 (
numpy.array
) – Images from which to construct the vicinal distribution, should all be the same shape and data type.img3 (
numpy.array
) – Images from which to construct the vicinal distribution, should all be the same shape and data type.steps (
int
, optional) – Number of steps to take between images, affects the number of virtual images generated.expand (
float
, optional) – How far beyond the original images to expand the plane; only works with shape=’rectangle’shape (
str
, optional) – Shape of the region of plane to generate samples for.
Composition Analysis¶
- src.composition_analysis.get_compositions(filepath: str, tasks: dict, output_function: str | None = None, thresholds: list = [0.5], aggregate: str | None = None) DataFrame ¶
Gets the compositions of all decision regions in a decision region file.
- Parameters:
filepath (str) – File path of decision region hdf5 file.
tasks (dict) – Model classification tasks.
output_function (str | None) – Function to be applied to model output scores.
thresholds (list) – Thresholds for each task.
aggregate (str | None) – How to aggregate the compositions.
- Returns:
Dataframe of decision region compositions.
- Return type:
pandas.DataFrame
- src.composition_analysis.plot_decision_regions(filepath: str, save_loc: str, n_per_group: int | None = None, threshold: int = 1, palette: str = 'Set2')¶
Plot decision regions from decision region files.
Notes
TODO: - threshold support: legend + palette
- Parameters:
filepath (str) – File path to decision region hdf5 file.
save_loc (str) – Save location.
n_per_group (int | None) – Number per group to plot; if None, plots all.
threshold (int) – Ouput score threshold; if None, does not threshold
palette (str) –
- Returns:
Plot figure.
- Return type:
matplotlib.pyplot.figure
- src.composition_analysis.plot_figures(df, plot: str, save_loc: str, tasks: dict, palette: str | dict | list = 'Set2', aggregate: str = 'group', show_percent: bool = True, errorbar: bool = True, show: bool = False, save: bool = True, save_dpi: int = 800, filepath: str | None = None, n_per_group: int | None = None, threshold: float | None = None, output_formats=['.svg'])¶
Plots composition, performance or region figures.
- Parameters:
df (pandas.DataFrame) – DataFrame containing decision region composition infromation; used with plot = composition or performance.
plot (str) – Type of plot to create, options: composition, performance, region.
save_loc (str) – Folder in which to save plot images.
tasks (dict) – Model classification tasks.
palette (str | dict | list) – Color palette to use. (TODO: expand description)
aggregate (str) – Method by which to aggregate results, must match the aggregation used during composition analysis.
show_percent (bool) – If True, exact percent values will be included on composition/performance plots.
errorbar (bool) – If True, error bars will be included on composition/performance plots.
show (bool) – If True, will display plots.
save (bool) – If True, will save plot files to save_loc.
save_dpi (int) – DPI of saved output image(s).
filepath (str | None) – File path of decision region hdf5 file; only used for region figures.
n_per_group (int | None) – Number of triplets per group to create plots for; only used for region figures.
threshold (float | None) – The threshold to be applied to convert output scores to a binary classification, if None, no threshold is applied; only used for region figures.
output_formats – The image formats in which to save the output figures.
- src.composition_analysis.save_compositions(compositions: DataFrame, save_loc: str, overwrite: bool = False, aggregate: str | None = None)¶
Saves the compositions analysis in a csv file
- Parameters:
compositions (DataFrame) – Decision region compositions, as output by
get_compositions()
.save_loc (str) – Save location
overwrite (bool) – Whether or not to overwriting existing files.
aggregate (str | None) – The aggregation method; used as part of naming convention.
- src.composition_analysis.set_params(plot: str)¶
Sets figure style parameters.
- Parameters:
plot (str) – The type of plot being created (composition, performance or region).