domid.dsets package

Submodules

domid.dsets.a_dset_mnist_color_rgb_solo module

Color MNIST with single color

class domid.dsets.a_dset_mnist_color_rgb_solo.ADsetMNISTColorRGBSolo(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]

Bases: Dataset

Color MNIST with single color
  1. nominal domains: color palettes/range/spectrum

  2. subdomains: color(foreground, background)

  3. structure: each subdomain contains a combination of foreground+background color

abstract get_foreground_color(ind)[source]
abstract get_background_color(ind)[source]
abstract get_num_colors()[source]
__init__(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]
Parameters:
  • ind_color – index of a color palette

  • path – disk storage directory

  • color_scheme – num (paint according to number), back (only paint background), both (background and foreground)

  • list_transforms – torch transformations

  • raw_split – default use the training part of mnist

  • flag_rand_color – flag if to randomly paint each image (depreciated)

  • label_transform – e.g. index to one hot vector

generate_dataframe()[source]

domid.dsets.dset_her2 module

class domid.dsets.dset_her2.DsetHER2(class_num, path, d_dim, inject_variable=None, metadata_path=None, transform=None)[source]

Bases: Dataset

Dataset of HER2 stained digital microscopy images. As currently implemented, the subdomains are the HER2 diagnostic classes 1, 2, and 3. There are also 4 data collection site/machine combinations.

__init__(class_num, path, d_dim, inject_variable=None, metadata_path=None, transform=None)[source]
Parameters:
  • class_num – a integer value from 0 to 2, only images of this class will be kept. Note: that actual classes are from 1-3 (therefore, 1 is added in line 28)

  • path – path to data storage directory (typically passed through args.dpath)

  • d_dim – number of clusters for the clustering task

  • inject_variable – name of the variable to be injected for CDVaDE

  • metadata – path to the CSV file containing the to-be-injected variable for CDVaDE (typecally passed through args.meta_data_csv); if not specified then defaults to “dataframe.csv” in directory given by the “path” argument

  • transform – torch transformations

domid.dsets.dset_mnist module

MNIST

class domid.dsets.dset_mnist.DsetMNIST(digit, args, list_transforms=None, raw_split='train')[source]

Bases: Dataset

MNIST Dataset Loading - subdomains: MNIST digit value - structure: each subdomain contains all images of a given digit

__init__(digit, args, list_transforms=None, raw_split='train')[source]
Parameters:
  • digit – a integer value from 0 to 9; only images of this digit will be kept.

  • path – disk storage directory

  • subset_step – used to subsample the dataset; a fraction of 1/subset_step images is kept

  • list_transforms – torch transformations

  • raw_split – default use the training part of mnist

domid.dsets.dset_mnist_color_solo_default module

class domid.dsets.dset_mnist_color_solo_default.DsetMNISTColorSoloDefault(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]

Bases: ADsetMNISTColorRGBSolo

property palette
get_num_colors()[source]
get_background_color(ind)[source]
get_foreground_color(ind)[source]

domid.dsets.dset_unittest module

class domid.dsets.dset_unittest.DsetUnitTest(digit, args, subset_step=1, list_transforms=None)[source]

Bases: Dataset

This dataset is solely used for unit testing of loss values. The images contain tensors of one with the dimension of 1x16x16, the label is a random integer.

__init__(digit, args, subset_step=1, list_transforms=None)[source]
create_the_dataset(dpath)[source]

domid.dsets.dset_usps module

class domid.dsets.dset_usps.DsetUSPS(digit, args, subset_step=1, list_transforms=None)[source]

Bases: Dataset

__init__(digit, args, subset_step=1, list_transforms=None)[source]
get_original_indicies()[source]

domid.dsets.dset_wsi module

class domid.dsets.dset_wsi.DsetWSI(class_num, path, args, path_to_domain=None, transform=None)[source]

Bases: Dataset

Dataset of WEAH stained digital microscopy images. As currently implemented, the subdomains are the HER2 diagnostic classes 1, 2, and 3. There are also 4 data collection site/machine combinations.

__init__(class_num, path, args, path_to_domain=None, transform=None)[source]
Parameters:
  • class_num – a integer value from 0 to 2, only images of this class will be kept.Note: that actual classes are from 1-3 (therefore, 1 is added in line 28)

  • path – path to root storage directory

  • d_dim – number of clusters for the clustering task

  • path_to_domain – if inject previously predicted domain labels, the path needs to be specified.domain_labels.txt must be inside the directory, containing to-be-injected labels.

  • transform – torch transformations

domid.dsets.generate_dataset_dataframe_her2 module

domid.dsets.generate_dataset_dataframe_her2.get_jpg_folders(path)[source]

only keep folders of .jpg images, which folder names by convention end in jpg

domid.dsets.generate_dataset_dataframe_her2.total_count_images(path)[source]
domid.dsets.generate_dataset_dataframe_her2.parse_machine_labels(image_names)[source]
domid.dsets.generate_dataset_dataframe_her2.mean_scores_per_experiment(scores, img_locs)[source]

Parser to get mean scores per image from the cvs file. The name of the images in the folders are slightly different from the names in the csv file.

domid.dsets.make_graph module

class domid.dsets.make_graph.GraphConstructor(graph_method, topk=7)[source]

Bases: object

Class to construct graph from features. This is only used in training for SDCN model.

__init__(graph_method, topk=7)[source]

Initializer of GraphConstructor. :param graph_method: the method to calculate distance between features; one of ‘heat’, ‘cos’, ‘ncos’. :param topk: number of connections per image

sparse_mx_to_torch_sparse_tensor(sparse_mx)[source]

Convert a scipy sparse matrix to a torch sparse tensor.

get_features_labels(dataset)[source]

This funciton is used to get features and labels from dataset. :param dataset: Image dataset that can be batched or unbatched :return: X: features from the image (flattened images), labels: domain labels, region_labels: region labels if the dataset is WSI images

normalize(mx)[source]

Row-normalize sparse matrix which is used to calculate the distance for normalized cosine method. :param mx: sparse matrix :return: row-normalized sparse matrix

distance_calc(features)[source]

This function is used to calculate distance between features. :param features: the batch of features from the dataset :return: distance matrix between features of the batch of images with the shape of (num_img, num_img)

connection_calc(features)[source]

This function is used to calculate the connection pairs between images for all the batches of dataset. :param features: flattened image from the batch of dataset :return: indecies of top k connections per each image in the batch (shape: (num_img*self.topk, 2))

mk_adj_mat(n, connection_pairs)[source]

This function is used to make the adjacency matrix for the graph for each batch of dataset. :param n: batchsize :param connection_pairs: top k connections per each image in the batch (shape: (num_img*self.topk, 2)) :return:

construct_graph(dataset, experiment_folder)[source]

This function is used to construct the graph for all the batches of dataset. This is called in the trainer function of SDCN model. :param dataset: dataset contraining all the batches of data (or no batched data) :param graph_method: graph construction method :return: the adjacency matrix for all the batches of data

domid.dsets.make_graph_wsi module

class domid.dsets.make_graph_wsi.GraphConstructorWSI(graph_method, topk=7)[source]

Bases: GraphConstructor

Class to construct graph from features from WSI images. This is only used in training for SDCN model and for WSI dataset.

__init__(graph_method, topk=7)[source]

Initializer of GraphConstructor. :param graph_method: the method to calculate distance between features; one of ‘heat’, ‘cos’, ‘ncos’, ‘patch_distance’. :param topk: number of connections per image

distance_calc_wsi(features=None, coordinates=None)[source]

This function is used to calculate distance between features. :param features: the batch of features from the dataset :param coordinates: if the image(patch in the batch) has the coordinates specified, then the distance between can be calculated based on the coordinates :return: distance matrix between features of the batch of images with the shape of (num_img, num_img)

connection_calc(features, region_labels)[source]

This function is used to calculate the connection pairs between images for all the batches of dataset. :param features: flattened image from the batch of dataset :param region_labels: spacial information between patches used to calculate the distance between them (e.g. of the string ‘1Carcinoma_coord_39100_39573_patchnumber_98_xy_0_0.png’) :return: indecies of top k connections per each image in the batch (shape: (num_img*self.topk, 2))

construct_graph(features, img_ids, experiment_folder)[source]

This function is used to construct the graph for all the batches of dataset. This is called in the trainer function of SDCN model. :param features: flattened image from the batch of dataset :img_ids: :experiment_folder: :return: the adjacency matrix for one batch of data

Module contents