domid.dsets package¶
Submodules¶
domid.dsets.a_dset_mnist_color_rgb_solo module¶
Color MNIST with single color
- class domid.dsets.a_dset_mnist_color_rgb_solo.ADsetMNISTColorRGBSolo(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]¶
Bases:
Dataset
- Color MNIST with single color
nominal domains: color palettes/range/spectrum
subdomains: color(foreground, background)
structure: each subdomain contains a combination of foreground+background color
- __init__(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]¶
- Parameters:
ind_color – index of a color palette
path – disk storage directory
color_scheme – num (paint according to number), back (only paint background), both (background and foreground)
list_transforms – torch transformations
raw_split – default use the training part of mnist
flag_rand_color – flag if to randomly paint each image (depreciated)
label_transform – e.g. index to one hot vector
domid.dsets.dset_her2 module¶
- class domid.dsets.dset_her2.DsetHER2(class_num, path, d_dim, inject_variable=None, metadata_path=None, transform=None)[source]¶
Bases:
Dataset
Dataset of HER2 stained digital microscopy images. As currently implemented, the subdomains are the HER2 diagnostic classes 1, 2, and 3. There are also 4 data collection site/machine combinations.
- __init__(class_num, path, d_dim, inject_variable=None, metadata_path=None, transform=None)[source]¶
- Parameters:
class_num – a integer value from 0 to 2, only images of this class will be kept. Note: that actual classes are from 1-3 (therefore, 1 is added in line 28)
path – path to data storage directory (typically passed through args.dpath)
d_dim – number of clusters for the clustering task
inject_variable – name of the variable to be injected for CDVaDE
metadata – path to the CSV file containing the to-be-injected variable for CDVaDE (typecally passed through args.meta_data_csv); if not specified then defaults to “dataframe.csv” in directory given by the “path” argument
transform – torch transformations
domid.dsets.dset_mnist module¶
MNIST
- class domid.dsets.dset_mnist.DsetMNIST(digit, args, list_transforms=None, raw_split='train')[source]¶
Bases:
Dataset
MNIST Dataset Loading - subdomains: MNIST digit value - structure: each subdomain contains all images of a given digit
- __init__(digit, args, list_transforms=None, raw_split='train')[source]¶
- Parameters:
digit – a integer value from 0 to 9; only images of this digit will be kept.
path – disk storage directory
subset_step – used to subsample the dataset; a fraction of 1/subset_step images is kept
list_transforms – torch transformations
raw_split – default use the training part of mnist
domid.dsets.dset_mnist_color_solo_default module¶
- class domid.dsets.dset_mnist_color_solo_default.DsetMNISTColorSoloDefault(ind_color, path, subset_step=100, color_scheme='both', label_transform=<function mk_fun_label2onehot.<locals>.fun_label2onehot>, list_transforms=None, raw_split='train', flag_rand_color=False, inject_variable=None, args=None)[source]¶
Bases:
ADsetMNISTColorRGBSolo
- property palette¶
domid.dsets.dset_unittest module¶
domid.dsets.dset_usps module¶
domid.dsets.dset_wsi module¶
- class domid.dsets.dset_wsi.DsetWSI(class_num, path, args, path_to_domain=None, transform=None)[source]¶
Bases:
Dataset
Dataset of WEAH stained digital microscopy images. As currently implemented, the subdomains are the HER2 diagnostic classes 1, 2, and 3. There are also 4 data collection site/machine combinations.
- __init__(class_num, path, args, path_to_domain=None, transform=None)[source]¶
- Parameters:
class_num – a integer value from 0 to 2, only images of this class will be kept.Note: that actual classes are from 1-3 (therefore, 1 is added in line 28)
path – path to root storage directory
d_dim – number of clusters for the clustering task
path_to_domain – if inject previously predicted domain labels, the path needs to be specified.domain_labels.txt must be inside the directory, containing to-be-injected labels.
transform – torch transformations
domid.dsets.generate_dataset_dataframe_her2 module¶
domid.dsets.make_graph module¶
- class domid.dsets.make_graph.GraphConstructor(graph_method, topk=7)[source]¶
Bases:
object
Class to construct graph from features. This is only used in training for SDCN model.
- __init__(graph_method, topk=7)[source]¶
Initializer of GraphConstructor. :param graph_method: the method to calculate distance between features; one of ‘heat’, ‘cos’, ‘ncos’. :param topk: number of connections per image
- sparse_mx_to_torch_sparse_tensor(sparse_mx)[source]¶
Convert a scipy sparse matrix to a torch sparse tensor.
- get_features_labels(dataset)[source]¶
This funciton is used to get features and labels from dataset. :param dataset: Image dataset that can be batched or unbatched :return: X: features from the image (flattened images), labels: domain labels, region_labels: region labels if the dataset is WSI images
- normalize(mx)[source]¶
Row-normalize sparse matrix which is used to calculate the distance for normalized cosine method. :param mx: sparse matrix :return: row-normalized sparse matrix
- distance_calc(features)[source]¶
This function is used to calculate distance between features. :param features: the batch of features from the dataset :return: distance matrix between features of the batch of images with the shape of (num_img, num_img)
- connection_calc(features)[source]¶
This function is used to calculate the connection pairs between images for all the batches of dataset. :param features: flattened image from the batch of dataset :return: indecies of top k connections per each image in the batch (shape: (num_img*self.topk, 2))
- mk_adj_mat(n, connection_pairs)[source]¶
This function is used to make the adjacency matrix for the graph for each batch of dataset. :param n: batchsize :param connection_pairs: top k connections per each image in the batch (shape: (num_img*self.topk, 2)) :return:
- construct_graph(dataset, experiment_folder)[source]¶
This function is used to construct the graph for all the batches of dataset. This is called in the trainer function of SDCN model. :param dataset: dataset contraining all the batches of data (or no batched data) :param graph_method: graph construction method :return: the adjacency matrix for all the batches of data
domid.dsets.make_graph_wsi module¶
- class domid.dsets.make_graph_wsi.GraphConstructorWSI(graph_method, topk=7)[source]¶
Bases:
GraphConstructor
Class to construct graph from features from WSI images. This is only used in training for SDCN model and for WSI dataset.
- __init__(graph_method, topk=7)[source]¶
Initializer of GraphConstructor. :param graph_method: the method to calculate distance between features; one of ‘heat’, ‘cos’, ‘ncos’, ‘patch_distance’. :param topk: number of connections per image
- distance_calc_wsi(features=None, coordinates=None)[source]¶
This function is used to calculate distance between features. :param features: the batch of features from the dataset :param coordinates: if the image(patch in the batch) has the coordinates specified, then the distance between can be calculated based on the coordinates :return: distance matrix between features of the batch of images with the shape of (num_img, num_img)
- connection_calc(features, region_labels)[source]¶
This function is used to calculate the connection pairs between images for all the batches of dataset. :param features: flattened image from the batch of dataset :param region_labels: spacial information between patches used to calculate the distance between them (e.g. of the string ‘1Carcinoma_coord_39100_39573_patchnumber_98_xy_0_0.png’) :return: indecies of top k connections per each image in the batch (shape: (num_img*self.topk, 2))
- construct_graph(features, img_ids, experiment_folder)[source]¶
This function is used to construct the graph for all the batches of dataset. This is called in the trainer function of SDCN model. :param features: flattened image from the batch of dataset :img_ids: :experiment_folder: :return: the adjacency matrix for one batch of data