bias.myti.report GUI¶
- class src.AboutWindow¶
Class for the “about me” window. Includes GitHub link, version number and other necessary infomation.
- load_GUI()¶
Creates widget components for the window.
- class src.ClickLabel¶
Clickable QLabel object.
- mousePressEvent(event)¶
Mouse click event for the QLabel.
- class src.FinalPage(parent, page_number=3)¶
Class to build the third page (report display). This page show bias results with figures and corrsponding text description. Allows user to save report in .png or .pdf format.
- UIComponents()¶
Widgets for the page.
- check_conditions()¶
Sanity check if the third page can be appropriately loaded.
- fig_select(figure_number)¶
Update the selected figure by user.
- run_background()¶
Generate figures and descriptions.
- save_fig()¶
Save the figure and description.
- class src.InitialPage(parent)¶
Class for the first page (user input) of the tool. This page allows users to upload .csv data file, specify bias amplification type and study type.
- Parameters:
parent – The parent widget.
- UIComponents()¶
Creates the widgets for the page.
- approach_type()¶
Update bias amplification type selected by user.
- study_update()¶
Update study type selected by user.
- upload_csv()¶
Get the csv file from user input, and display amplification type selection.
- class src.MainWindow(*args, **kwargs)¶
Class for the main window including main pages, logo, side menus, navigation bars.
- about_info()¶
show about me window
- change_page(page_number: int, *args, **kwargs)¶
Naviagte between the three pages.
- load_GUI()¶
Creates widgets for the main window.
Add naviagtion buttons.
- make_sidebar()¶
Set up side menu bars.
- next_page()¶
Naviagte to the next page.
- prev_page()¶
Navigate to the previous page.
- set_pages()¶
Set up the three pages.
Adjust page navigation buttons for each page.
- class src.Page(parent, page_number)¶
Parent class for all the pages.
- UIComponents()¶
Placeholder for child classes.
- check_conditions(*conditions)¶
Conditions are variables that must be input in order for the page to load. This is the default for page(s) with no conditions.
- load()¶
Run background processes and load UI components, provided the conditions are met.
- run_background()¶
Runs any background work that needs to be run before the page can be loaded. This is the default for page(s) with no background work.
- class src.SecondPage(parent)¶
Class to build the second page (variable selection). This page allows user to select columns corresponding to variables required for bias report. Additional information is provided for the user for better clarification.
- Parameters:
parent – The parent widget.
- UIComponents()¶
Creates the widgets for the second page.
- add_info()¶
Update additional information for the variable clicked by user.
- check_boxes()¶
Update selected variables.
- check_conditions()¶
Sanity check to decide if the second page can be appropriately loaded.
- get_columns()¶
Get the list of columns from the input csv file.
- src.bias_plots_generation(variables, csv_path, exp_type, study_type, colors: list, set_colors: dict, name_mapping: dict)¶
Function to load inputs from the user, generate report figures and descriptions.
- Parameters:
variables – dictionary that contains user specified column names for every variables
csv_path – path for the csv file which contains the data
exp_type – string to indicate the bias amplification type
study_type – string to indicate the study type
colors – Color palette.
set_colors – Colors assigned to a specific subgroup.
name_mapping – Maps subgroup labels to names in the plot legend.
- Returns:
m_list (obj:list) – list contains plotted metrics
info_list (obj:list) – list contains report description text corresponding to each metric
- src.calculate_CI(df, mean_col='Mean', std_col='Std', confidence_level=0.95, sample_size=25)¶
Function to calculate confidence interval according to standard deviation, confidence level and sample size.
- Parameters:
df – dataframe that contains the data to calculate CI
mean_col – name for column with mean value
std_col – name for column with standard deviation
confidence_level – confidence level, currently support 0.9, 0.95 and 0.99
size (sample) – number of samples for each experiments in the data
- Returns:
dataframe that adds computed lower and upper confidence interval bound
- Return type:
pandas.DataFrame
- src.create_report(metrics, img_path, study_type, exp_type, save_path)¶
Create and save the report.
- Parameters:
metrics – list contains metrics to save in the report
img_path – list contains paths for the saved figures
study_type – string to indicate study type
exp_type – string to indicate bias amplification type
save_path – path to save the final report
- src.figure_plotting(data, x_col, s_col, hue_col, study_type, ylim=(0, 1), y_label=None, x_label=None, style_col=None, style_dict={}, color_dict={}, mean_col='Mean', lower_CI_col='lower_CI', upper_CI_col='upper_CI', plot_section=[], name_map={})¶
Function to generate subplots with input plot sections and parameters.
- Parameters:
data – dataframe that contains the data for plotting
x_col – name for column that contains x-axis ticks
s_col – name for column that contains sub-sections in the figure
hue_col – name for column that contains subgroups mapped with different colors during plotting
ylim – set the y-limits for all axes
y_label – set the y label name
x_label – set the x label name
style_col – name for column that determine line styles by positive-associated subgroup
style_dict – dictionary that determines plotting style
color_dict – dictionary that determins plotting colors
mean_col – name for column that contains metric mean value
lower_CI_col – name for column that contains lower bound of confidence interval
upper_CI_col – name for column that contains upper bound of confidence interval
plot_section – list that has all the sub-sections for plotting (including legends section)
name_map – Maps subgroup labels to display names in the plot legend.
- src.on_page(canvas, doc, pagesize=(612.0, 792.0))¶
Set up page header and footer.
Bias Amplification Implementation¶
- class src.utils.Dataset(list_file, train_flag=True, default_out_class='Yes', default_patient_id='patient_id', default_path='Path')¶
Class for customized dataset
- Parameters:
list_file (str) – The file path to the file containing the input information; to be read into a pandas.DataFrame.
train_flag (bool) – Training process indicator.
default_out_class (str) – The name of the column in list_file that indicates the model’s output class.
default_patient_id (str) – The name of the column in list_file that indicates the sample’s patient id.
default_path (str) – The name of the column in list_file that indicates the sample’s file path.
- src.utils.adjust_comp(in_df, random_seed, split_frac=1, split_num=None)¶
Adjusts the composition of in_df to match the composition specified.
- Parameters:
in_df (pandas.DataFrame) – input dataframe (by-patient).
random_seed (int) – The random state to use when selecting patients in each subgroup.
split_frac (float) – The portion of the available data to return.
split_num (int) – The number of patients overall to return.
- Returns:
Dataframe adjusted to match the specified composition.
- Return type:
pandas.DataFrame
- src.utils.adjust_subgroups(in_df)¶
Adjusts subgroup information displayed in dataframe to only reflect attributes that are specified as import in equal_stratification_groups.
- Parameters:
in_df (pandas.DataFrame) – The sample data to be balanced.
- Returns:
The sample data with filtered subgroups.
- Return type:
pd.DataFrame
- src.utils.analysis(args)¶
Main script to load test results, measure bias and do plotting.
- Parameters:
args (argparse.Namespace) – The input arguments to the python script.
- src.utils.apply_custom_transfer_learning__resnet18(net)¶
Set the ResNet18 model to freeze first certain number of layers.
- Parameters:
net – Pytorch model to freeze.
- Return type:
net
- src.utils.bootstrapping(args)¶
Partitioning patient data into train, validation, validation_2 and testing. In each dataset, patient can be partitioned equally or by customized ratio in race, sex, image modality and COVID.
- Parameters:
args (argparse.Namespace) – The input arguments to the python script.
- src.utils.convert_from_summary(df, conversion_table, min_img, max_img, selection_mode, random_state)¶
Create overall dataframe from json to csv format, and sample the images per patient according to image selection inputs.
- Parameters:
df (pandas.DataFrame) – The data read from the summary json file.
conversion_table (pandas.DataFrame) – The information from the data_conversion json file.
min_img (int) – The minimum number of images to be included per patient.
max_img (int) – The maximum number of images to be included per patient.
selection_mode (str) – Specifies how to select images for patients.
random_state (int) – The random state used during patient selection.
- Returns:
Patient information and the selected images for each patient.
- Return type:
pandas.DataFrame
- src.utils.convert_to_csv(df, tasks)¶
Convert the partitioned datasets to save as csv files.
- Parameters:
df (pandas.DataFrame) – The dataset information to be saved.
tasks (list) – The list of classes which the model will be trained to classify.
- Returns:
The input dataset in a training-friendly format.
- Return type:
pandas.DataFrame
- src.utils.ethnicity_lookup(ethnicity_info)¶
Converts the patient’s ethnicity information to a standard format.
- Parameters:
ethnicity_info – The patient’s ethnicity info.
- Returns:
ethnicity_info – The ethnicity information in standardized terminology.
- Return type:
str
- src.utils.get_stats(df_dict)¶
Generate statistical summary of partitioned data, including subgroup information.
- Parameters:
df_dict (dict) – Maps the datasets (keys) to data csv files (values).
- Returns:
Summary that contains number of patients/images in each dataset.
- Return type:
pandas.DataFrame
- src.utils.get_subgroup(row)¶
Get subgroup strings.
- Parameters:
row (pandas.Series) – A row from the DataFrame (representing a single patient).
- Returns:
The patient’s subgroup.
- Return type:
str
- src.utils.inference_onnx(args)¶
Main script to run model deployment.
- Parameters:
args (argparse.Namespace) – The input arguments to the python script.
- src.utils.info_pred_mapping(info: DataFrame, pred: DataFrame) DataFrame ¶
Map patient attributes information (e.g. sex, race) to prediction score and labels according to the patient id
- Parameters:
info – Dataframe contains patient attributes info.
pred – Dataframe contains model prediction scores.
- Returns:
Dataframe combines patient attributes and predictions.
- Return type:
pandas.DataFrame
- src.utils.load_custom_checkpoint(ckpt_path, base_dcnn, gpu_ids, num_channels)¶
Load customized pre-trained models with given weight file.
- Parameters:
ckpt_path (str) – File path of the pre-trained model.
base_dcnn (str) – Name of the model architecture (e.g. resnet18, densenet121).
gpu_ids (int) – Current GPU ID.
num_channels (int) – Number of channels (classes).
- Returns:
Loaded pre-trained model.
- Return type:
model
- src.utils.manufacturer_lookup(manufacturer_info)¶
Converts the sample’s manufacturer information to a standard format.
- Parameters:
manufacturer_info – The sample’s manufacturer info.
- Returns:
manufacturer_info – The manufacturer information in standardized terminology.
- Return type:
str
- src.utils.metric_calculation(result_df, info_pred, test_list, positive_group, prev_diff, threshold=0.5)¶
Calculate performance and bias measurements for subgroups.
- Parameters:
info_pred – Dataframe contains patient attributes and predictions.
test_list – List of subgroups for bias measurements calculation.
output_file – File path to store calculated performance and bias measurements.
threshold – Threshold
- Returns:
Dataframe contains calculated performance and bias measurements.
- Return type:
pandas.DataFrame
- src.utils.modify_classification_layer_v1(model, num_channels)¶
Modify the last fully connected layer to have given output size.
- Parameters:
model – Pytorch model to be modified.
num_channels (int) – Number of outputs for the fully connected layer (the number of classes).
- Returns:
Modified model.
- Return type:
model
- src.utils.prevent_data_leakage(base_df, df_list: list)¶
Prevent duplicate patient/images in different datasets
- Parameters:
base_df (pandas.DataFrame) – A dataframe containing all of the patient information.
df_list (list) – A list of the dataframes of the existing partitions.
- Returns:
A dataframe containing all of the samples that are not already contained in an existing partition.
- Return type:
pandas.DataFrame
- src.utils.process_convert_dicom_to_jpeg(args)¶
Convert and crop dicom files and save as jpeg files, where the name of the jpeg is patient_id_#.
- Parameters:
args (argparse.Namespace) – The input arguments to the python script; includes the input summary file and the directory in which to save the converted jpegs.
- src.utils.process_convert_image_loop(img_info: list)¶
Reads a dicom file, does histogram equalization, crops the image and saves as jpeg file.
- Parameters:
img_info – List contains file path for the dicom file, and file path to store resulted jpeg file.
- src.utils.race_lookup(race_info)¶
Converts the patient’s recorded race to a standard format.
- Parameters:
race_info – The patient’s recorded race.
- Returns:
race_info – The race information in standardized terminology.
- Return type:
str
- src.utils.read_jpg(imageName)¶
Reads jpg image and applies the transforms listed in the file (default=rotation).
- Parameters:
imageName (str) – The file path of the input image
- Return type:
Tensor
- src.utils.read_open_A1(args)¶
Filters and summarizes the downloaded Open-A1 data set using the imaging data and the associated MIDRC tsv files. Summary file will be save as a json file.
- Info on supporting files:
…all_Cases.tsv: get patient-level info (submitter_id, sex, age, race, COVID_status) …all_Imaging_Series.tsv: get image series info
- Parameters:
args (argparse.Namespace) – A collection of the input arguments to the python script; includes input and output file names.
- src.utils.results_plotting(data, x_col, x_label, hue_col, style_col, s_col, value_col, exp_type, color_dict={'F': '#dd337c', 'M': '#0fb5ae'}, style_dict={'F': '-', 'M': '--'}, y_lim=None)¶
Generates subplots based on input plot sections and parameters.
- Parameters:
data – dataframe that contains the data for plotting
x_col – name for column that contains x-axis ticks
x_label – set the x label name
hue_col – name for column that contains subgroups mapped with different colors during plotting
style_col – name for column that determine line styles by positive-associated subgroup
s_col – name for column that contains sub-sections in the figure
value_col – name for column that contains metric value
exp_type – indicate which bias amplification approach
style_dict – dictionary that determines plotting style
color_dict – dictionary that determins plotting colors
y_lim – set range for y axis according to metric
- src.utils.run_deploy_onnx(data_loader, args)¶
Deploys the model on the validation data loader, calculates sample-based AUC and saves the scores in a tsv file.
- Parameters:
data_loader (torch.utils.data.DataLoader) – The validation dataloader.
args (argparse.Namespace) – The input arguments to the python script.
- src.utils.run_train(train_loader, model, criterion, optimizer, epoch_progress_bar=None)¶
Function that runs the training
- Parameters:
train_loader (torch.utils.data.DataLoader) – Loaded training data set.
model – The model to be trained.
criterion – Loss function to be minimized.
optimizer – Training optimizer.
epoch_progress_bar (ipywidgets.IntProgress) – (Optional) The progress bar which displays the progress of the current epoch in the jupyter notebook.
- Returns:
Average loss for the current epoch.
- Return type:
float
- src.utils.run_validate(val_loader, model, args)¶
Deploys the model on the input data loader, calculates sample based AUC and saves the scores in a tsv file.
- Parameters:
val_loader (torch.utils.data.DataLoader) – Validation data.
model – The model to deploy.
args (argparse.Namespace) – Input arguments to the python script.
- Returns:
The calculated AUC value.
- Return type:
float
- src.utils.save_checkpoint(state, filename='checkpoint.pth.tar')¶
Save the model checkpoint.
- Parameters:
state – The checkpoint to save.
filename (str) – The file name with which to the checkpoint.
- src.utils.save_to_file(data: list, filepath: str)¶
Saves the list of dictionaries to a json file, with each dictionary on its own line. Created to be used in read_open_A1.
- Parameters:
data – A list of dictionaries, to be saved in json format.
filepath – The json file in which to save the dictionaries
- src.utils.train(args)¶
Main script for model training, including model selection, pre-trained weights loading, training/validation data loading, and model fine-tuning.
- Parameters:
args (argparse.Namespace) – The input arguments to the python script.
- src.utils.train_split(args)¶
Manipulate prevalence in race or sex subgroups. The given subgroup will be sampled to the specified prevalence, while the opposite subgroup will be sampled to (1-prevalence).
- Parameters:
args (argparse.Namespace) – The input arguments to the python script.