bias.myti.report GUI

class src.AboutWindow

Class for the “about me” window. Includes GitHub link, version number and other necessary infomation.

load_GUI()

Creates widget components for the window.

class src.ClickLabel

Clickable QLabel object.

mousePressEvent(event)

Mouse click event for the QLabel.

class src.FinalPage(parent, page_number=3)

Class to build the third page (report display). This page show bias results with figures and corrsponding text description. Allows user to save report in .png or .pdf format.

UIComponents()

Widgets for the page.

check_conditions()

Sanity check if the third page can be appropriately loaded.

fig_select(figure_number)

Update the selected figure by user.

run_background()

Generate figures and descriptions.

save_fig()

Save the figure and description.

class src.InitialPage(parent)

Class for the first page (user input) of the tool. This page allows users to upload .csv data file, specify bias amplification type and study type.

Parameters:

parent – The parent widget.

UIComponents()

Creates the widgets for the page.

approach_type()

Update bias amplification type selected by user.

study_update()

Update study type selected by user.

upload_csv()

Get the csv file from user input, and display amplification type selection.

class src.MainWindow(*args, **kwargs)

Class for the main window including main pages, logo, side menus, navigation bars.

about_info()

show about me window

change_page(page_number: int, *args, **kwargs)

Naviagte between the three pages.

load_GUI()

Creates widgets for the main window.

make_navbar()

Add naviagtion buttons.

make_sidebar()

Set up side menu bars.

next_page()

Naviagte to the next page.

prev_page()

Navigate to the previous page.

set_pages()

Set up the three pages.

update_navbar()

Adjust page navigation buttons for each page.

class src.Page(parent, page_number)

Parent class for all the pages.

UIComponents()

Placeholder for child classes.

check_conditions(*conditions)

Conditions are variables that must be input in order for the page to load. This is the default for page(s) with no conditions.

load()

Run background processes and load UI components, provided the conditions are met.

run_background()

Runs any background work that needs to be run before the page can be loaded. This is the default for page(s) with no background work.

class src.SecondPage(parent)

Class to build the second page (variable selection). This page allows user to select columns corresponding to variables required for bias report. Additional information is provided for the user for better clarification.

Parameters:

parent – The parent widget.

UIComponents()

Creates the widgets for the second page.

add_info()

Update additional information for the variable clicked by user.

check_boxes()

Update selected variables.

check_conditions()

Sanity check to decide if the second page can be appropriately loaded.

get_columns()

Get the list of columns from the input csv file.

src.bias_plots_generation(variables, csv_path, exp_type, study_type, colors: list, set_colors: dict, name_mapping: dict)

Function to load inputs from the user, generate report figures and descriptions.

Parameters:
  • variables – dictionary that contains user specified column names for every variables

  • csv_path – path for the csv file which contains the data

  • exp_type – string to indicate the bias amplification type

  • study_type – string to indicate the study type

  • colors – Color palette.

  • set_colors – Colors assigned to a specific subgroup.

  • name_mapping – Maps subgroup labels to names in the plot legend.

Returns:

  • m_list (obj:list) – list contains plotted metrics

  • info_list (obj:list) – list contains report description text corresponding to each metric

src.calculate_CI(df, mean_col='Mean', std_col='Std', confidence_level=0.95, sample_size=25)

Function to calculate confidence interval according to standard deviation, confidence level and sample size.

Parameters:
  • df – dataframe that contains the data to calculate CI

  • mean_col – name for column with mean value

  • std_col – name for column with standard deviation

  • confidence_level – confidence level, currently support 0.9, 0.95 and 0.99

  • size (sample) – number of samples for each experiments in the data

Returns:

dataframe that adds computed lower and upper confidence interval bound

Return type:

pandas.DataFrame

src.create_report(metrics, img_path, study_type, exp_type, save_path)

Create and save the report.

Parameters:
  • metrics – list contains metrics to save in the report

  • img_path – list contains paths for the saved figures

  • study_type – string to indicate study type

  • exp_type – string to indicate bias amplification type

  • save_path – path to save the final report

src.figure_plotting(data, x_col, s_col, hue_col, study_type, ylim=(0, 1), y_label=None, x_label=None, style_col=None, style_dict={}, color_dict={}, mean_col='Mean', lower_CI_col='lower_CI', upper_CI_col='upper_CI', plot_section=[], name_map={})

Function to generate subplots with input plot sections and parameters.

Parameters:
  • data – dataframe that contains the data for plotting

  • x_col – name for column that contains x-axis ticks

  • s_col – name for column that contains sub-sections in the figure

  • hue_col – name for column that contains subgroups mapped with different colors during plotting

  • ylim – set the y-limits for all axes

  • y_label – set the y label name

  • x_label – set the x label name

  • style_col – name for column that determine line styles by positive-associated subgroup

  • style_dict – dictionary that determines plotting style

  • color_dict – dictionary that determins plotting colors

  • mean_col – name for column that contains metric mean value

  • lower_CI_col – name for column that contains lower bound of confidence interval

  • upper_CI_col – name for column that contains upper bound of confidence interval

  • plot_section – list that has all the sub-sections for plotting (including legends section)

  • name_map – Maps subgroup labels to display names in the plot legend.

src.on_page(canvas, doc, pagesize=(612.0, 792.0))

Set up page header and footer.

Bias Amplification Implementation

class src.utils.Dataset(list_file, train_flag=True, default_out_class='Yes', default_patient_id='patient_id', default_path='Path')

Class for customized dataset

Parameters:
  • list_file (str) – The file path to the file containing the input information; to be read into a pandas.DataFrame.

  • train_flag (bool) – Training process indicator.

  • default_out_class (str) – The name of the column in list_file that indicates the model’s output class.

  • default_patient_id (str) – The name of the column in list_file that indicates the sample’s patient id.

  • default_path (str) – The name of the column in list_file that indicates the sample’s file path.

src.utils.adjust_comp(in_df, random_seed, split_frac=1, split_num=None)

Adjusts the composition of in_df to match the composition specified.

Parameters:
  • in_df (pandas.DataFrame) – input dataframe (by-patient).

  • random_seed (int) – The random state to use when selecting patients in each subgroup.

  • split_frac (float) – The portion of the available data to return.

  • split_num (int) – The number of patients overall to return.

Returns:

Dataframe adjusted to match the specified composition.

Return type:

pandas.DataFrame

src.utils.adjust_subgroups(in_df)

Adjusts subgroup information displayed in dataframe to only reflect attributes that are specified as import in equal_stratification_groups.

Parameters:

in_df (pandas.DataFrame) – The sample data to be balanced.

Returns:

The sample data with filtered subgroups.

Return type:

pd.DataFrame

src.utils.analysis(args)

Main script to load test results, measure bias and do plotting.

Parameters:

args (argparse.Namespace) – The input arguments to the python script.

src.utils.apply_custom_transfer_learning__resnet18(net)

Set the ResNet18 model to freeze first certain number of layers.

Parameters:

net – Pytorch model to freeze.

Return type:

net

src.utils.bootstrapping(args)

Partitioning patient data into train, validation, validation_2 and testing. In each dataset, patient can be partitioned equally or by customized ratio in race, sex, image modality and COVID.

Parameters:

args (argparse.Namespace) – The input arguments to the python script.

src.utils.convert_from_summary(df, conversion_table, min_img, max_img, selection_mode, random_state)

Create overall dataframe from json to csv format, and sample the images per patient according to image selection inputs.

Parameters:
  • df (pandas.DataFrame) – The data read from the summary json file.

  • conversion_table (pandas.DataFrame) – The information from the data_conversion json file.

  • min_img (int) – The minimum number of images to be included per patient.

  • max_img (int) – The maximum number of images to be included per patient.

  • selection_mode (str) – Specifies how to select images for patients.

  • random_state (int) – The random state used during patient selection.

Returns:

Patient information and the selected images for each patient.

Return type:

pandas.DataFrame

src.utils.convert_to_csv(df, tasks)

Convert the partitioned datasets to save as csv files.

Parameters:
  • df (pandas.DataFrame) – The dataset information to be saved.

  • tasks (list) – The list of classes which the model will be trained to classify.

Returns:

The input dataset in a training-friendly format.

Return type:

pandas.DataFrame

src.utils.ethnicity_lookup(ethnicity_info)

Converts the patient’s ethnicity information to a standard format.

Parameters:

ethnicity_info – The patient’s ethnicity info.

Returns:

ethnicity_info – The ethnicity information in standardized terminology.

Return type:

str

src.utils.get_stats(df_dict)

Generate statistical summary of partitioned data, including subgroup information.

Parameters:

df_dict (dict) – Maps the datasets (keys) to data csv files (values).

Returns:

Summary that contains number of patients/images in each dataset.

Return type:

pandas.DataFrame

src.utils.get_subgroup(row)

Get subgroup strings.

Parameters:

row (pandas.Series) – A row from the DataFrame (representing a single patient).

Returns:

The patient’s subgroup.

Return type:

str

src.utils.inference_onnx(args)

Main script to run model deployment.

Parameters:

args (argparse.Namespace) – The input arguments to the python script.

src.utils.info_pred_mapping(info: DataFrame, pred: DataFrame) DataFrame

Map patient attributes information (e.g. sex, race) to prediction score and labels according to the patient id

Parameters:
  • info – Dataframe contains patient attributes info.

  • pred – Dataframe contains model prediction scores.

Returns:

Dataframe combines patient attributes and predictions.

Return type:

pandas.DataFrame

src.utils.load_custom_checkpoint(ckpt_path, base_dcnn, gpu_ids, num_channels)

Load customized pre-trained models with given weight file.

Parameters:
  • ckpt_path (str) – File path of the pre-trained model.

  • base_dcnn (str) – Name of the model architecture (e.g. resnet18, densenet121).

  • gpu_ids (int) – Current GPU ID.

  • num_channels (int) – Number of channels (classes).

Returns:

Loaded pre-trained model.

Return type:

model

src.utils.manufacturer_lookup(manufacturer_info)

Converts the sample’s manufacturer information to a standard format.

Parameters:

manufacturer_info – The sample’s manufacturer info.

Returns:

manufacturer_info – The manufacturer information in standardized terminology.

Return type:

str

src.utils.metric_calculation(result_df, info_pred, test_list, positive_group, prev_diff, threshold=0.5)

Calculate performance and bias measurements for subgroups.

Parameters:
  • info_pred – Dataframe contains patient attributes and predictions.

  • test_list – List of subgroups for bias measurements calculation.

  • output_file – File path to store calculated performance and bias measurements.

  • threshold – Threshold

Returns:

Dataframe contains calculated performance and bias measurements.

Return type:

pandas.DataFrame

src.utils.modify_classification_layer_v1(model, num_channels)

Modify the last fully connected layer to have given output size.

Parameters:
  • model – Pytorch model to be modified.

  • num_channels (int) – Number of outputs for the fully connected layer (the number of classes).

Returns:

Modified model.

Return type:

model

src.utils.prevent_data_leakage(base_df, df_list: list)

Prevent duplicate patient/images in different datasets

Parameters:
  • base_df (pandas.DataFrame) – A dataframe containing all of the patient information.

  • df_list (list) – A list of the dataframes of the existing partitions.

Returns:

A dataframe containing all of the samples that are not already contained in an existing partition.

Return type:

pandas.DataFrame

src.utils.process_convert_dicom_to_jpeg(args)

Convert and crop dicom files and save as jpeg files, where the name of the jpeg is patient_id_#.

Parameters:

args (argparse.Namespace) – The input arguments to the python script; includes the input summary file and the directory in which to save the converted jpegs.

src.utils.process_convert_image_loop(img_info: list)

Reads a dicom file, does histogram equalization, crops the image and saves as jpeg file.

Parameters:

img_info – List contains file path for the dicom file, and file path to store resulted jpeg file.

src.utils.race_lookup(race_info)

Converts the patient’s recorded race to a standard format.

Parameters:

race_info – The patient’s recorded race.

Returns:

race_info – The race information in standardized terminology.

Return type:

str

src.utils.read_jpg(imageName)

Reads jpg image and applies the transforms listed in the file (default=rotation).

Parameters:

imageName (str) – The file path of the input image

Return type:

Tensor

src.utils.read_open_A1(args)

Filters and summarizes the downloaded Open-A1 data set using the imaging data and the associated MIDRC tsv files. Summary file will be save as a json file.

Info on supporting files:

…all_Cases.tsv: get patient-level info (submitter_id, sex, age, race, COVID_status) …all_Imaging_Series.tsv: get image series info

Parameters:

args (argparse.Namespace) – A collection of the input arguments to the python script; includes input and output file names.

src.utils.results_plotting(data, x_col, x_label, hue_col, style_col, s_col, value_col, exp_type, color_dict={'F': '#dd337c', 'M': '#0fb5ae'}, style_dict={'F': '-', 'M': '--'}, y_lim=None)

Generates subplots based on input plot sections and parameters.

Parameters:
  • data – dataframe that contains the data for plotting

  • x_col – name for column that contains x-axis ticks

  • x_label – set the x label name

  • hue_col – name for column that contains subgroups mapped with different colors during plotting

  • style_col – name for column that determine line styles by positive-associated subgroup

  • s_col – name for column that contains sub-sections in the figure

  • value_col – name for column that contains metric value

  • exp_type – indicate which bias amplification approach

  • style_dict – dictionary that determines plotting style

  • color_dict – dictionary that determins plotting colors

  • y_lim – set range for y axis according to metric

src.utils.run_deploy_onnx(data_loader, args)

Deploys the model on the validation data loader, calculates sample-based AUC and saves the scores in a tsv file.

Parameters:
  • data_loader (torch.utils.data.DataLoader) – The validation dataloader.

  • args (argparse.Namespace) – The input arguments to the python script.

src.utils.run_train(train_loader, model, criterion, optimizer, epoch_progress_bar=None)

Function that runs the training

Parameters:
  • train_loader (torch.utils.data.DataLoader) – Loaded training data set.

  • model – The model to be trained.

  • criterion – Loss function to be minimized.

  • optimizer – Training optimizer.

  • epoch_progress_bar (ipywidgets.IntProgress) – (Optional) The progress bar which displays the progress of the current epoch in the jupyter notebook.

Returns:

Average loss for the current epoch.

Return type:

float

src.utils.run_validate(val_loader, model, args)

Deploys the model on the input data loader, calculates sample based AUC and saves the scores in a tsv file.

Parameters:
  • val_loader (torch.utils.data.DataLoader) – Validation data.

  • model – The model to deploy.

  • args (argparse.Namespace) – Input arguments to the python script.

Returns:

The calculated AUC value.

Return type:

float

src.utils.save_checkpoint(state, filename='checkpoint.pth.tar')

Save the model checkpoint.

Parameters:
  • state – The checkpoint to save.

  • filename (str) – The file name with which to the checkpoint.

src.utils.save_to_file(data: list, filepath: str)

Saves the list of dictionaries to a json file, with each dictionary on its own line. Created to be used in read_open_A1.

Parameters:
  • data – A list of dictionaries, to be saved in json format.

  • filepath – The json file in which to save the dictionaries

src.utils.train(args)

Main script for model training, including model selection, pre-trained weights loading, training/validation data loading, and model fine-tuning.

Parameters:

args (argparse.Namespace) – The input arguments to the python script.

src.utils.train_split(args)

Manipulate prevalence in race or sex subgroups. The given subgroup will be sampled to the specified prevalence, while the opposite subgroup will be sampled to (1-prevalence).

Parameters:

args (argparse.Namespace) – The input arguments to the python script.