ML Assessment

assessment.uncertainty module

Title: ValidPath Toolbox - Uncertainty Analysis module

Description: This is the Uncertainty Analysis module of the ValidPath toolbox. It is includes Uncertainty_Analysis class and several methods

Classes: Uncertainty_Analysis

Methods: get_report, auc_keras_, ci_, Delong_CI, compute_midrank, compute_midrank_weight, calc_pvalue, compute_ground_truth_statistics, delong_roc_variance, bootstrapping

class assessment.uncertainty.Uncertainty_Analysis[source]

Bases: object

Delong_CI(y_pred, y_truth)[source]

A Python implementation of an algorithm for computing the statistical significance of comparing two sets of predictions by ROC AUC. Also can compute variance of a single ROC AUC estimate. X. Sun and W. Xu, “Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Operating Characteristic Curves,” in IEEE Signal Processing Letters, vol. 21, no. 11, pp. 1389-1393, Nov. 2014, doi: 10.1109/LSP.2014.2337313.

Parameters:: y_truth: ground_truth - np.array of 0 and 1 y_pred: predictions - np.array of floats of the probability of being class 1
Returns:: auc, ci, lower_upper_q, auc_cov, auc_std

auc_keras_(fpr_keras, tpr_keras)[source]

Estimates confidence interval for Bernoulli p

Parameters:: fpr_keras: False Positive Rate Values tpr_keras: True Positive Rate Values
Returns:: AUC: Area Under the ROC Curve

bootstrapping(y_true, y_pred)[source]

Computes ROC AUC variance for a single set of predictions

Parameters:: ground_truth: np.array of 0 and 1 predictions: np.array of floats of the probability of being class 1

calc_pvalue(aucs, sigma)[source]

Computes log(10) of p-values.

Parameters:: aucs: 1D array of AUCs sigma: AUC DeLong covariances
Returns:: log10(pvalue)

ci_(tp, n, alpha=0.05)[source]

Estimates confidence interval for Bernoulli p

Parameters:: tp: number of positive outcomes, TP in this case n: number of attemps, TP+FP for Precision, TP+FN for Recall alpha: confidence level
Returns:: Tuple[float, float]: lower and upper bounds of the confidence interval

compute_ground_truth_statistics(ground_truth, sample_weight)[source]

compute_midrank(x)[source]

Computes midranks.

Parameters:: x - a 1D numpy array
Returns:: array of midranks

compute_midrank_weight(x, sample_weight)[source]

Computes midranks.

Parameters:: x - a 1D numpy array
Returns:: array of midranks

delong_roc_variance(ground_truth, predictions, sample_weight=None)[source]

Computes ROC AUC variance for a single set of predictions

Parameters:: ground_truth: np.array of 0 and 1 predictions: np.array of floats of the probability of being class 1

fastDeLong(predictions_sorted_transposed, label_1_count, sample_weight)[source]

fastDeLong_no_weights(predictions_sorted_transposed, label_1_count)[source]

The fast version of DeLong’s method for computing the covariance of unadjusted AUC.

Parameters:

predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]: sorted such as the examples with label “1” are first

Returns:

(AUC value, DeLong covariance)

Reference:

@article{sun2014fast,: title={Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Oerating Characteristic Curves}, author={Xu Sun and Weichao Xu}, journal={IEEE Signal Processing Letters}, volume={21}, number={11}, pages={1389–1393}, year={2014}, publisher={IEEE}

}

fastDeLong_weights(predictions_sorted_transposed, label_1_count, sample_weight)[source]

The fast version of DeLong’s method for computing the covariance of unadjusted AUC.

Parameters:

predictions_sorted_transposed: a 2D numpy.array[n_classifiers, n_examples]: sorted such as the examples with label “1” are first

Returns:

(AUC value, DeLong covariance)

Reference:

@article{sun2014fast,: title={Fast Implementation of DeLong’s Algorithm for Comparing the Areas Under Correlated Receiver Oerating Characteristic Curves}, author={Xu Sun and Weichao Xu}, journal={IEEE Signal Processing Letters}, volume={21}, number={11}, pages={1389–1393}, year={2014}, publisher={IEEE}

}

get_report(y_pred, y_truth)[source]

This method recieve the machine learning prediction output and the ground truth and report several metrics. This is the main metod of the Uncertainty_Analysis class which calls other methods to procude results.

Parameters:: y_truth: ground_truth - np.array of 0 and 1 y_pred: predictions - np.array of floats of the probability of being class 1
Returns:: precision Precision Conficenc Interval Recall Recall Conficenc Interval AUC based on delong method and its Conficenc Interval and COV False Positive Rate True Positive Rate AUC Confusion Matrix