📈 Evaluate Performance

Focus question: How accurate and reliable are my model’s predictions?

1 Performance Metrics

Model Error (Training)

(Regression accuracy metrics)
Metric output_1 output_2
0 R2 0.99 0.94
1 RMSE 4.09 8.69
2 MAE 2.08 5.56
3 MSE 16.74 75.48
4 MEDAE 0.68 3.67
Table 1: Table of regression accuracy metrics calculated on the training dataset.

Model Uncertainty (Training)

(Uncertainty calibration metrics)
Metric output_1 output_2
5 PICP 1.00 1.00
6 MPIW 28.91 77.42
Table 2: Table of uncertainty calibration metrics calculated on the training dataset.

Model Error (Cross-validated)

(Regression accuracy metrics with confidence intervals)
Metric output_1 output_2
0 R2 0.614 ± 0.560 0.668 ± 0.715
1 RMSE 19.284 ± 18.748 18.170 ± 19.353
2 MSE 451.914 ± 672.890 416.823 ± 862.163
3 MAE 12.498 ± 11.715 12.105 ± 10.642
4 MEDAE 3.223 ± 4.098 6.068 ± 4.849
Table 3: Table of regression accuracy metrics calculated using cross-validation.

Model Uncertainty (Cross-validated)

(Uncertainty calibration metrics with confidence intervals)
Metric output_1 output_2
5 PICP 0.828 ± 0.745 0.903 ± 0.585
6 MPIW 44.807 ± 70.132 79.239 ± 74.415
Table 4: Table of uncertainty calibration metrics calculated using cross-validation.

2 Residual Analysis

Residuals Scatter by Predicted Value

(Residuals plotted against predicted values)
(a) output_1
(b) output_2

Figure 1: Scatter plot showing residuals plotted against predicted values. This plot helps identify patterns or biases in the residuals, such as systematic errors or heteroscedasticity, which can indicate issues with model predictions.

Residual Distribution

(KDE of residuals)
(a) output_1
(b) output_2

Figure 2: Density curve of residuals (observed − predicted) with rug marks to show individual errors; used to assess bias, spread, and normality of prediction errors.

3 Model Uncertainty

Observed vs. Predicted (Training)

(Model predictions compared to true values with posterior uncertainty)
(a) output_1
(b) output_2

Figure 3: Observed vs. Predicted plot with error bars (posterior standard deviation) for the training dataset.

Observed vs. Predicted (Cross-validated)

(Model predictions compared to true values with posterior uncertainty)
(a) output_1
(b) output_2

Figure 4: Observed vs. Predicted plot with error bars (posterior standard deviation) using cross-validation.

Prediction Intervals (Training)

(Predictions ordered by output with confidence intervals)

Prediction Intervals (Training)

(Predictions ordered by output with confidence intervals)
(a) output_1
(b) output_2

Figure 5: Prediction intervals plot with confidence bounds calculated on the training dataset.

Prediction Intervals (Cross-validated)

(Predictions ordered by output with confidence intervals)
(a) output_1
(b) output_2

Figure 6: Prediction intervals plot with confidence bounds calculated using cross-validation.

Prediction Uncertainty Distribution

(KDE of predictive uncertainty)
(a) output_1
(b) output_2

Figure 7: Density curve of model predictive uncertainty values (e.g., standard deviation or interval half-width) with rug marks; summarizes how confident the model is across predictions.

Uncertainty Calibration

(Predicted vs. Observed Confidence)
(a) output_1
(b) output_2

Figure 8: Calibration curve comparing nominal coverage of symmetric prediction intervals (mean ± z·σ) to the empirical proportion of observed values within those intervals. The shaded area quantifies total miscalibration.