Baselines - Shifts Challenge 2022

Note: For both tasks, we place a hard limitation on the compute which solutions can use.The proposed solution should yield predictions within 800ms per 1 input sample. Submitted solutions which break these limitations will not be considered in the final scoring. See Rules for more info,

Track 1¶

Link to GitHub repo

We provide a 3D UNET baseline based on the work by [La Rosa et al., 2020].

The model is trained for 300 epochs with early-stopping on Dev-in. At training time, 32 patches of 96x96x96 voxels are sampled from the centre of a lesion in the input volume. At inference time, overlapping patches (by 25%) are selected across the whole 3D volume. Gaussian weighted averaging is used for the final prediction of each voxel belonging to multiple patches.

The segmentation map is obtained by thresholding the predicted probabilities by a threshold tuned on Dev-in. Deep ensembles are formed by averaging the output probabilities of 3 distinct models trained with different seed initialisation.

As each single model yields a per-voxel probabilistic prediction, ensemble-based uncertainty measures [Malinin, 2019; Malinin & Gales] are available for uncertainty quantification. Our ensembled models use reverse mutual information [Malinin et al., 2021] as the choice of uncertainty measure.

Track 2¶

Link to GitHub repo

We provide a deep ensemble of 10 Monte-Carlo dropout neural networks as a baseline. The ensemble method yields both improvement in robustness and interpretable estimates of uncertainty. Each ensemble member predicts the parameters mean and standard deviation of the conditional normal distribution over the target (power) given the input features.

The deep ensemble members have the same architecture: 2 hidden layers with 50 and 20 nodes and softplus activation function. The output layer has 2 nodes and a linear activation function. To satisfy the constraint of positive standard deviation the second output is fed through a softplus function and a constant 10^(-6) is added for numerical stability as proposed by Lakshminarayanan et al., 2017. During inference we sample 10 times each member of the ensemble (100 samples in total) to estimate the epistemic uncertainty. For optimization, we use the negative log likelihood loss function and the Adam optimizer with a learning rate of 10^(-4). The number of epochs is defined by early stopping, monitoring the mean absolute error (MAE) of the dev_in set. The models are implemented in PyTorch.

During inference we sample 10 times each member of the ensemble (100 samples in total) to estimate the epistemic uncertainty. The variance of the predicted means across the members of the ensemble corresponds to the epistemic uncertainty and the mean of the predicted variances across the members is a measure of aleatoric uncertainty [Malinin et al., 2021]. As a measure of uncertainty we use the total uncertainty, that is the sum of the epistemic and aleatoric uncertainty.

References¶

[Lakshminarayanan et al., 2017] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles,” in Proc. Conference on Neural Information Processing Systems (NIPS), 2017.

[Malinin et al., 2021] Andrey Malinin, Neil Band, Yarin Gal, Mark Gales, Alexander Ganshin, German Chesnokov, Alexey Noskov, Andrey Ploskonosov, Liudmila Prokhorenkova, Ivan Provilkov, Vatsal Raina, Vyas Raina, Denis Roginskiy, Mariya Shmatova, Panagiotis Tigas, and Boris Yangel, “Shifts: A dataset of real distributional shift across multiple large-scale tasks,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.