Evaluation:

Classical semantic segmentation metrics in this case, the Dice Score (DSC) and the Average Symmetric Surface Distance(ASSD), will be used to assess different aspects of the performance of the region of interest. These metrics are implemented here. The metrics (DSC, ASSD) were chosen because of their simplicity, their popularity, their rank stability, and their ability to assess the accuracy of the predictions.

Participating teams are ranked for each target testing subject and for each measure (i.e., DSC and ASSD). The final ranking score for each team is then calculated by first averaging across all these individual rankings for each patient (i.e., Cumulative Rank), and then averaging these cumulative ranks across all patients for each participating team.