Whole Slide Image Set
The WSI set is structured in Training, Validation and Testing subsets. An .xlsx file is also provided, in which the label of the WSI (RoI), the corresponding patient ID and reference set (training/validation/test). Moreover, each subset is divided into three main groups:
Group_BT
currently contains sets of normal tissue images and of two histopathological distinct subtypes of benign breast lesions: Type_N, Type_PB, Type_UDH that include WSIs annotates as Normal (N), Pathological Benign (PB), Usual Ductal Hyperplasia (UDH), respectively;
Group_AT
includes Type_FEA and Type_ADH subsets containing, respectively, Flat Epithelial Atypia (FEA) and Atypical Ductal Hyperplasia (ADH) lesion subtypes;
Finally, Group_MT is divided in two subsets Type_DCIS and Type_IC including WSIs annotated as Ductal Carcinoma in Situ (DCIS) and Invasive Carcinoma (IC) lesion subtypes.
The Table 1 shows the distribution of the number of WSIs according to the groups/subtypes for the Training, Validation and Testing subsets.
Group_BT | Group_AT | Group_MT | |||||
---|---|---|---|---|---|---|---|
Type_N | Type_PB | Type_UDH | Type_FEA | Type_ADH | Type_DCIS | Type_IC | |
Training | 27 | 120 | 56 | 24 | 28 | 40 | 100 |
Validation | 10 | 11 | 9 | 6 | 8 | 9 | 12 |
Testing | 7 | 16 | 9 | 11 | 12 | 12 | 20 |
Whole-slide images are stored in the .svs file format as multi-resolution pyramid structures (the size of the highest resolution image can easily exceed 100,000 by 100,000 pixels). For some WSIs, a file in the .qpdata file format having the same filename of the WSI is provided for viewing the annotations inside the WSI.
Libraries and open source platforms that can open these file formats are listed in the Software page.
Regions of Interest Set
The RoI set follows the equivalent organization of the WSI set. The Table 2 shows the distribution of the number of RoIs according to the groups/subtypes for the Training, Validation and Testing subsets of the RoI set.
Group_BT | Group_AT | Group_MT | |||||
---|---|---|---|---|---|---|---|
Type_N | Type_PB | Type_UDH | Type_FEA | Type_ADH | Type_DCIS | Type_IC | |
Training | 357 | 714 | 389 | 624 | 387 | 665 | 521 |
Validation | 46 | 43 | 46 | 49 | 41 | 40 | 47 |
Testing | 81 | 79 | 82 | 83 | 79 | 85 | 81 |
The Regions of Interest are provided in .png file format. The filename of a RoI includes the filename of the corresponding WSI as well as the subtype of RoI (e.g. BRACS_010_PB_32.png is the RoI number 32, extracted from the WSI named BRACS_010.svs and labeled as Pathological Benign). The resolution of each RoI is 40× and its dimension can easily exceed 4,000 by 4,000 pixels.