Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning

Soumen Basu^a Mayank Gupta^a Pratyaksha Rana^b Pankaj Gupta^b Chetan Arora^a

^a Indian Institute of Technology, Delhi
^b Post Graduate Institute of Medical Education and Research, Chandigarh

In this work, we explore the potential of CNN-based models for gallbladder cancer (GBC) detection from ultrasound sonography (USG) images as no prior study is known. USG is the most common diagnostic modality for GBC detection due to its low cost and accessibility. However, USG images are challenging to analyze due to low image quality, noise, and varying viewpoints due to the handheld nature of the sensor. Our exhaustive study of state-of-the-art (SOTA) image classification techniques for the problem reveals that they often fail to learn the salient GB region due to the presence of shadows in the USG images. SOTA object detection techniques also achieve low accuracy because of spurious textures due to noise or adjacent organs. We propose GBCNet to tackle the challenges in our problem. GBCNet first extracts the regions of interest (ROIs) by detecting the GB (and not cancer), and then uses a new multi-scale, second-order pooling architecture specializing in classifying GBC. To effectively handle spurious textures, we propose a curriculum inspired by human visual acuity, which reduces the texture biases in GBCNet. Experimental results demonstrate that GBCNet significantly outperforms SOTA CNN models, as well as even the expert radiologists.

Paper Supplementary

Dataset

Source code and pre-trained models

Overview

(a), (b), and (c) Normal, benign, and malignant GB sample in USG images, respectively. While normal or benign GB have regular anatomy, clear boundary is absent in malignant GB.
(d) A malignant (biopsy-proven) GB sample. (e) Shadows having visual traits of a GB leads to localization error in ResNet50. (f) GBCNet tackles shadow artifacts well.
(g) Another sample of a malignant GB. (h) The radiologist incorrectly diagnosed the GB as benign based on the stone and wall thickening. (i) GBCNet helps the radiologist to identify the salient region with liver infiltration by the GB, a critical feature of GBC, and correct the prediction.

Grad-CAM Visuals

VGG16, ResNet50, and Inception-V3 focus on the shadow or the echogenic area, and mostly fail to detect GBC. GBCNet accurately focuses on the malignant GB region and detects GBC.

Key Results

Dataset comprises of 1255 images, collected from 218 patients. The test set contains 122 images. We perform 10-fold cross validation to assert generalizability of the results.