发展与验证基于患有晚期癌症患者的腹部和盆腔CT图像的卷积神经网络模型,以预测近骨干骨折
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.
发表日期:2023 Aug 23
作者:
Min Wook Joo, Taehoon Ko, Min Seob Kim, Yong-Suk Lee, Seung Han Shin, Yang-Guk Chung, Hong Kwon Lee
来源:
CLINICAL ORTHOPAEDICS AND RELATED RESEARCH
摘要:
随着晚期癌症患者生存率的提高,发生骨转移和相关病理性骨折的概率也增加(尤其是在股骨近端)。现有的几种用于诊断由转移引起的即将发生的骨折并最终预防未来骨折的系统具有实际限制,因此需要新的筛查工具。 CT扫描是癌症患者分期和随访的标准模态,并且基于CT的数字重建的即时X线检查技术可以进行股骨近端的放射学评估。深度学习模型,如卷积神经网络(CNN),可能能够通过数字重建的X射线图像预测由转移引起的病理性骨折,但据我们所知,尚未对其进行此应用测试。 (1) 使用数字重建的盆腹CT图像预测股骨近端转移的病理性骨折的CNN模型的准确性如何?(2)相对于临床医生,CNN模型在在对盆腹CT图像进行预测股骨近端转移的病理性骨折时,是否表现更好?这些临床医生具有不同背景和经验水平,除了了解股骨近端的转移之外,对患者的历史没有任何了解?
从2011年1月至2021年12月,共有392名患者在三家医院接受了股骨近端的放射治疗。这些患者接受了2945次盆腹CT扫描,以评估和随访其原发癌症。在33%的CT扫描(974次)中,无法确定是否在每次CT图像获取后的3个月内发生了病理性骨折,因此这些患者被排除在外。最终,纳入了平均年龄为59 ± 12岁的1971例病例。在这些病例中,3%(60/1971)在CT后的3个月内发生了病理性骨折。其中47%(936/1971)为女性。60例患者在每次CT扫描后的3个月内发生了确定的病理性骨折,另一组1911例患者在每次CT扫描后的3个月内没有发生病理性骨折。前者和后者组的平均年龄分别为64 ± 11岁和59 ± 12岁,其中32%(19/60)和53%(1016/1911)是女性。采用CT三维体积的透视投影生成了数字重建的即时X线图像。然后,使用一家医院的1557张图像作为训练集。为了验证深度学习模型在不同医疗环境的医院中是否能够稳定运行,使用其他医院的414张图像进行外部验证。通过数据增强方法,增加了具有和没有每次CT扫描后3个月内的病理性骨折的组的图像数量,以提高深度学习模型的性能。使用数字重建的X射线图像对三个CNN模型(VGG16、ResNet50和DenseNet121)进行微调。性能指标包括接收者操作特征曲线下的面积,准确度,敏感度,特异度,精确度和F1分数。接收者操作特征曲线下的面积主要用于评估这三个CNN模型的性能,并使用Youden J统计量计算了最佳准确度,敏感度和特异度。准确度是指CNN模型准确预测了每次CT扫描后3个月内具有和没有病理性骨折的组中的骨折比例。敏感性和特异性分别表示在具有和没有病理性骨折的组中准确预测骨折的比例。精确度是模型产生的错误阳性案例较少。F1分数是敏感性和精确度的调和平均数,两者存在权衡关系。梯度加权类活化映射图像被创建用于检查CNN模型是否正确关注潜在的病理性骨折区域。具有最佳性能的CNN模型与临床医生的性能进行了比较。DenseNet121在识别病理性骨折方面表现最佳,其接收者操作特征曲线下的面积大于VGG16(0.77±0.07 [95% CI 0.75 to 0.79] vs 0.71±0.08 [95% CI 0.69 to 0.73]; p = 0.001)和ResNet50(0.77±0.07 [95% CI 0.75 to 0.79] vs 0.72±0.09 [95% CI 0.69 to 0.74]; p = 0.001)。具体而言,DenseNet121在敏感性(0.22±0.07 [95% CI 0.20 to 0.24]),精确性(0.72±0.19 [95% CI 0.67 to 0.77])和F1分数(0.34±0.10 [95% CI 0.31 to 0.37])方面得分最高,并且能够准确关注到预期病理性骨折区域。此外,与临床医生相比,DenseNet121在预测无病理性骨折病例时误判的可能性较低,优于专业人员在特异性(0.98±0.01 [95% CI 0.98 to 0.99] vs 0.86±0.09 [95% CI 0.81 to 0.91]; p = 0.01)、精确度(0.72±0.19 [95% CI 0.67 to 0.77] vs 0.11±0.10 [95% CI 0.05 to 0.17]; p = 0.0001)和F1分数(0.34±0.10 [95% CI 0.31 to 0.37] vs 0.17±0.15 [95% CI 0.08 to 0.26]; p = 0.0001)方面的表现。CNN模型可能能够准确预测盆腹CT图像中即将发生的病理性骨折,而临床医生可能无法预见;这可以对医生在医疗、放射和骨科肿瘤学中有所帮助。为了获得更好的性能,应开发和验证使用患者病史的集成学习模型。我们的模型代码可在 https://github.com/taehoonko/CNN_path_fx_prediction 上公开获取。三级诊断研究。版权所有 © 2023年骨科医生协会。
Improvement in survival in patients with advanced cancer is accompanied by an increased probability of bone metastasis and related pathologic fractures (especially in the proximal femur). The few systems proposed and used to diagnose impending fractures owing to metastasis and to ultimately prevent future fractures have practical limitations; thus, novel screening tools are essential. A CT scan of the abdomen and pelvis is a standard modality for staging and follow-up in patients with cancer, and radiologic assessments of the proximal femur are possible with CT-based digitally reconstructed radiographs. Deep-learning models, such as convolutional neural networks (CNNs), may be able to predict pathologic fractures from digitally reconstructed radiographs, but to our knowledge, they have not been tested for this application.(1) How accurate is a CNN model for predicting a pathologic fracture in a proximal femur with metastasis using digitally reconstructed radiographs of the abdomen and pelvis CT images in patients with advanced cancer? (2) Do CNN models perform better than clinicians with varying backgrounds and experience levels in predicting a pathologic fracture on abdomen and pelvis CT images without any knowledge of the patients' histories, except for metastasis in the proximal femur?A total of 392 patients received radiation treatment of the proximal femur at three hospitals from January 2011 to December 2021. The patients had 2945 CT scans of the abdomen and pelvis for systemic evaluation and follow-up in relation to their primary cancer. In 33% of the CT scans (974), it was impossible to identify whether a pathologic fracture developed within 3 months after each CT image was acquired, and these were excluded. Finally, 1971 cases with a mean age of 59 ± 12 years were included in this study. Pathologic fractures developed within 3 months after CT in 3% (60 of 1971) of cases. A total of 47% (936 of 1971) were women. Sixty cases had an established pathologic fracture within 3 months after each CT scan, and another group of 1911 cases had no established pathologic fracture within 3 months after CT scan. The mean age of the cases in the former and latter groups was 64 ± 11 years and 59 ± 12 years, respectively, and 32% (19 of 60) and 53% (1016 of 1911) of cases, respectively, were female. Digitally reconstructed radiographs were generated with perspective projections of three-dimensional CT volumes onto two-dimensional planes. Then, 1557 images from one hospital were used for a training set. To verify that the deep-learning models could consistently operate even in hospitals with a different medical environment, 414 images from other hospitals were used for external validation. The number of images in the groups with and without a pathologic fracture within 3 months after each CT scan increased from 1911 to 22,932 and from 60 to 720, respectively, using data augmentation methods that are known to be an effective way to boost the performance of deep-learning models. Three CNNs (VGG16, ResNet50, and DenseNet121) were fine-tuned using digitally reconstructed radiographs. For performance measures, the area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, precision, and F1 score were determined. The area under the receiver operating characteristic curve was used to evaluate three CNN models mainly, and the optimal accuracy, sensitivity, and specificity were calculated using the Youden J statistic. Accuracy refers to the proportion of fractures in the groups with and without a pathologic fracture within 3 months after each CT scan that were accurately predicted by the CNN model. Sensitivity and specificity represent the proportion of accurately predicted fractures among those with and without a pathologic fracture within 3 months after each CT scan, respectively. Precision is a measure of how few false-positives the model produces. The F1 score is a harmonic mean of sensitivity and precision, which have a tradeoff relationship. Gradient-weighted class activation mapping images were created to check whether the CNN model correctly focused on potential pathologic fracture regions. The CNN model with the best performance was compared with the performance of clinicians.DenseNet121 showed the best performance in identifying pathologic fractures; the area under the receiver operating characteristic curve for DenseNet121 was larger than those for VGG16 (0.77 ± 0.07 [95% CI 0.75 to 0.79] versus 0.71 ± 0.08 [95% CI 0.69 to 0.73]; p = 0.001) and ResNet50 (0.77 ± 0.07 [95% CI 0.75 to 0.79] versus 0.72 ± 0.09 [95% CI 0.69 to 0.74]; p = 0.001). Specifically, DenseNet121 scored the highest in sensitivity (0.22 ± 0.07 [95% CI 0.20 to 0.24]), precision (0.72 ± 0.19 [95% CI 0.67 to 0.77]), and F1 score (0.34 ± 0.10 [95% CI 0.31 to 0.37]), and it focused accurately on the region with the expected pathologic fracture. Further, DenseNet121 was less likely than clinicians to mispredict cases in which there was no pathologic fracture than cases in which there was a fracture; the performance of DenseNet121 was better than clinician performance in terms of specificity (0.98 ± 0.01 [95% CI 0.98 to 0.99] versus 0.86 ± 0.09 [95% CI 0.81 to 0.91]; p = 0.01), precision (0.72 ± 0.19 [95% CI 0.67 to 0.77] versus 0.11 ± 0.10 [95% CI 0.05 to 0.17]; p = 0.0001), and F1 score (0.34 ± 0.10 [95% CI 0.31 to 0.37] versus 0.17 ± 0.15 [95% CI 0.08 to 0.26]; p = 0.0001).CNN models may be able to accurately predict impending pathologic fractures from digitally reconstructed radiographs of the abdomen and pelvis CT images that clinicians may not anticipate; this can assist medical, radiation, and orthopaedic oncologists clinically. To achieve better performance, ensemble-learning models using knowledge of the patients' histories should be developed and validated. The code for our model is publicly available online at https://github.com/taehoonko/CNN_path_fx_prediction.Level III, diagnostic study.Copyright © 2023 by the Association of Bone and Joint Surgeons.