评估市售人工智能算法对超声甲状腺结节风险分层的诊断性能。
Assessment of the Diagnostic Performance of a Commercially Available Artificial Intelligence Algorithm for Risk Stratification of Thyroid Nodules on Ultrasound.
发表日期:2024 Oct 15
作者:
Jeffrey Ashton, Samantha Morrison, Alaattin Erkanli, Benjamin Wildman-Tobriner
来源:
THYROID
摘要:
背景:尽管风险分层系统和最近人工智能 (AI) 算法的出现改进了结节分类,但在超声 (US) 上准确表征甲状腺结节仍具有挑战性。本研究的目的是评估最近美国食品和药物管理局 (FDA) 批准的用于检测甲状腺结节恶性肿瘤的人工智能工具的性能。方法:回顾性评估杜克大学医院及其附属社区医院一年连续甲状腺超声检查≥1个结节的情况(347名患者的649个结节)。所包含的结节通过外科病理学、细针抽吸 (FNA) 或三年随访美国显示稳定的真实诊断。 FDA 批准的人工智能工具 (Koios DS Thyroid) 分析每个结节以生成 (i) 美国放射学会甲状腺成像报告和数据系统 (ACR TI-RADS) 描述符、评分和后续建议以及 (ii) 人工智能- 适配器评分以进一步调整风险评估和建议。然后比较四组:(i)带有 AI 适配器的 Koios,(ii)没有 AI 适配器的 Koios,(iii)临床放射学报告,以及(iv)结合 AI 适配器的放射学报告。最终建议(FNA 或无 FNA)的表现是根据真实情况确定的,并使用敏感性、特异性和接受者操作曲线分析对四组进行比较。结果:649 个结节中,32 个为恶性,617 个为良性。启用 AI 适配器的 Koios 的性能与放射科医生相似(曲线下面积 [AUC] 均为 0.70,[CI 0.60-0.81] 和 [0.60-0.79],分别)。与放射科医生相比,配备 AI 适配器的 Koios 的特异性有所提高(0.63 [CI:0.59-0.67] 与 0.43 [CI:0.38-0.48]),但敏感性降低(0.69 [CI:0.50-0.83)与 0.81 [CI:0.61,0.92] ])。当放射学解读与 AI 适配器结合使用时,性能最高(AUC 0.76,[CI:0.67-0.85])。与 AI 适配器相结合,放射科医生的特异性从 0.43 ([CI: 0.38-0.48]) 提高到 0.53 ([CI: 0.49-0.58])(麦克尼马尔检验 p < 0.001),导致 FNA 建议减少 17%,而 FNA 推荐没有变化灵敏度(0.81,p = 1)。结论:Koios DS 表现出与放射科医生相似的独立性能,但灵敏度较低,特异性较高。当放射科医生的解释与 AI 适配器组件相结合时,性能最佳,可提高特异性并减少不必要的 FNA 建议。
Background: Thyroid nodules are challenging to accurately characterize on ultrasound (US), though the emergence of risk stratification systems and more recently artificial intelligence (AI) algorithms has improved nodule classification. The purpose of this study was to evaluate the performance of a recent Food and Drug Administration (FDA)-cleared AI tool for detection of malignancy in thyroid nodules on US. Methods: One year of consecutive thyroid US with ≥1 nodule from Duke University Hospital and its affiliate community hospital (649 nodules from 347 patients) were retrospectively evaluated. Included nodules had ground truth diagnoses by surgical pathology, fine needle aspiration (FNA), or three-year follow-up US showing stability. An FDA-cleared AI tool (Koios DS Thyroid) analyzed each nodule to generate (i) American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) descriptors, scores, and follow-up recommendations and (ii) an AI-adapter score to further adjust risk assessments and recommendations. Four groups were then compared: (i) Koios with AI-adapter, (ii) Koios without AI-adapter, (iii) clinical radiology report, and (iv) radiology report combined with AI-adapter. Performance of the final recommendations (FNA or no FNA) was determined based on ground truth, and comparison between the four groups was made using sensitivity, specificity, and receiver-operating-curve analysis. Results: Of 649 nodules, 32 were malignant and 617 were benign. Performance of Koios with AI-adapter enabled was similar to radiologists (area under the curve [AUC] 0.70 for both, [CI 0.60-0.81] and [0.60-0.79], respectively). Koios with AI-adapter had improved specificity compared to radiologists (0.63 [CI: 0.59-0.67] versus 0.43 [CI: 0.38-0.48]) but decreased sensitivity (0.69 [CI: 0.50-0.83) versus 0.81 [CI: 0.61, 0.92]). Highest performance was seen when the radiology interpretation was combined with the AI-adapter (AUC 0.76, [CI: 0.67-0.85]). Combined with the AI-adapter, radiologist specificity improved from 0.43 ([CI: 0.38-0.48]) to 0.53 ([CI: 0.49-0.58]) (McNemar's test p < 0.001), resulting in 17% fewer FNA recommendations, with unchanged sensitivity (0.81, p = 1). Conclusion: Koios DS demonstrated standalone performance similar to radiologists, though with lower sensitivity and higher specificity. Performance was best when radiologist interpretations were combined with the AI-adapter component, with improved specificity and reduced unnecessary FNA recommendations.