研究动态
Articles below are published ahead of final publication in an issue. Please cite articles in the following format: authors, (year), title, journal, DOI.

使用大语言模型辅助多模态学习自动描绘放射治疗的治疗靶区。

Auto-delineation of treatment target volume for radiation therapy using large language model-aided multimodal learning.

发表日期:2024 Aug 06
作者: Praveenbalaji Rajendran, Yizheng Chen, Liang Qiu, Thomas Niedermayr, Wu Liu, Mark Buyyounouski, Hilary Bagshaw, Bin Han, Yong Yang, Nataliya Kovalchuk, Xuejun Gu, Steven Hancock, Lei Xing, Xianjin Dai
来源: Int J Radiat Oncol

摘要:

人工智能(AI)辅助方法在正常组织的自动描绘方面取得了重大进展。然而,这些方法难以实现放射治疗靶区的自动轮廓。我们的目标是将目标体积的描绘建模为临床决策问题,通过利用大语言模型辅助的多模态学习方法来解决。已经开发了一种称为 Medformer 的视觉语言模型,采用分层视觉转换器作为其模型骨干,并结合大型语言模型来提取丰富的文本特征。通过视觉语言注意模块,上下文嵌入的语言特征无缝集成到视觉特征中,以进行语言感知的视觉编码。包括 Dice 相似系数 (DSC)、交并集 (IOU) 和 95% 豪斯多夫距离 (HD95) 在内的指标用于定量评估我们模型的性能。该评估是在内部前列腺癌数据集和公共口咽癌 (OPC) 数据集上进行的,总共 668 名受试者。我们的 Medformer 的 DSC 为 0.81 ± 0.10 与 0.72 ± 0.10,IOU 为 0.73 ± 0.12 与 0.65 ± 0.09,在前列腺癌数据集上描绘总肿瘤体积 (GTV) 时,HD95 为 9.86 ± 9.77 mm,而 HD95 为 19.13 ± 12.96 mm。同样,在 OPC 数据集上,它的 DSC 为 0.77 ± 0.11 对比 0.72 ± 0.09,IOU 为 0.70 ± 0.09 对比 0.65 ± 0.07,HD95 为 7.52 ± 4.8 mm 对比 13.63 ± 7.13 mm,代表显着改进 (p < 0.05 )。为了描绘临床目标体积 (CTV),Medformer 实现了 0.91 ± 0.04 的 DSC、0.85 ± 0.05 的 IOU 和 2.98 ± 1.60 mm 的 HD95,与其他最先进的算法相当。基于多模式学习的治疗目标优于纯粹依赖视觉特征的传统方法。我们的方法可以采用到日常实践中,以快速绘制 CTV/GTV 轮廓。版权所有 © 2024。由 Elsevier Inc. 出版。
Artificial intelligence (AI)-aided methods have made significant progress in the auto-delineation of normal tissues. However, these approaches struggle with the auto-contouring of radiotherapy target volume. Our goal is to model the delineation of target volume as a clinical decision-making problem, resolved by leveraging large language model-aided multimodal learning approaches.A vision-language model, termed Medformer, has been developed, employing the hierarchical vision transformer as its backbone, and incorporating large language models to extract text-rich features. The contextually embedded linguistic features are seamlessly integrated into visual features for language-aware visual encoding through the visual language attention module. Metrics, including Dice similarity coefficient (DSC), intersection over union (IOU), and 95th percentile Hausdorff distance (HD95), were used to quantitatively evaluate the performance of our model. The evaluation was conducted on an in-house prostate cancer dataset and a public oropharyngeal carcinoma (OPC) dataset, totaling 668 subjects.Our Medformer achieved a DSC of 0.81 ± 0.10 versus 0.72 ± 0.10, IOU of 0.73 ± 0.12 versus 0.65 ± 0.09, and HD95 of 9.86 ± 9.77 mm versus 19.13 ± 12.96 mm for delineation of gross tumor volume (GTV) on the prostate cancer dataset. Similarly, on the OPC dataset, it achieved a DSC of 0.77 ± 0.11 versus 0.72 ± 0.09, IOU of 0.70 ± 0.09 versus 0.65 ± 0.07, and HD95 of 7.52 ± 4.8 mm versus 13.63 ± 7.13 mm, representing significant improvements (p < 0.05). For delineating the clinical target volume (CTV), Medformer achieved a DSC of 0.91 ± 0.04, IOU of 0.85 ± 0.05, and HD95 of 2.98 ± 1.60 mm, comparable to other state-of-the-art algorithms.Auto-delineation of the treatment target based on multimodal learning outperforms conventional approaches that rely purely on visual features. Our method could be adopted into routine practice to rapidly contour CTV/GTV.Copyright © 2024. Published by Elsevier Inc.