使用诊断特异性整体学习和症状来识别血清蛋白质组生物标志物组,以进行早期胰腺癌检测。
Identification of a serum proteomic biomarker panel using diagnosis specific ensemble learning and symptoms for early pancreatic cancer detection.
发表日期:2024 Aug 29
作者:
Alexander Ney, Nuno R Nené, Eva Sedlak, Pilar Acedo, Oleg Blyuss, Harry J Whitwell, Eithne Costello, Aleksandra Gentry-Maharaj, Norman R Williams, Usha Menon, Giuseppe K Fusai, Alexey Zaikin, Stephen P Pereira
来源:
PLoS Computational Biology
摘要:
胰腺导管腺癌 (PDAC) 的严峻生存率(<10% 5 年)归因于其复杂的内在生物学和最常见的晚期检测。早期症状与良性胃肠道疾病的重叠使及时发现变得更加复杂。碳水化合物抗原 (CA) 19-9 的次优诊断性能和良性高胆红素血症的升高破坏了其可靠性,导致明显缺乏准确的诊断生物标志物。使用选定的患有良性胰腺和胆道疾病的患者队列,我们的目的是开发一种数据分析方案,产生一种生物标志物特征,能够将具有非特异性但相关临床表现的患者与患有 PDAC 的患者区分开来。539 名患者血清样本是根据神经内分泌和胰腺肿瘤加速诊断 (ADEPTS) 研究(良性疾病对照和 PDAC)和英国卵巢癌筛查合作试验(UKCTOCS,健康对照)使用 Olink Oncology II panel 进行筛查,并辅以五种内部标记物。堆叠 16 个专门的基础学习器分类器来选择和增强盲样本中的生物标志物性能和鲁棒性。每个基础学习器都是通过交叉验证和递归特征消除在包含大约三分之二的 ADEPTS 和 UKCTOCS 样本的发现集中构建的,并将特定诊断与 PDAC 进行对比。使用特定于诊断的集成学习开发的签名显示出优于其他诊断的预测能力CA19-9 是目前唯一被 FDA 和国家综合癌症网络指南接受的胰腺癌生物标志物,以及发现和保留验证集中的其他个体生物标志物和组合。使用集成方法在 90% 特异性下实现了 0.98(95% CI 0.98-0.99)的 AUC 和 0.99(95% CI 0.98-1)的灵敏度,显着大于 0.79(95% CI 0.66-1)的 AUC。在发现组中,CA19-9 的敏感性为 0.67 (95% CI 0.50-0.83),特异性也为 90%(分别为 p = 0.0016 和 p = 0.00050)。在保留组中进行整体签名验证期间,AUC 为 0.95 (95% CI 0.91-0.99),灵敏度为 0.86 (95% CI 0.68-1),而 AUC 为 0.80 (95% CI 0.66-0.93) ),单独使用 CA19-9 的敏感性为 0.65(95% CI 0.48-0.56),特异性为 90%(分别为 p = 0.0082 和 p = 0.024)。仅在良性疾病对照和从 ADEPTS 收集的 PDAC 上进行验证时,诊断特异性特征在 90% 特异性下达到了 0.96(95% CI 0.92-0.99)的 AUC(95% CI 0.92-0.99)、0.82(95% CI 0.64-0.95)的敏感性,这是仍显着高于 CA19-9 作为单一预测因子的表现,AUC 为 0.79 (95% CI 0.64-0.93),敏感性为 0.18 (95% CI 0.03-0.69)(分别为 p = 0.013 和 p = 0.0055) .我们的整体建模技术优于 CA19-9,采用流行算法开发的个体生物标志物和指数可将具有非特异性但相关症状的患者与 PDAC 患者区分开来,这对于改善高危个体的早期检测具有重要意义。版权所有:© 2024 Ney等人。这是一篇根据知识共享署名许可条款分发的开放获取文章,允许在任何媒体上不受限制地使用、分发和复制,前提是注明原始作者和来源。
The grim (<10% 5-year) survival rates for pancreatic ductal adenocarcinoma (PDAC) are attributed to its complex intrinsic biology and most often late-stage detection. The overlap of symptoms with benign gastrointestinal conditions in early stage further complicates timely detection. The suboptimal diagnostic performance of carbohydrate antigen (CA) 19-9 and elevation in benign hyperbilirubinaemia undermine its reliability, leaving a notable absence of accurate diagnostic biomarkers. Using a selected patient cohort with benign pancreatic and biliary tract conditions we aimed to develop a data analysis protocol leading to a biomarker signature capable of distinguishing patients with non-specific yet concerning clinical presentations, from those with PDAC.539 patient serum samples collected under the Accelerated Diagnosis of neuro Endocrine and Pancreatic TumourS (ADEPTS) study (benign disease controls and PDACs) and the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS, healthy controls) were screened using the Olink Oncology II panel, supplemented with five in-house markers. 16 specialized base-learner classifiers were stacked to select and enhance biomarker performances and robustness in blinded samples. Each base-learner was constructed through cross-validation and recursive feature elimination in a discovery set comprising approximately two thirds of the ADEPTS and UKCTOCS samples and contrasted specific diagnosis with PDAC.The signature which was developed using diagnosis-specific ensemble learning demonstrated predictive capabilities outperforming CA19-9, the only biomarker currently accepted by the FDA and the National Comprehensive Cancer Network guidelines for pancreatic cancer, and other individual biomarkers and combinations in both discovery and held-out validation sets. An AUC of 0.98 (95% CI 0.98-0.99) and sensitivity of 0.99 (95% CI 0.98-1) at 90% specificity was achieved with the ensemble method, which was significantly larger than the AUC of 0.79 (95% CI 0.66-0.91) and sensitivity 0.67 (95% CI 0.50-0.83), also at 90% specificity, for CA19-9, in the discovery set (p = 0.0016 and p = 0.00050, respectively). During ensemble signature validation in the held-out set, an AUC of 0.95 (95% CI 0.91-0.99), sensitivity 0.86 (95% CI 0.68-1), was attained compared to an AUC of 0.80 (95% CI 0.66-0.93), sensitivity 0.65 (95% CI 0.48-0.56) at 90% specificity for CA19-9 alone (p = 0.0082 and p = 0.024, respectively). When validated only on the benign disease controls and PDACs collected from ADEPTS, the diagnostic-specific signature achieved an AUC of 0.96 (95% CI 0.92-0.99), sensitivity 0.82 (95% CI 0.64-0.95) at 90% specificity, which was still significantly higher than the performance for CA19-9 taken as a single predictor, AUC of 0.79 (95% CI 0.64-0.93) and sensitivity of 0.18 (95% CI 0.03-0.69) (p = 0.013 and p = 0.0055, respectively).Our ensemble modelling technique outperformed CA19-9, individual biomarkers and indices developed with prevailing algorithms in distinguishing patients with non-specific but concerning symptoms from those with PDAC, with implications for improving its early detection in individuals at risk.Copyright: © 2024 Ney et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.