对前列腺切除标本上前列腺癌格里森分级的人工智能模型进行外部验证。
External validation of an artificial intelligence model for Gleason grading of prostate cancer on prostatectomy specimens.
发表日期:2024 Jul 11
作者:
Bogdana Schmidt, Simon John Christoph Soerensen, Hriday P Bhambhvani, Richard E Fan, Indrani Bhattacharya, Moon Hyung Choi, Christian A Kunder, Chia-Sui Kao, John Higgins, Mirabela Rusu, Geoffrey A Sonn
来源:
BJU INTERNATIONAL
摘要:
为了从外部验证 DeepDx 前列腺人工智能 (AI) 算法(Deep Bio Inc.,首尔,韩国)对全前列腺组织病理学格里森分级的性能,考虑到将在活检样本上训练的 AI 模型应用于根治性治疗时观察到的潜在变化由于组织代表性和样本量的固有差异,商业化的 DeepDx 前列腺 AI 算法是一种自动格里森分级系统,该系统之前使用 1133 个前列腺核心活检图像进行了训练,并在来自两个机构的 700 个活检图像上进行了验证。我们评估了 AI 算法的性能,该算法在由第三方机构的 150 个整体 RP 样本创建的 500 个 1 mm2 瓷砖上输出格里森图案(3、4 或 5)。然后将这些模式分为等级组 (GG),以便与专家病理学家的评估进行比较。参考标准是国际泌尿病理学会GG,由两名经验丰富的泌尿病理学家和第三名专家建立,以裁决不一致的病例。我们使用 Cohen's kappa 将主要指标定义为与参考标准的一致性。两位经验丰富的病理学家在确定图块级别 GG 时的一致性具有二次加权的 Cohen's kappa 为 0.94。 AI 算法与区分癌组织和非癌组织的参考标准之间的一致性,未加权的 Cohen kappa 为 0.91。此外,AI 算法与将图块分类为 GG 的参考标准一致,其二次加权 Cohen 的 kappa 为 0.89。在区分癌组织和非癌组织方面,AI 算法的灵敏度为 0.997,特异性为 0.88;在对 GG ≥2 与 GG 1 和非癌组织进行分类时,其灵敏度为 0.98,特异性为 0.85。DeepDx 前列腺 AI 算法与尿路病理学家专家高度一致,并且在 RP 标本的癌症识别和分级方面表现出色,尽管经过了训练对来自完全不同的患者群体的活检标本进行研究。© 2024 作者。 BJU International 约翰·威利 (John Wiley) 出版
To externally validate the performance of the DeepDx Prostate artificial intelligence (AI) algorithm (Deep Bio Inc., Seoul, South Korea) for Gleason grading on whole-mount prostate histopathology, considering potential variations observed when applying AI models trained on biopsy samples to radical prostatectomy (RP) specimens due to inherent differences in tissue representation and sample size.The commercially available DeepDx Prostate AI algorithm is an automated Gleason grading system that was previously trained using 1133 prostate core biopsy images and validated on 700 biopsy images from two institutions. We assessed the AI algorithm's performance, which outputs Gleason patterns (3, 4, or 5), on 500 1-mm2 tiles created from 150 whole-mount RP specimens from a third institution. These patterns were then grouped into grade groups (GGs) for comparison with expert pathologist assessments. The reference standard was the International Society of Urological Pathology GG as established by two experienced uropathologists with a third expert to adjudicate discordant cases. We defined the main metric as the agreement with the reference standard, using Cohen's kappa.The agreement between the two experienced pathologists in determining GGs at the tile level had a quadratically weighted Cohen's kappa of 0.94. The agreement between the AI algorithm and the reference standard in differentiating cancerous vs non-cancerous tissue had an unweighted Cohen's kappa of 0.91. Additionally, the AI algorithm's agreement with the reference standard in classifying tiles into GGs had a quadratically weighted Cohen's kappa of 0.89. In distinguishing cancerous vs non-cancerous tissue, the AI algorithm achieved a sensitivity of 0.997 and specificity of 0.88; in classifying GG ≥2 vs GG 1 and non-cancerous tissue, it demonstrated a sensitivity of 0.98 and specificity of 0.85.The DeepDx Prostate AI algorithm had excellent agreement with expert uropathologists and performance in cancer identification and grading on RP specimens, despite being trained on biopsy specimens from an entirely different patient population.© 2024 The Author(s). BJU International published by John Wiley & Sons Ltd on behalf of BJU International.