使用机器学习检测淋巴瘤 B 细胞库中的疾病特异性特征。

Detection of disease-specific signatures in B cell repertoires of lymphomas using machine learning.

Original text

发表日期：2024 Jul 02

作者： Paul Schmidt-Barbo, Gabriel Kalweit, Mehdi Naouar, Lisa Paschold, Edith Willscher, Christoph Schultheiß, Bruno Märkl, Stefan Dirnhofer, Alexandar Tzankov, Mascha Binder, Maria Kalweit

来源： BIOMEDICINE & PHARMACOTHERAPY

摘要：

B 细胞淋巴瘤的分类（主要基于病理学家的光学显微镜评估）需要多年的培训。由于淋巴瘤克隆型的 B 细胞受体 (BCR) 和微环境免疫结构是区分不同淋巴瘤亚型的重要特征，我们询问淋巴瘤浸润组织的 BCR 全新一代测序 (NGS) 与机器学习算法相结合是否可以这些癌症的亚分类中的诊断实用性。我们根据克隆分布模式、VDJ 基因使用和 620 个典型淋巴瘤样本（以结节性淋巴细胞为主的 B）的 BCR 库中最常见克隆型的理化特性，通过逻辑回归训练随机森林和线性分类器细胞淋巴瘤 (NLPBL)、弥漫性大 B 细胞淋巴瘤 (DLBCL) 和慢性淋巴细胞白血病 (CLL)，以及 291 个对照样本。对于 DLBCL 和 CLL，模型在仅利用最常见的克隆型进行分类时表现出最佳性能，而在 NLPBL（具有非恶性旁观细胞的主导背景）中，更广泛的克隆型阵列提高了模型的准确性。令人惊讶的是，简单的逻辑回归模型在这个看似复杂的分类问题中表现最好，这表明我们选择的维度具有线性可分离性。它在测试队列中获得了 0.84 的加权 F1 分数，该测试队列包括来自所有三种淋巴瘤实体的 125 个样本和来自健康个体的 58 个样本。我们共同提供了概念验证，即通过训练有素的机器学习模型，使用 BCR 全部 NGS 在淋巴瘤浸润组织上可以区分至少 3 种研究的淋巴瘤实体。版权所有：© 2024 Schmidt-Barbo 等人。这是一篇根据知识共享署名许可条款分发的开放获取文章，允许在任何媒体上不受限制地使用、分发和复制，前提是注明原始作者和来源。

The classification of B cell lymphomas-mainly based on light microscopy evaluation by a pathologist-requires many years of training. Since the B cell receptor (BCR) of the lymphoma clonotype and the microenvironmental immune architecture are important features discriminating different lymphoma subsets, we asked whether BCR repertoire next-generation sequencing (NGS) of lymphoma-infiltrated tissues in conjunction with machine learning algorithms could have diagnostic utility in the subclassification of these cancers. We trained a random forest and a linear classifier via logistic regression based on patterns of clonal distribution, VDJ gene usage and physico-chemical properties of the top-n most frequently represented clonotypes in the BCR repertoires of 620 paradigmatic lymphoma samples-nodular lymphocyte predominant B cell lymphoma (NLPBL), diffuse large B cell lymphoma (DLBCL) and chronic lymphocytic leukemia (CLL)-alongside with 291 control samples. With regard to DLBCL and CLL, the models demonstrated optimal performance when utilizing only the most prevalent clonotype for classification, while in NLPBL-that has a dominant background of non-malignant bystander cells-a broader array of clonotypes enhanced model accuracy. Surprisingly, the straightforward logistic regression model performed best in this seemingly complex classification problem, suggesting linear separability in our chosen dimensions. It achieved a weighted F1-score of 0.84 on a test cohort including 125 samples from all three lymphoma entities and 58 samples from healthy individuals. Together, we provide proof-of-concept that at least the 3 studied lymphoma entities can be differentiated from each other using BCR repertoire NGS on lymphoma-infiltrated tissues by a trained machine learning model.Copyright: © 2024 Schmidt-Barbo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.