利用机器学习进行甲状腺结节的早期筛查:中国的双中心横断面研究。
Utilizing machine learning for early screening of thyroid nodules: a dual-center cross-sectional study in China.
发表日期:2024
作者:
Shuwei Weng, Chen Ding, Die Hu, Jin Chen, Yang Liu, Wenwu Liu, Yang Chen, Xin Guo, Chenghui Cao, Yuting Yi, Yanyi Yang, Daoquan Peng
来源:
Disease Models & Mechanisms
摘要:
甲状腺结节在全球日益普遍,存在恶变的风险。早期筛查对于治疗至关重要,但当前的模型主要关注超声特征。本研究探索了使用人口统计和生化指标进行筛查的机器学习。通过分析 6,102 个个体和 61 个变量的数据,我们确定了 17 个关键变量,以使用 6 个机器学习分类器构建模型:逻辑回归、SVM、多层感知器、随机森林、XGBoost 和光GBM。通过准确度、精确度、召回率、F1 分数、特异性、kappa 统计量和 AUC 来评估性能,并通过内部和外部验证来评估普遍性。 Shapley 值确定特征重要性,决策曲线分析评估临床效益。随机森林显示出最高的内部验证准确度 (78.3%) 和 AUC (89.1%)。 LightGBM 展示了强大的外部验证性能。关键因素包括年龄、性别和尿碘水平,在不同阈值下具有显着的临床益处。在各种风险阈值下都观察到了临床益处,特别是在集成模型中。机器学习,特别是集成方法,可以使用人口统计和生化数据准确预测甲状腺结节的存在。这种具有成本效益的策略为甲状腺健康管理提供了宝贵的见解,有助于早期发现并可能改善临床结果。这些发现增强了我们对甲状腺结节关键预测因素的理解,并强调了机器学习在公共卫生应用中用于早期疾病筛查和预防的潜力。版权所有 © 2024 Weng, Ding, Hu, Chen, Liu, Liu, Chen,Guo, Cao 、易、杨、彭。
Thyroid nodules, increasingly prevalent globally, pose a risk of malignant transformation. Early screening is crucial for management, yet current models focus mainly on ultrasound features. This study explores machine learning for screening using demographic and biochemical indicators.Analyzing data from 6,102 individuals and 61 variables, we identified 17 key variables to construct models using six machine learning classifiers: Logistic Regression, SVM, Multilayer Perceptron, Random Forest, XGBoost, and LightGBM. Performance was evaluated by accuracy, precision, recall, F1 score, specificity, kappa statistic, and AUC, with internal and external validations assessing generalizability. Shapley values determined feature importance, and Decision Curve Analysis evaluated clinical benefits.Random Forest showed the highest internal validation accuracy (78.3%) and AUC (89.1%). LightGBM demonstrated robust external validation performance. Key factors included age, gender, and urinary iodine levels, with significant clinical benefits at various thresholds. Clinical benefits were observed across various risk thresholds, particularly in ensemble models.Machine learning, particularly ensemble methods, accurately predicts thyroid nodule presence using demographic and biochemical data. This cost-effective strategy offers valuable insights for thyroid health management, aiding in early detection and potentially improving clinical outcomes. These findings enhance our understanding of the key predictors of thyroid nodules and underscore the potential of machine learning in public health applications for early disease screening and prevention.Copyright © 2024 Weng, Ding, Hu, Chen, Liu, Liu, Chen, Guo, Cao, Yi, Yang and Peng.