用于在基于组学的生物标志物发现中对多目标特征选择进行基准测试的综合评估框架。
A comprehensive evaluation framework for benchmarking multi-objective feature selection in omics-based biomarker discovery.
发表日期:2024 Oct 14
作者:
Luca Cattelani, Arindam Ghosh, Teemu Rintala, Vittorio Fortino
来源:
Ieee Acm T Comput Bi
摘要:
机器学习算法已广泛用于由基于基因表达的生物标志物驱动的癌症亚型的准确分类。然而,结合多个基因表达特征的生物标志物模型通常无法在外部验证数据集中重现,并且其特征集大小通常未优化,从而危及它们转化为具有成本效益的临床工具的能力。我们研究了如何解决多目标问题,即应用七种机器学习驱动的特征子集选择算法来找到分类性能和集合大小之间的最佳权衡,并分析它们在八个大规模转录组数据集的基准测试中的表现癌症,涵盖训练集和外部验证集。该基准包括根据组成基因的准确性、多样性和稳定性评估单个生物标志物和解决方案集的性能的评估指标。此外,提出了一种用于交叉验证研究的新评估指标,该指标概括了超体积,该指标通常用于评估多目标优化算法的性能。分别使用 4、2 和 7 个特征,获得了在乳腺癌、肾癌和卵巢癌外部数据集上表现出 0.8 平衡准确度的生物标志物。遗传算法通常比其他考虑的算法提供更好的性能,最近提出的 NSGA2-CH 和 NSGA2-CHS 在大多数情况下是性能最好的方法。
Machine learning algorithms have been extensively used for accurate classification of cancer subtypes driven by gene expression-based biomarkers. However, biomarker models combining multiple gene expression signatures are often not reproducible in external validation datasets and their feature set size is often not optimized, jeopardizing their translatability into cost-effective clinical tools. We investigated how to solve the multi-objective problem of finding the best trade-offs between classification performance and set size applying seven algorithms for machine learning-driven feature subset selection and analyse how they perform in a benchmark with eight large-scale transcriptome datasets of cancer, covering both training and external validation sets. The benchmark includes evaluation metrics assessing the performance of the individual biomarkers and the solution sets, according to their accuracy, diversity, and stability of the composing genes. Moreover, a new evaluation metric for cross-validation studies is proposed that generalizes the hypervolume, which is commonly used to assess the performance of multi-objective optimization algorithms. Biomarkers exhibiting 0.8 of balanced accuracy on the external dataset for breast, kidney and ovarian cancer using respectively 4, 2 and 7 features, were obtained. Genetic algorithms often provided better performance than other considered algorithms, and the recently proposed NSGA2-CH and NSGA2-CHS were the best performing methods in most cases.