拉曼光谱结合卷积神经网络进行乳腺癌亚型分类和关键特征可视化。

Raman spectroscopy combined with convolutional neural network for the sub-types classification of breast cancer and critical feature visualization.

Original text

发表日期：2024 Aug 03

作者： Juan Li, Xiaoting Wang, Shungeng Min, Jingjing Xia, Jinyao Li

来源： Comput Meth Prog Bio

摘要：

拉曼光谱已成为一种可用于非侵入性乳腺癌分析的有效技术。然而，目前的拉曼预测模型未能覆盖乳腺癌所有分子亚型，且缺乏模型的可视化。利用拉曼光谱结合卷积神经网络（CNN）对现有已知分子亚型构建预测模型- 乳腺癌的类型，并通过可视化策略选择关键峰，从而达到挖掘特定生物标志物信息的目的。针对CNN中的多个参数，借助麻雀搜索算法（SSA）优化网络参数，提高预测能力模型的性能。为了避免结果的偶然性，通过蒙特卡罗采样生成多组数据并用于训练模型，从而提高结果的可信度。在模型准确预测的基础上，利用梯度加权类激活映射（Grad-CAM）对有助于分类的光谱区域进行可视化，达到特征峰可视化的目的。与其他算法相比，优化后的CNN可以获得最高的准确度和最低的标准误差。并且使用全光谱和指纹区域之间没有显着差异（在2%以内），表明指纹区域在分类子类型中提供了最大的贡献。根据指纹区域的分类结果，各子类型的模型性能如下：CNN（95.34%±2.18%）>SVM（94.90%±1.88%）>PLS-DA（94.52%±2.22%） > KNN（80.00%±5.27%）。 Grad-CAM可视化的关键特征可以与IHC信息很好地匹配，使得亚型在空间位置上的区分更加明显。拉曼光谱结合CNN可以实现乳腺癌分子亚型的准确、快速识别。所提出的可视化策略可以从生物化学信息和空间位置得到证明，证明该策略将来可能用于生物标志物的挖掘。版权所有 © 2024 Elsevier B.V. 保留所有权利。

Raman spectroscopy has emerged as an effective technique that can be used for noninvasive breast cancer analysis. However, the current Raman prediction models fail to cover all the molecular sub-types of breast cancer, and lack the visualization of the model.Using Raman spectroscopy combined with convolutional neural network (CNN) to construct a prediction model for the existing known molecular sub-types of breast cancer, and selected critical peaks through visualization strategies, so as to achieve the purpose of mining specific biomarker information.Optimizing network parameters with the help of sparrow search algorithm (SSA) for the multiple parameters in the CNN to improve the prediction performance of the model. To avoid the contingency of the results, multiple sets of data were generated through Monte Carlo sampling and used to train the model, thereby improving the credibility of the results. Based on the accurate prediction of the model, the spectral regions that contributed to the classification were visualized using Gradient-weighted Class Activation Mapping (Grad-CAM), achieving the goal of visualizing characteristic peaks.Compared with other algorithms, optimized CNN could obtain the highest accuracy and lowest standard error. And there was no significant difference between using full spectra and fingerprint regions (within 2 %), indicating that the fingerprint region provided the most contribution in classifying sub-types. Based on the classification results from the fingerprint region, the model performances about various sub-types were as follows: CNN (95.34 %±2.18 %)>SVM(94.90 %±1.88 %)>PLS-DA(94.52 %±2.22 %)> KNN (80.00 %±5.27 %). The critical features visualized by Grad-CAM could match well with IHC information, allowing for a more distinct differentiation of sub-types in their spatial positions.Raman spectroscopy combined with CNN could achieve accurate and rapid identification of breast cancer molecular sub-types. Proposed visualization strategy could be proved from biochemistry information and spatial location, demonstrated that the strategy might be used for the mining of biomarkers in future.Copyright © 2024 Elsevier B.V. All rights reserved.