基于归一化优势函数的深度强化学习控制联合化疗与抗血管生成药物输送用于癌肿瘤治疗

Deep reinforcement learning control of combined chemotherapy and anti-angiogenic drug delivery for cancerous tumor treatment

DOI 原文链接

用sci-hub下载

COMPUTERS IN BIOLOGY AND MEDICINE

影响因子:6.3

分区:医学2区 / 数学与计算生物学1区生物学2区计算机：跨学科应用2区工程：生物医学2区

发表日期:2024 Oct

作者: Vahid Reza Niazmand, Mohammad Ali Raheb, Navid Eqra, Ramin Vatankhah, Amirmohammad Farrokhi

DOI: 10.1016/j.compbiomed.2024.109041

摘要

鉴于癌症的慢性与危险性，研究人员探索了多种利用新型治疗方法管理与该疾病相关的异常细胞生长的途径。本文引入一种基于归一化优势函数强化学习的控制系统，旨在增强人体免疫系统对抗癌细胞增殖的反应。该控制方法首次无需复杂的预定义数学模型，即可实现化疗和抗血管生成药物的联合应用。它采用无模型强化学习技术，能够自适应调整以适应不同患者，确定最优药物给药方案以最小的注射剂量。在此基础上，建立了一个全面且逼真的模拟与训练环境，状态变量包括正常细胞、癌细胞和内皮细胞的浓度，以及化疗药物和抗血管生成药物的水平。此外，在模拟中还考虑了高水平干扰，以研究所提出方法对治疗过程中或患者参数中可能存在的不确定性的鲁棒性。设计了符合医疗目标的实用奖励函数，以确保治疗效果的有效性和安全性。结果显示，该方法具有鲁棒性，并优于现有方法。模拟结果表明，所提出的方法是一种可靠的策略，能在最短时间内以最小剂量的化疗和抗血管生成药物有效降低癌细胞浓度。

Abstract

By virtue of the chronic and dangerous nature of cancer, researchers have explored various approaches to managing the abnormal cell growth associated with this disease using novel treatment methods. This study introduces a control system based on normalized advantage function reinforcement learning. It aims to boost the body's immune response against cancer cell proliferation. This control approach is applied to provide a combination of both chemotherapy and anti-angiogenic drugs for the first time without the need for complex, predefined mathematical models. It employs a model-free reinforcement learning technique that adaptively adjusts to individual patients to determine optimal drug administration with minimum injection rates. In this regard, a comprehensive and realistic simulation and training environment is employed, with the concentrations of normal cells, cancer cells, and endothelial cells, as well as the levels of chemotherapy and anti-angiogenic agents, as state variables. Furthermore, high levels of disturbances are considered in the simulation to investigate the robustness of the proposed method against probable uncertainties in the treatment process or patient parameters. A practical reward function has also been devised in alignment with medical objectives to ensure effective and safe treatment outcomes. The results demonstrate robustness and superior performance compared to the existing methods. Simulations show that the proposed approach is a dependable strategy for effectively reducing the concentration of cancer cells in the shortest duration using minimal doses of chemotherapy and anti-angiogenic drugs.