基于人群的癌症研究大数据集指南:优点、局限性和陷阱。
A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls.
发表日期:2024 Aug 19
作者:
Allison N Martin, Norine W Chan, Dillon C Cheung, Zhi Ven Fong
来源:
CANCER
摘要:
随着基于大型数据库的癌症研究的激增,研究问题和数据集能力的错位是不可避免的。国家维护的数据库对癌症研究人员很有吸引力,因为可以轻松访问大量可用于分析和风险评估的患者数据。癌症研究中常用的数据集包括国家癌症数据库、国家癌症研究所的 SEER(监测、流行病学和最终结果)计划、SEER-Medicare 数据库、美国外科医生学会国家手术质量改进计划、以及医疗保健成本和利用项目数据库等。每个数据集在变量可用性和分析癌症特定结果的能力方面都有优点和缺点。对于研究人员来说,了解每个数据库的优点和局限性至关重要。改变变量定义、术后数据收集的长度以及患者报告的结果或健康数据的社会决定因素的可用性是研究人员在选择用于研究目的的数据集时必须考虑的因素的例子。在当前的综述中,作者总结了癌症人群队列研究的各种国家数据集的优缺点。© 2024 美国癌症协会。
With the proliferation of cancer research based on large databases, misalignment of research questions and data set capabilities is inevitable. Nationally maintained databases are appealing to cancer researchers because of the ease of access to large amounts of patient data available for analysis and risk estimation. Data sets that are commonly used in cancer research include the National Cancer Database, the SEER (Surveillance, Epidemiology, and End Results) program of the National Cancer Institute, the SEER-Medicare database, the American College of Surgeons National Surgical Quality Improvement Program, and the Healthcare Cost and Utilization Project databases, among others. Each data set has pros and cons with respect to variable availability and the ability to analyze cancer-specific outcomes. It is critical for researchers to understand the strengths and limitations of each database. Changing variable definitions, the length of postoperative data collection, and the availability of patient-reported outcomes or social determinants of health data are examples of factors that researchers must consider when selecting a data set for research purposes. For the current review, the authors summarized the advantages and disadvantages of various national data sets for cohort studies in cancer populations.© 2024 American Cancer Society.