医保真实世界数据清洗研究——以国谈药品“纳入支付范围的药品费用”为例

刘雨欣, 杨莹, 侯宜坦, 宋世鸿, 付熙媛, 罗毅, 毛宗福, 左后娟

中国医疗保险 ›› 2025, Vol. 0 ›› Issue (7) : 23-31.

中国医疗保险 ›› 2025, Vol. 0 ›› Issue (7) : 23-31. DOI: 10.19546/j.issn.1674-3830.2025.7.003
观察思考

医保真实世界数据清洗研究——以国谈药品“纳入支付范围的药品费用”为例

  • 刘雨欣1, 杨莹2, 侯宜坦3, 宋世鸿4, 付熙媛4, 罗毅1, 毛宗福5,6, 左后娟1
作者信息 +

A Study on the Cleaning of Real-World Medical Insurance Data——Taking the “Cost of Medicines Included in the Payment Scope” of National Negotiated Drugs as an Example

Author information +
文章历史 +

摘要

目的: 本研究聚焦医保真实世界数据的质量问题,系统构建一套医保真实世界数据清洗策略,并以国家谈判药品“纳入支付范围的药品费用”的统计为例,评估数据清洗策略的有效性。方法: 以A市2024年1—9月涉及国家医保谈判准入的抗肿瘤药及免疫调节剂的医保结算原始记录数据库为研究对象,构建了包含建立校验数据集、文本结构化处理及异常值处理的数据清洗策略。以药品总费用及纳入支付范围的药品费用为效应指标,从不同人群和用药场景多维度比较分析数据清洗前后观测指标的变化。结果: 数据清洗后,药品总费用保持高度稳定性(偏差<0.3%),纳入支付范围的药品费用显著下降,绝对值减少约200万元,降幅约5%。在不同人群(职工与城乡居民医保参保人)、不同用药场景(门诊、住院与零售药店以及本地与异地)呈现出不同的偏差。结论: 研究验证了数据清洗策略在提升医保数据质量中的有效性,为医保真实世界数据的科学应用提供了有力支持。同时,研究揭示了医保数据质量问题的异质性分布,提示在医保数据治理过程中需关注不同人群和用药场景下的数据质量差异。未来,需进一步完善医保信息系统、强化跨平台数据系统治理,以应对医保数据治理中的系统性挑战。

Abstract

Objective: This study focuses on the quality issues of real-world medical insurance data, systematically constructs a data cleaning strategy for real-world medical insurance data, and evaluates the effectiveness of the data cleaning strategy using the statistics of the "cost of medicines included in the payment scope" of national negotiated drugs as an example. Methods: The study takes the original medical insurance settlement records database of anti-tumor drugs and immune regulators involved in the national medical insurance negotiation access in City A from January to September 2024 as the research object, and constructs a data cleaning strategy that includes establishing a verification dataset, text structuring, and handling of outliers. With the total cost of medicines and the cost of medicines included in the payment scope as effect indicators, the study compares and analyzes the changes in the observed indicators before and after data cleaning from multiple dimensions of different populations and medication scenarios. Results: After data cleaning, the total cost of medicines remained highly stable (deviation < 0.3%), while the cost of medicines included in the payment scope significantly decreased, with an absolute value reduction of approximately 2 million yuan, corresponding to a decrease of about 5%. Different deviations were observed across different populations (employees and urban and rural residents' medical insurance) and different medication scenarios (outpatient, inpatient, and retail pharmacies, as well as local and non-local medical treatment). Conclusion: The study verifies the effectiveness of the data cleaning strategy in improving the quality of medical insurance data and provides strong support for the scientific application of real-world medical insurance data. Meanwhile, the study reveals the heterogeneous distribution of medical insurance data quality issues, indicating that attention should be paid to data quality differences under different populations and medication scenarios in medical insurance data governance. In the future, it is necessary to further improve the medical insurance information system and strengthen cross-platform data governance to cope with the systemic challenges in medical insurance data governance.

关键词

数据清洗 / 医保真实世界数据 / 纳入支付范围的药品费用

Key words

data cleaning / real-world medical insurance data / cost of medicines included in the payment scope

引用本文

导出引用
刘雨欣, 杨莹, 侯宜坦, 宋世鸿, 付熙媛, 罗毅, 毛宗福, 左后娟. 医保真实世界数据清洗研究——以国谈药品“纳入支付范围的药品费用”为例[J]. 中国医疗保险. 2025, 0(7): 23-31 https://doi.org/10.19546/j.issn.1674-3830.2025.7.003
A Study on the Cleaning of Real-World Medical Insurance Data——Taking the “Cost of Medicines Included in the Payment Scope” of National Negotiated Drugs as an Example[J]. China Health Insurance. 2025, 0(7): 23-31 https://doi.org/10.19546/j.issn.1674-3830.2025.7.003
中图分类号: F840.684    C913.7   

参考文献

[1] 杨莹,侯宜坦,吴若男,等.老年“两病”患者门诊用药保障的影响效应研究——基于2019—2023年医保真实世界数据[J].中国医疗保险,2024(07):11-23.
[2] 阮文懿,霍记平,于飚,等.我国儿童专用药品可及性真实世界数据的多中心调研[J].中国新药杂志,2024,33(09):849-855.
[3] 刘静,黄镇,覃肖潇,等.基于故障树分析法的医保基金使用风险识别研究——以某市医保基金监管真实世界数据为例[J].中国医疗保险,2019(05):34-38.
[4] 陈苏宁,范长生,盛广影,等.扩大医保报销对慢性髓性白血病治疗的影响——一项基于真实世界数据的卫生经济研究[J].中国医疗保险2017(08):55-60.
[5] 卡尔·安德森.数据驱动力:企业数据分析实战[M].张奎,郭鹏程,等,译.北京:人民邮电出版社,2021.
[6] 袁妮,吕子萱,黄祖彤,等.医保真实世界数据研究面临的挑战[J].中国医疗保险,2024(10):15-23.
[7] 马苏冰星,丁锦希,陈莹,等.医保准入真实世界证据的质量评价和应用规范[J].中国医药工业杂志,2024,55(06):866-872.
[8] 马文昊,王诗淳,靳英辉,等.真实世界研究的发展与展望[J].中国循证心血管医学杂志,2023,15(10):1266-1271.
[9] SHERMAN R E, ANDERSON S A, DAL PAN G J, et al. Real-world evidence-what is it and what it tell us?[J]. The New England journal of medicine, 2016, 375(23): 2293-2297.
[10] 邢冬梅,李春晓,刘新灿,等.中医药领域真实世界研究存在的问题与对策[J].中华中医药杂志,2021,36(4):1798-1801.
[11] 刘雨欣,侯宜坦,左后娟,等.医保真实世界数据质量评估研究——以谈判药品“纳入支付范围的药品费用”为例[J].中国医疗保险,2025(01):34-41.
[12] 刘鹏,张燕,李法平,等.数据清洗[M].北京:清华大学出版社,2018:1-13.
[13] 巴尔·摩西,利奥·加维什,莫莉·沃尔维克,等.数据质量管理:数据可靠性与数据质量问题解决之道[M].李晗玥,等,译.北京:机械工业出版社,2024.
[14] 王雯,高培,吴晶,等.构建基于既有健康医疗数据的研究型数据库技术规范[J].中国循证医学杂志,2019,19(07):763-770.
[15] 谭婧,熊益权,黄诗尧,等.用于药品临床价值和经济价值评价的真实世界数据关键技术考量[J].中国循证医学杂志,2024,24(05):516-522.
[16] 赵国桢,闫世艳,郭玉红,等.基于既有医疗数据构建研究型数据库的方法学探讨及实例解读(二):数据治理的方法[J].中国中医药信息杂志,2023,30(09):17-21.
[17] XI S, CHARLOTTE P, GIJS P V, et al.An automated data cleaning method for electronic health records by incorporating clinical knowledge[J]. BMC medical informatics and decision making, 2021, 21(1): 267.
[18] TYREE T P.Challenges of using medical insurance claims data for utilization analysis[J].American journal of medical quality, 2006, 21(4): 269-275.
[19] 华俊杰. 基于WHO MDB的两类非故意伤害死亡编码质量及其影响研究[D].长沙:中南大学,2022.
[20] 国家药品监督管理局药品评审中心.《用于产生真实世界证据的真实世界数据指导原则(试行)》的通告[EB/OL].(2021-04)[2025-03-11].https://www.cde.org.cn/main/news/viewInfoCommon/2a1c437ed54e7b838a7e86f4ac21c539.
[21] 中华中医药学会.《中医药真实世界研究技术规范-数据库构建和数据预处理》[EB/OL].(2021-06-30)[2025-03-11].https://www.cacm.org.cn/2021/06/30/13874/#:~:text=%E8%81%94%20%E7%B3%BB%20%E4%BA%BA,2021%E5%B9%B46%E6%9C%8830%E6%97%A5.
[22] MANPING G, YIMING W, QIAONING Y, et al.Normal workflow and key strategies for data cleaning toward real-world data: viewpoint[J]. Interactive journal of medical research, 2023:12e44310-e44310.
[23] OLIWIER D, TIFFANY C, MUSTAFA O, et al.Using a data quality framework to clean data extracted from the electronic health record: a case study[J]. EGEMS (Washington, DC), 2016, 4(1): 1201.
[24] 李欣雨,徐娟.我国医保谈判药品“双通道”管理政策执行困境及推进策略[J].中国药房,2024,35(08):906-911.
[25] 俞旭霞. 异地就医医保直接结算的困境与对策研究——以宁波X医院为例[D].杭州:浙江大学,2022.
[26] 翟绍果,陈兴怡.大数据在医疗服务与医保治理中的应用——基于数据技术、网络形态和政策支持的向度[J].江汉学术,2018,37(03):5-10.
[27] 鲍庆升. 医保数据分析若干问题的研究[D].合肥:中国科学技术大学,2015.
[28] 伍琳,廖诗语,陈嘉怡,等.多适应症药物医保支付的理论反思与实践走向——基于制度变迁视角[J].卫生经济研究,2024,41(06):10-15.

基金

国家自然科学基金资助项目“医保战略性购买视角下慢性病多层次门诊用药保障的经济效应与健康效应研究”(72404098); 中国博士后科学基金项目“价值导向下慢性病门诊用药保障与老年人健康产出:作用机制与政策优化”(2024M761028); 湖北省博士后创新人才培养项目(2024HBBHCXB019)

Accesses

Citation

Detail

段落导航
相关文章

/