A Study on the Cleaning of Real-World Medical Insurance Data——Taking the “Cost of Medicines Included in the Payment Scope” of National Negotiated Drugs as an Example

China Health Insurance ›› 2025, Vol. 0 ›› Issue (7) : 23-31.

China Health Insurance ›› 2025, Vol. 0 ›› Issue (7) : 23-31. DOI: 10.19546/j.issn.1674-3830.2025.7.003
Observation & Discussion

A Study on the Cleaning of Real-World Medical Insurance Data——Taking the “Cost of Medicines Included in the Payment Scope” of National Negotiated Drugs as an Example

Author information +
History +

Abstract

Objective: This study focuses on the quality issues of real-world medical insurance data, systematically constructs a data cleaning strategy for real-world medical insurance data, and evaluates the effectiveness of the data cleaning strategy using the statistics of the "cost of medicines included in the payment scope" of national negotiated drugs as an example. Methods: The study takes the original medical insurance settlement records database of anti-tumor drugs and immune regulators involved in the national medical insurance negotiation access in City A from January to September 2024 as the research object, and constructs a data cleaning strategy that includes establishing a verification dataset, text structuring, and handling of outliers. With the total cost of medicines and the cost of medicines included in the payment scope as effect indicators, the study compares and analyzes the changes in the observed indicators before and after data cleaning from multiple dimensions of different populations and medication scenarios. Results: After data cleaning, the total cost of medicines remained highly stable (deviation < 0.3%), while the cost of medicines included in the payment scope significantly decreased, with an absolute value reduction of approximately 2 million yuan, corresponding to a decrease of about 5%. Different deviations were observed across different populations (employees and urban and rural residents' medical insurance) and different medication scenarios (outpatient, inpatient, and retail pharmacies, as well as local and non-local medical treatment). Conclusion: The study verifies the effectiveness of the data cleaning strategy in improving the quality of medical insurance data and provides strong support for the scientific application of real-world medical insurance data. Meanwhile, the study reveals the heterogeneous distribution of medical insurance data quality issues, indicating that attention should be paid to data quality differences under different populations and medication scenarios in medical insurance data governance. In the future, it is necessary to further improve the medical insurance information system and strengthen cross-platform data governance to cope with the systemic challenges in medical insurance data governance.

Key words

data cleaning / real-world medical insurance data / cost of medicines included in the payment scope

Cite this article

Download Citations
A Study on the Cleaning of Real-World Medical Insurance Data——Taking the “Cost of Medicines Included in the Payment Scope” of National Negotiated Drugs as an Example[J]. China Health Insurance. 2025, 0(7): 23-31 https://doi.org/10.19546/j.issn.1674-3830.2025.7.003

References

[1] 杨莹,侯宜坦,吴若男,等.老年“两病”患者门诊用药保障的影响效应研究——基于2019—2023年医保真实世界数据[J].中国医疗保险,2024(07):11-23.
[2] 阮文懿,霍记平,于飚,等.我国儿童专用药品可及性真实世界数据的多中心调研[J].中国新药杂志,2024,33(09):849-855.
[3] 刘静,黄镇,覃肖潇,等.基于故障树分析法的医保基金使用风险识别研究——以某市医保基金监管真实世界数据为例[J].中国医疗保险,2019(05):34-38.
[4] 陈苏宁,范长生,盛广影,等.扩大医保报销对慢性髓性白血病治疗的影响——一项基于真实世界数据的卫生经济研究[J].中国医疗保险2017(08):55-60.
[5] 卡尔·安德森.数据驱动力:企业数据分析实战[M].张奎,郭鹏程,等,译.北京:人民邮电出版社,2021.
[6] 袁妮,吕子萱,黄祖彤,等.医保真实世界数据研究面临的挑战[J].中国医疗保险,2024(10):15-23.
[7] 马苏冰星,丁锦希,陈莹,等.医保准入真实世界证据的质量评价和应用规范[J].中国医药工业杂志,2024,55(06):866-872.
[8] 马文昊,王诗淳,靳英辉,等.真实世界研究的发展与展望[J].中国循证心血管医学杂志,2023,15(10):1266-1271.
[9] SHERMAN R E, ANDERSON S A, DAL PAN G J, et al. Real-world evidence-what is it and what it tell us?[J]. The New England journal of medicine, 2016, 375(23): 2293-2297.
[10] 邢冬梅,李春晓,刘新灿,等.中医药领域真实世界研究存在的问题与对策[J].中华中医药杂志,2021,36(4):1798-1801.
[11] 刘雨欣,侯宜坦,左后娟,等.医保真实世界数据质量评估研究——以谈判药品“纳入支付范围的药品费用”为例[J].中国医疗保险,2025(01):34-41.
[12] 刘鹏,张燕,李法平,等.数据清洗[M].北京:清华大学出版社,2018:1-13.
[13] 巴尔·摩西,利奥·加维什,莫莉·沃尔维克,等.数据质量管理:数据可靠性与数据质量问题解决之道[M].李晗玥,等,译.北京:机械工业出版社,2024.
[14] 王雯,高培,吴晶,等.构建基于既有健康医疗数据的研究型数据库技术规范[J].中国循证医学杂志,2019,19(07):763-770.
[15] 谭婧,熊益权,黄诗尧,等.用于药品临床价值和经济价值评价的真实世界数据关键技术考量[J].中国循证医学杂志,2024,24(05):516-522.
[16] 赵国桢,闫世艳,郭玉红,等.基于既有医疗数据构建研究型数据库的方法学探讨及实例解读(二):数据治理的方法[J].中国中医药信息杂志,2023,30(09):17-21.
[17] XI S, CHARLOTTE P, GIJS P V, et al.An automated data cleaning method for electronic health records by incorporating clinical knowledge[J]. BMC medical informatics and decision making, 2021, 21(1): 267.
[18] TYREE T P.Challenges of using medical insurance claims data for utilization analysis[J].American journal of medical quality, 2006, 21(4): 269-275.
[19] 华俊杰. 基于WHO MDB的两类非故意伤害死亡编码质量及其影响研究[D].长沙:中南大学,2022.
[20] 国家药品监督管理局药品评审中心.《用于产生真实世界证据的真实世界数据指导原则(试行)》的通告[EB/OL].(2021-04)[2025-03-11].https://www.cde.org.cn/main/news/viewInfoCommon/2a1c437ed54e7b838a7e86f4ac21c539.
[21] 中华中医药学会.《中医药真实世界研究技术规范-数据库构建和数据预处理》[EB/OL].(2021-06-30)[2025-03-11].https://www.cacm.org.cn/2021/06/30/13874/#:~:text=%E8%81%94%20%E7%B3%BB%20%E4%BA%BA,2021%E5%B9%B46%E6%9C%8830%E6%97%A5.
[22] MANPING G, YIMING W, QIAONING Y, et al.Normal workflow and key strategies for data cleaning toward real-world data: viewpoint[J]. Interactive journal of medical research, 2023:12e44310-e44310.
[23] OLIWIER D, TIFFANY C, MUSTAFA O, et al.Using a data quality framework to clean data extracted from the electronic health record: a case study[J]. EGEMS (Washington, DC), 2016, 4(1): 1201.
[24] 李欣雨,徐娟.我国医保谈判药品“双通道”管理政策执行困境及推进策略[J].中国药房,2024,35(08):906-911.
[25] 俞旭霞. 异地就医医保直接结算的困境与对策研究——以宁波X医院为例[D].杭州:浙江大学,2022.
[26] 翟绍果,陈兴怡.大数据在医疗服务与医保治理中的应用——基于数据技术、网络形态和政策支持的向度[J].江汉学术,2018,37(03):5-10.
[27] 鲍庆升. 医保数据分析若干问题的研究[D].合肥:中国科学技术大学,2015.
[28] 伍琳,廖诗语,陈嘉怡,等.多适应症药物医保支付的理论反思与实践走向——基于制度变迁视角[J].卫生经济研究,2024,41(06):10-15.

Accesses

Citation

Detail

Sections
Recommended

/