Objective: In view of the increasingly concerned issue of commercial health insurance participation of sick persons, this study took cardiovascular diseases as an example to analyze the prediction and influencing factors of hospitalization costs, so as to provide a reference for the improvement of commercial health insurance strategies of sick persons. Methods: Six mainstream machine learning algorithms were evaluated comprehensively, and the parameters of the model were selected for Bayesian optimization to improve the accuracy of hospitalization cost prediction for patients with cardiovascular diseases. The study deeply analyzed the influence of key factors such as client characteristics, disease characteristics and geographical attributes on hospitalization costs in order to reveal the correlation between these factors and costs. Results: The gradient ascending decision tree-Gaussian process (GBDF-GP) model showed the best performance in predicting hospital expenses of patients with cardiovascular disease. The analysis shows that customer attributes are the main influencing factor of cost, followed by disease attributes, and geographical factors only have a significant impact on patients aged from 91 to 100. It is recommended that insurance companies apply GBDF-GP model to broaden coverage and carry out refined risk assessment for different customer groups. In addition, the research results provide an empirical basis for promoting the innovation of insurance products and promoting the diversification of insurance market.
Key words
cardiovascular disease /
hospitalization expenses /
machine learning /
sick persons' participation of insurance; /
commercial health insurance
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 梁志胜.医疗保险对老年医疗服务和健康影响研究[D].南宁:广西医科大学,2017.
[2] 中国银行保险监督管理委员会.中国银保监会办公厅:关于进一步丰富人身保险产品供给的指导意见[R/OL].(2021-10-15)[2022-11-22].https://www.beijing.gov.cn/zhengce/zhengcefagui/qtwj/202204/t20220411_2671471.html.
[3] 李雅婷,江原.保险科技赋能保险价值链[J].中国金融,2021(22):58-59.
[4] 张宁,陈浩,周亮,等.基于机器学习模型的糖尿病带病人群医疗险风险保费测算[J].保险研究,2020(11):79-95.
[5] 赵颖旭,包竹青,高珊,等.考虑老年痴呆症的医疗险住院费用预测与比较——基于机器学习模型[J].保险研究,2020(09):64-76.
[6] 陈伟伟,高润霖,刘力生,等.《中国心血管病报告2017》概要[J].中国循环杂志,2018,33(1):1-8.
[7] THOMAS C, PETER H.Nearest neighbor pattern classification[C].IEEE: transactions on information theory,1967.
[8] BREIAMN L.Random forests[J].Machine learning, 2001, 45(1): 5-32.
[9] LIU T,TING K M,Yu Y,et al.Spectrum of variable-randomtrees[J].Journal of artificial intelligence research, 2008, 32(1): 355-384.
[10] FRIEDMAN J H.Greedy function approximation: a gradient boosting machine[J].Annals of statistics, 2001, 29(5): 1189-1232.
[11] CHEN T, GUESTRIN C.XGBoost: a scalable tree boosting system[C].San Francisco: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 785-794.
[12] KE G L, MENG Q, FINLEY T, et al.LightGBM: a highly efficient gradient boosting decision tree[C].Neural Information Processing Systems Curran Associates Inc, 2017.
[13] 周志华.机器学习[M].北京:清华大学出版社,2018:171-177.
[14] SCHAPIRE E.The strength of weak learnability[J].Machine learning, 1990, 5(2): 197-227.
[15] SNOKE J, LAROCHELLE H, ADAMS R P.Practical Bayesian optimization of machine learning algorithms[J].Advances in neural information processing systems, 2012: 2951-2959.
[16] 世界卫生组织.ICD-10:疾病和有关健康问题的国际统计分类(第10次修订本)[EB/OL].[2022-12-22].https://www.who.int/classifications/classification-of-diseases.
[17] 陆阳,石宝峰,迟国泰,等.基于违约损失逆序最小的非线性信用风险评价模型及实证[J].中国管理科学,2023:1-15.
[18] DEMSAR J.Statistical comparisons of classifiers over multiple data sets[J].Journal of machine learning research, 2006 (7): 1-30.
[19] CHUN W W, CHUN S K, ADITYA, et al.Marital status and risk of cardiovascular diseases: a systematic review and meta-analysis[J].Heart (British Cardiac Society), 2018, 104(23): 1937-1948.
[20] 中国银行保险监督管理委员会.关于印发保险业标准化“十四五”规划的通知[R/OL].(2022-05-11)[2024-06-09].https://www.gov.cn/zhengce/zhengceku/2022-05/28/content_5692816.htm.