ChatGPT在中国临床执业医师资格模拟考试中的表现研究

A Study on the Performance of ChatGPT in the Simulated Examination for Clinical Practitioner Qualification in China

在线阅读下载PDF

导出

摘要目的评估聊天生成预训练转换器(chat generative pre-trained transformer,ChatGPT)在中国临床执业医师资格模拟考试中的表现,并探讨其优势和局限性,以期对医学教育和知识评估提供参考。方法研究于2023年7月1—至9月1日进行,使用一组涵盖多个题型和专业的中国临床执业医师资格考试模拟选择题来评估ChatGPT的答题表现。所有试题都来自医学生常用的备考题库,旨在匹配中国执业医师资格考试的风格、内容和难度。根据试题类型和专业对300个选择题进行分组,并进一步将其细分为高阶思维试题和低阶思维试题。ChatGPT的表现通过回答准确率进行评估。结果在所有试题中,ChatGPT回答准确率为70.3%。ChatGPT对低阶思维试题的回答准确率(78.3%)高于高阶思维试题(66.0%),差异有统计学意义(P<0.05)。ChatGPT对临床医学和非临床医学试题的回答准确率分别为71.0%和68.7%,差异无统计学意义(P>0.05)。在4个题型中,ChatGPT的回答准确率分别为69.1%、64.3%、73.9%、70.8%,差异无统计学意义(P>0.05)。即使不正确,ChatGPT也能始终如一地使用自信的语言(100%)。结论ChatGPT能够顺利实现通过中国临床执业医师资格模拟考试的目标,预示其在医学教育和医疗实践中的具有巨大潜力。但是也必须意识到ChatGPT的局限性,例如它在不准确回答时仍然自信地表达。 Objective To evaluate the performance of the chat generative pre-trained transformer(ChatGPT)in Chinese practicing physician licensing simulated examinations and explore its advantages and limitations to provide inspiration for medical education and knowledge assessment.Methods The study was conducted from July 1 to September 1,2023,and the ChatGPT answer performance was evaluated using a set of simulated choice questions of Chinese practicing physician licensing examinations covering multiple item types and specialties.All questions were drawn from a commonly used test-prep item bank for medical students,and the questions were designed to match the style,content,and difficulty of the chinese medical licensing examination.300 choice questions were grouped according to question types and specialty,and further subdivided them into higher-order and lower-order thinking questions.ChatGPT performance was assessed by answer accuracy.Results Among all questions,the answer accuracy of ChatGPT was 70.3%.The answer accuracy of ChatGPT on lower-order thinking problems(78.3%)was higher than that on higher-order thinking problems(66.0%),and the difference was statistically significant(P<0.05).The answer accuracy of ChatGPT was 71.0%and 68.7%on clinical medicine problems and nonclinical medicine problems respectively,and the difference was not statistically significant(P>0.05).Among the four question types,the accuracy of ChatGPT was 69.1%,64.3%,73.9%and 70.8%respectively,and the difference was not statistically significant(P>0.05).ChatGPT consistently uses confident language(100%),even when incorrect.Conclusion ChatGPT can successfully achieve the goal of passing the Chinese practicing physician licensing simulated examination,which indicates the great potential of ChatGPT in medical education and medical practice.However,it is also necessary to be aware of the limitations of ChatGPT,such as its confident expression in the face of inaccurate answers.

作者张丽张雪周海燕温馨姜九明李健维李谭谭李蒙 ZHANG Li;ZHANG Xue;ZHOU Haiyan;WEN Xin;JIANG Jiuming;LI Jianwei;LI Tantan;LI Meng(Department of Diagnostic Radiology,National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital,Chinese Academy of Medical Sciences and Peking Union Medical College,Beijing 100021,China;Department of Education,National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital,Chinese Academy of Medical Sciences and Peking Union Medical College,Beijing 100021,China;Department of Radiation Oncology,National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital,Chinese Academy of Medical Sciences and Peking Union Medical College,Beijing 100021,China)

机构地区国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院影像诊断科国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院教育处国家癌症中心/国家肿瘤临床医学研究中心/中国医学科学院北京协和医学院肿瘤医院放疗科

出处《中国继续医学教育》 2024年第15期157-162,共6页 China Continuing Medical Education

关键词人工智能自然语言处理聊天生成预训练转化器中国临床执业医师资格考试继续教育医学 artificial intelligence natural language process ChatGPT Chinese practicing physician licensing examination continuing education medicine

分类号 G726 [文化科学—成人教育学]