心理科学 ›› 2024, Vol. 47 ›› Issue (2): 485-493.DOI: 10.16719/j.cnki.1671-6981.20240227

• 临床与咨询 • 上一篇    下一篇

主题统觉测验用于自杀风险识别——基于语音及文本特征的机器学习研究*

杨劲寅1, 吴雯1,2, 李世佳1, 张亚**1   

  1. 1华东师范大学心理与认知科学学院 上海市心理健康与危机干预重点实验室, 上海,200062;
    2华东师范大学计算机科学与技术学院,上海,200062
  • 出版日期:2024-03-20 发布日期:2024-02-29
  • 通讯作者: **张亚,E-mail: yzhang@psy.ecnu.edu.cn
  • 基金资助:
    *本研究得到国家自然科学基金青年项目(31900767)“心理咨询中咨访关系的神经基础:基于来访者和咨询师大脑同步性的研究”、上海市科技计划项目资助(20dz2260300)和中央高校基本科研业务费专项资金的资助

Thematic Apperception Test for Suicide Risk Identification: An Audio and Text-Based Machine Learning Study

Yang Jinying1, Wu Wen1,2, Li Shijia1, Zhang Ya1   

  1. 1Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, School of Psychology and Cognitive Science, East China Normal University, Shanghai, 200062;
    2School of Computer Science and Technology, Shanghai, 200062
  • Online:2024-03-20 Published:2024-02-29

摘要: 自杀风险识别是自杀预防的重要环节,但传统的自陈量表筛查存在虚报/漏报率高的局限。通过两步连续实验对主题统觉测验(TAT)进行的改编实现了基于TAT的小程序自助施测方案,并获取音频及文本数据用于机器学习建模,构建了针对自杀意念的自杀风险识别模型。结果发现,在测验耗时更短的情况下,该模型取得了与前人研究相比综合指数更优的模型效果;词频分析及关键词共线网络分析发现高自杀风险组被试在叙述文本中提及了更多与自杀、自伤相关的词汇以及主题,且使用了更多的排除词。经改编后的TAT小程序施测方案流程标准化且施测便捷,后续可收集更多高质量的样本以构建泛化性能更优的模型,应用于自杀风险识别的辅助评估中。

关键词: 自杀风险识别, 主题统觉测验, 机器学习, 语音识别, 文本分析

Abstract: Suicide is not only a personal tragedy, but also has far-reaching effects. Identifying suicide risk is an important part of suicide prevention. Because traditional suicide risk screening methods based on self-report scales have a high rate of misreporting/underreporting, it is important to find an objective and effective identification tool.
Although previous studies on the establishment of suicide risk identification models through audio data have yielded good results, the test materials used lacked theoretical support and were time-consuming. Besides, the lack of a standardized process made it difficult to collect large data to train a model that could be applied. Therefore, this study aims to adapt the widely used Thematic Apperception Test (TAT) by two steps. Firstly, adapting the test materials into an online test to build a model, and then developing a WeChat app to obtain high-quality audio data in a standardized process to build a suicide risk model.
Study 1 began by adapting a standardized process for online administration of the TAT using the Tencent meetings. The audio of 64 subjects (High Risk Group: 34; Low Risk Group: 30) who completed the test was included in the analysis. After pre-processing, speech and text features were extracted for machine learning modeling, and four classifiers (SVM, LR, RF, KNN) were used to build the model. It was found that (1)Three pictures in the TAT test constructed the best performing classification models. Take Picture 5 in TAT for example, the LR model achieved an average ACC= .80 and an average AUC= .90. The best performing models were LR and SVM. (2) The analysis of narrative duration revealed that the subjects in the crisis group in this test generally had longer narrative durations. (3) Word frequency analysis of the full-length text using KH Coder found more words related to suicide, self-injury, and negative emotions mentioned in the narrative texts of the subjects in the crisis group, and more themes about suicide and self-injury in the narratives of the subjects in the crisis group were found through Keyword Co-occurrence Network analysis. The results of Study 1 confirm the feasibility of administering a TAT online and applying speech and text features to identify suicide risk, but the test is still time-consuming and requires a subject to administer it, so there may be experimenter bias.
To further standardize the process, reduce the test time and enhance the convenience of the test, and thus improve the applicability of the adapted TAT, we further conducted Study 2. In this Study, a WeChat app was designed and implemented, and two images from Study 1 (Figure 5 and Figure 10) were used as test materials and administered by the subjects themselves. A total of 58 subjects' audio was included in the analysis (High Risk Group: 29; Low Risk Group: 29). Four classifier models were selected for feature extraction and evaluated for effectiveness. The LR model trained with the data set extracted from the combined audio in Figure 5 and Figure 10 achieved the best results of all models in terms of ACC metrics (mean ACC= .83, mean AUC= .89). The results of the study suggest that modeling using audio data generated from a participant self-administered test can also yield satisfactory results. The constructed model achieved better modeling results with a better composite index compared to previous studies when the test took less time. The short administration time, ease of administration, and standardized procedure of the adapted TAT applet also facilitated the collection of more high-quality samples for the construction of a better generalized model to be used as an aid in the identification of suicide risk.

Key words: suicide risk identification, thematic apperception test, machine learning, speech recognition, text analysis