Using Demographic Information, Psychological Assessment Data and Machine Learning to Predict Students’ Academic Performance

DING Xin-Fang; ZHE Jing; ZHANG Bin

PDF(1044 KB)

Journal of Psychological Science ›› 2021, Vol. 44 ›› Issue (2) : 330-339.

Using Demographic Information, Psychological Assessment Data and Machine Learning to Predict Students’ Academic Performance

Author information +

History +

Abstract

Tracking college students’ academic performance and predicting students who will be likely to fail courses are important to providing early intervention and increasing retention rates. Previous studies have found that many psychological factors are correlated with academic marks, including personality, coping styles, mental health and academic and social motivational constructs. However, the traditional way of studying correlational factors often fails in providing an early prediction model since the mechanism underlying poor academic performance is generally complicated and sometimes the patterns are even implicit. Machine learning is an approach that detects implicit patterns via algorithms and statistical models in the big data, which can optimize exploratory analysis by providing internal cross-validation and is more robust to outliers. The present study aimed at utilizing a machine learning approach involving demographic information and the results of psychological assessments as input to classify students who have failed courses from those who have not failed courses in their first year at college. Six hundred and fifty-three participants from five universities in northern China were recruited. They were required to complete demographic information survey, Symptom Checklist 90, Rotter Internal-External Locus of Control Scale, Trait Coping Style Questionnaire and The Big-Five Personality Inventory-10. Those questionnaires measured mental health, coping styles, personality and generalized control expectations on internal-external locus respectively. Academic performance information was collected one year later. The low performing students were defined as having at least one course failed in their first year at college. Five machine learning algorithms including Random Forests (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Na?ve Bayes (NB) and Decision Tree (DT) were trained to build dichotomous classification model to detect low-performing students. The results showed that the highest classification f1 score was obtained by RF algorithms, with accuracy = 99.00%, precision = 95.86%, recall = 91.83% and f1 score = 93.80%. The feature importance analysis revealed that the features extracted from demographic information and psychological assessment questionnaires were both important in predicting a college student’s academic. The top 10 most important features in RF algorithm included age, gender, whether the student is the only child or not, internal-external locus control, neuroticism, positive coping, agreeable, general symptomatic index, openness and anxiety level. To avoid overfitting, which occurs when the model fits the peculiarities of the training dataset too much and does not find a general predictive rule, a new dataset (n=166) was collected and used to test the generalization performance of the predicting model in the present study. According to the results, the model showed a good generalization performance on the new dataset that was collected one year later with f1 score = 90.90%, accuracy = 97.84%, precision = 92.60% and recall = 89.26%. The study shows the potential of machine learning approaches in predicting students who will be likely to fail courses by using demographic and psychological assessment information. The results demonstrated that the RF algorithm could be used effectively to build a classification model that identifies low-performing students, indicating the applications in the future where early intervention for low-performing students is possible.

Key words

Academic performance / Machine learning / Prediction / Psychological factors / Classification prediction model

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Using Demographic Information, Psychological Assessment Data and Machine Learning to Predict Students’ Academic Performance[J]. Journal of Psychological Science. 2021, 44(2): 330-339