Generalizabiltity Analysis of Teaching Level Evaluation for College Teachers

YU Nie-Jun; LI Guang-Meng; ZHANG Min-Jiang; JIANG Xin; LIANG Zheng-Yan; CHU Xiao-Yi

PDF(952 KB)

Journal of Psychological Science ›› 2016, Vol. 39 ›› Issue (1) : 90-96.

Generalizabiltity Analysis of Teaching Level Evaluation for College Teachers

Author information +

History +

Abstract

In order to improve the quality of college teaching, colleges in China have evaluation for college teachers teaching level every semester. Generally speaking, colleges have students evaluate each one of their curriculum teacher. Since each teacher teaches different classes, there will be a few results for each one of them and a concordant conclusion for each teacher cannot be drawn. On the other hand, factors which influence the results are multiple. For instance, too little students take part in the evaluation leads to low reliability of the result, students with different kinds of majors pay different attention to indexes, different kinds of curriculums gain disproportion grade and because of students focusing on different issues through different periods, time point is another factor that affects results. This study, based on generalizability theory, offers a method to solve the problem which is mentioned above and discusses the factors which affect results of Teaching Level Evaluation for college teachers. The data collected by the scale of Teachers’ Teaching Level Evaluation and collected from 19 curriculums, 7 of which are liberal arts curriculums and the other 11 are science curriculums. 558 data were collected at the beginning of the semester on March and 566 collected at the end of the semester on December, of which involved 5 liberal arts majors, 10 science majors and 4 engineering majors. All the data was saved to the txt format and analyzed with mGENOVA. According to the generalizability theory, evaluation taken by specific number of students is reliable enough to measure teaching level of one teacher. The generalizability theory uses index of dependability (Φ) instead of validity used in classical test theory to judge the reliability of results. In terms of needed number of evaluators, the D study result shows that reliability raises while the number increases and it is appropriate to have 20 students evaluate each teacher. The study also finds out that students major in engineering course, who pay more attention to practical issue, have higher reliability for five indexes of the scale than students major in liberal arts course or science course have. In addition, when students evaluate their teachers who teach science course, the result is more reliable than when they evaluate teachers of liberal arts. Last but not least, it turns out that the evaluation taken at the beginning of the semester is more reliable than the evaluation taken at the end of the semester. The conclusions are as follows: (1) Compared to the result taken at the end of the semester, the result taken at the beginning of next semester has a higher reliability. (2) Students with different kinds of major pay different attention on five indexes, which affects the reliability of evaluation. (3) Evaluation reliability for science curriculum is higher than evaluation reliability for liberal arts curriculum. (4) To ensure the reliability of evaluation, 20 students are needed to participate in the evaluation for each teacher.