Evaluating Large Language Models in Analyzing Student Sentiments: A Course Feedback Case Study
Michael E. Auer; Peter Toth (Hrsg). Innovation via Collaborative Learning in Engineering Education : Proceedings of the 28th International Conference on Interactive Collaborative Learning (ICL2025). Cham: Springer International Publishing 2025 S. 28 - 35
Erscheinungsjahr: 2025
Publikationstyp: Diverses (Konferenzbeitrag)
Sprache: Englisch
Doi/URN: 10.1007/978-3-032-18888-5_3
| Geprüft: | Bibliothek |
Inhaltszusammenfassung
The emergence of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP), with significant implicationsfor educational sentiment analysis. Instructors and institutions increasinglyrely on student feedback to enhance teaching effectiveness, butmanual analysis of qualitative comments is resource-intensive. This studyinvestigates the potential of general-purpose LLMs, specifically GPT-3.5,for automated sentiment analysis of student course evaluations. We compareits perf...The emergence of Large Language Models (LLMs) has revolutionized Natural Language Processing (NLP), with significant implicationsfor educational sentiment analysis. Instructors and institutions increasinglyrely on student feedback to enhance teaching effectiveness, butmanual analysis of qualitative comments is resource-intensive. This studyinvestigates the potential of general-purpose LLMs, specifically GPT-3.5,for automated sentiment analysis of student course evaluations. We compareits performance against fine-tuned transformer models, includingBERT, XLNet, BART-large-MNLI, and RoBERTa-large-MNLI, usingan open-access dataset of student course feedback. Sentiment classificationwas conducted using both a three-label (negative, neutral, positive)and a more granular five-label (very negative to very positive) scheme. To assess GPT-3.5’s interpretive capacity, we applied various prompting strategies, such as Zero-shot, One-shot, Few-shot, Chain-of-Thought(CoT), and Role-Playing (RP). Our findings indicate that while finetunedmodels generally outperform GPT-3.5 in five-label classification,GPT-3.5 performs competitively in three-label settings when guided by effective prompts. These results suggest that LLMs, despite certain limitations, can be effectively deployed in educational contexts for scalableand cost-efficient sentiment analysis, contributing to improved responsivenessand personalized learning environments» weiterlesen» einklappen