Modelos de lenguaje en educación: Inteligencia Artificial Generativa para optimizar el análisis del desempeño docente

Roberto E. Ramos-Rivera; Pedro César Santana Mancilla; Jesus Garcia-Mancilla; Laura S. Gaytán-Lugo

doi:10.29105/innoacad.v1i2.36

Authors

Roberto E. Ramos-Rivera University of Colima https://orcid.org/0009-0007-9588-3115
Pedro César Santana Mancilla University of Colima https://orcid.org/0000-0002-4184-0116
Jesus Garcia-Mancilla AI & Digital Solutions https://orcid.org/0000-0002-2104-8033
Laura S. Gaytán-Lugo University of Colima https://orcid.org/0000-0002-7007-7500

DOI:

https://doi.org/10.29105/innoacad.v1i2.36

Keywords:

generative artificial intelligence, large language models (LLM), teacher performance assessment

Abstract

This work explores using Generative Artificial Intelligence, specifically Large Language Models (LLM), to analyze open-ended responses in teacher performance assessments. Although LLM offers advanced capabilities for interpreting and classifying textual data, their tendency to generate "hallucinations" presents challenges in contexts where precision is crucial. Three approaches are presented to mitigate these risks: domain-specific LLMs, fine-tuned with educational data to enhance their relevance; Small Language Models (SLM), lighter models designed to optimize efficiency and reduce errors; and cloud-based models using few-shot learning, which allow rapid adaptation with representative examples but pose privacy concerns when processing sensitive educational data. Finally, the benefits of these tools for academic institutions are discussed, including improved decision-making, technological accessibility, and ecological sustainability.

Downloads

Download data is not yet available.

References

Abdou, I., & Eude, T. (2024). Open-ended questions automated evaluation: Proposal of a new generation. Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence, 143–147. https://doi.org/10.1145/3632971.3632980 DOI: https://doi.org/10.1145/3632971.3632980

Álvarez, B. A, Acosta-Díaz, R., & Morales-Vanegas, E. A. (2024). Privacy-Aware Artificial Intelligence: A Review of Design Principles and Applications. Avances en Interacción Humano-Computadora, 9(1), 209–213. https://doi.org/10.47756/aihc.y9i1.169 DOI: https://doi.org/10.47756/aihc.y9i1.169

Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 1-11. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1 DOI: https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1

Fuller, K. A., Morbitzer, K. A., Zeeman, J. M., Persky, A. M., Savage, A. C., & McLaughlin, J. E. (2024). Exploring the use of ChatGPT to analyze student course evaluation comments. BMC Medical Education, 24(423), 1-8. https://doi.org/10.1186/s12909-024-05316-2 DOI: https://doi.org/10.1186/s12909-024-05316-2

Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 1-15. https://doi.org/10.1016/j.caeai.2024.100206 DOI: https://doi.org/10.1016/j.caeai.2024.100206

Heffernan, T. (2022). Sexism, racism, prejudice, and bias: A literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 47(1), 144–154. https://doi.org/10.1080/02602938.2021.1888075 DOI: https://doi.org/10.1080/02602938.2021.1888075

Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1304016. https://doi.org/10.1080/2331186X.2017.1304016 DOI: https://doi.org/10.1080/2331186X.2017.1304016

Jansen, B. J., Jung, S., & Salminen, J. (2023). Employing large language models in survey research. Natural Language Processing Journal, 4, 1-7. https://doi.org/10.1016/j.nlp.2023.100020 DOI: https://doi.org/10.1016/j.nlp.2023.100020

Kreitzer, R. J., & Sweet-Cushman, J. (2022). Evaluating Student Evaluations of Teaching: A Review of Measurement and Equity Bias in SETs and Recommendations for Ethical Reform. Journal of Academic Ethics, 20(1), 73–84. https://doi.org/10.1007/s10805-021-09400-w DOI: https://doi.org/10.1007/s10805-021-09400-w

Lin, C.-C., Huang, A. Y. Q., & Yang, S. J. H. (2023). A Review of AI-Driven Conversational Chatbots Implementation Methodologies and Challenges (1999–2022). Sustainability, 15(5), 1-13. https://doi.org/10.3390/su15054012 DOI: https://doi.org/10.3390/su15054012

Lin, J., & Koedinger, K. R. (2024). HAROR: A System for Highlighting and Rephrasing Open-Ended Responses. Proceedings of the Eleventh ACM Conference on Learning @ Scale, 553–555. https://doi.org/10.1145/3657604.3664721 DOI: https://doi.org/10.1145/3657604.3664721

Pinto, G., Cardoso-Pereira, I., Monteiro, D., Lucena, D., Souza, A., & Gama, K. (2023). Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT. Proceedings of the XXXVII Brazilian Symposium on Software Engineering, 293–302. https://doi.org/10.1145/3613372.3614197 DOI: https://doi.org/10.1145/3613372.3614197

Remadi, A., El Hage, K., Hobeika, Y., & Bugiotti, F. (2024). To prompt or not to prompt: Navigating the use of Large Language Models for integrating and modeling heterogeneous data. Data & Knowledge Engineering, 152, 1-17. https://doi.org/10.1016/j.datak.2024.102313 DOI: https://doi.org/10.1016/j.datak.2024.102313

Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 1-7. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1 DOI: https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1

Stronge, J. H. (2018). Qualities of effective teachers (3rd edition). ASCD.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need, 1-15. https://doi.org/10.48550/ARXIV.1706.03762

Wang, Z., Denny, P., Leinonen, J., & Luxton-Reilly, A. (2023). Leveraging Large Language Models for Analysis of Student Course Feedback. Proceedings of the 16th Annual ACM India Compute Conference, 76–79. https://doi.org/10.1145/3627217.3627221 DOI: https://doi.org/10.1145/3627217.3627221

Zhang, B., & Tian, X. (2024). Capturing fine-grained teacher performance from student evaluation of teaching via ChatGPT. Education and Lifelong Development Research, 1(4), 156–169. https://doi.org/10.46690/elder.2024.04.01 DOI: https://doi.org/10.46690/elder.2024.04.01