Modelos de lenguaje en educación: Inteligencia Artificial Generativa para optimizar el análisis del desempeño docente

Roberto E. Ramos-Rivera; Pedro César Santana Mancilla; Jesus Garcia-Mancilla; Laura S. Gaytán-Lugo

doi:10.29105/innoacad.v1i2.36

Autores/as

Roberto E. Ramos-Rivera Universidad de Colima https://orcid.org/0009-0007-9588-3115
Pedro César Santana Mancilla Universidad de Colima https://orcid.org/0000-0002-4184-0116
Jesus Garcia-Mancilla AI & Digital Solutions https://orcid.org/0000-0002-2104-8033
Laura S. Gaytán-Lugo Universidad de Colima https://orcid.org/0000-0002-7007-7500

DOI:

https://doi.org/10.29105/innoacad.v1i2.36

Palabras clave:

inteligencia artificial generativa, grandes modelos de lenguaje (LLM), Evaluación del Desempeño Docente

Resumen

Este artículo explora el uso de la Inteligencia Artificial Generativa, específicamente los Grandes Modelos de Lenguaje (LLM), para analizar respuestas abiertas en evaluaciones del desempeño docente. Aunque los LLM ofrecen capacidades avanzadas para interpretar y clasificar datos textuales, su tendencia a generar "alucinaciones" plantea desafíos en contextos donde la precisión es crucial. Para mitigar estos riesgos, se presentan tres enfoques: los LLM de dominio específico, entrenados con datos educativos para mejorar su relevancia; los Pequeños Modelos de Lenguaje (SLM), modelos más ligeros que optimizan la eficiencia y reducen la posibilidad de errores; y el uso de modelos en la nube con entrenamiento few-shot, que permiten ajustes rápidos mediante ejemplos representativos, aunque con implicaciones en privacidad y protección de datos. Finalmente, se describen los beneficios de estas herramientas para las instituciones educativas, incluyendo la mejora en la toma de decisiones, la accesibilidad tecnológica y la sostenibilidad ecológica.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Abdou, I., & Eude, T. (2024). Open-ended questions automated evaluation: Proposal of a new generation. Proceedings of the 2023 International Joint Conference on Robotics and Artificial Intelligence, 143–147. https://doi.org/10.1145/3632971.3632980 DOI: https://doi.org/10.1145/3632971.3632980

Álvarez, B. A, Acosta-Díaz, R., & Morales-Vanegas, E. A. (2024). Privacy-Aware Artificial Intelligence: A Review of Design Principles and Applications. Avances en Interacción Humano-Computadora, 9(1), 209–213. https://doi.org/10.47756/aihc.y9i1.169 DOI: https://doi.org/10.47756/aihc.y9i1.169

Boring, A., Ottoboni, K., & Stark, P. B. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness. ScienceOpen Research, 1-11. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1 DOI: https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AETBZC.v1

Fuller, K. A., Morbitzer, K. A., Zeeman, J. M., Persky, A. M., Savage, A. C., & McLaughlin, J. E. (2024). Exploring the use of ChatGPT to analyze student course evaluation comments. BMC Medical Education, 24(423), 1-8. https://doi.org/10.1186/s12909-024-05316-2 DOI: https://doi.org/10.1186/s12909-024-05316-2

Gao, R., Merzdorf, H. E., Anwar, S., Hipwell, M. C., & Srinivasa, A. R. (2024). Automatic assessment of text-based responses in post-secondary education: A systematic review. Computers and Education: Artificial Intelligence, 6, 1-15. https://doi.org/10.1016/j.caeai.2024.100206 DOI: https://doi.org/10.1016/j.caeai.2024.100206

Heffernan, T. (2022). Sexism, racism, prejudice, and bias: A literature review and synthesis of research surrounding student evaluations of courses and teaching. Assessment & Evaluation in Higher Education, 47(1), 144–154. https://doi.org/10.1080/02602938.2021.1888075 DOI: https://doi.org/10.1080/02602938.2021.1888075

Hornstein, H. A. (2017). Student evaluations of teaching are an inadequate assessment tool for evaluating faculty performance. Cogent Education, 4(1), 1304016. https://doi.org/10.1080/2331186X.2017.1304016 DOI: https://doi.org/10.1080/2331186X.2017.1304016

Jansen, B. J., Jung, S., & Salminen, J. (2023). Employing large language models in survey research. Natural Language Processing Journal, 4, 1-7. https://doi.org/10.1016/j.nlp.2023.100020 DOI: https://doi.org/10.1016/j.nlp.2023.100020

Kreitzer, R. J., & Sweet-Cushman, J. (2022). Evaluating Student Evaluations of Teaching: A Review of Measurement and Equity Bias in SETs and Recommendations for Ethical Reform. Journal of Academic Ethics, 20(1), 73–84. https://doi.org/10.1007/s10805-021-09400-w DOI: https://doi.org/10.1007/s10805-021-09400-w

Lin, C.-C., Huang, A. Y. Q., & Yang, S. J. H. (2023). A Review of AI-Driven Conversational Chatbots Implementation Methodologies and Challenges (1999–2022). Sustainability, 15(5), 1-13. https://doi.org/10.3390/su15054012 DOI: https://doi.org/10.3390/su15054012

Lin, J., & Koedinger, K. R. (2024). HAROR: A System for Highlighting and Rephrasing Open-Ended Responses. Proceedings of the Eleventh ACM Conference on Learning @ Scale, 553–555. https://doi.org/10.1145/3657604.3664721 DOI: https://doi.org/10.1145/3657604.3664721

Pinto, G., Cardoso-Pereira, I., Monteiro, D., Lucena, D., Souza, A., & Gama, K. (2023). Large Language Models for Education: Grading Open-Ended Questions Using ChatGPT. Proceedings of the XXXVII Brazilian Symposium on Software Engineering, 293–302. https://doi.org/10.1145/3613372.3614197 DOI: https://doi.org/10.1145/3613372.3614197

Remadi, A., El Hage, K., Hobeika, Y., & Bugiotti, F. (2024). To prompt or not to prompt: Navigating the use of Large Language Models for integrating and modeling heterogeneous data. Data & Knowledge Engineering, 152, 1-17. https://doi.org/10.1016/j.datak.2024.102313 DOI: https://doi.org/10.1016/j.datak.2024.102313

Stark, P. B., & Freishtat, R. (2014). An evaluation of course evaluations. ScienceOpen Research, 1-7. https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1 DOI: https://doi.org/10.14293/S2199-1006.1.SOR-EDU.AOFRQA.v1

Stronge, J. H. (2018). Qualities of effective teachers (3rd edition). ASCD.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need, 1-15. https://doi.org/10.48550/ARXIV.1706.03762

Wang, Z., Denny, P., Leinonen, J., & Luxton-Reilly, A. (2023). Leveraging Large Language Models for Analysis of Student Course Feedback. Proceedings of the 16th Annual ACM India Compute Conference, 76–79. https://doi.org/10.1145/3627217.3627221 DOI: https://doi.org/10.1145/3627217.3627221

Zhang, B., & Tian, X. (2024). Capturing fine-grained teacher performance from student evaluation of teaching via ChatGPT. Education and Lifelong Development Research, 1(4), 156–169. https://doi.org/10.46690/elder.2024.04.01 DOI: https://doi.org/10.46690/elder.2024.04.01