Automated Scoring of Speaking and Writing: Starting to Hit its Stride

Daniel Marc Jones; Liying Cheng; Gregory Tweedie

doi:10.21432/cjlt28241

Auteurs-es

Daniel Marc Jones Queen's University
Liying Cheng Queen’s University https://orcid.org/0000-0002-4458-5085
Gregory Tweedie University of Calgary https://orcid.org/0000-0003-0497-4577

DOI :

https://doi.org/10.21432/cjlt28241

Mots-clés :

Notation automatisée de la langue , revue de littérature, rétroaction sur la notation, technologie dans l’évaluation et enseignement des langues

Résumé

Cet article examine la littérature récente (2011jusqu’à présent) sur la notation automatisée (NA) de l'expression écrite et de l’expression orale. Son objectif est d'abord d'examiner les recherches actuelles sur la notation automatisée de la langue, puis de mettre en évidence l'impact de la notation automatisée sur le présent et l'avenir de l'évaluation, de l'enseignement et de l'apprentissage. L'article commence par décrire le contexte général des problèmes de notation automatisée dans l'évaluation et les tests linguistiques. Il positionne ensuite la recherche sur la NA par rapport aux avancées technologiques. La deuxième section décrit en détail le processus de recherche de la revue de la littérature et les critères d'inclusion des articles. Dans la troisième section, les trois principaux thèmes qui se dégagent de l’analyse sont présentés : considérations relatives à la conception de la notation automatisée; le rôle des humains et de l'intelligence artificielle; et la précision de la notation automatisée avec différents groupes. Deux tableaux montrent comment des articles spécifiques ont contribué à chacun des thèmes. Ensuite, chacun des trois thèmes est présenté plus en détail, avec un accent séquentiel sur l'expression écrite, l’expression orale et un bref résumé. La quatrième section aborde la mise en œuvre des NA en ce qui concerne les questions actuelles d'évaluation, d'enseignement et d'apprentissage. La cinquième section présente les possibilités de recherche futures liées à la recherche et aux utilisations actuelles de la NA, avec des implications sur le contexte canadien en ce qui concerne les prochaines étapes de la NA.

Bibliographies de l'auteur-e

Daniel Marc Jones, Queen's University

Daniel Marc Jones is a PhD student in the Faculty of Education, Queen’s University. His research focuses on the use of games to teach language and literacies with the pedagogy of multiliteracies. That is, games are framed as rich cultural artifacts that can support active learning and participatory opportunities.

Liying Cheng, Queen’s University

Liying Cheng is Professor and Director of Assessment and Evaluation Group (AEG) at the Faculty of Education, Queen’s University. Her seminal research on washback illustrates the global impact of large-scale testing on instruction, the relationships between assessment and instruction, and the academic and professional acculturation of international and new immigrant students, workers, and professionals in Canada.

Gregory Tweedie, University of Calgary

M. Gregory Tweedie is Associate Professor in Language & Literacy at the Werklund School of Education, University of Calgary, Alberta. His teaching and research, in the field of applied linguistics, focuses on what happens when people from different first language backgrounds use English as a communicative vehicle in international professional contexts.

Références

Aluthman, E. S. (2016). The effect of using automated essay evaluation on ESL undergraduate students’ writing skill. International Journal of English Linguistics, 6(5), 54-67. https://doi.org/10.5539/ijel.v6n5p54

Attali, Y. (2011). Automated subscores for TOEFL iBT® independent essays. (ED525308). ETS Research Report Series, 2011(2), i-16. https://doi.org/10.1002/j.2333-8504.2011.tb02275.x

Attali, Y., Lewis, W., & Steier, M. (2012). Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring. Language Testing, 30(1), 125-141. https://doi.org/10.1177/0265532212452396

Bejar, I. I., VanWinkle, W., Madnani, N., Lewis, W., & Steier, M. (2013). Length of textual response as a construct-irrelevant response strategy: The case of shell language. ETS Research Report Series, 2013(1), i-39. https://doi.org/10.1002/j.2333-8504.2013.tb02314.x

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101.

Bridgeman, B., Powers, D., Stone, E., & Mollaun, P. (2012a). TOEFL iBT speaking test scores as indicators of oral communicative language proficiency. Language Testing, 29(1), 91-108. https://doi.org/10.1177/0265532211411078

Bridgeman, B., Trapani, C., & Attali, Y. (2012b). Comparison of human and machine scoring of essays: Differences by gender, ethnicity, and country. Applied Measurement in Education, 25(1), 27-40. https://doi.org/10.1080/08957347.2012.635502

Burstein, J., LaFlair, G. T., Kunnan, A. J., & von Davier, A. A. (2021). A theoretical assessment ecosystem for a digital-first assessment—The Duolingo English test. http://duolingo-papers.s3.amazonaws.com/other/det-assessment-ecosystem.pdf

Cahill, A., & Evanini, K. (2020). Natural language processing for speaking and writing. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 69-92). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language testing, 32(3), 385-405. https://doi.org/10.1177/0265532214565386

Cheng, J., Chen, X., & Metallinou, A. (2015). Deep neural network acoustic models for spoken assessment applications. Speech Communication, 73, 14-27. https://doi.org/10.1016/j.specom.2015.07.006

D’Mello, S. (2020). Multimodal analytics for automated assessment. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 93-111). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

d’Orville, H. (2020). COVID-19 causes unprecedented educational disruption: Is there a road towards a new normal? Prospects, 49, 11-15. https://doi.org/10.1007/s11125-020-09475-0

DiCerbo, K., Lai, E., & Ventura, M. (2020). Assessment design with automated scoring in mind. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 29-47). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Douglas, D. (2013). Technology and language testing. In C. A. Chapelle (Eds.), The encyclopedia of applied linguistics (pp. 1-7). Wiley-Blackwell. https://doi.org/10.1002/9781405198431.wbeal1182

Foltz, P. W., Yan, D., & Rupp, A. A. (2020). The past, present, and future of automated scoring for complex tasks. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 1-11). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Fu, J., Chiba, Y., Nose, T., & Ito, A. (2020). Automatic assessment of English proficiency for Japanese learners without reference sentences based on deep neural network acoustic models. Speech Communication, 116, 86-97. https://doi.org/10.1016/j.specom.2019.12.002

Golkova, D., & Hubackova, S. (2014). Productive skills in second language learning. Procedia-Social and Behavioral Sciences, 143, 477-481. https://doi.org/10.1016/j.sbspro.2014.07.520

Gu, L., Davis, L., Tao, J., & Zechner, K. (2021). Using spoken language technology for generating feedback to prepare for the TOEFL iBT® test: A user perception study. Assessment in Education: Principles, Policy & Practice, 28(1), 58-76. https://doi.org/10.1080/0969594X.2020.1735995

Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282-306. https://doi.org/10.1016/j.csl.2010.06.001

Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, e208. https://doi.org/10.7717/peerj-cs.208

Kaushik, V., & Drolet, J. (2018). Settlement and integration needs of skilled immigrants in Canada. Social Sciences, 7(5), 76. https://doi.org/10.3390/socsci7050076

Latifi, S., & Gierl, M. (2021). Automated scoring of junior and senior high essays using Coh-Metrix features: Implications for large-scale language testing. Language Testing, 38(1), 62-85. https://doi.org/10.1177/0265532220929918

Litman, D., Strik, H., & Lim, G. S. (2018). Speech technologies and the assessment of second language speaking: Approaches, challenges, and opportunities. Language Assessment Quarterly, 15(3), 294-309. https://doi.org/10.1080/15434303.2018.1472265

Loewen, S., Crowther, D., Isbell, D. R., Kim, K. M., Maloney, J., Miller, Z. F., & Rawal, H. (2019). Mobile-assisted language learning: A Duolingo case study. ReCALL, 31(3), 293-311. https://doi.org/10.1017/S0958344019000065

McNamara, T. (2005). 21st century shibboleth: Language tests, identity and intergroup conflict. Language Policy, 4(4), 351-370. https://doi.org/10.1007/s10993-005-2886-0

Powers, D. E., Escoffery, D. S., & Duchnowski, M. P. (2015). Validating automated essay scoring: A (modest) refinement of the “gold standard.” Applied Measurement in Education, 28(2), 130-142. https://doi.org/10.1080/08957347.2014.1002920

Ricker-Pedley, K., Hines, S., & Connolley, C. (2020). Operational human scoring at scale. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 171-193). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Rupp, A., Foltz, P., & Yan, D. (2020). Theory into practice: Reflections on the handbook. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 475-487). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist, 56(4), 302. https://doi.org/10.1037/0003-066X.56.4.302

Schmidgall, J. E., & Powers, D. E. (2017). Technology and high-stakes language testing. In C. A. Chapelle, & S. Sauro (Eds.), The handbook of technology and second language teaching and learning (pp. 317-331). Wiley Blackwell. https://doi.org/10.1002/9781118914069.ch21

Schneider, C., & Boyer, M. (2020). Design and implementation for automated scoring systems. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 217-239). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Settles, B., LaFlair, G. T., & Hagiwara, M. (2020). Machine learning–driven language assessment. Transactions of the Association for Computational Linguistics, 8, 247-263. https://doi.org/10.1162/tacl_a_00310

Shermis, M. D., & Burstein, J. (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge Academic.

Shin, J., & Gierl, M. J. (2021). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing, 38(2), 247-272. https://doi.org/10.1177/0265532220937830

Shohamy, E. (2013). The discourse of language testing as a tool for shaping national, global, and transnational identities. Language and Intercultural Communication, 13(2), 225-236. https://doi.org/10.1080/14708477.2013.770868

Voogt, J., & Knezek, G. (2021). Teaching and learning with technology during the COVID-19 pandemic: Highlighting the need for micro-meso-macro alignments. Canadian Journal of Learning and Technology, 47(4). https://doi.org/10.21432/cjlt28150

Wang, Y. (2021). Detecting pronunciation errors in spoken English tests based on multifeature fusion algorithm. Complexity, 2021, 1-11. https://doi.org/10.1155/2021/6623885

Wang, Z., & von Davier, A. A. (2014). Monitoring of scoring using the e‐rater® automated scoring system and human raters on a writing test. ETS Research Report Series, 2014(1), 1-21. https://doi.org/10.1002/ets2.12005

Wang, Z., Zechner, K., & Sun, Y. (2018). Monitoring the performance of human and automated scores for spoken responses. Language Testing, 35(1), 101-120. https://doi.org/10.1177/0265532216679451

Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x

Wind, S. A., Wolfe, E. W., Engelhard Jr, G., Foltz, P., & Rosenstein, M. (2018). The influence of rater effects in training sets on the psychometric quality of automated scoring for writing assessments. International Journal of Testing, 18(1), 27-49. https://doi.org/10.1080/15305058.2017.1361426

Wood, S. (2020). Public perception and communication around automated essay scoring. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 133-150). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Xi, X., Higgins, D., Zechner, K., & Williamson, D. (2012). A comparison of two scoring methods for an automated speech scoring system. Language Testing, 29(3), 371-394. https://doi.org/10.1177/0265532211425673

Yan, D., & Bridgeman, B. (2020). Validation of automated scoring systems. In D. Yan, A. A. Rupp, & P. W. Foltz (Eds.), Handbook of automated scoring: Theory into practice (pp. 297-318). CRC Press, Taylor & Francis Group. https://doi.org/10.1201/9781351264808

Yoon, S. Y., & Zechner, K. (2017). Combining human and automated scores for the improved assessment of non-native speech. Speech Communication, 93, 43-52. https://doi.org/10.1016/j.specom.2017.08.001

Zechner, K., Chen, L., Davis, L., Evanini, K., Lee, C. M., Leong, C. W., Wang, X., & Yoon, S. Y. (2015). Automated scoring of speaking tasks in the Test of English-for-Teaching (TEFT™). ETS Research Report Series, 2015(2), 1-17. https://doi.org/10.1002/ets2.12080

Zechner, K., Yoon, S. Y., Bhat, S., & Leong, C. W. (2017). Comparative evaluation of automated scoring of syntactic competence of non-native speakers. Computers in Human Behavior, 76, 672-682. https://doi.org/10.1016/j.chb.2017.01.060

Zhang, M., Breyer, F. J., & Lorenz, F. (2013). Investigating the suitability of implementing the E‐Rater® scoring engine in a large-scale English language testing program. ETS Research Report Series, 2013(2), i-60. https://doi.org/10.1002/j.2333-8504.2013.tb02343.x

Notation automatisée de l'expression orale et écrite : Un début prometteur

Auteurs-es

DOI :

Mots-clés :

Résumé

Bibliographies de l'auteur-e

Daniel Marc Jones, Queen's University

Liying Cheng, Queen’s University

Gregory Tweedie, University of Calgary

Références

Téléchargements

Publié-e

Numéro

Rubrique

Licence

Langue

Les éditeurs

about

submitelerts

cnie