How effective is speech recognition software for improving pronunciation skills?

01 Feb 2021

On Saturday 19th June, Bindi Clements (Instructional Design and Efficacy Manager, Wall Street English) presented a research paper (Artieda and Clements, 2019)  on the effectiveness of automatic speech recognition (ASR) for providing pronunciation practice at the 54th IATEFL conference.  This research was carried out by the Wall Street English product team and uses study and survey data from learners using Wall Street English multimedia activities. It was published in the peer-reviewed collection of papers ‘Call and Complexity – short papers from EUROCALL 2019’ (Meunier et al., 2019). 

How can automatic speech recognition help language learners?

Automatic speech recognition (ASR) software can be effective for helping language learners improve their pronunciation in a number of ways. These include:  

  • Building proficiency: ASR-powered activities provide opportunities for learners to have more speaking practice outside the classroom. 
  • Personalising learning: Students can go at their own pace and practice according to their own specific needs. 
  • Giving objective feedback: Learners find it difficult to hear their own pronunciation errors. ASR-powered feedback can help to pinpoint learners’ errors and show them where they need to improve. 
  • Empowering students to learn. ASR-power activities enable learners to practice on their own without a teacher. 
  • Providing a low-anxiety environment. Learners do not feel nervous or judged when practising pronunciation with ASR-feedback 

How effective is automatic speech recognition?

The effectiveness of ASR for pronunciation training depends on the technology used. Off-the-shelf speech-to-text software (such as software used for dictation) provides a low-cost and flexible tool that can be used by teachers in the classroom. However, this technology is not specifically designed to be used with language learners, and does not always give accurate feedback or tell the learner where they need to improve. Automatic Speech Assessment technology, such as that integrated into the Wall Street English learning platform, has been designed specifically to provide learners with pinpointed feedback on where they need to improve.  When incorporated into a learning programme, this technology can provide accurate, actionable feedback to learners to help them with their pronunciation. 

However, whether or not speech recognition technology is effective also depends on learner attitudes towards pronunciation, and to their beliefs about the effectiveness of the technology. If learners do not believe they can improve their pronunciation, and/or do not believe the technology ‘works’, then they will not be motivated to use the online pronunciation activities. Other factors, such as nationality and age, can also have an impact on how learners use the pronunciation activities. Our research, which was based on a pilot programme involving over 2,800 Wall Street English students in four countries, set out to answer the following questions: 

  1. Do students think pronunciation activities with ASR help them improve their pronunciation?
  2. Do students in four countries (China, Vietnam, Italy and Saudi Arabia) make different use of ASR activity features?
  3. Are there differences between age groups and nationalities in students’ beliefs and perceptions on learning pronunciation using ASR?

The research found that Wall Street English learners were overwhelmingly positive towards using ASR-powered pronunciation activities to improve their pronunciation. There were, however, significant differences between the use of these activities between learners of different nationalities, and learner beliefs about pronunciation instruction had an effect on how the activities were used.

Find out more about the results and implications of this research, in Bindi’s IATEFL conference presentation below or view the research paper (Artieda and Clements, 2019).

IATEFL is one of the best-recognised events in the English training market, attended by 3,000+ professionals from over 100 countries. Its 2021 International Conference took place from June 19th to 21st involving a programme of a variety of talks and workshops.


Artieda, Gemma; Clements, Bindi. (2019). A comparison of learner characteristics, beliefs, and usage of ASR-CALL systems. In Meunier, Fanny; Van de Vyver, Julie; Bradley, Linda; Thouësny, Sylvie (Eds), CALL and complexity – short papers from EUROCALL 2019 (pp. 19-25).

Meunier, Fanny; Van de Vyver, Julie; Bradley, Linda; Thouësny, Sylvie (Eds). (2019). CALL and complexity – short papers from EUROCALL 2019.