How English Learners Use Speech Recognition to Improve Their Pronunciation

On 29 August 2019 Dr Gemma Artieda and Dr Bindi Clements presented the latest research from Wall Street English into how learners use ASR-driven pronunciation activities at the prestigious international EUROCALL 2019 conference.

Following the highly successful launch of ASR-driven pronunciation activities as part of the Wall Street English core course earlier this year, the Product team at Wall Street English are committed to ongoing research into using ASR (Automatic Speech Recognition) technology for improving English pronunciation. The EUROCALL conference brings together renowned academic researchers and practitioners from around the world and is a key opportunity for sharing innovative research into using technology for learning and teaching languages.

So what has our research uncovered about how our students are improving their pronunciation using this technology?

Pronunciation can be challenging for anyone learning another language because it is often difficult to hear and copy sounds that are different to those from your own language. Research has shown that well-designed ASR-powered activities can be effective for helping language learners improve their pronunciation. Not only do ASR-powered self-study activities offer a lot of opportunities for much-needed individual pronunciation practice, but a new generation of speech assessment technology, designed specifically for giving pronunciation feedback to language learners, can provide the accurate and pin-pointed personal feedback that students need to improve.

Following overwhelmingly positive feedback from a pilot held in February this year, activities powered by the latest in speech assessment technology are currently being rolled out to all Wall Street English students studying their core English course. The latest research from Wall Street English presents some of the findings from the pilot, and aims to shed light on student engagement with ASR-powered pronunciation activities. Students from centres in China, Vietnam, Saudi Arabia and Italy completed ASR-powered pronunciation activities as part of their core course studies. The study records of these 2,867 students were analysed, and 482 of the students completed a survey to give specific feedback on the use of the activities.

The current research answers the following three questions.

Do students think pronunciation activities with ASR help them improve their pronunciation?

If learners believe that the pronunciation activities are effective, they are more likely to use these activities effectively. The results from our study were very encouraging. 95% of students believed that pronunciation activities with ASR help improve their pronunciation. Vietnamese students were the most enthusiastic (98.8%), closely followed by students in Italy (98.5%), then in Saudi Arabia (95.2%), and, finally, students in China (91.5%).

Do students in different countries make different use of ASR pronunciation activity features?

Students in different countries may make different use of the activities in different ways for a number of reasons, such as how difficult they find English pronunciation (how different their own language sounds from English may be a factor), or their beliefs about how to improve pronunciation skills. The study found remarkable differences in usage of specific activity features across countries.

ASR is used to assess words and sentences recorded by students to give ‘traffic light’ feedback on their pronunciation, with ‘green’ showing the learner that their recorded answer would be understood by another speaker of English. For any answers shown in yellow or red, students have the chance to retry and re-record their answers.

The research found that students from Vietnam and China use more retries than Saudi Arabia, and students from Italy use the fewest number of retries. Students can also listen to a model audio to help them refine their pronunciation. Students in China and Vietnam reported using this feature the most (82%), less so students in Saudi Arabia (57%) and remarkably lower usage was reported by students in Italy (41%). Lastly, students have the opportunity to listen to their own recordings to compare to the model. Interestingly, while a similar number of students in Vietnam reported using this feature (80%) as with listening to the model audio, Chinese students use this features far less (66%), and students from Saudi Arabia (61%) and Italy (57%) report much lower usage of this feature.

Are there differences between student nationalities and/or student age groups in terms of their beliefs and perceptions on learning pronunciation using ASR?

Students hold different beliefs and perceptions about learning pronunciation, such as whether they want to (or believe they can) sound like a native speaker, and whether technology can be successfully used to improve pronunciation. The analyses for age revealed statistically significant differences between age groups and most beliefs and perceptions about learning pronunciation, with responses suggesting that older learners have less overall confidence in their ability to acquire pronunciation skills. The differences between nationalities were even more remarkable, and may help to explain some of the differences found in feature usage reported for question 2. For example, although we saw above that 98.5% of students from Italy believe that the ASR-powered activities to help them to improve their pronunciation, they also demonstrated lowest self-belief in ability to be able to speak English very well and placed lowest importance on being able to speak with excellent pronunciation. Interestingly, it was these students who reported the lowest instances of listening to both the sample and their own recorded audios.

This study points to the importance of considering differences between students of different ages and nationalities, and suggests that successful learner engagement with ASR-powered activities will depend not only on the effectiveness of the technology, but also on learner beliefs and perceptions.

With over 420 learning centers in 28 countries and with a total enrolment of more than 180,000 students, Wall Street English has a large and diverse student base. Uncovering learner beliefs about improving pronunciation and understanding how students use ASR technology can help us to continue to design best-in-class pronunciation activities and to give effective and personalised study advice to each of our students.

For more details and results, view the slides from the presentation.

The effects of learner characteristics and beliefs on usage of ASR-CALL systems from Bindi Clements