Does Alexa understand your toddler? Western University researchers are looking into it

Listen to this article

Approximately 4 minutes

The audio version of this article was created using text-to-speech, a technology based on artificial intelligence.

Olivia Daub's two-year-old son is obsessed with “trinkets.” He talks about them and shouts about them at 5am every day.

Daub said most people have no idea what her son is talking about, but she knows how to get him the tiny dark blue fruit he really wants: blueberries.

“We were all children and faced the fact that adults did not understand us. [adults] We all found it very difficult to understand children because they produce speech and language differently than adults.”

Daub, an associate professor in the School of Communication Sciences and Disorders at Western University in London, Ont., says understanding toddler speech is even more difficult for artificial intelligence (AI). That's why she's leading a new study on how AI can better understand how babies talk.

Daub said that while automatic speech recognition software such as automatic captioning for Zoom meetings and Amazon's virtual assistant Alexa have become good at recognizing adult speech, it still struggles to accurately capture what young children are saying.

Woman smiling at camera outdoors
Sude Nikan is an assistant professor in the Department of Electrical and Computer Engineering at Western University, studying artificial intelligence. She is working on training an artificial intelligence model to better understand babies' speech. (Presented by Sudekh Nikan)

“I think we've all seen videos on YouTube where a kid asks Alexa to play a song and gets something completely different and completely inappropriate,” she said. “This research is trying to understand how we can use the principles of artificial intelligence and machine learning to improve recognition in toddlers and preschoolers.”

To do this, she is working with Western assistant professor of electrical and computer engineering Sudeh Nikan to train the artificial intelligence model on babies' common speech patterns and combinations.

“Most of the speech models we have are trained on adult speech, so most of these models are not very good at recognizing toddler speech, especially the mistakes they make,” Nikan said.

“You must provide examples of A.I. [for it] be able to understand and distinguish between common speech errors and problems.”

How the study will be carried out

Daub plans to gather 30 children to play, tell stories and talk to research assistants. Each session will be recorded and transcribed by people who will also collect data on the children's speech patterns.

One common pattern, Daub says, is that many English-speaking toddlers have trouble pronouncing the “r” sound and use the “w” sound instead.

Such data will be transferred to Nikan, who will feed the information to a private artificial intelligence model for training.

“We can fine-tune these models using data specifically annotated for this purpose,” Nikan said, adding that the AI ​​model will also be trained using some data from OpenAI’s pre-existing online service.

Daub and her team have met nine babies so far and are continuing to look for more participants for the study.

Clinical and everyday use of the AI ​​model

Although the research is in its early stages, Daub and Nikan said their goal is to train an artificial intelligence model that can be used in clinical settings to help speech therapists analyze and decipher what children say.

“I don't think we'll ever be 100 percent accurate unless we monitor kids 24 hours a day… but I think we can get a lot closer than we do now,” she said.

Going forward, Daub said, if AI can better understand preschoolers, it could improve tools like captioning and voice-activated accessibility software, as well as give kids more room to play with technology.

“We can be creative about how these little people can contribute to society. They are not just consumers of the world around them. Giving them access to technology is also an important factor,” Daub said.

Leave a Comment