Science & Technology

AI Weekly: These researchers are improving AI’s ability to understand different accents

Join Transform2021 for the most important themes of enterprise AI and data. learn more..

The pandemic seems to be supercharged with voice app usage. Upward.. According to a survey by NPR and Edison Research percentage From early 2020 to early April, the percentage of owners of voice-enabled devices that use commands at least once a day increased. Just over one-third of smart speaker owners say they listen to more music, entertainment and news from their devices than ever before. Owners report that this year they are requesting assistants an average of 10.8 tasks per week, compared to 9.4 different tasks in 2019.according to New report By 2024, consumers will interact with the Voice Assistant on 8.4 billion devices, according to Juniper Research.

However, despite its growing popularity, assistants such as Alexa, Google Assistant, and Siri have a hard time understanding the different accents of the region.according to Survey According to the Life Science Center, 79% of people with accents are changing their voices to ensure that they are understood by digital assistants.And recently Survey Entrusted by Washington postThe popular smart speakers made by Google and Amazon were 30% less likely to understand non-American accents than those of native-born users.

Traditional approaches to narrowing the accent gap require collecting and labeling large datasets in different languages, which is time consuming and resource intensive. As a result, researchers at ML Commons, a non-profit organization associated with MLPerf, the industry standard benchmark set for machine learning performance, have embarked on a project of 1000 words in 1000 languages. You need to create a free pipeline that can take recorded speech and automatically generate clips to train a compact speech recognition model.

“For example, in the context of consumer electronics, you don’t have to go out and build a new language dataset because it’s costly, tedious, and error-prone,” said Vijay Janapa Reddi, an associate professor at Harvard University. Mr. says. Project contributors told VentureBeat in a telephone interview. “We are developing audio plug-ins from various sources, [words] For the training you want. “

The pipeline is limited in scope in that it only creates training datasets for small, low-power models that continuously listen to specific keywords (such as “OK Google” and “Alexa”), but it’s really aimed at. It can be an important step. A voice recognition system that is not bound by accents. By convention, training a new keyword spotting model requires manually collecting thousands of examples of audio clips labeled by keyword. When the pipeline is released, developers only need to provide a list of keywords they want to detect along with the voice recording, and the pipeline automates model extraction, training, and validation without the need for labeling.

“We’re not actually creating the dataset, we’re just training the dataset that results from searching for a larger corpus,” Reddi explains. “It’s like doing a Google search. What you’re trying to do is find a needle in a haystack. Ultimately, a subset of the results with different accents and the others there. Everything is done. “

The 1000 Words in 1000 Languages ​​project builds on existing efforts to make speech recognition models more accessible and fair. Mozilla’s common voiceThe open source, annotated audio dataset consists of audio snippets and spontaneously provided metadata to help train the audio engine, such as speaker age, gender, and accent. As part of Common Voice, Mozilla maintains dataset target segments aimed at collecting audio data for specific purposes and use cases. These include numbers from “zero” to “9”, “yes”, “no”, “hey” and “Firefox”. As part of that, in December, ML Commons released the first iteration of an 86,000-hour dataset published to AI researchers. Later versions will branch into more languages ​​and accents.

“Organizations with a lot of speech are often large, but speech has many uses,” says Reddi. “The question is how to deliver this to smaller organizations that aren’t the same size as large entities like Google and Microsoft. If you have a pipeline, focus on what you’re building. can.”

For AI coverage, send news tips to Hari Johnson And Kyle Wiggers — And be sure to subscribe AI Weekly Newsletter Bookmark the AI ​​channel. This machine..

thank you for reading,

Kyle Wiggers

AI staff writer


VentureBeat’s mission is to become a digital town square for technical decision makers to acquire knowledge about innovative technology and trading. Our site provides important information about data technologies and strategies to guide you when you lead your organization. We encourage you to become a member of the community and access:

  • The latest information on the subject you are interested in
  • Newsletter
  • Gated sort reader content and discounted access to valuable events such as Transform
  • Network function etc.

Become a member

AI Weekly: These researchers are improving AI’s ability to understand different accents AI Weekly: These researchers are improving AI’s ability to understand different accents

Back to top button