Google Chirp 3, A Speech-to-Text Model Integrated With The Vertex AI

Google adds its voice model Chirp 3 to its Vertex AI platform

Much of the focus in generative AI has revolved around text-based interfaces used to produce text, images, and more. However, the next significant advancement appears to be in voice technology, and it is progressing rapidly.

In a recent development, Google has announced that Chirp 3—its speech-to-text and HD text-to-speech models—will be integrated into its Vertex AI development platform starting next week.

Last week, Google discreetly revealed that Chirp 3 would introduce eight new voices across 31 languages. Potential applications for the platform include developing voice assistants, producing audiobooks, and creating support agents or voice-overs for videos. This announcement was made at an event held at Google’s DeepMind offices in London.

This move comes at a time when other companies are making significant strides in voice AI. Just last week, Sesame—the startup known for the highly realistic “Maya” and “Miles” AI voice apps—launched its own model, allowing developers to build custom applications and services using its technology.

Notably, there will be certain restrictions on the use of Chirp 3 to prevent potential misuse. “We’re just working through some of these things with our safety team,” said Thomas Kurian, CEO of Google Cloud, at a recent news event.

One of the major players in AI voice services, ElevenLabs, has also secured hundreds of millions of dollars in funding to further its advancements in this field.

With this latest development, Chirp 3 will join the ranks of newer versions of Google’s flagship large language model, Gemini, which is currently being tested, alongside its image-generation model, Imagen, and the high-cost Veo 2 video generation tool.

Vertex AI

It remains to be seen whether Chirp 3 will deliver voice generation as realistic as some of the leading AI efforts, such as those from Sesame. However, as Demis Hassabis, CEO of DeepMind, pointed out, this technological race is a marathon rather than a sprint.

“In the near term … this idea that [AI is] a silver bullet to everything in the next couple of years, I don’t see that happening just yet. Think we’re still quite a few years away from something like AGI happening,” he said. “It’s going to change things … over the next decade, so the medium to longer term. It’s one of those interesting moments in time.”

Google initially launched Vertex AI in 2021 as a cloud-based platform for developers to build machine learning services. This was well before the surge of interest in AI—particularly generative AI—following the launch of OpenAI’s GPT models.

Since then, Google has been leveraging Vertex AI as it strives to catch up with competitors like Microsoft and Amazon, both of which are also expanding their generative AI offerings.

Besides incorporating generative AI within Gemini, Vertex AI enables developers to classify data, train models, and deploy them for production use. It remains to be seen whether Google will extend its ecosystem to accommodate models beyond those developed in-house.

Google has been working on “Chirp” voice technology for years, initially using the name as a code reference in its early attempts to compete with Amazon’s Alexa service.