ElevenLabs CEO Mati Staniszewski on Darth Vader, Competition and Preventing Misuse

What is the difference between your individual and corporate clients?

Today it is close to 60/40, 50/50. It was [previously] below on the enterprise side. At the beginning of 2024 it was 90/10. A lot of this is growing on the corporate side, especially now that we are collaborating and working more closely with some companies like Deutsche Telekom or Epic Games.

ElevenLabs creates both conversational AI chatbots and creative tools that generate sound based on prompts. What moves faster?

Conversational AI. The self-service part that uses conversational AI is mostly developers. There's an interesting push from very traditional spaces to move into this conversational segment. Epic Games is a great example. This was probably one of the largest deployments we've ever done to implement Fortnite an experience where everyone could interact with Darth Vader. This was done in partnership with the estate of James Earl Jones. But millions of players had a live Darth Vader to play with on the fly. It wasn't something you could do traditionally, going from static pre-built lines to dynamic characters. It was huge. Now we see customer support [field] heading towards huge upheaval [thanks to] conversational AI.

How to stay ahead of the big players?

I think we have some of the best people. And with text to speech, [and] with this speech to text, the next big problem that everyone is trying to solve is whether you can train an omni model, i.e. a combination of LLMs [large language model] and speech, which can improve the quality of conversation while keeping it not only emotional and fast, but also stable. We have an internal prototype and that's the big thing we're trying to build later this year. But the goal of conversational AI as a product, and the research that will go into creating that product, is to effectively pass the Turing test for conversation with an AI agent. So you feel like this is a real conversation. This is the North Star.

I thought we'd already passed the Turing Test..

Pure voice interaction like customer support would probably pass the Turing test, and I think, hopefully, we were one of the first to do that with some of these things that we do. But I think it was emotional contextual awareness and [there is a] higher threshold of intelligence in conversation.

How do you deal with people abusing your technology?

We have created security measures. One of them is transparency or provenance, so every [piece of] content is traced back to the account. Secondly, this is moderation, where we moderate both text and voice. So, we moderate scams and scams, we moderate child safety. For voice messages, we moderate them to ensure they are not misused. The final question is how we can bring this technology to people so that they know they are interacting with AI. [There is] classifier that allows people to upload audio content and get information about [whether it’s] AI or not. We are collaborating with Oxford, Berkeley, Reality Defender, and AI Security Institutes in the US and UK to make the classifier available to other organizations. How we can get all the technology into the hands of good actors while preventing bad actors is very important. This is the same balance.

Leave a Comment