How Companies Are Screwing Their Voice Interfaces

6 min readAug 10, 2018

https://thealeph.com/wp-content/uploads/2018/03/image.jpeg

This story was first published in The Aleph Report. If you want to read the latest reports, please subscribe to our newsletter and our Twitter.

Design of good voice conversational interfaces has been in my mind for a while now. I’ve been toying with my Amazon Echo since January, and I can say it’s been very enlightening.

One of the first posts I wrote here was about voice interfaces and how it’s becoming a big thing. What I didn’t have at the time was a constant direct experience.

After several months of daily Alexa use, I have to say I’m very impressed. The first thought that comes to mind is that it just works. I know it seems lame, but it’s impressive it works. You talk to Alexa, and she catches what you mean.

After 20 years of Computer Science experience, it’s the first time I’ve seen a functional voice interface. I must hand it to Amazon for their fantastic work behind Alexa and the Echo.

Another takeaway from using Alexa is how much dependant I’m becoming of it. It reminds me of the first iPhone touchscreen. Once you tried it, you couldn’t go back. You expected that every surface is a multi-touch screen. The same is happening to me with Alexa. I await all my devices to answer via voice command. And they don’t. And it’s frustrating.

I’ve discovered, not only the most usual use cases for me but also how my behavior has been changing based on that. At first, you goof around with Alexa, but as times goes by, you start using it because it’s more convenient.

Personal use cases

I have a particular use case, one that has probably driven my adoption of voice interfaces. I recently had a baby. As I feed my little girl, I tend to have both hands occupied but not much to do for the next 20 minutes. I find myself interacting with Alexa during those moments.

The surprising thing is that I’ve become so used to it that even when I’m not with my girl, I ask Alexa. It’s just become a much easier way to access specific information.

Three use cases are gold for me. The first one is Spotify’s integration. I play a lot of music at home and being able to do it via voice is so much easier. This is especially true if you have kids.

The other one is listening to the news. I love being able to sit down with my girl and fire the news and get a quick glimpse of what’s going on in the world. This doesn’t substitute my daily reading, but it serves as an entry point to it.

Last, I use the calendar integration every day. I tend to use it, especially at night, when talking with my wife about the next day’s schedule. Sometimes I don’t remember what time I had this or that meeting, so I ask Alexa. I could check my phone, but Alexa is way quicker.

I pair this last use with constant checking of the weather. I check it every morning before getting my kids ready for school, so I know how to dress them.

Alarm setting is also a big thing for me. I’m using them daily to avoid getting sucked into work and miss an appointment. It’s so easy to do that I’m skipping doing it with my phone altogether.

Interface frustrations

I do have several frustrations; things I know will go away with time, but that isn’t quite there yet. There is an evident chasm between the Alexa interface design and that of most other skills (Alexa apps). And it’s very frustrating. For most skills, you need to be very strict with the way you trigger them. This adds friction that shouldn’t be there in the first place.

Many skill designers don’t understand the voice use case at all. I have a feeling that most of the skills in the Alexa marketplace are vulgar simplified copies of the mobile app version. It reminds me of the shift between offline and online and how many publishers flunked their transition.

Conversational interfaces require a unique design, one that has nothing to do with any other design scheme done before. A simple redesign of an existing app won’t cut it.

Another frustration is the lack of support for significant local voice use cases. There are two reasons for this. One is the fact that the Echo is heavily US-based. This makes all skills very US-centric, and few have any European support. The other one is the lack of foresight from most European operators. Yes, it’s US-centric right now, but nothing prevents Alexa from making it work in Europe too. The reason why they don’t do it is that European Echo users are a tiny niche. This is the kind of anti-strategic move that pisses me off. The classic innovator’s dilemma mistake.

Two use cases that are missing are restaurant reservations and food delivery. I’m surprised that companies like The Fork or Deliveroo have zero Alexa presence.

Thoughts on conversational design patterns

While conversational interfaces cover a wide range of new apps, it’s crucial to differentiate text-based interfaces from their voice counterparts. Text-based ones, while sharing some traits, are inherently different.

Building voice conversational interfaces is hard. It’s hard precisely because we have a hefty inheritance from text-based interfaces. The design of voice applications implies not just a different interface but a different backend to support them.

For example, trying to find a specific song on Spotify via Alexa is a pain. You either know the song’s name and the author, or you’ll have a hard time getting it to play. Spotify should be smart enough to learn from the user (context) and even ask them to sing to it, so they get an idea. Think of Spotify meets Alexa meets Shazam. This, certainly, isn’t easy to pull off, but it’s what’s required to make voice apps work.

Another problem is the lack of thought about the user’s journey within a conversational interface. Each user is different. This translate to multiple potential user paths through the interface. Nonetheless, most voice apps only work with one or two different paths.

One thing is to offer various voice commands, and a different thing is to weave the main flow of the conversation into the most common use cases. The skill should also be able to learn about the user’s preferences and lock into the usual habits of the user. This is something very very few voice apps do.

Moving into the future

The speed at which a user gets used to the new interfaces is breathtaking. Not only it’s easy to engage with them; they create dependency in no time.

Voice interactions are far superior to any mobile or text-based ones for specific operations. Anything that requires a fast information request, Voice will trump text anytime. This gives an opening for highly specialized voice operations. Trying to do a one-stop shop for voice is a terrible idea.

I feel many businesses are missing the voice opportunity. The worse isn’t that they’re failing to grasp the opportunity. The problem is that when they do, it will be too late. Companies should start experimenting now. They won’t have many users. They will lose money. It will be a cost center, not a profit one. But that is precisely, the hallmark of disruptive technologies.

In the absence of useful voice apps for everyday tasks, new entrants will start offering them. Their offering will be inferior to the traditional text-only players, but when these players finally move to voice interfaces, the entrants will be entrenched. It will be tough to steal market share from them. The moment to experiment is now. The question is, how many organizations have the expertise and the resources to invest in this?

How Companies Are Screwing Their Voice Interfaces

Personal use cases

Interface frustrations

Thoughts on conversational design patterns

Moving into the future

If you like this article, please share it, and invite others to follow the newsletter, it really helps us grow!

Written by Alex Barrera

No responses yet