A modified version of this article originally appeared on ITProPortal.
Talking to our virtual assistants is almost like science fiction. We just love the idea of literally telling them what to do. We don't have to bother with a keyboard or a touch screen, we just get to order them around like our electronic servants. The technology isn’t new - we’ve had simple voice interfaces for decades, allowing us to give basic commands or dictate documents. But in the last few years, voice has taken a huge leap forward. Google Home, Alexa, Siri, and Cortana have begun to make the transition towards true AI, to a point where they're not just executing set commands, but they actually understand what they’re being told. You can ask them questions, and they'll do their best to answer. We’re reaching the era of “do what I mean, not what I say.”
But it’s still early days, and the tech still has a long way to go. We talked to users in both the UK and the US who had tried voice shopping. We weren’t entirely surprised by the range of reactions. We heard everything from “it was a disaster,” and “it’s horrible, I turned it off” to, “I love it,” and “we use it all the time.” The most common reaction was simply, “I don’t see the point.”
So why does it work well for some people and not for others? The reason wasn’t, as we initially expected to find, that the tech geeks loved it and the rest didn’t. At this point, almost everyone who has Alexa is an early adopter, so they’re all geeks to some extent. In fact, the most vehement negative reactions often came from the most tech-savvy users. The Americans definitely loved it more than the British, but that still doesn’t explain the range of reactions we saw.
What it came down to was two key factors: what people were trying to buy, and the amount of time they were prepared to invest in training both themselves and the system.
Layers of technology
To understand why this is, we need to look a little deeper into how voice shopping actually works. It’s a massively complicated problem to solve, requiring many different layers of technology and AI.
First, it has to understand speech. It has to take the sounds it hears, and convert them into words. It needs a huge vocabulary for every language, and use context to distinguish between homonyms such as “time” and “thyme”. It also needs to be able to understand different accents and dialects, male and female voices, as well as both children and the elderly. Given the difficulty that humans sometimes have understanding each other even when they're both speaking the same language, it’s a major challenge for a computer.
As if that wasn’t enough, an effective voice-controlled AI needs to be able to operate in a noisy environment. There may be music playing, a TV show in the background, a washing machine running, dogs barking, and traffic rumbling outside. It has to be able to tune all that out and pick up the relevant sounds that make up speech. To add extra complication, it needs to know that the character on the TV who just spoke to their AI isn't real.
Like any good butler, it needs to learn discretion. It’s always listening to every word you say, but it needs to be aware of when it’s being addressed and when it should ignore you.
Then, like any good butler, it needs to learn discretion. It’s always hovering in the background, listening to every word you say, but it needs to be aware of when it’s being addressed and when it should ignore you. In a shared house, it needs to know that it’s not okay for your children or your guests to order whatever they want. And, if you’ve ordered a gift for your partner, it needs to understand that it should be charged to your account, not to theirs.
Finally, we get to the actual shopping part of the problem. This requires the AI to know what I want almost as well as I do. If my wife asks me to pick up some coffee on my way home, we don’t need to have a long conversation about what kind of coffee, what size pack, or how finely to grind it. (Interestingly, almost everyone we spoke to used coffee to illustrate their first voice shopping experience.) The system doesn’t work effectively until it has a lot of data about your shopping habits and preferences - until it gets the information it needs, it has to keep asking you a lot of questions to reach the right level of detail, which is considerably slower and more frustrating than shopping via web or mobile.
The learning curve goes two ways. The more you order something like cat food, the more AI learns about what kind of cat foods you like. And you learn that if you say “Alexa, buy wet cat food ocean whitefish” instead of just “buy cat food,” you’ll get what you want a lot quicker. The UX sucks at first, but if you’re prepared to invest the time to train both the system and yourself, it’s ideal for simple repeat purchases, just like Dash buttons. And for hands-free situations, such as while you’re cooking, doing housework, or driving, it’s much easier than having to get to a computer or mobile device.
Where voice hits a real limitation is that it’s really not good for browsing or comparison shopping. For one-off or complex purchases, the audio interface is inadequate. You wouldn’t want to buy something like a sofa, a camera, clothes, or even a teddy bear without being able to see it, read reviews, and check the specifications. For that, you need a screen. That’ll doubtless happen in the next evolution: the AI will fire up the nearest screen or projection display and show you suggested items.
“Where are my new shoes?” is a perfectly reasonable question.
On the other hand, even if it’s not great at helping you buy these kinds of items, it should be able to help you with issues like delivery tracking, regardless of what device you used to make the purchase. “Where are my new shoes?” is a perfectly reasonable question.
It's only a matter of time
However, here’s the bottom line. AIs are like humans in one vitally important respect. They learn by failing. They listen to billions of words a day and gradually learn to understand us: look at how much voice input has improved since it became a standard feature on mobile devices. AIs undertake millions of shopping transactions and learn about what we really want. And, just like kids, they observe everything we do so that they can fit in with us better. Like kids, they do their best, they make mistakes, and we teach them to do better. Voice shopping may be limited and clunky now, but it won’t take long before it’s just another way to do business.
A good shopping AI shouldn’t be linked to just one vendor. It needs to be focused around you, not the store.
There’s one final issue that voice shopping developers need to address. A good shopping AI shouldn’t be linked to just one vendor. It needs to be focused around you, not the store. When you want coffee, it should search across multiple stores and see what’s available from all of them. It should retain your preferences even when you’re visiting a store for the first time.
To gain widespread acceptance in the home, AIs will have to work for the benefit of the customers, not just be an agent for a single retailer.