Designing In-store Voice Assistants

Blog post author
Ross Malpass
December 11, 2018

Just a couple of years ago, when we first started working on it, the idea of talking to a screen in a store seemed like science fiction. But since then, we’ve seen a huge rise in smart speakers and people are getting more used to talking to Siri and Google on their phones or engaging with voice-driven customer service bots.

According to eMarketer, over 36% of millennials use voice assistants regularly, and comScore claims that by 2020, half of all searching will be done via voice. Voice is rapidly becoming a natural and instinctive UI: we’re getting closer to that Star Trek world where the main way we interface with machines is simply by speaking to them, and we expect the machines to talk back. Putting voice-enabled devices in stores isn’t as strange or far-fetched as it used to be. In fact, we’ve done it already. Twice.

There are many benefits to voice-enabled systems in retail environments. Most importantly, they don’t require the user to try and understand the interface. They don’t have to mess around with figuring out where to point and click, and they don’t need to type anything. All they have to do is talk, using natural language.

Voice-driven systems are also far easier to operate than traditional touch-screens in many real-world contexts. For example, they’re completely hands-free, so you can use them if your hands are full of shopping bags or if you’re holding onto kids. And in places like Sweden, where it gets bitterly cold in winter, you can operate them without having to take off your gloves first. Little things like that don’t seem like much, but they all contribute to creating a better customer experience.

One other major advantage of voice-centred systems is that they’re proactive when it comes to customer engagement. If designed the right way, they’re not passive devices, waiting patiently for shoppers to notice them. Instead, they detect the presence of passersby and initiate a conversation, actively attracting their attention. That gives them what we call “stopping power” - the ability to make a shopper literally stop and discover a new way to engage with a brand.

It’s not hard to imagine a store where we will talk to AIs throughout our visit, just as we talk to human staff now. As we enter, we ask where to find the items we’re looking for or what’s are on offer. Once in-store, we ask for more information about products or get recommendations based on the things we’ve already browsed. The technology is already there, and customers are ready for it. And - let’s be honest about it - it makes shopping more fun.

Designing for voice

Designing for a voice system is completely different to designing for traditional screen-based devices. That’s mostly because they have to be much simpler than touch-screens or apps, which means offering much less functionality. The huge difficulty with 100% voice based interfaces is that they don't provide any cues.

A visual UI will display something like a cart icon, a list of items, or a range of options that a user can click, which makes it clear what’s going on. A voice based system, however, has to literally tell the user what they can do. This has an important implication: voice is only useful for simple interactions and anything more complicated won't work.

But there’s another, more subtle reason why voice systems are different - personality. According to Google, 41% of people who own a smart speaker say that it feels like talking to a friend or another person. For a screen-based system to be successful, the most important thing is to deliver the required information efficiently and cleanly. For a voice-based system, however, the emotional connection is equally important. How you communicate with the user matters just as much as what you communicate - not just the words, but the tone and the actual voice.


We’ve faced some interesting technical challenges over the last year. The initial version of our voice-controlled “selfie mirror” was deployed in the spring. Then in November, we rolled out the new, improved version in Dubai. This had one major new feature - it’s bilingual in English and Arabic.

In addition, it’s much more accurate and it’s faster, which makes it better able to cope with a busy, noisy shopping mall which may be filled with background noises, multiple people speaking at the same time, and uncooperative acoustics. To do this, we had to integrate a whole new voice engine.

Shoppers in both Dubai and Manhattan love these devices. For H&M and Max Fashion, they offer customers an exciting new experience that brings them back to the store and gets them talking on social media.

We already have new clients eager to start using our next generation of voice-enabled devices. It’s certain that over the next couple of years, we’ll see voice moving from the realms of experimental concept installations to mainstream retail.

Let us help

Improving your Omnichannel journeys, Visitor Management or Customer Experiences?

Looking to deploy IoT, Digital Signage or Mobile apps?

Reach out by e-mail or use the form here and we'll be happy to help!

Email* Message*