10 essential things to consider when designing for voice

Voice design is incredibly currently, with more and more users adopting the technology in their homes and on the go. This is not only evident in the West either, with markets like China taking on voice design more wholesale where voice fits more naturally to non-digital native users who are coming online for the first time.

There is no doubt that that the use of voice is on the rise, but with big players in voice like Google and Amazon probing markets like hospitality for opportunities, designers need to be prepared to bring their tools to the table and design delightful experiences through Voice User Interfaces (or VUI’s, for short).

In this blog, Pimento member Mike Stead – UX Architect at award-winning digital agency twentysix – shares his top tips for designing for voice.

1. Voice is not meant to replace screen-based browsing, but rather to supplement it.

It’s worth noting that while voice is on the rise, it is not meant to replace the familiarity of physical devices and models of interfaces but rather act as an enhancement to the overall experience of a service.

Voice is one of the most natural ways of communicating, we should be leveraging voice for its ease of use where conversations are the right fit.

2. Make sure voice is the right fit for the users

While it’s exciting to include a voice offering to our users, it can be easy to fall into the trap of building one just because you can. Before rushing to the drawing board here are a few things to consider.

Does a human to human interaction exist already for this interaction? Is it a natural topic of conversation, like say, ordering an item of food from a menu? If the interaction is conducive to conversation and the interaction is brief, it is usually a good fit for a VUI.

3. The conversation should let the user speak freely

VUI’s perform best in a private or familiar shared space. Avoid taking new payment details or asking the user to perform a task they would feel uncomfortable performing in an unfamiliar public space by asking for other personal information.

4. Natural language breaks the traditional screen-based navigation models

Will the users need to be navigating through multiple steps to access the information they need, potentially through multiple apps or through hard to reach content on the screen?

Conversation naturally defeats this barrier by allowing a flat navigation through your app, meaning users won’t need to search for an action on screen to find what they are looking for. Voice can also be leveraged to enhance an experience on screen where visual aids are needed to inform choice, like in retail or for recipes.

5. The Maxim of Quantity

What makes a conversation delightful and engaging? There are principles allow us to have an insight into how cooperative conversations are formed and how we can leverage them in our VUI’s.

The principle, or maxim, we will look is known as the Maxim of Quantity. In this context, the maxim of quantity refers to giving the user exactly the right amount of information they need to perform a conversation cooperatively. For example:

User: “Where is the nearest pub to me?”

VUI: “Not far from here, really.”

You can argue that the VUI is answering the users’ question, but by violating the maxim it isn’t behaving in a cooperative (or useful) way. Let’s try something a little more in line with our maxim;

User: “Where is the nearest pub to me?”

Our VUI: “The nearest pub, The Rolling Stone, is 50 yards down the road, first left”

Better. Our system is adhering to this maxim by giving the user relevant, co-operative information and trying to pre-empt any further questions the user is understanding by looking at the context of the users’ question.

6. Cater to the co-operative user

This maxim shines with the ideal user for voice interface design, the informative user. The informative user is a cooperative user who is looking for verbal recognition from the product, much like a friendly co-worker or family member who is genuinely engaging in a conversation. In order to build a successful VUI, we need to make sure that at the most basic level our responses are delightful and informative. This saves us a lot of trouble down the line too when it comes to handling errors.

7. Avoid learned commands

One of the hardest obstacles voice designers will face now is reversing current expectations of voice interaction. Users’ expectations have been slowly changing the in the voice landscape because of the abundance of mediocre voice apps, where misrecognition is a very common problem that leads to uncooperative styles of speech.

Our biggest culprit is learned commands. A whole host of VUIs rely on users learning specific commands to work the features on their app, where they make their users learn (mostly through trial and error) how to speak their apps language. Users’ natural speech patterns don’t lend themselves particularly well to learned commands and if they start to feel like they are talking to a flowchart, they will naturally disengage from your app

8. Don’t force users down a ‘neat flowchart’

Utilities and other self-service phone applications will use a neat flow chart to funnel a user to the correct contact point, by explicitly telling the user what to say at each point. Once the user is locked into a choice, it becomes harder and harder to recover from errors, whether that’s going down a rabbit hole and reaching the end not having their query answered or making a mistake along the way and not being able to recover, often leading to have to start the whole process from the beginning. This whole process is very disingenuous but is often replicated in a lot of VUIs.

9. Pre-empt errors by understanding the superficial meanings of spoken language

Understanding the capability of technology is a big part of designing your VUI and according to Google, their assistant reached a 95% speech recognition success rate back in 2017 for US English. So why are people still struggling VUIs, with error messages cropping up during most sessions?

Well, recognition is not the issue with error handling, it’s understanding the language and superficial meanings behind the words of the user and making sure we don’t pose any questions that are purposely unambiguous and leave little room for error.

On a screen, users usually encounter instant feedback when something has gone wrong with a message letting them know how to recover, where they can do this at their own pace. Voice doesn’t have this luxury however, especially when users are halfway through a sentence and make a mistake. The user can’t unsay spoken words, and they are likely to get flustered before finishing their sentence. People expect our VUI to respond much like a human would in this scenario, particularly responding well if our app is following the co-operative principle.

Let’s look at an example of a co-operative response for error handling. In this fictional VUI, a user is trying to book a table at a restaurant: 

VUI: “Got it, how many people would you like to book for?”

User: “Me and my mate Jess”

In this case, the VUI is expecting a numeric response. It would be almost impossible to account for all the possible responses in the code; however, we could still keep the conversation co-operative:

VUI: “Sorry, how many was that?”

Here we are prompting the user for a numeric answer in an organic way, we want to avoid something like this:

VUI: “Sorry, I don’t understand. To book a table, you must give me a number. For example, you could say 2, or 2 people”.

The response here is way too verbose, overly formal and is explicitly telling the user to use a specific command to communicate. This will lead to the user losing confidence in the app and remind them that they are talking to a robot. 

As a side note, using words like ‘sorry’ in the application is used for this exact reason for error messages. The VUI isn’t genuinely feeling apologetic in this instance, but the user will be able to understand the superficial meaning by the word sorry in this context. Sorry meaning something hasn’t gone quite right, setting the user up for some error recovery.

10. Make good use of the Pareto Principle

Finally, it’s worth noting that not all features on the VUI will be used evenly. It’s a good idea to try and recognise early on the key feature of your application that is core to the experience, even if it only represents the smallest part of the application.

Designing 20% of the app to suit 80% of the use cases will leave us in a good place to polish that feature into a delightful, cooperative experience.

Want to be part of the voice revolution? We’ll happily put you in touch with the team at twentysix – get in touch.

If you like this article you might also like:
AI in Marketing: Where do we stand?

AI in Marketing: Where do we stand?

Artificial Intelligence is dramatically reshaping industries, and its impact on marketing strategies is (and will continue to be) profound. For […]

Read more.
Top Five Potential Metaverse Business Solutions

Top Five Potential Metaverse Business Solutions

The Metaverse is not all about gaming and entertainment. It’s also about business. Businesses of all sizes have already begun betting […]

Read more.