A couple days ago, I wrote about a research paper comparing broad vs. deep menu designs for a phone menu. One quirk of the research was the uniquely user-centered design process which led to some rather (ahem) unique decisions.
The application was a system for browsing e-mail over the phone, and had eleven different functions for dealing with each message: Next, Previous, Repeat, Reply, Reply to All, Forward, Delete, List Recipients, Add Sender (to address book), Mark Unread, and Time and Date. The researchers were interested in comparing the usability of giving the caller all eleven options in a single menu (a "broad" design) vs. dividing the options into submenus (a "deep" design).
The menu structure they came up with for the "deep" design was:
Listen to Messages: Next, Previous, Repeat
Respond: Reply, Reply to All, Forward, Delete
Distribution: List Recipients, Add Sender
Message Details: Mark Unread, Time and Date
So if you wanted to go to the next message, you would first have to say, "Listen to Messages" and then choose the "Next" option from that submenu.
My immediate reaction to this design was along the lines of, "WTF? Why is 'Delete' in a submenu called 'Respond'? I would never look for the 'Delete' option under the 'Respond' menu, and that's probably one of the most used functions. What sane VUI designer would build this menu tree?"
It turns out that this menu structure wasn't built by any VUI design (sane or otherwise), but rather through a uniquely user-centric process which illustrates a hazard of blindly applying user data to the design process. Here's what they did:
Step 1: 26 users were asked to organize the eleven functions into logical groupings of five or fewer functions. Each user's groupings were analyzed, and an aggregated grouping was generated with the following groups:
1. Delete, Forward, Reply, Reply to All
2. Repeat, Next, Previous
3. Mark Unread, Time and Date
4. List Recipients, Add Sender
So far so good, though there's no reason why some functions (especially heavily-used ones like "delete") can't be included in multiple groups.
Step 2: 101 users (not the same ones as in Step 1) were given the four groups from Step 1 and asked to suggest a label for each group. The responses were compiled, and the researchers identified the most common suggestions for each group:
1. Delete, Forward, Reply, Reply to All: "Action" (volunteered 22 times), "Respond" (volunteered 15 times)
2. Repeat, Next, Previous: "Navigate" (volunteered 22 times), "Listen to Messages" (volunteered 15 times)
3. Mark Unread, Time and Date: "Message Details" (volunteered 10 times), "Status" (volunteered 10 times), "Miscellaneous" (volunteered 5 times), "Options" (volunteered 5 times)
4. List Recipients, Add Sender: "Address Book" (volunteered 9 times), "Distribution" (volunteered 9 times)
Here we start to see the beginnings of trouble. First, while the paper's authors don't disclose the exact wording of the instructions to the survey participants, it appears that they asked participants to "label" the group; in other words, offer a short description. However, that's the opposite of what a user of the application needs to do: a user needs to take a set of labels and guess which label contains a given function, rather than take a group of functions and describe them with a label.
As descriptive terms for the groups, the labels are fine. As guideposts to the functions contained in each group, many of the labels fall short.
The other problem is that none of the labels were volunteered by more than one in four participants. This should have been a red flag that the labeling isn't obvious, has no user consensus, and needs to be treated with some care. There's no evidence that the paper's authors did anything more than accept the survey results at face value.
Step 3: 155 users (not the same as in Step 1 or Step 2) were given the most common labels for each group, and asked to choose the most appropriate label for each group. The most popular label was used in the application:
1. Delete, Forward, Reply, Reply to All: "Respond" (66%) beat "Action" (34%)
2. Repeat, Next, Previous: "Listen to Messages" (83%) beat "Navigate" (17%)
3. Mark Unread, Time and Date: "Message Details" (40%) beat "Status" (25%), "Options" (26%) and "Miscellaneous" (10%)
4. List Recipients, Add Sender: "Distribution" (64%) beat "Address Book" (36%)
This is where "Delete" managed to get in a menu called "Respond:" because it was grouped with two other functions which are variations on "Reply," most survey respondents thought "Respond" was a more "appropriate" label for the group than "Action."
Just as in Step 2, part of the problem is that the survey participants were asked to do the wrong thing: they were asked to choose the "most appropriate" label, not which label they thought a given function belonged under. The end result is perverse, even though none of the steps sound (on the surface) unreasonable.
User Centric Design
The lesson we take from this is that while user input and surveys are critical input to the design process, they cannot replace the design process. Surveys are valuable tools, but they are not perfect, and require some intelligence and interpretation. Most importantly, they require a healthy skepticism.
In this situation, while I applaud the effort to gather lots of data about user preferences, nobody ever stopped and asked questions like "Does this result make sense?" "Did we apply the survey properly?" and "Did we ask the right questions?"
There were also some artificial constraints applied to the design, such as requiring every function to belong to exactly one submenu. A human designer would probably have elevated the most common functions to top-level status and push others into logical subgroups, maybe something like this:
Respond: Reply, Reply to All, Forward
Info: Add Sender, List Recipients, Mark Unread, Time and Date
(this particular grouping mildly violates the "no more than five options" limit at the top menu, but only mildly since "Next" and "Previous" aren't offered when listening to the last or first message. A professional could probably dream up something equally functional which doesn't violate the "five options" rule)
It would be interesting to know how the broad vs. deep applications would have compared with a more reasonable "deep" design. Unfortunately, we may never know the answer, since that will require re-doing much of the work behind this paper.