Choosing the Right Test Method
In This Issue:
- Choosing the Right Test Method
Choosing the Right Test Method
By Peter Leppik
"You can't manage what you can't measure," and measuring how well a customer service operation meets callers' needs can be difficult. Measuring performance is a critical component of any plan to improve customer service. The wrong test method can be worse than not testing at all, since it can lead to misleading data and a false sense that everything is working well when it isn't.
The first thing to decide is what you want to measure. Everything from capacity under load to caller preferences for recorded voices can be accurately measured, but not always with the same test. Some methods produce reproducible statistics which can be used to compare different systems, while others produce only qualitative data, useful for a general sense of system quality, but not for comparison purposes.
Contacting Past Customers
Contacting past customers, or doing a follow-up survey, is a common method for quality monitoring in call centers. Either during or shortly after a call, a caller is asked if he or she wants to participate in a survey. The survey can be either automated (though an IVR system), or administered by a trained agent.
This can be a powerful method for collecting customer satisfaction data, but it has some serious drawbacks. The most important is the difficulty in getting enough caller responses to have a statistically meaningful sample. We believe 500 survey responses should be a minimum, but given the expense and difficulty in getting callers to participate, many companies stop at around 100. Statistically, you can't reliably differentiate between 75% and 95% satisfaction with only 100 calls.
Capacity (Load) Testing
Capacity testing uses an automated dialer to place hundreds or thousands of simultaneous calls to an automated system, ensuring that the system can handle the expected call volume.
Load testing is a critical part of rolling out any new application. It is best performed when the application is close to its final form, after usability and caller preference testing is complete. Making significant changes, especially in a speech recognition application, can impact the system's ability to handle the expected load.
"Wizard of Oz" Testing
"Wizard of Oz" testing is common with speech recognition applications in the early stages of design. This involves using a human to play the part of the speech recognition computer, as a way of testing design prototypes before any actual programming is done.
Wizard of Oz can provide good qualitative data for providing direction for refining the application design, but it cannot provide statistical data, given the very small number of callers typically used (generally 25 or fewer). A Wizard of Oz test will provide ideas for improving an application, but it can't tell you if the new application is better or worse than an earlier version, or how it stacks up against industry norms.
Technical benchmarking, or comparing statistics generated by a call center system against published industry norms, is another common technique for evaluating a system. It has the advantage of using data which is already being generated by existing systems, such as average hold times, and call abandon rates, so the only additional expense is buying benchmark statistics from a third party.
Unfortunately, these statistics are often promoted as a measure of caller satisfaction, when they are really proxies at best. For example, it is clearly good if the average hold time is shorter than industry norms, but that doesn't mean customers are being well served.
In the worst case, relying too heavily on technical benchmarking can lead to a customer service operation managing to the numbers, rather than managing to customer service. For example, call center agents under pressure to reduce their average call time have been known to abruptly hang up on callers with difficult problems. That certainly will reduce average call length, but at the expense of customer satisfaction.
In addition, while technical benchmarking can help decide if a system is performing poorly, it can't tell you if a replacement system will be any better, since it is only meaningful once a system is rolled out.
Focus groups, intensive interviews with small numbers of customers, are similar in many ways to Wizard of Oz testing. This method can generate a lot of ideas for improvement, and qualitative feedback about a new or existing customer service operation, but it can't generate statistically valid or comparative data.
Employee Test Calls
Employee test calls are a very common method for testing new automated systems. This involves having employees call into an application, and provide feedback for improvement (often through a survey). It has the sole advantage of being fast and cheap.
Unless the system is intended to be used by employees (for example, an HR hotline), this method can actually be worse than doing no testing at all.
The problem lies in the fact that employees are a very different group of people than customers. Employees are familiar with the jargon and processes of the company and industry, where customers generally are not. We have experience with several companies which successfully tested new applications using employee calls, yet found that the expensive new system completely failed to serve the needs of real customers.
As a result, we strongly recommend that companies not rely on employee calls to test a new system.
Pilot testing, or rolling out a new application to a limited number of live customer calls, can be a good way to understand application performance in the real world, but unfortunately, it comes too late in the project to make major design changes.
In addition, unless a concerted effort is made to follow-up with callers, there is no way to gather satisfaction data with this method. We recommend that pilot testing be viewed more as a roll-out strategy than a testing strategy, and that other methods be used earlier in the development process.
A VocaLabs survey has a number of important advantages, including a large number of live callers, the ability to directly compare live and automated operations, the ability to test systems at any stage of development, the ability to gather both quantitative and qualitative data, and the ability to benchmark against industry norms.
Often, a VocaLabs survey is comparable in cost and turnaround time to other test methods, even ones which have serious flaws. While we admit to being biased, our survey method was designed from the ground up to address many of the limitations in other techniques, and we believe we have succeeded in developing a more cost-effective and meaningful way to evaluate customer service operations than any other method.
We have included a chart in a PDF file which summarizes the strengths and weaknesses of each of these test methods. While no test can do everything, it is important to make sure the method you use is appropriate for what you are trying to measure.