1530.0 - ABS Forms Design Standards Manual, 2010  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 25/01/2010  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Telephone and Face to Face Interviews >> Interactive Voice Response

INTERACTIVE VOICE RESPONSE

While aspects of these standards will be of interest to those outside the ABS, they were developed for internal use. As such, some information contained in these standards will not be applicable to an external audience. ABS staff should refer to the Corporate Manuals database for the most recent version of these documents, as some details (names, phone numbers etc.) have been removed from the online version.

Contents


Introduction

Interactive Voice Response (IVR) is a computerised telephony system that has the ability to interact on a basic level with telephone users. A prerecorded voice lists a series of options and prompts. The user chooses their desired option by either pressing a number on a telephone keypad, or by speaking a simple answer that the system is programmed to recognise. IVR is commonly used by companies such as banks, airlines and cinemas, to deal with common or basic customer requests (e.g. requests for account balances, flight times, or movie sessions). IVR is also popular with tax authorities in the US, allowing simple tax returns to be filed by telephone (Electronic Commerce Best Practices, 2000).

There are two main types of IVR: Telephone Data Entry (TDE) and Automated Speech Recognition (ASR). These vary in terms of how the user inputs their response.

In a TDE system, the user is generally asked a question, and then provided with a series of options that correspond to the keys on a "dual-tone multi-frequency" (DTMF) telephone keypad (see Figure 1). Most telephones are limited to 12 keys, and therefore produce 12 signals that the system can recognise (0-9, * and #). The user responds to the questions by pressing the relevant key on the keypad; the corresponding tone then inputs the response to the system. TDE can also be used to enter letters as input (see paragraph 73); however, entering information this way can be tedious, so its use should be minimised. Telephone Data Entry is sometimes referred to as "Touchtone Data Entry", and when used in a survey data collection context, "Telephone Audio Computer-Assisted Self-Interviewing (T-ACASI)".

1
2
ABC
3
DEF
4
GHI
5
JKL
6
MNO
7
PQRS
8
TUV
9
WXYZ
*
0
#

Figure 1 - Layout of a standard DTMF telephone keypad
In an ASR system, the user responds to the system's options by saying a word (or words) that the system has been programmed to recognise. The system may repeat the user's responses so that the user can confirm or amend them (Schneider, Cantor, Heller & Brick, 2002). ASR is sometimes referred to as "Guided Speech IVR", "Natural Language IVR (NLIVR)", "Voice User Interface" (VUI) and "Voice Recognition Entry" (VRE).

Both TDE and ASR can be used in a single IVR system - this is known as a "multi-mode" application (Standards Australia, 2005). For example, the user may be given the option to either say "Help", or to press 0 for help. Allowing choice in the mode of input can be desirable; for example, users may feel more comfortable inputting private information such as passwords using the keypad rather than verbally, and keypad entry can be a useful alternative if the user has encountered difficulty or errors using the ASR system (Standards Australia, 2005). In addition, as TDE systems can only be used by users who have a DTMF telephone, providing an ASR alternative allows a wider range of users to use the system.

The potential use of IVR in survey data collection and related survey activities is being increasingly recognised. However, the literature on this aspect of its use is limited, and to date, there has generally been more interest in TDE as a mode of data collection compared with ASR.
This document

This document provides guidelines on designing an effective, user-friendly IVR system for survey-related activities at the ABS. Design standards are crucial, as most problems with IVR systems are the result of poor planning of menu structure and wording, rather than being due to technical failures (Dortmans & Angus, 2001). Systems that are not logical and intuitive will cause user frustration and a reluctance to use the system.

It is important that, from the respondent's perspective, the IVR system is as consistent as possible across ABS collections. This means ensuring factors like terminology, menu structure, voice, language, and error messages are implemented consistently (these factors are discussed below). A consistent, predictable system will lead to faster learning, greater productivity, fewer errors, and greater satisfaction (Standards Australia/Standards New Zealand, 2003).

The focus of this document is on the system output that a user hears while using the IVR system, and system input considerations. Technical issues such as hardware and software requirements are out of the scope of this document.

These guidelines are based on international research and practice, and include advice from the Australian and New Zealand IVR design standards (Standards Australia, 2005; Standards Australia/Standards New Zealand, 2003). Note that at the time of finalising this document the Australian standards on ASR were still in draft form and may be subject to change. The standards are available online (see reference list for details).

The IVR standards that are currently available in the literature are generally intended for "traditional" IVR systems (i.e. those that provide a service to the user, such as providing information or routing them to an appropriate party). Not all of the recommendations in such standards are likely to be appropriate for IVRs for data collection; however, some standards are universally encouraged (Mundt, Searles, Perrine & Walter, 1997), and relevant guidelines have been included here.

These standards have been produced prior to the development of IVR systems at the ABS. The standards are likely to be updated after pilot testing to include more specific guidelines to ensure standard application of IVR across ABS collections that use this technology.
User identification in inbound and outbound systems

A distinction that is made between types of IVR in a survey context is whether it is the user or the survey organisation who initiates the telephone call. In "inbound IVR", the user dials directly into the IVR system (after being recruited to the survey through some other means). In "outbound IVR", the user is contacted. Usually an interviewer makes the initial recruitment call, then switches the user to an automated system after asking some initial questions and providing instructions on using the system (Couper, Singer & Tourangeau, 2004). However, outbound calls may be made with no operator intervention; that is, a pre-recorded message may provide the information that would otherwise be provided by the interviewer. This latter type of outbound call may be applicable for pre-recorded Intensive Follow-Up (IFU).

Whether it is an inbound or outbound system, it is important that users are identified properly at the start of the call. Users of inbound systems are generally prompted at the start of the call to enter a unique reference number. In outbound calls, the interviewer may confirm the respondent's identity through standard interviewing questions, similar to those used in Computer Assisted Telephone Interviews (CATIs). The "Introducing users to the system" section below contains more details on identifying users.

Users should not be required to identify themselves more than once in a call (Dortmans & Angus, 2001). This also applies when calls are transferred from one location to another (e.g. when being transferred from the IVR system to an operator; Standards Australia/Standards New Zealand, 2003). Repeated requests for the same information are extremely frustrating for users, and represent "the single biggest barrier to the perception of efficiency" of the system (Cohen, Giangola & Balogh, 2004, p. 206). Perceived inefficiency can lead to a greater number of hang-ups, longer call durations, decreased user satisfaction, and a reduced level of user attention (Cohen et al., 2004).
Possible applications of IVR in the ABS

This section outlines the potential uses of IVR at the ABS. Much of this section is based on knowledge about how IVR has been used in international organisations, such as the ONS (Lewis, 2006). TDE has been used more commonly in survey organisations than ASR. It is unrealistic to use ASR to recognise unrestricted speech for survey data collection; however, a mixed TDE/ASR system that can recognise a limited vocabulary may be appropriate (Lewis, 2006).
Directing queries

IVR can be used as the first point of contact for all callers, to filter and direct incoming calls from respondents, based on the option the respondent selects. The PCU currently uses a sophisticated telephony system which performs functions such as routing calls to interviewers (based on factors such as interviewer skill level), and providing recorded updates to respondents waiting for an interviewer. There are also particular "1800 freecall" telephone numbers for respondents in different surveys to ring for assistance. The phone number (which appears on the front page of the form) identifies the incoming caller as being part of a particular survey, and automatically routes them to an appropriate interviewer. This functionality could be incorporated into an IVR system, in which after the relevant survey has been identified by the phone number called, additional options for that particular survey are provided to the respondent (such as those options listed below).
Requests for duplicate forms

For surveys where duplicate forms are not provided in the initial dispatch or as part of the reminder process, respondents can use the IVR system to request a duplicate form.
Requests for extensions

Respondents who are unable to return their form by the due date, or who receive reminder letters after not returning their form on time, can use the IVR system to request an extension to the due date. This option should be considered carefully, as in practice it may prove difficult to implement. There are several reasons for this, including:
  • The rules that govern whether extensions are granted to respondents differ across surveys;
  • Judgement is required as to whether an extension is warranted for an individual, based on their individual circumstances; and
  • A large number of callers may decide to take the option of an extension if it is offered, even if they were initially calling for a different reason.

Registering a nil response

Respondents who wish to register a nil response to a survey can do so via the IVR system, by ringing the freecall number, keying their reference number, and confirming that they wish to register a nil response. This option can be provided as a stand-alone option for surveys where a high number of nil responses is common (e.g. CapEx); that is, the option of submitting actual data via IVR may not need to be offered for these surveys.
Data collection

Respondents can be offered the option to submit their survey data via IVR. Longitudinal surveys in which the same respondents participate on a regular basis, and which have a small number of simple data items, are ideal. Long, complex surveys with numerous explanatory notes and definitions are unsuitable. Attempting to administer a complex survey via IVR would place excessive cognitive demands on respondents, as they would have to remember not only each question, but the permitted responses, and associated explanatory notes, definitions etc. This could lead to errors, frustration, and the provision of poor quality data. As a general "rule of thumb", an IVR transaction should be able to be completed in ten minutes or less (Electronic Commerce Best Practices, 2000).

Submitting data via IVR should never be mandatory; the paper form should always be provided as an alternative, as there will always be some respondents who do not want to submit data via IVR (Lewis, 2006). Given the Multi Modal Data Collection (MMDC) and Standard Business Reporting (SBR) projects, the ABS may end up offering a range of reporting options (e.g. paper, email, fax, IVR, CATI), and let respondents choose which mode they would prefer for a particular collection.

An option is to allow automatic validation of data against previous data, whereby if the respondent provides data that differs greatly from that previously provided, the system asks them to explain this via a recorded message. The message can be played back by survey processing staff at a later time, reducing the need for staff to follow-up respondents with queries about their data. However, to avoid introducing a modal bias, this should only be provided as an option if an interviewer or editor would query the respondent later anyway.
User requirements

Depending on which of the above activities the IVR system is being used for, some or all of the following should be dispatched to respondents:
  • a paper form or telephone reporting card (if survey data is to be collected)
  • standard instructions on completing the form
  • information about the option to use IVR
  • instructions on how to use the IVR system
  • the freecall telephone number to ring to access the system
  • the respondent's unique reference number (see section below for more details on reference numbers)
  • details of the data required

Even if respondents are provided with written instructions, they may not read them before attempting to use the system. It is therefore important that the system is self-explanatory, so that an inexperienced user can get all the help they may require during the call. Related to this, an option is to provide new IVR respondents with a practice ID number that allows them to try out the system before providing real data (Clayton & Winter, 1992).
Introducing users to the system

The introductory information that users hear when accessing the system is important, as it helps set their expectations for the system (Schneider et al., 2002). It is important to welcome the user to the system and put them at ease though clear instructions. The introduction should explain the purpose of the call; approximately how long it will take; and how to obtain further information or help.
User identification

Users of inbound IVR systems should be provided with a unique reference number in the survey material they receive (see "User requirements" section above). They should use this number to identify themselves when they call the IVR system. Stakeholder discussion is required to determine an appropriate method of allocating unique reference numbers. One option may be to identify each respondent with a number that is a combination of their unit ID and their telephone number. However, if a respondent is calling about more than one survey, this method may not work, so discussion is required to determine an appropriate solution for uniquely identifying respondents.

As mentioned above, it is important not to ask the user to provide the same information more than once in a call; therefore, once they have identified themselves, they should not be required to do this again within the IVR session.
Explaining the technology

While the recorded voice should be welcoming, you should make it clear to users that they are interacting with a computer and not a human (Noonan, 2006; Harris, 2005). Failing to do so will lead to "... assumptions of capabilities beyond what the system can fulfil (Cohen et al., 2004, p. 210). This is particularly important for ASR systems, where the belief that a human is on the other end of the phone may lead to requests that are too long and complicated for the system to recognise.

Whether or not to use personalising language, such as the first person, is not clear-cut in the literature (e.g. Schumacher, Hardzinski & Schwartz, 1995; Harris, 2005), and will depend in part on the type of IVR system being implemented. For example, a conversational approach, where personalising language is used freely, may be suited to IVR systems that provide information on restaurants, social events etc. A non-personalising approach, in which terms like "I", "me" etc. are avoided, may be more appropriate for ABS IVR systems. Non-personalising language may also be more effective when the survey questions are of a sensitive nature (Tourangeau, Couper & Steiger, 2003).

Examples of personalising vs non-personalising language:
Personalising: "Please tell me your reference number"
Non-personalising: "Please say your reference number".

Personalising:"I will now read out a list of five possible responses"
Non-personalising: "Please listen to the following list of five possible responses"
Instructions for navigating and obtaining help should be explained at the beginning of the call (e.g. "To access further information at any time, say "Help", or press 1"). These options should be available throughout the entire call, and should always be accessible using the same command (Noonan, 2006). See the "Help" section below for more detailed information on help in IVR systems.

It is important that the user has the option at all times to speak to an operator (during normal business hours), and that this is made clear at the start of the call. If they are calling outside normal business hours, speaking to an operator may not be possible; again, this should be made clear up front. In this case, they should be given the option to leave a message and have someone call them back during business hours. (In the future, respondents may also be able to leave an email address for an email response).
Example introductions

The type of introduction that is appropriate will depend on whether it is an inbound or outbound call.
Example introduction for inbound system
    Hello. Welcome to the Australian Bureau of Statistics' automated telephone system.
    Your x-digit reference number is located at the top right hand side of the letter you received. At the tone, please enter your reference number, using the numbers on your telephone keypad. [Tone]
    Thank you.
    ** Our hours of business are 9-5 EST. During these hours, you may speak to an ABS staff member at any time by pressing zero. Outside these hours you may leave a message by pressing zero.
    The letter you received contains information on how to use the system, such as how to get help.
    To hear this information, press 1. To skip this step and begin entering data, press 2.
    [If 1 pressed]
    To access help at any time during the call, please press 1.
    To repeat the current question, press star.
    To correct an entry you have made, press hash.
    To start the system over again, press 2.
    To speak to an ABS staff member, press zero.
    To submit your data and end the call, press 3 or simply hang up.
    [If 2 pressed]
    Begin survey questions

Example introduction for outbound system
    Interviewer:
    Hello, this is XX from the Australian Bureau of Statistics. I'm calling about the XX Survey. Could I speak to [name] please?
    [If a survey form or telephone reporting card was received]: Check for survey ID
    [If no survey form or telephone reporting card was received]: Check for ABN
    As explained in the letter you received, data for this survey can now be reported via our automated telephone system. A recorded voice will ask the questions, which you can respond to by pressing the relevant numbers on your telephone keypad. There are only two data items to report [briefly describe data items], and the call should only take about two minutes.
    If you need assistance at any time during the call, press 1.
    Do you have any questions before I transfer you to the system?
    [Revert to ** in inbound example above]
    Compared to other interviewing techniques, it is particularly easy for users to abandon IVR interviews prematurely, as they can simply hang up the phone without risking offending anyone. Leaving the system before completing the interview is known as 'break-off', and can be a particular problem for outbound systems, where the user may leave the system after the interviewer switches them to the IVR system (Couper et al., 2004). The system should be set up so if the user ends the call before completing the session, all data provided up to that point is saved (see paragraph 82). Having the interviewer ask a few basic questions (e.g. demographic information such as age and educational attainment, for household surveys) before switching the user may reduce the number of break-offs (Tourangeau et al., 2003).

The IVR "voice"

Overview

Deciding on the voice that users will hear when they use the IVR system is one of the most important decisions to make when designing the system (Noonan, 2006; Harris, 2005). The voice is one of the most easily identifiable features of the system, and voice characteristics can have a great impact on users' experience of the system. The voice that is appropriate depends on the type of application it is and what organisation it represents; testing beforehand should uncover what voice users prefer (Harris, 2005). It is important that whatever type of voice is used, for consistency, the same voice is used throughout the entire IVR system (Dortmans & Angus 2001; but see Harris, 2005). This section contains some factors to consider when choosing an appropriate voice.
Digitised or synthesised

IVR voices can be either "digitised" or "synthesised". A digitised voice is a recording of a live human voice, which is converted to a digital format. Depending on factors such as the quality of the recording devices used, the final voice may sound more or less like the original. Most IVR applications use digitised speech. A synthesised voice is computer-generated, usually using a text-to-speech (TTS) system (Couper et al., 2004). Using real voices, tiny segments of speech are pre-recorded, and the segments are later concatenated into sequences and played to the user (Cohen et al., 2004). Synthesised speech is becoming more popular due to its flexibility and cost-saving potential (Couper et al., 2004). It is particularly useful and economical when the information presented to users is lengthy or frequently updated (i.e. where pre-recording is too expensive) (Standards Australia, 2005).

Synthesised speech varies in how "human-like" or "machine-like" it sounds; however, newer synthesised systems are sounding more natural and less mechanical (Couper, 2002). Users tend to prefer the more realistic sound of digitised speech (Harris, 2005; Couper, 2004), and non-native speakers and elderly users in particular can find synthesised output difficult to comprehend (Cohen et al., 2004). However, comprehension of synthesised speech "... may improve with exposure to and experience with the system" (Phipps & Tupek, 1991, p. 5), so this type of speech may be appropriate for repeat users of the system.

Couper et al. (2004) investigated whether the "humanness" of the voice had an effect on users' reporting of sensitive information. The voice was either a real human voice, a human-sounding synthesised voice, or a machine-sounding synthesised voice. They crossed this by gender of voice. They found that the voice had little effect on factors including the break-off rate, responses to sensitive questions, and respondents' ratings of the degree to which the interaction was like participating in an "ordinary conversation".
Gender

There is much discussion in the literature about the effect that voice gender has on listeners' impressions; however, there has been little examination of the effect of voice gender for IVR systems.

Social psychological research about the effect that the gender of a voice has on people's perceptions of a speaker has suggested that males are often rated higher on influence, persuasiveness, and authority, while female voices are rated as more friendly, pleasant and helpful (e.g. Eagly, 1983; Standards Australia/Standards New Zealand, 2003). Harris (2005) argues that the gender of the IVR voice, whether digital or synthesised, can trigger these stereotypes. Similarly, Cohen et al. (2004) claim that "... we humans cannot help inferring personality traits and social information from the voices we hear, even if we encounter them as brief, recorded samples" (p. 75).

However, not all studies have found gender to have a significant effect on perceptions, and in a controlled experiment examining the effect of the IVR voice, Couper et al. (2004) found that voice gender did not affect users' perceptions of a range of factors.

Despite the scarcity of studies investigating the effect of voice gender in IVR systems, most IVR systems use female voices (Standards Australia/Standards New Zealand, 2003; Couper et al., 2004). Most systems are optimised for strong, clear voices that are of an average pitch (not too high or too low); therefore a contralto female voice is suitable (Noonan, 2006). Furthermore, female modulation is often received more easily by people using hearing-aids (Standards Australia/Standards New Zealand, 2003). In contrast, Noonan (2006) claims that most of the better synthetic speech currently available is male, because the vocal characteristics of the male voice are less complex and therefore easier to synthesise (Noonan, 2006).

Voice gender should be considered when designing the IVR system, and users' preference for different voices should be tested.
Tone

The tone of the speaker's voice should not sound bored or dull (Noonan, 2006). Rather, it should be expressive, with voice inflection indicating things such as when a list of options is complete (Standards Australia/Standards New Zealand, 2003). Words should be clearly enunciated, particularly those that sound similar (e.g. "five" and "nine", "F" and "S") (Standards Australia/Standards New Zealand, 2003). Radio announcers and radio "actors" are often used as IVR speakers, as they have good control over their voices and are able to emote effectively (Harris, 2005; Withers, 2001). "Warm", "friendly", "agreeable", and "assertive" are some of the adjectives that researchers have suggested as being important for IVR voices (Schneider et al., 2002).
IVR menus

Overview

The IVR systems that most people are familiar with are structured hierarchically, and contain one or more "menus". A menu contains a set of options or prompts, each describing an available choice, and the action that is required to make that choice (Standards Australia/Standards New Zealand, 2003). An example of a prompt is "To update your contact details, press 1". (Prompts are described in more detail in the "Prompts and questions" section below).

Menus may be prefaced with a title or brief explanation before the options are read out (e.g. "Main menu"). When the user makes a selection, the system offers another menu, until the system reaches the item or action that the user wants. An example of a simple IVR structure for a typical system that provides a service to the caller is shown below. Text in bold represents the options that are provided in the first level menu (i.e. the "main menu"). Lower level menu options appear beneath the main menu items, with items of the same menu level having the same size indent (the greater the indent, the lower the menu level of the item).
    Example of a basic structure for a typical IVR system
      Hello, welcome to Cinema Australia's automated telephone service.
      For session times, press 1
      If you know the name of the film you would like to see, press 1
        Please say the name of the film now
        Please say the day of the week that you would like to see the film
          (session times are listed)
      To find out which films are playing on a particular date, press 2
        Please say the day of the week that you would like to see the film
        (films and session times are listed)
      To hear a list of films that are currently showing, press 3
        To skip to the next film, press 1.
        To hear further details about the film, press 2.
        To go to the main menu, press 3.
    To buy tickets, press 2
      Please say the name of the film now
      Please say the session day and time
      Please say the number of tickets you would like to buy
      Please enter your credit card details
      etc.
    To hear a film review, press 3
      If you know the name of the film you would like to hear reviewed, press 1
        Please say the name of the film now
      (Review is played)
      To hear a list of films that are currently showing, press 2
        To skip to the next film, press 1
        When you hear the name of the film you're interested in, press 2 and a review will be played
    To hear a list of our current prices, press 4
      (Current prices are listed)
    To speak to an operator, press 0
      (Caller is transferred to an operator)

Users may be allowed to make a selection from a menu without having to listen to all of the options (Noonan, 2006). This way, frequent users can override the menu and go straight to their preferred option quickly. This is known as "key-ahead" (Standards Australia/Standards New Zealand, 2003). As respondents in e.g. monthly IVR surveys will become very familiar with the options provided, allowing key-ahead may be appropriate in some instances.

Allowing users to interrupt some types of system output (such as the current message being played) with valid input (including requests for help) is known as "interrupt capability" for TDE systems (Standards Australia/Standards New Zealand, 2003) or "barge-in" for ASR systems. For most IVR systems, non-interruptible messages should be only those messages that are considered essential for all users to hear in full. However, IVR systems for survey data collection generally do not enable interrupt capability or barge-in, since users should hear each question in its entirety (Schneider et al., 2002). Another reason to be careful with allowing barge-in for ASR systems is that extraneous noise (e.g. coughing, throat clearing) may lead to inappropriate interruption if the system is too sensitive (Standards Australia, 2005).
Developing a menu structure

IVR systems that are used for "form filling" generally do not contain menus that users would be recognise as such. Rather than the user navigating their way through the interface through their responses to prompts, the system directs them through the application, asking them a series of questions as if they were completing a form (Cohen et al., 2004; Standards Australia, 2005).

However, it is still important in the early stages of developing the IVR system to specify how the system will be structured. This includes determining how user errors will be handled (see the "Errors" section below). The overview of the structure could take the form of a list, as in the example above, or be displayed in a flow chart that shows all possible paths of the system.
Number of items

A mistake that many IVR designers make is to try to do too much. The end result is often a menu system with many levels and many choices at each level. It is important to restrict the number of menu levels, and the number of choices at each level (Dortmans & Angus, 2001). The ideal number of menu items specified in the literature ranges from about 3 to 7. Noonan (2006) advises including 3 to 5 items in a menu, as this is all that can be held in the short-term memory of most users. Gerber (2000) claims that people will only wait through 7 menu items, before they go to the operator. Dortmans and Angus (2001) suggest that a 4 by 4 menu, i.e. 4 main items, each which have 4 or less options, is plenty for most applications. The Australian standard (Standards Australia/Standards New Zealand, 2003) recommends limiting menus to 4 options. Note that this suggested maximum does not include the recommended "universal" options (see "Help" section below). Schumacher et al. (1995) suggest that if it is not possible to limit the number of options to 4 or less, the less frequently chosen options can be put in a fifth general category, e.g. "For more options, press 5".

At each level of the menu, where there are several options, it is useful to tell users how many choices they have before telling them what the choices are. For example, "You now have three choices. To update your contact details, press 1..." This lets users know what is coming up, encourages them to listen to the entire menu before making a selection, and reduces the likelihood of users selecting the wrong response (Dortmans & Angus, 2001; Harris, 2005, Schumacher et al., 1995).
Order of items

When a person processes information that is presented on paper or a computer screen, they can look at any part of the information displayed as much or as little as they choose. For example, when you read a restaurant menu, you are able to "...take in all the items at a glance, search through them, look back at ones you've forgotten, skip over several to look at the fourth one in the list, and so on..." (Harris, 2005, p. 215). However, since the information presented in a typical IVR system is serial - only one word can be heard at a time - the order that material is presented is crucial (Noonan, 2006).

Menu options should be in ascending numerical order (Standards Australia/Standards New Zealand, 2003), with the most important or commonly selected items (as identified in pre-testing) presented at the beginning of each menu. This means that most users will not have to listen to the entire list before making their selection (Noonan, 2006).
Prompts and questions

General guidelines

Prompts, questions and instructions should be clear, concise, and unambiguous (Standards Australia/Standards New Zealand, 2003). They should be kept as brief as possible, without sounding terse and without sacrificing clarity (Noonan, 2006; Cohen et al., 2004). Tone and wording should encourage users to remedy situations when they get lost, make errors, or are not responding quickly enough, without seeming to chastise them (Noonan, 2006).

Simple language and terms should be used; avoid jargon or technical terms that users may not be familiar with (Noonan, 2006). For example, in the case of an error occurring, the second error message below is a more effective, jargon-free message than the first message:
"An error has been generated. Returning to Main" (jargon).

"Sorry, there was a technical problem, so we'll have to go back to the main menu" (jargon-free).
Always announce the function first, followed by the response required to activate it (Noonan, 2006), e.g. "To transfer to an operator, press 0". Stating the required response first ("Press 0 to transfer to an operator") will lead some users to jump in early and enter a response before the instruction is completed (Dortmans & Angus, 2001). Also, it is less taxing on users' memory if the information they need to remember is the last thing they hear (Cohen et al., 2004).
Each prompt should include some prominent key words. The most significant key word should appear early in the sentence, but not as the first word (Standards Australia/Standards New Zealand, 2003).

The use of polite terms such as "please", "sorry", "thank you", and "goodbye" should be considered carefully - overuse of these terms can sound tedious, while underuse can sound terse. For example, in a list of menu options, it is unnecessary to include "please" in each one ("For sales, please press 1", "For shipping, please press 2"...). However, the use of "please" at the start of a prompt (e.g. "Please enter your account number") helps to positively dispose the user to the system (Standards Australia/Standards New Zealand, 2003). It is recommended that "sorry" is included at the beginning of error messages, e.g. "Sorry, that term was not recognised by the system" (see paragraphs 83-89 for more on errors). This is not to give the user the false impression that they are interacting with a human, but to "... facilitate comprehension, thereby reducing cognitive load..." (Cohen et al., 2004, p. 147).
Prompts should be moderately paced. There should be a short pause between menu items, and a slightly longer pause between different menus. Avoid long silences, as this can cause confusion (Noonan, 2006).

The vocabulary that is used should be consistent throughout the system, to prevent confusion, and to help users learn what to expect. The table below shows the preferred terminology for common terms used in TDE output, as recommended by Standards Australia/Standards New Zealand (2003). Note that the first time users are prompted to enter the * or # keys, it is a good idea to tell them where the keys are located, e.g. "The hash key is located on the lower right of the keypad" (Schumacher et al., 1995).
Table 1: Common terminology for output used in TDE systems - preferred and non-preferred terms (Standards Australia/Standards New Zealand, 2003)
Description of itemPreferred termExamples of non-preferred terms
# keyHashPound, square, number sign
* keyStarAsterisk, aster
0 keyZeroNought, oh
Ending a callEnd, disconnectTerminate
Ending or exiting the systemEnd, leave, exitDisconnect, terminate
Entering DTMF data (e.g. a password, telephone number)EnterDial, key-in, type
Inputting commands and menu choicesPressDial, key, push, touch
User personal codePassword, Personal Identification Code (PIC), ID code, Access code*Security code
When spoken input is expectedSay (when responding to a prompt), Record (when a message is being recorded)Speak
DTMFTouchtone, tonePush button, DTMF

* The standards do not explicitly advise for or against using the term "Personal Identification Number (PIN)". Basically, because of the strong association of PINs with financial transaction using an ATM, financial institutions are sensitive about the use of PIN in any situation other than in association with a financial transaction card. Financial institutions therefore avoid using the term PIN in IVR systems. The use of the term PIN in non-financial organisations' IVR systems is unlikely to be problematic; however, consistency across IVR systems is preferred.
Wording of survey questions

When converting an existing self-administered survey to IVR, many standard ABS economic questions will need to be rephrased, as they are not worded as "questions" as such. They are really just titles designed to match sections in our respondents' accounts e.g. "Interest income". When converting such questions into IVR, these types of items need to be reworded to be actual questions, e.g. "What was this business's interest income for the period?".
Simple questions with only one or two simple short notes may be appropriate for IVR; however, the notes must be incorporated into the question itself. For example:
    "What was the total gross income of this business during the financial period?"
    with the separate explanation:
      Excluding
      • Extraordinary items
      • Goods and Services Tax (GST)
becomes the IVR question:
    "What was the total gross income of this business during the financial period, excluding extraordinary income items and GST?"

Measurement units, which usually appear to the side of the response box in ABS paper forms, must also be incorporated into the question, e.g. "How long in years..." or "What was your business's income... in thousands of dollars".
Questions that require the respondent to select one or more options from a list will need to be reworded so that each option is asked about separately, requiring a yes or no answer. This is because reading out even a short list over the phone for the respondent to choose from is prone to bias due to the cognitive difficulty of remembering all the items while making a decision. For example:
    "Did this business use any of the following business practices during the year ended 30 June 2007?"
      • Written strategic or business plans
      • Budget forecasts
      • Formal networking with other businesses
      • Comparison of performance with other business
      • Export market plans
      • None of the above

becomes the IVR questions:
    "Did your business use written strategic or business plans during the financial year?" (yes/no)
    "Did your business use budget forecasts during the financial year?" (yes/no)
    etc.

In interviewer-administered surveys, once it is clear that the respondent understands the intent of the question, the interviewer often omits the stem (in the example above, "Did your business use..." is the stem). However, this may lead to respondent confusion in IVR; therefore, the entire question must be recited each time. This, combined with the fact that it is important that respondents hear each question in its entirety, can lead to the questions sounding repetitive, artificial and regimented (Schneider et al., 2002). Therefore, this type of questioning should be limited to a small number of items.

As with all ABS business surveys, respondents must be asked if they have any comments at the end of the survey. An adapted version of the paper form question is recommended:
    "If you would like to make any comments about the information you have provided during this call, or about the automated interview process, please press 1 (or say yes) now".
    [If yes] - "Please provide your comments now".
Similarly, collecting "Time taken" is mandatory for IVR surveys. The time for the call itself should be automatically recorded by the instrument so there is no need to ask the respondent this. However, for a full picture of the provider load and better comparison with paper form equivalents, it is important to ask about any other time the provider may have spent on the survey. The recommended wording is:
    "Have you or any other employees spent any time on this survey, apart from this interview?" (No/Yes)
    [If yes] "Please estimate how much time was spent, including reading"

Answer options

TDE systems

While the majority of this chapter has focussed on recommendations for system output, this section will briefly cover guidelines related to the input that users of IVR should use. This information is from the Australian Standards (Standards Australia/Standards New Zealand, 2003; Standards Australia, 2005). Anyone seeking more detailed information should refer to the standards.

The # key should be used as an input field delimiter to enter variable-length input (e.g. passwords, phone numbers) and move to the next step. The # should not be required to terminate fixed-length data input (e.g. "For sales, press 1"), however, if the user does press # in this situation, the system should ignore the delimiter it and not treat it as an error.

The * key should generally be used to either stop the current action, and go back one or more steps in the menu; or (if the user has commenced entering data), to disregard the data already entered and allow them to re-enter it.

For prompts that request a "Yes" or "No" response, 1=Yes and No=2.

Alpha characters can be entered using a keypad, using a similar method that is used to compose an SMS message on a mobile phone. The key that the letter appears on is pressed as many times as the position of the letter on the key, e.g. to enter the letter "B", the 2 key would be pressed twice (see Figure 1). This is not an ideal input method for long strings, and should be limited to short strings of characters. Furthermore, not every user will have a keypad with alpha layout, and some may have keypads that have alpha layout differing from that shown in Figure 1. Harris (2005) suggests that alpha entry can be used as a backup option for ASR systems, if the system repeatedly does not recognise what the user is saying.
ASR systems

A selection of preferred terminology for common actions from the draft Australian standards for ASR (Standards Australia, 2005) is shown in Table 2. As the draft Australian standards may change, the guidelines should be interpreted with care. The preferred terminology refers to both user input and system output; that is, terminology used by the system, and user responses that the system should recognise.

Users should be instructed which terms are acceptable, e.g. "Please say 'yes' or 'no'". This is important for numerical values, e.g. "If your employment is four hundred and twelve, say 'four one two'" (Clayton & Winter, 1992).
Table 2: Preferred terminology for ASR systems (Standards Australia, 2005)

ActionRecommended terminology
For reading prompt againRepeat, say that again, pardon, sorry
For helpHelp, instructions, user tips
For transferring to a human operatorOperator, agent, consultant
For going to the top of the main menuMain menu, start, go to start, home
For listing commands/functionsOptions
For terminating the serviceGoodbye, bye, end, exit, quit, hang up
For going back to previous menuPrevious menu, go back
Cancel last actionCancel
Discard last entryClear entry, clear input
End of inputInput complete, complete, done, finished
To stop current operationStop
For saying digit 0Zero, oh
For 100-999One hundred, one hundred and one, one hundred and two etc., or a hundred, a hundred and one...
1,000One thousand, a thousand
1,000,000One million, a million
Time of day between 12.00am and 11.59am<time> am
<time> o'clock
<time> in the morning
Time of day between 12.00pm and 11.59pm<time> pm
<time> o'clock
<time> in the evening, at night, in the afternoon
12.00pmTwelve midday, midday, twelve noon, noon
12.00amTwelve midnight, midnight
Date<day of the week>, the <ordinal> of <month>, <year>, e.g. Monday, the twenty third of July, two thousand and seven

Feedback and verification

It is important to provide the user with feedback each time they provide input to the system, so that they know their input has been received (Standards Australia/Standards New Zealand, 2003). A common type of feedback is to present the next set of options immediately after the user has responded to the previous set. Another type is to play a short announcement (e.g. "Please wait while we access your account details"), or music, during any waiting periods, so that the user knows that their request is being processed. Note that radio should not be used to provide music, as the content cannot be guaranteed.

Users should have the opportunity to verify the data they provide. After they have entered the information, the system should repeat it back to them and allow them to confirm or amend it (e.g. E.g. "You entered 12345678. If correct, press 1 now. If incorrect, press 2"). This can help reduce errors. However, verification of every answer is unnecessary and tedious for users, and will increase the length of the interview, especially if there are more than a few data items. Requiring verification of selected menu choices (e.g. "You selected the Personal Details menu. If correct, press 1 now. If incorrect, press 2") is unnecessary and will frustrate users.

When confirming numeric information that the user has entered, the Australian Standards recommend that when a string of more than 4 digits is being read back by the system, the digits should broken into logical groups of no more than 4 digits (e.g. the receipt number 12345678 would be read as "1234" (slight pause), then "5678". The dollar amount $152.56 should be read back as "one hundred and fifty two dollars and fifty six cents". The date 25.4.93 should be read as "twenty-fifth of April, nineteen ninety-three" (Standards Australia/Standards New Zealand, 2003).

When the system requires data in a particular format, having the system read back the information provided in the format requested can alert respondents to data entry errors. For example, if a respondent mistakenly enters the figure $123,456 in whole dollars (i.e. 123456) rather than thousands of dollars (i.e. 123), having the system say "You reported 123 million, 456 thousand dollars" alerts the respondent to their error, and allows them to re-enter the correct response.
Help

Users should be informed as early as possible in the call how to access help, and context-specific help information should be easily accessible at all times. Help information can be used to: repeat menus or prompts; describe how to perform a task; explain how to correct an error; describe menu choices in more detail; and explain generic system commands (Standards Australia/Standards New Zealand, 2003).

Users who encounter technical difficulties, or cannot find the information they require, should always have the option to speak to a real person (within normal business hours), and instructions for doing so should be made clear. However, some users prefer not to interact with a human when using IVR systems (Noonan, 2006). This may be for a variety of reasons, e.g. they may not want to wait for an operator to become available, or they might prefer to attempt to remedy the problem on their own at a later time. You should therefore always notify the user when they are being transferred to an operator, so that they can make the choice to continue or discontinue the call.

In their discussion of ASR, Cohen et al. (2004) recommend the following six "universal", or generic commands that are available at all times regardless of where the user is in the system. These commands are designed to allow users to recover from problems and to receive context-specific help. The same universal commands are relevant for a TDE system. For example, * could be used to go back one or more steps in the menu, and 0 could be used to transfer to an operator (Standards Australia/Standards New Zealand, 2003). Whichever keys or words are used to carry out the commands, they should be consistent throughout the application.
  • Help: Provide help or additional instructions about the current state of the application
  • Repeat: Repeat the most recently played prompt
  • Main menu / start over: Return to the beginning of the application
  • Go back: Go back up to the previous step
  • Operator: Transfer to an operator
  • Goodbye: Allow the user to say "Goodbye" and respond appropriately so that they are comfortable hanging up. If the user simply hangs up, the data they have provided should still be saved. However, users tend to be more comfortable saying goodbye or pressing a key to end the call, rather than just hanging up, due to the perception that simply hanging up might cause their data to be lost.

Errors

Errors may occur in an IVR system when the user keys in an invalid response to a prompt (e.g. entering 9 digits for their password instead of the required 10). ASR systems are particularly prone to recognition errors caused by factors such as the following (from Standards Australia, 2005 and Harris, 2005):
  • The user's speech not being recognised as being part of the system's vocabulary
  • Environmental noise
  • Coughing / throat clearing
  • Long pauses in the user's speech
  • The user beginning to speak too early, before recognition has started
  • The user continuing to speak after recognition has finished
  • The system "hearing" a word or phrase differently than what the user has spoken (e.g. "five" instead of "nine").

Errors can cause frustration and confusion, and can be indications that the user requires help. It is important to ensure that users are not left unsure about what to do next, or unable to extract themselves from an erroneous response (Noonan, 2006). Therefore, error handling must be carefully considered when designing the system to ensure that users can move forward from errors with minimal frustration (Standards Australia, 2005).

If a user enters invalid data, the message they receive should be brief, simple, and polite (Standards Australia, 2005), and clearly inform them what they should do to progress. Words such as "error", "wrong", and "invalid" should be avoided, as they may make the user feel foolish, which may reduce their desire to use the system.
If the user says a term that is not recognised in an ASR system, Harris (2005) says that the follow-up prompt should include an apology, explain that it does not understand, and tell the user what sort of response is expected, e.g. "Sorry, that term was not recognised. You can say [list valid response options]". For a TDE system, a prompt might be "You have entered an incorrect account number - please enter again" (Standards Australia/Standards New Zealand, 2003). If a user is encountering difficulties using ASR in a multi-mode IVR system, the option to use their keypad to enter their response should be mentioned in the message, e.g. "Please say the number again, or enter it using your telephone keypad" (Standards Australia, 2005).

If the user again enters an invalid response, the next prompt should be different, and more detailed, than the first. Having the system repeatedly provide the same error message in response to a repeated error is unhelpful and can be extremely irritating for users (Harris, 2005). The technique of providing increasingly detailed information in response to consecutive errors is known as the "escalating detail" strategy (Cohen et al., 2004). However, Cohen et al. (2004) note this method can be frustrating for users who know what to say to the ASR system, and just need another chance to enter their response. This may be the case for frequent users of the system. In this case, a more appropriate error handling technique can be for the system to reply with a brief prompt such as "I'm sorry?" or "What was that?" (known as the "rapid reprompt" approach). If another error occurs after this, the system can then revert to the strategy of providing more detailed instructions, as shown in Table 3.
Table 3: Example of "escalating detail" versus "rapid reprompt" methods of dealing with errors (from Cohen et al., 2004)

EventEscalating detailRapid reprompt
Initial promptWhat is your account number?What is your account number?
First errorSorry, I didn't understand. Please say your 10-digit account number.I'm sorry?
Second errorSorry, I still didn't understand. Your 10-digit account number appears on your monthly statement at the top right corner. Please say your account number now, or for more information, say "Help".Sorry, I didn't understand. Please say your 10-digit account number.
Third errorSorry, I still didn't understand. Please key in your account number, or say "I don't know", and I'll connect you to someone who can help you.Sorry, I still didn't understand. Your 10-digit account number appears on your monthly statement at the top right corner. Please say your account number now, or for more information, say "Help".

In considering error handling, you need to determine the maximum number of consecutive errors that will be allowed before the system takes other action, e.g. transferring the user to an operator (Cohen et al., 2004). A typical number is 3: after 2 or 3 consecutive errors, it is unlikely that the user will recover on their own, and break-off rates are high after a few recognition errors. The Australian Standards suggest that the system should do one of the following, depending on the number of repeated errors or help requests recorded (Standards Australia/Standards New Zealand, 2003):
  • request alternative data input;
  • offer the user a choice between continuing the call or ending it;
  • play a transfer message and transfer the user to an operator; or
  • play a closing message and disconnect the user. This is usually done once the user exceeds a predetermined error limit (Standards Australia/Standards New Zealand, 2003). The closing message should include an explanation of the problem that caused the error, suggestions for possible resolution of the problem (where appropriate), and a polite termination.

If there is a problem with the system that means that the user cannot carry out their desired action, they should be played a message informing them of the problem, and what alternatives they have (Standards Australia/Standards New Zealand, 2003).

Example
    "The system is currently experiencing technical difficulties. Please stay on the line, and you will be transferred to an operator. Otherwise, please try the system again later".

Time-outs

"A time-out is a change in the state of the system in response to a period of no detected user input" (Schumacher et al., 1995, p. 260). Time-out errors can occur in both TDE and ASR systems, when the user does not provide a response to a prompt in the time allowed by the system. The time-out period is the time the system waits for data input to start in response to a prompt, before continuing to the next prompt.

The appropriate time-out for TDE systems in the literature varies, ranging from as little as 3 seconds (Standards Australia/Standards New Zealand, 2003 - see Table 4), to 10-15 seconds (Dortmans & Angus, 2001), depending on the task being carried out. Note that these times may be extended after the first time-out is triggered.

The time-out for ASR systems is generally less than for TDE, as submitting a response verbally usually takes less time than entering a response using a keypad. For ASR, the time-out is usually about 2 seconds, ranging from around 1.5-3 seconds (Harris, 2005). Harris (2005) points out that "three seconds is a long time in a conversation, especially over the phone..." (p. 369), but acknowledges that some tasks may require a longer time-out.

When considering appropriate time-out values, you should take into account the task being performed, as well as the speaking speeds and response times of all users (including those with a disability). Questions that require a short answer will require a shorter time-out than questions requiring a longer answer, or an answer that requires the user to look up information (Standards Australia, 2005). Also, consider giving a longer time-out for the first few questions, and for questions that require a long string of digits to be entered (Phipps & Tupek, 1991).
Table 4: TDE time-out values suggested by Standards Australia

CircumstanceTime-out value (seconds)
Simple task, such as making a selection from a basic menu (e.g. "To update your contact details, press 1") 3-8
Data input task involving looking up information prior to entry (such as retrieving an account code or credit card number)
- Entry of initial digit>10
- Entry of subsequent digits3-8

If a user does not enter a response in the time expected, the system should repeat the current message or menu. If a second consecutive time-out is made, a rephrased, or more detailed explanation of the required input should be provided. If another consecutive time-out is made, one of the steps in paragraph 88 should be taken. Because silence may indicate that the user is not sure what to say, the rapid reprompt strategy may not be as effective for time-outs as it is for recognition errors (Cohen et al., 2004).
References

Clayton RL & Winter DLS (1992). Speech data entry: Results of a test of voice recognition for survey data collection. Journal of Official Statistics, 8(3), 377-388.

Cohen MH, Giangola JP & Balogh J (2004). Voice user interface design. Addison-Wesley: Boston.

Couper MP (2002). New technologies and survey data collection: Challenges and opportunities.Paper presented at the International Conference on Improving Surveys, Copenhagen, August.

Couper MP, Singer E & Tourangeau R (2004). Does Voice Matter? An interactive voice response (IVR) experiment. Journal of Official Statistics, 20(3), 551-570.

Dortmans H & Angus I (2001). Stamp out Awful IVR! Telemanagement, Issue 189, October, 10-12.

Eagly AH (1983). Gender and social influence - A social psychological analysis. American Psychologist, 38, 971-81.

Electronic Commerce Best Practices (2000). A guide for taxing authorities. www.taxadmin.org/fta/edi/newecbp.html. Accessed 28 August 2007.

Harris RA (2005). Voice interaction design: Crafting the new conversational speech systems. Morgan Kaufmann: San Francisco.
Lewis A (2006). Using telephone data entry as an alternative to paper questionnaires. Office for National Statistics, United Kingdom.
Mundt JC, Searles JS, Perrine MW & Walter D (1997). Conducting longitudinal studies of behavior using interactive voice response technology. International Journal of Speech Technology, 2, 21-31.

Noonan T (2006). Building user-friendly voice systems. From: http://www.timnoonan.com.au/ivrpap98.htm (last updated January 2006).
Phipps PA & Tupek AR (1991). Assessing measurement errors in a touchtone recognition survey. Survey Methodology, 17(1), 15-26.

Schneider SJ, Cantor D, Heller TH, Brick PD (2002). Pretesting Interactive Voice Response / Automated Speech Recognition Surveys. Paper presented at the International Conference on Questionnaire Development, Evaluation and Testing Methods, Charleston, South Carolina, November 14-17.

Schumacher Jr. RM, Hardzinski ML, Schwartz AL (1995). Increasing the usability of interactive voice response systems: Research and guidelines for phone-based interfaces. Human Factors, 37(2), 251-264.

Standards Australia (2005). Draft for public comment. Interactive voice response systems user interfaces - Speech recognition. DR 05470. Available online at http://www.saiglobal.com/online/autologin.asp - direct hyperlink to standards is unavailable - search for keyword "DR 05470".

Standards Australia/Standards New Zealand (2003). Interactive vice response systems - User interface - Dual tone multi frequency (DTMF) signalling. AS/NZS 4263. Available online at http://www.saiglobal.com/online/autologin.asp - direct hyperlink to standards is unavailable - search for keyword "4263:2003".

Tourangeau R, Couper MP & Steiger DM (2003). Humanizing self-administered surveys: Experiments on social presence in web and IVR surveys. Computers in Human Behavior, 19, 1-24.

Withers S (2001). Voicenet. Marketing & eBusiness, February, 50-51.

Previous PageNext Page