Lesson#30

EVALUATION-2

The aim of this lecture is to introduce you the study of Human Computer Interaction,
so that after studying this you will be able to:

. Understand the DECIDE evaluation framework

30.1 DECIDE: A framework to guide evaluation

Well-planned evaluations are driven by clear goals and appropriate questions (Basili
et al., 1994). To guide our evaluations we use the DECIDE framework, which
provides the following checklist to help novice evaluators:
1. Determine the overall goals that the evaluation addresses.
2. Explore the specific questions to be answered.
3. Choose the evaluation paradigm and techniques to answer the questions.
4. Identify the practical issues that must be addressed, such as selecting participants.
5. Decide how to deal with the ethical issues.
6. Evaluate, interpret, and present the data.

Determine the goals

What are the high-level goals of the evaluation? Who wants it and why? An
evaluation to help clarify user needs has different goals from an evaluation to
determine the best metaphor for a conceptual design, or to fine-tune an interface, or to
examine how technology changes working practices, or to inform how the next
version of a product should be changed.
Goals should guide an evaluation, so determining what these goals are is the first step
in planning an evaluation. For example, we can restate the general goal statements
just mentioned more clearly as:

. Check that the evaluators have understood the users’ needs.

. Identify the metaphor on which to base the design.

. Check to ensure that the final interface is consistent.

. Investigate the degree to which technology influences working practices.

. Identify how the interface of an existing product could be engineered to improve
its usability.
These goals influence the evaluation approach, that is, which evaluation paradigm
guides the study. For example, engineering a user interface involves a quantitative
engineering style of working in which measurements are used to judge the quality of
the interface. Hence usability testing would be appropriate. Exploring how children
talk together in order to see if an innovative new groupware product would help them
to be more engaged would probably be better informed by a field study.

280

Explore the questions

In order to make goals operational, questions that must be answered to satisfy them
have to be identified. For example, the goal of finding out why many customers prefer
to purchase paper airline tickets over the counter rather than e-tickets can he broken
down into a number of relevant questions for investigation. What are customers’
attitudes to these new tickets? Perhaps they don't trust the system and are not sure that
they will actually get on the flight without a ticket in their hand. Do customers have
adequate access to computers to make bookings? Are they concerned about security?
Does this electronic system have a bad reputation? Is the user interface to the ticketing
system so poor that they can't use it? Maybe very few people managed to complete
the transaction.
Questions can be broken down into very specific sub-questions to make the evaluation
even more specific. For example, what does it mean to ask, "Is the user interface
poor?": Is the system difficult to navigate? Is the terminology confusing because it is
inconsistent? Is response time too slow? Is the feedback confusing or maybe
insufficient? Sub-questions can, in turn, be further decomposed into even finergrained
questions, and so on.

Choose the evaluation paradigm and techniques

Having identified the goals and main questions, the next step is to choose the evaluation
paradigm and techniques. As discussed in the previous section, the evaluation
paradigm determines the kinds of techniques that are used. Practical and ethical issues
(discussed next) must also be considered and trade-offs made. For example, what
seems to be the most appropriate set of techniques may be too expensive, or may take
too long, or may require equipment or expertise that is not available, so compromises
are needed.

Identify the practical issues

There are many practical issues to consider when doing any kind of evaluation and it
is important to identify them before starting. Some issues that should be considered
include users, facilities and equipment, schedules and budgets, and evaluators'
expertise. Depending on the availability of resources, compromises may involve
adapting or substituting techniques.

Users

It goes without saying that a key aspect of an evaluation is involving appropriate
users. For laboratory studies, users must be found and screened to ensure that they
represent the user population to which the product is targeted. For example, usability
tests often need to involve users with a particular level of experience e.g., novices or
experts, or users with a range of expertise. The number of men and women within a
particular age range, cultural diversity, educational experience, and personality
differences may also need to be taken into account, depending on the kind of product
being evaluated. In usability tests participants are typically screened to ensure that
they meet some predetermined characteristic. For example, they might be tested to
ensure that they have attained a certain skill level or fall within a particular
demographic range. Questionnaire surveys require large numbers of participants so
ways of identifying and reaching a representative sample of participants are needed.

281
For field studies to be successful, an appropriate and accessible site must be found
where the evaluator can work with the users in their natural setting.
Another issue to consider is how the users will be involved. The tasks used in a
laboratory study should be representative of those for which the product is de signed.
However, there are no written rules about the length of time that a user should be
expected to spend on an evaluation task. Ten minutes is too short for most tasks and
two hours is a long time, but what is reasonable? Task times will vary according to
the type of evaluation, but when tasks go on for more than 20 minutes, consider
offering breaks. It is accepted that people using computers should stop, move around
and change their position regularly after every 20 minutes spent at the keyboard to
avoid repetitive strain injury. Evaluators also need to put users at ease so they are not
anxious and will perform normally. Even when users are paid to participate, it is
important to treat them courteously. At no time should users be treated
condescendingly or made to feel uncomfortable when they make mistakes. Greeting
users, explaining that it is the system that is being tested and not them, and planning
an activity to familiarize them with the system before starting the task all help to put
users at ease.

Facilities and equipment

There are many practical issues concerned with using equipment in an evaluation For
example, when using video you need to think about how you will do the recording:
how many cameras and where do you put them? Some people are disturbed by having
a camera pointed at them and will not perform normally, so how can you avoid
making them feel uncomfortable? Spare film and batteries may also be needed.

Schedule and budget constraints

Time and budget constraints are important considerations to keep in mind. It might
seem ideal to have 20 users test your interface, but if you need to pay them, then it
could get costly. Planning evaluations that can be completed on schedule is also important,
particularly in commercial settings. There is never enough time to do
evaluations as you would ideally like, so you have to compromise and plan to do a
good job with the resources and time available.

Expertise

Does the evaluation team have the expertise needed to do the evaluation? For example,
if no one has used models to evaluate systems before, then basing an evaluation
on this approach is not sensible. It is no use planning to use experts to review
an interface if none are available. Similarly, running usability tests requires expertise.
Analyzing video can take many hours, so someone with appropriate expertise and
equipment must be available to do it. If statistics are to be used, then a statistician
should be consulted before starting the evaluation and then again later for analysis, if
appropriate.

Decide how to deal with the ethical issues

The Association for Computing Machinery (ACM) and many other professional organizations
provide ethical codes that they expect their members to uphold,
particularly if their activities involve other human beings. For example. people's
privacy should be protected, which means that their name should not be associated

282
with data collected about them or disclosed in written reports (unless they give
permission). Personal records containing details about health, employment, education,
financial status, and where participants live should be confidential. Similarly, it
should not be possible to identify individuals from comments written in reports For
example, if a focus group involves nine men and one woman, the pronoun “she”
should not be used in the report because it will be obvious to whom it refers
Most professional societies, universities, government and other research offices
require researchers to provide information about activities in which human
participants will be involved. This documentation is reviewed by a panel and the researchers
are notified whether their plan of work, particularly the details about how
human participants will be treated, is acceptable.
People give their time and their trust when they agree to participate in an evaluation
study and both should be respected. But what does it mean to be respectful to users?
What should participants be told about the evaluation? What are participants’ rights?
Many institutions and project managers require participants to read and sign an
informed consent. This form explains the aim of the tests or research and promises
participants that their personal details and performance will not be made public and
will be used only for the purpose stated. It is an agreement between the evaluator and
the evaluation participants that helps to confirm the professional relationship that
exists between them. If your university or organization does not provide such a form
it is advisable to develop one, partly to protect yourself in the unhappy event of
litigation and partly because the act of constructing it will remind you what you
should consider.
The following guidelines will help ensure that evaluations are done ethically and that
adequate steps to protect users' rights have been taken.

. Tell participants the goals of the study and exactly what they should expect if
they participate. The information given to them should include outlining the
process, the approximate amount of time the study will take, the kind of data
that will be collected, and how that data will be analyzed. The form of the
final report should be described and, if possible, a copy offered to them. Any
payment offered should also be clearly stated.

. Be sure to explain that demographic, financial, health, or other sensitive information
that users disclose or is discovered from the tests is confidential. A
coding system should be used to record each user and, if a user must be identified
for a follow-up interview, the code and the person's demographic details
should be stored separately from the data. Anonymity should also be promised
if audio and video are used.

. Make sure users know that they are free to stop the evaluation at any time if
they feel uncomfortable with the procedure.

. Pay users when possible because this creates a formal relationship in which
mutual commitment and responsibility are expected.

. Avoid including quotes or descriptions that inadvertently reveal a person's
identity, as in the example mentioned above, of avoiding use of the pronoun
"she" in the focus group. If quotes need to be reported, e.g., to justify conclusions,
then it is convention to replace words that would reveal the source
with representative words, in square brackets. Ask users' permission in
advance to quote them, promise them anonymity, and offer to show them a
copy of the report before it is distributed.

283
The general rule to remember when doing evaluations is do unto others only what you
would not mind being done to you.
The recent explosion in Internet and web usage has resulted in more research on how
people use these technologies and their effects on everyday life. Consequently, there
are many projects in which developers and researchers are logging users' interactions,
analyzing web traffic, or examining conversations in chat rooms, bulletin boards, or
on email. Unlike most previous evaluations in human-computer interaction, these
studies can be done without users knowing that they are being studied. This raises
ethical concerns, chief among which are issues of privacy, confidentiality, informed
consent, and appropriation of others’ personal stories (Sharf, 1999). People often say
things online that they would not say face to face. Further more, many people are
unaware that personal information they share online can be read by someone with
technical know-how years later, even after they have deleted it from their personal
mailbox (Erickson et aL 1999).

Evaluate, interpret, and present the data

Choosing the evaluation paradigm and techniques to answer the questions that satisfy
the evaluation goal is an important step. So is identifying the practical and ethical
issues to be resolved. However, decisions are also needed about what data to
collect, how to analyze it, and how to present the findings to the development team.
To a great extent the technique used determines the type of data collected, but there
are still some choices. For example, should the data be treated statistically? If
qualitative data is collected, how should it be analyzed and represented? Some general
questions also need to be asked (Preece et al., 1994): Is the technique reliable? Will
the approach measure what is intended, i.e., what is its validity? Are biases creeping
in that will distort the results? Are the results generalizable, i.e., what is their scope?
Is the evaluation ecologically valid or is the fundamental nature of the process being
changed by studying it?

Reliability

The reliability or consistency of a technique is how well it produces the same results
on separate occasions under the same circumstances. Different evaluation processes
have different degrees of reliability. For example, a carefully controlled experiment
will have high reliability. Another evaluator or researcher who follows exactly the
same procedure should get similar results. In contrast, an informal, unstructured
interview will have low reliability: it would be difficult if not impossible to repeat
exactly the same discussion.

Validity

Validity is concerned with whether the evaluation technique measures what it is
supposed to measure. This encompasses both the technique itself and the way it is
performed. If for example, the goal of an evaluation is to find out how users use a new
product in their homes, then it is not appropriate to plan a laboratory experiment. An
ethnographic study in users' homes would be more appropriate. If the goal is to find
average performance times for completing a task, then counting only the number of
user errors would be invalid.

284

Biases

Bias occurs when the results are distorted. For example, expert evaluators performing
a heuristic evaluation may be much more sensitive to certain kinds of design flaws
than others. Evaluators collecting observational data may consistently fail to notice
certain types of behavior because they do not deem them important.
Put another way, they may selectively gather data that they think is important.
Interviewers may unconsciously influence responses from interviewees by their tone
of voice, their facial expressions, or the way questions are phrased, so it is important
to be sensitive to the possibility of biases.

Scope

The scope of an evaluation study refers to how much its findings can be generalized.
For example, some modeling techniques, like the keystroke model, have a narrow,
precise scope. The model predicts expert, error-free behavior so, for example, the
results cannot be used to describe novices learning to use the system.

Ecological validity

Ecological validity concerns how the environment in which an evaluation is
conducted influences or even distorts the results. For example, laboratory experiments
are strongly controlled and are quite different from workplace, home, or leisure
environments. Laboratory experiments therefore have low ecological validity because
the results are unlikely to represent what happens in the real world. In contrast,
ethnographic studies do not impact the environment, so they have high ecological
validity.
Ecological validity is also affected when participants are aware of being studied. This
is sometimes called the Hawthorne effect after a series of experiments at the Western
Electric Company's Hawthorne factory in the US in the 1920s and 1930s. The studies
investigated changes in length of working day, heating, lighting etc., but eventually it
was discovered that the workers were reacting positively to being given special
treatment rather than just to the experimental conditions

<Previous Lesson

Human Computer Interaction

<Previous Lesson

Human Computer Interaction

Next Lesson>

Lesson#30

EVALUATION-2

<Previous Lesson

Human Computer Interaction

Next Lesson>

Home

Lesson Plan

Topics

Go to Top