Usability evaluation of online terminology databases

Mari-Carmen Marcos et al.

Citación recomendada: Mari-Carmen Marcos et al.. Usability evaluation of online terminology databases [en linea]. "Hipertext.net", num. 4, 2006. <http://www.hipertext.net>

  1. Introduction
  2. Results of the heuristic evaluation
  3. Results of the user studies
    3.1. What the user does: the usability test
    3.2. What the user says: the interview and the questionnaire
  4. Conclusions
  5. Final note
  6. References

 

1. Introduction

In the day-to-day work with information retrieval tools, users are faced with systems that are difficult to understand and use. This is the case for linguists and translators, experts in language but not necessarily in the use of reference tools, designed in many cases from the viewpoint of their developers without considering the end-users. This gap between developers and users leads to incorrect use - or at least inefficient use - of electronic dictionaries, terminology databases, and other digital resources used by these professionals. In our daily work with linguists and translators we have discovered that these tools are difficult to use, and we believe that this is due to the fact that the aspects of usability are not considered during the design process and development. There are as of yet no definitive guidelines established to direct the presentation of these types of tools, only a few general recommendations for usability as applied to online terminology databases, such as those proposed by Marcos and Gómez (2006).

In this article we present a study carried out between December of 2005 and March of 2006 by a group of researchers from the Pompeu Fabra University in which we established and applied a methodology to analyze the quality of terminology databases, with usability as the fundamental criterion. The ISO defines usability as the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use (ISO 9241-11). This definition proves to be quite illuminating, as it divides usability into the different aspects to be considered: effectiveness, efficiency and satisfaction. It also makes a distinction with respect to the goal, the user, and the context of use.

Applied to the case of terminology databases, effectiveness refers to the capability of a system to offer the features for which it was designed. Efficiency is the effort required to use these features, and satisfaction - the most subjective aspect - is the feeling the user experiences during and after use of the system. This explanation infers that usability is not an absolute value intrinsic to a system, but rather a factor that can vary depending on the context. The ISO definition also clarifies that a product is designed for specified users and specified goals. A system cannot in itself be said to be usable or not; the variables mentioned must be specified.

In our case, we considered as prototypical users of terminology databases the two following profiles: on one side, language professionals (translators, terminologists, correctors, etc.) and on the other, specialists in a determined field. In the case of language professionals, these types of resources are very useful tools when searching for conceptual equivalences in other languages, variants of a term, etc. In the case of specialists or trainees in a determined field, the use of terminology databases could prove to be helpful in acquiring or confirming the terminology that will advance their specialized knowledge. Once we had defined the potential users and features, we were able to begin to evaluate the system in terms of usability.

The usability of a website can be evaluated using various methods and techniques. We began with a heuristic evaluation, followed by a card sorting study, a user protocol test, an interview, a questionnaire, and a focus group. Here we will present the findings of some of the more significant tests. The reason for making a preliminary evaluation before testing with users was to familiarize ourselves with the website and make a detailed and ordered assessment of all of the aspects of interest to usability. This first element must necessarily be complemented by the user observation tests. Without such tests we would be left with biased conclusions about the real problems of the systems.

Our investigation took as a sample ten terminology databases that can be searched free of cost on the web. All of them offer multilingual search options and pertain to different thematic areas. Our work is based on the following databases:

2. Results of the heuristic evaluation

Among the systems of usability evaluation which fall under the heading of "inspection" is the heuristic evaluation, which consists of having evaluators assess the interface following recognized usability principles (heuristics). The assessments are carried out individually, assuming the role of the user. Until the evaluation has been executed fully the evaluators are not allowed to exchange or combine results.

We undertook this part of the analysis taking as reference some of the more outstanding works on user-centered design (Simpson, 1985; Nielsen and Molich, 1990; Preece et al. 1994; Shneiderman, 1997) and in particular the heuristic principles set out by Jakob Nielsen (1994) and Bruce Tognazzini (2003). Taking into account the type of system under evaluation, we created a template that compiled analysis indicators relative to the following aspects: navigation, functionality, user control, language use, online help and user guide, system feedback, accessibility, consistency, error prevention and correction, and the architectural and visual clarity of the system.

To achieve a complete analysis with uniform parameters for all of the evaluators, we designed a questionnaire in the form of a template in which each of the variables could be assessed on the basis of three options: "yes, always", "no, never", and "sometimes", which were later computed as 0, 1, and 2, respectively (with 2 as the best score). Once the experts had given their responses, we recorded the positive, negative, and intermediate ("sometimes") answers. The responses of the evaluators were thus converted into numerical data (percentages) used to evaluate individually the aspects of usability of each of the terminology databases analyzed in this study (Table 1).


Yes

Sometimes

No

Navigation

54.2

37.5

8.1

Functionality

71.6

18.3

6.6

User control

37.5

62.4

0.0

Language and content

64.6

26.6

8.7

Online help

65.3

24.6

10

System feedback

24.9

69.9

5.0

Accessibility

47.3

45.3

7.3

Consistency

86.6

10.8

2.5

Error prevention

37.1

57.1

5.7

Architectural clarity

65.5

25.0

10.8

Table 1. Percentages obtained for the various aspects evaluated.

In cases where an established variable was not applicable to a certain database, the percentages were calculated taking into account the different number of variables. The sum of the positive, negative, and intermediate values for each database also made it possible to compare the databases to each other and determine which are the most usable and which are the least usable.

Following the analysis methodology outlined above, the resulting ranking (from most to least usable) is the following:

  1. Eurodicautom

  2. Euskalterm

  3. UBTerm

  4. WTOTerm

  5. Base de Terminologie

  6. Cercaterm

  7. Terminobanque

  8. TIS: Terminological Information System

  9. UNTerm

  10. OncoTerm

The general overview that we gained from the analysis of these websites shows that all of the databases need improvement, such as the application of standards to the source code of the sites. None of the interfaces indicates that its source code has been approved by the W3C. Every site available on the Internet should conform to the standards of the web and even more so those websites that intend to offer a quality service to a varied group of users. As we stated above, terminology databases are subject to use by a wide variety of users from very diverse workstations. We believe that code validation for these interfaces should be made a priority.

We also observed a very limited implementation of the mechanisms that improve the accessibility of the database user interfaces. Very few of the systems we evaluated systematically present alternative text for the images present in the interface. The same can be said of the titles and alternative texts of links.

The use of icons and visual metaphors in this type of interface should always assist in identifying and rapidly accessing the most common functions of the database. In this way the site will be able to achieve the highest levels of satisfaction among the individuals that use these systems to assist in their work: translating, writing, validating terminology, etc.

In spite of the fact that the level of satisfaction of the different databases we analyzed tended to have more to do with the content of the database than the way in which the data were accessed, we believe that a successful system of this type should combine both elements. Access to the content of a database will always be tied to the manner in which the content is organized.

Taking Nielsen's heuristics as a model and analyzing the results obtained through the expert evaluation, we conclude that terminological information retrieval systems should give special attention to the following aspects:

  1. Navigation. Information overload in the user's memory when querying a database should be minimized. It is always better to recognize than to remember. A well-designed navigation system aids information retrieval in these databases.

  2. Functionality. The functions should be explicitly stated in the description of the database: languages treated, profiles of targeted users, thematic areas covered, etc.

  3. User control. The users should feel that they control the tool and that they are free to navigate within it. It is very important that the interface and method of information retrieval be flexible depending on the user's familiarity with the database.

  4. Language and content. The information conveyed in the form of text should be understandable to the targeted users of the application, typically linguists in a broad sense (translators, correctors, technical writers, etc.) or specialists in a specific thematic area.

  5. Online help. The system should incorporate means for recognizing, diagnosing, and solving problems.

  6. System feedback. The users should have be aware at all times whether the interface allows searches or is simply a display of results, so that they always know what actions are possible.

  7. Accessibility. The database should consider accessibility guidelines, both for users with disabilities as well as those with technological limitations.

  8. Consistency. The pages within a website should respond to the same criteria in terms of graphic design (use of color, fonts, etc.), position of the elements in the pages, means of operation, etc. The web pages that conform to the W3C standards already meet this accessibility criterion, at least in part.

  9. Error prevention. The system should be designed to avoid user errors and should employ necessary means to keep users from committing them.

  10. Architectural clarity. The user interface should follow clear and minimalist design principles that help users find information quickly.

3. Results of the user studies

The evaluation of a product is not complete without performing tests with the end-users. Many authors have considered how to carry out such tests when dealing with information retrieval systems. In particular we have followed Su (1992, 1998) and Johnson, Griffiths and Hartley (2003), as well as Morgan (2006), Marcos and Cañada (2003) and Granollers, Lorés and Cañas (2006) for their techniques in working with users. When we had received the evaluator reports we began the user tests, hoping to detect more possible problems in the use of terminology databases and at the same time compare the results the expert evaluation with the observations of the users. For the study of each website we chose 6 individuals: 3 experts in linguistics and 3 non-experts in the field. All of the users were to have a basic knowledge of Internet use, and none of them should have previously used the database, or if so only sporadically.

3.1. What the user does: the usability test

The technique that showed the most results was the usability test. It consisted of a set of concrete tasks that the users performed expressing out loud what they did and thought while an observer took notes. In this case we did not use any recording devices. We prepared a series of 8 tasks and carried out the tests individually with 6 users over 40 minutes. The tasks were adapted for each database, as not all of the systems provided the same features (adaptable sections are marked in italics):

  1. Visit the website and find out which source and target languages are available.

  2. You live in France and the doctor has diagnosed you with amygdalite pultacée. Find the equivalent term in Spanish.

  3. Search for the collocations allowed by the word bolsa ('bag') in a technical field and in different languages.

  4. Find out in which other fields the term bolsa is used and which words other languages use to name the different types of bolsas.

  5. Search for the term repetición ('repetition') in the fields of sports and computing. (The objective of this task is for the user to try to restrict the search fields.)

  6. Find information about bacterias ('bacteria'). (In this case we were not looking for equivalent words but rather information about a theme.)

  7. Search for the words with the radical econom- in the different languages offered by the system.

  8. Send your opinion about the search service on this website to the webmaster.

We gave the tasks that were not completed a score of 0, the tasks completed with difficulty a score of 1, and the tasks easily completed a score of 2. Table 2 shows which tasks were performed most successfully in the user test, combining expert and non-expert results.

Task

Type of task

Score

5

Searching for a term in two fields

80.55

1

Clarity of the presentation of information (languages)

78.35

3

Searching of collocations

66.65

4

Clarity of results

65.75

8

Feedback

62.95

2

Searching for a term

61.65

7

Searching for a lemma

50.95

6

Searching for non-linguistic information

50.85

Table 2. Scores of task completion expressed in percentages.

Although we will not detail the problems that users encountered in completing the tasks, we did detect a difference in the way that experts and non-experts in terminology used the systems (Figure 1). The tasks were completed with scores ranging from 50.85% to 80.55%.

100000000000019100000100B554C14E

Figure 1. Results of the user test with experts and non-experts in terminology

3.2. What the user says: the interview and the questionnaire

Once the tests had been carried out, the observers interviewed each one of the users. The interview helped to raise subjective issues that did not always appear during the user test and let us know what the user experienced during the test. The interview was further supported by a questionnaire that users were to complete outside of the test site and return at a later date. In these tasks the evaluators raised questions intended to refine the results obtained during the user test relating to the usefulness, ease of use, speed, effectiveness and satisfaction of the system.

The aspects rated most highly were the ease of use of the databases, the clarity of vocabulary and of explanatory texts, and the speed of use of the resource. In contrast, the aspects which were given the lowest scores were the clarity of icons, menu texts, and structure of the pages, and finally the usefulness of the help page (Table 3).

Aspect evaluated

Score

Speed

68.45

Ease of use

67.95

Satisfaction

56.60

Effectiveness

55.65

Usefulness of the help page

33.50

Table 3. Results of the interview in percentages

Analyzing the results generally, we observe that the speed and ease of use are ranked highly, while the effectiveness and usefulness of the help page are the aspects most in need of improvement.

The majority of the users acknowledged that if they had used the database outside of work they would not have spent as much time and energy on the completion of the tasks, and that they would have resorted to other terminology databases or different types of linguistic tools. It is very revealing that the difficulty in using a system is directly related to the perception that a system is not useful. In other words, the usefulness is doubted when the usability is poor.

The most pronounced differences between the two types of users, expert and non-expert, are found in the aspects relating to speed and satisfaction. While the non-expert users felt satisfied with the databases analyzed, especially in terms of the results obtained, for the expert users the element of satisfaction received some of the lowest scores. In contrast, with respect to the speed, the expert users gave very high scores while the non-experts rated this aspect poorly. We believe that this is due to the fact that the non-expert users, lacking experience in the use of terminology databases, are slower in performing the tasks; nonetheless, they feel satisfied with the results obtained because they are unable to judge whether they could have been better.


User test

Interview

Average score

Difference

Cercaterm

73.95

83.35

78.65

9.40

Euskalterm

73.95

81.90

77.92

7.95

Base de Terminologie

77.40

53.55

65.47

23.85

TIS

72.90

55.20

64.05

17.70

Eurodicautom

63.90

58.35

61.12

5.55

Terminobanque

64.30

55.20

59.75

9.10

WTOTerm

48.95

63.55

56.25

14.60

OncoTerm

58.35

51.20

54.77

7.15

UBTerm

53.15

50.00

51.57

3.15

UNTerm

52.40

45.85

49.12

6.55

Table 4. Scores (in percentages) of the databases in the user tests and interviews

Some of the results that we have seen may seem surprising when compared to the results of the user tests (Table 4). Some of the databases, such as Cercaterm and Euskalterm, are consistently among the best; however, some others, specifically Base de Terminologie, TIS and WTOTerm, are not valued in the same way. The differences between the task completion scores and the degree of satisfaction shown in the interview (as well as in the questionnaire) lead us to believe that the latter is very subjective, and that the users do not always respond in a way consistent with how they performed the tasks. For this reason, the user test remains an indispensable technique.

100000000000025600000144BEB25815

Figure 2. Overall results obtained from the user tests

 

4. Conclusions

According to the analysis methodology presented, we obtained the following classification (from most to least usable) from the various tests carried out:

Database

Heuristic evaluation

User tests

Difference

Euskalterm

2

2

0

Eurodicautom

1

5

4

Cercaterm

6

1

5

Base de Terminologie

5

3

2

WTOTerm

4

6

2

UBTerm

3

8

5

OncoTerm

10

4

6

TIS

8

7

1

Terminobanque

7

9

2

UNTerm

9

10

1

Table 5. Comparison of the classification resulting from the expert evaluation and the user tests

As Table 5 shows, the results of performing a heuristic evaluation and those of working directly with the end-users present some differences. The average difference seen the comparison above was 2.8 places in the classification. One of the most outstanding cases was OncoTerm, which was ranked fourth according to the user tests but last in the heuristic evaluation. Three more cases to the opposite extreme were UBTerm, ranked third in the heuristic evaluation and eighth by the users, Cercaterm, occupying the top position according to the user tests but the sixth position as per the expert evaluation, and Eurodicautom, ranked first by the heuristics and fourth by the users. Except for these four cases, the rest of the databases show similar classification results with a difference of just one or two places.

We highlight the following as conclusions of the study:

  • Usability evaluation is a crucial step in the website design process. Only the application of methodologies with this specific focus allows serious usability errors to be corrected before this type of product is made public. Evaluation, as an intermediate step in all design processes, is thus a quality control that lets us release usable products confidently.

  • Terminology databases are products that will be employed by real users with different needs and in the most diverse circumstances. As such, the users must be involved in the design process.

  • Some of the problems detected in the heuristic evaluation did not surface during the user tests, in part because of the lack of a real context and a natural, concrete need, and in part because the prestige of the authors of a resource causes some problems to be disregarded. The opinion of the non-expert users differs even more from what is observed in the test. Their inability to discern the quality of the results leads them to think that the system has performed correctly, which is not always the case. Likewise, some of the features of the system cannot be easily accessed due to usability problems.

  • Websites that provide terminological search tools should offer their interface in each of the source and target languages handled by the database.

  • The information should be presented clearly, using a vocabulary that will not cause confusion.

  • Those responsible for online terminological tools should pay attention to the principles of usability. Not doing so results in a negative perception of the resource's content on the part of the users.

In general we assert that terminology databases, in spite of being accessible on the web, still show some of the characteristics common to information retrieval systems before the existence of the web, such as the complexity of making queries or the lack of flexibility in the presentation of results. Today's users have adapted their mental model to the new reference tools and are more reluctant to admit difficulties, because the web offers many alternatives that they use without hesitation should they encounter problems of comprehension or flexibility, or should the resources prove not to be usable.

 

5. Final note

We would like to express our gratitude to the institutions that have participated in our study in some capacity:

The Institute for Applied Linguistics of the Pompeu Fabra University, for its support in the completion of this study and for the grants given to Albert Morales and Juan Manuel Pérez (full-time professor at the University of Antioquia, Colombia).

The Ministry of Education and Science (2004-2007), "Semantic web and document information systems" project number HUM2004-03162/FILO on which Mari-Carmen Marcos works as a researcher.

Fundación Caixa Galicia/Claudio San Martín, who collaborated in giving a grant to Paulo Malvar.

Program Alban (European Union High Level Scholarship Program for Latin America) which collaborated in giving grant number E05M059270MX to Fernanda López.

The Spanish Agency of International Cooperation of the Ministry of External Affairs, which gave a grant to Hajar Benmakhlouf.

The Ford Foundation, which gave a grant to Pedro Hernández.

 

6. References

Abadal Falgueras, E. (2002). Elementos para la evaluación de interfaces de consulta de bases de datos web. El Profesional de la Información, 11, 349-360.

Borgman, C. (2001). Evaluating digital libraries for teaching and learning in undergraduate education: a case study of the Alexandria Digital Earth ProtoType (ADEPT), Library Trends, 49, 228-50.

Cabré, T., Codina, L., y Estopà, R. (eds.) (2001). Terminologia i documentació. Barcelona: Institut Universitari de Lingüística Aplicada.

Espelt, C. (1998). Improving subject retrieval: user-friendly interfaces and effectiveness. BiD: Biblioteconomia i Documentació, vol. 1, < http://www.ub.es/biblio/bid/01espel1.htm > [Retrieved: 04/04/06]

Floria, A. Evaluación Heurística [online]. 2000 < http://www.entrelinea.com/usabilidad/inspeccion/Heur.htm > [Retrieved: 04/04/06]

Granollers, T., Lorés, J., y Cañas, J. J. (2005). Diseño de sistemas interactivos centrados en el usuario. Barcelona: UOC.

ISO. 1997. ISO 9241: Ergonomic Requirements for Office Work with Visual Display Terminals, International Organization for Standardization, Géneve, 1997

Johnson, F. C., Griffiths, J. R., y Hartley, R. J. (2003). Task dimensions of user evaluations of information retrieval systems. Information Retrieval, 18 (3), < http://informationr.net/ir/8-4/paper157.html > [Retrieved: 04/04/06]

Knapp, A. et al. (2002). La experiencia del usuario. Madrid: Anaya Multimedia.

Krug, S. (2001). No me hagas pensar: una aproximación a la usabilidad. Madrid: Pearson Educación.

Manchón, E. ¿Qué es la usabilidad? Definición [online]. 2002 < http://www.ainda.info/que_es_usabilidad.htm > [Retrieved: 04/04/06]

Manchón, E. Evaluación por criterios o heurística [online]. 2002 < http://www.ainda.info/evaluacion_heuristica.html > [Retrieved: 04/04/06]

Manchón, E. Principios generales de usabilidad en sitios web [online]. 2002 < http://www.ainda.info/principios_generales.html > [Retrieved: 04/04/06]

Marchionini, G. (1995). Information Seeking in an Electronic Environment. Cambridge University Press, Cambridge

Marcos, M. C. (2004). Interacción en interfaces de recuperación de información: conceptos, metáforas y visualización. Gijón: Trea, 2004

Marcos, M. C.; Gómez, M. (2006) Idoneidad de las interfaces de léxicos y terminologías en la web. Glat: Aspects méthodologiques pour l'élaboration de lexiques unilingues et de multilingues. Bertinoro (Italia), 17-20 may.

Marcos, M. C.; Cañada, J. (2003). Cómo medir la usabilidad: técnicas y métodos para evaluar el uso de sitios web. En C. Rovira y Ll. Codina (Dirs.). Documentación digital. Barcelona: Sección Científica de Ciencias de la Documentación. Departamento de Ciencias Políticas y Sociales. Universidad Pompeu Fabra.

Morgan, E. L. et al. (2006). User-centered design. En Designing, Implementing, and Maintaining Digital Library Services and Collections with MyLibrary. Chapter V, < http://dewey.library.nd.edu/mylibrary/manual/ch/pt05.html > [Retrieved: 04/04/06]

Nielsen, J. (1994). Heuristic evaluation. En J. Nielsen & R. Mack (eds.). Usability Inspection Methods. John Wiley & Sons, New York, NY.

Nielsen, J. How to Conduct a Heuristic Evaluation [online]. 2002 < http://www.useit.com/papers/heuristic/heuristic_evaluation.html > [Retrieved: 04/04/06]

Nielsen, J. Ten Usability Heuristics [online]. 2002 < http://www.useit.com/papers/heuristic/heuristic_list.html > [Retrieved: 04/04/06]

Nielsen, J. Usabilidad. Diseño de sitios web. Madrid: Prentice Hall, 2000.

Nielsen, J., & Molich, R. (1990). Heuristic evaluation of user interfaces, Proc. ACM CHI'90 Conf. (Seattle, WA, 1-5 April), 249-256.

Preece, J. et al. (1994). Human-Computer Interaction. Harlow: Addison-Wesley.

Shneiderman, B. (1997). Designing the User Interface. 3rd ed. Reading, MA: Addison-Wesley.

Simpson, H. (1985). Design of User-Friendly Programs for Small Computers. New York: McGraw-Hill.

Su, L. (1992). Evaluation measures for interactive information retrieval. Information Processing and Management, 28 (4), 503-516.

Su, L. (1998). Value of search results as a whole as the best single measure of information retrieval performance. Information Processing and Management, 34 (5), 57-579.

Tognazzini, B. (2003). First principles of interaction design, URL < http://www.asktog.com/basics/firstPrinciples.html > [Retrieved: 04/04/06]



Creative Commons License
Last updated 05-06-2012
© Universitat Pompeu Fabra, Barcelona