Positioning evaluation of online terminological information systems

Mari-Carmen Marcos et al.

Citación recomendada: Mari-Carmen Marcos et al.. Positioning evaluation of online terminological information systems [en linea]. "Hipertext.net", num. 4, 2006. <http://www.hipertext.net>

  1. Introduction
  2. Methodology of the study
  3. Choosing keywords for web positioning
  4. Analysis of web positioning factors
    4.1. Keyword position and frequency
    4.2. Metadata
    4.3. Popularity, anchor texts and visitor traffic
  5. Conclusions
  6. Final note
  7. References

 

1. Introduction

Today, the majority of websites are accessed through search engines. It is therefore fundamental that their managers and designers ensure that they are well positioned within the search results, from the point of view of marketing and in order to provide the best service to their users.

The position that a website takes in the list of results when a user carries out a search is thus a very relevant aspect for persons responsible for websites. The actions necessary for improvement are explained as "web positioning" (Codina, 2004; Codina, Marcos 2005; Arbildi, 2005). The centerpiece is in asking yourself which words the potential users are likely to use in their searchers; once this has been determined, you can make legitimate use of the optimization techniques so that a specific website appears in a good position when the users search for information related to the contents of that website.

The websites that host terminology databases can-and should-make use of positioning improvement techniques to help potential users find these resources using search engines. Our study took as a sample ten terminology databases having free access online, presenting multilingual data, and pertaining to different thematic areas (Table 1).

Database

URL

Base de Terminologie

http://www.cilf.org/bt.fr.html

Cercaterm

http://www.termcat.net

Eurodicautom

http://europa.eu.int/eurodicautom

Euskalterm

http://www1.euskadi.net/euskalterm

OncoTerm

http://www.ugr.es/~oncoterm/

Terminobanque

http://www.cfwb.be/franca/bd/bd.htm

TIS: Terminological Information System

http://tis.consilium.eu.int

UBTerm

http://www.ub.edu/slc/ubterm

UNTerm

http://unterm.un.org

WTOTerm

http://wtoterm.wto.org

Table 1. Terminology databases studied

In this article we present the major results found in our analysis of web positioning and the improvement proposals, which can be applied to other linguistic search tools such as online dictionaries, thesauri, etc. We employed an analysis methodology involving an empirical observation of the factors considered by search engines (see the following section) and a multivariate statistical analysis to relate the various aspects analyzed with the positioning obtained for each keyword.

 

2. Methodology of the study

To be easily found by search engines, websites should improve all of the aspects that these search engines consider when ranking the results. Without explaining in detail the techniques of website optimization, we will cite some of the factors that the search engines appear to be using in their rankings (Codina; Marcos, 2005):

  • Frequency: the rate of incidence (absolute and relative) of the search term in the web page (which we call the "keyword"), keeping in mind that abusive repetition will be considered spam by the search engines. In this study we chose three possible keywords for each website.

  • Position: the location of the term within the page. The metadata and first paragraph of a site are more heavily weighted than other parts of the page.

  • Metadata: besides the metadata in the head section (principally title, description and keywords), other tags also provide descriptive information used by search engines, such as the titles of links and images, as well as the alternative (alt) text of the images.

  • Popularity: the number of incoming links that a web page receives. This factor is related to the PageRank determined by the Google toolbar.

  • Anchors: the texts that serve as links to reach the web page.

  • Traffic: the number of visits that a web page receives, considering also the length of the visit. This is the basis of the TrafficRank in the Alexa toolbar.

Of these factors, our study was fundamentally centered on metadata and popularity. The other factors were studied as well, but they did not present the results that would have helped us determine their importance in the positioning of the sites analyzed.

The study made use of two types of analysis. First, an empirical descriptive analysis attending to the aspects that the web positioning bibliography indicates should be kept in mind. Based on the results obtained during this phase, we proposed and tested some hypotheses using an ANOVA statistical analysis-a multivariable analysis which studies how a dependent variable is affected by the values of some independent variables.

To apply the statistical analysis we used the program Statgraphics Plus 5.0. The dependent variable was the positioning of a website, and the independent variables were the websites studied, the search engines used the keywords searched, and title, description, and keywords that appear in the metadata. The analysis gives as a result the p-value, which is the empirical level of significance of a comparison of hypotheses. A p-value less than 0.05 accept the null hypothesis, while a value above 0.05 accepts the alternative hypothesis.

 

3. Choosing keywords for web positioning

Before assessing the positioning factors in these websites, and given that this element constitutes the dependent variable of the statistical analysis, we compared the positioning results for three search terms for which we believed the websites should appear well positioned. We considered good positioning to be within the first ten results of the search engines that are currently widely used. We gave a high score (two points) to keywords for which the website appeared in the first or second position, a partial score (one point) if it appeared between three and ten, and (zero points) if it appeared after the first ten results.

The search terms chosen were translated to the principal language of the interface of each terminological database using the following criteria:

  • Keyword 1 (KW1): the name of the database.

  • Keyword 2 (KW2): a phrase describing the type of terminological resource in each case.

  • Keyword 3 (KW3): a phrase describing a specific quality of the terminological resource in each case.

KW1

KW2

KW3

Base de Terminologie

CILF terminologie

Terminologie specialisée

Cercaterm

Base de dades terminològica

Terminologia catalana

WTOTerm

Terminology database

World trade terminology

Terminobanque

Banque de donées

Terminologie spécialisée

Eurodicautom

Terminology database

Multilingual specialized terminology

Euskalterm

Banco terminológico

Terminología euskera

TIS

Terminological information system

Consilium terminological database

UBTerm

Base de dades terminològica

Terminologia catalana

UNTerm

Terminology database

United Nations terminology

OncoTerm

Base terminológica de oncología

Terminología oncológica

Table 2. Keywords searched using six search engines

In January of 2006 we performed the three searches in six of the search engines most widely used today (Google, Yahoo! Search, MSN Search, Altavista, Teoma and Vivísimo). The results obtained were very different for each of the keywords (Table 3):

  • The searches for KW1 were most likely to list the website in the first ten results, especially using Yahoo! Search and Google. This was not the case for Teoma and Vivísimo, which were only able to find these sites in 50% of the searches.

  • The results of KW2 differed in that two of websites did not appear within the first ten results, and the others did not appear in the first positions. Google and Yahoo! Search positioned these websites higher than the other search engines. In contrast, Teoma only placed four of the ten websites studied within the first ten results.

  • The search for KW3 also showed different results from the previous ones: although similar to KW2 in that some of the search engines did not list these websites in the first ten results, in this case Teoma gave the best placement, followed by Vivísimo and Google.

Database

KW1

KW2

KW3

Average of each database

Base de Terminologie

50,0

100,0

100,0

83,3

Eurodicautom

100,0

50,0

100,0

83,3

TIS

33,0

91,6

100,0

74,8

WTOTerm

100,0

25,0

91,6

72,2

Euskalterm

100,0

83,3

0,0

61,1

Cercaterm

91,6

0,0

50,0

47,2

OncoTerm

41,6

100,0

0,0

47,2

UNTerm

91,6

25,0

16,6

44,4

UBTerm

75,0

33,0

0,0

36,0

Terminobanque

100,0

0,0

0,0

33,3

Average of each KW

72,3

50,8

45,8

83,3

Table 3. Ranking of the databases based on their average positioning across the three keywords KW1, KW2, and KW3. The maximum value of 100 indicates a placement in the first two results. Intermediate values, near 50, indicate placement between three and ten, and a value of zero indicates that it was not in the first ten results.

The statistical analysis confirms that there is a significant difference in the positioning of the websites studied depending on the keyword used in the search; the positioning obtained using KW1 was much better than that obtained using KW2 and KW3 (Figure 1). This also shows that the use of one search engine or another does not cause significant differences in the positioning results, although Google, Yahoo! Search and Vivísimo place these websites higher than the other three search engines used.

10000000000001CD000000CC311F8926

Figure 1. Web positioning in relation to KW1, KW2 and KW3

 

4. Analysis of web positioning factors

On the basis of the factors indicated in Section 2, we present the most relevant results obtained through the empirical descriptive analysis and the statistical analysis.

4.1. Keyword position and frequency

Although we agree that these are important factors in improving web positioning, we decided not to consider them in this study for two reasons. First, the query interfaces of searchable terminological resources do not include enough text to be able to determine ideal frequencies. Second, except for the name of the database, the other keywords established (KW2 and KW3) tend not to appear in the systems we studied, and would not allow us to make comparisons.

4.2. Metadata

Regarding the metadata section of the websites studied, we found that 90% include information in the title tag, 40% in the keywords tag, and only 20% in the description tag (Table 4). We did not consider the image title and alt tags, as the number of such tags in the sites we studied was very low if they had any at all, and they would not affect the positioning of these websites.

Database

Title

Keywords

Description

Base de Terminologie

X

-

-

Cercaterm

X

X

-

Eurodicautom

X

X

-

Euskalterm

X

X

X

OncoTerm

X

X

-

Terminobanque

X

-

-

TIS

X

-

X

UBTerm

X

-

-

UNTerm

-

-

-

WTOTerm

X

-

-

Table 4. Use of metadata in the websites of the databases studied

In the statistical analysis, we found that only the metadata tag that significantly influenced the ranking of a website was the description tag (Figure 2), while the keywords tag did not have much affect on the results obtained (Figure 3). We were not able to assess the influence of the title tag, as nine out of the ten websites included it, and we did not have enough information to establish comparisons.

10000000000001CD000000CC311F8926

Figure 2. Web positioning depending on the presence (2) or absence (0) of the metatag "description"

10000000000001CD000000CC311F8926

Figure 3. Web positioning depending on the presence (2) or absence (0) of the met tag "keywords"

4.3. Popularity, anchor texts and visitor traffic

Popularity, understood as the number of incoming links that a website receives, can be determine partially by performing a search with "link:" feature which is offered by some search engines. Here we show the values given by Yahoo! Search (Table 5), which are much higher figures than those given by Google.

Database

Incoming links (Yahoo! Search)

Eurodicautom

2,340

Cercaterm

1,290

Base de Terminologie

1,030

TIS

458

Euskalterm

243

Terminobanque

163

WTOTerm

104

UNTerm

79

UBTerm

49

OncoTerm

9

Table 5. Database ranking by the number of incoming links a page receives according to Yahoo! Search

Comparing the results obtained in Tables 3 and 5, we observe a fairly direct relationship between the number of incoming links of each website and the average of the positioning results based on the three keywords. The databases that are cited most frequently are those that are best positioned for these search terms. At the other extreme are the sites UBTerm and UNTerm, which do not have many incoming links and are poorly positioned for the keywords. The case of Cercaterm, which is not well positioned but receives a lot of incoming links, is due to the fact that the most of the incoming links do not refer directly to the database, but rather to its institution (TermCat). To access the main page of the database the user has to enter information in the home page of the institution, making it difficult for the search engines to access it directly. In this case, if they do not change the mode of accessing the database, they should optimize the home page of the institution to improve the positioning in searches related to their terminological resource.

As far as the anchor texts used in these links, we reviewed the first five in the list of results offered by Google. The result we obtained was that, besides KW1 (the name of the database), the keywords we chose in this study were not used. As such this factor that did not serve as a basis for making comparisons or assessing its relevance in web positioning.

We also recorded the PageRank value given by Google, although we concluded that it was not of interest to this study because the method of calculating this value is unclear. For the same reason we chose not to consider the TrafficRank given by Alexa, which furthermore is not specific to the web page of the database analyzed but rather to its home page.

 

5. Conclusions

Although the search engine optimization techniques tend to be fall under the category of marketing, we have a broader vision of this concept. We believe that it is not only a matter of publicity, but also of service to the users. Even services that are free to use, such as the databases we studied, should utilize the necessary (and legitimate) positioning techniques; this is the first step towards being usable.

As far as the methodology we employed, the empirical descriptive analysis proved useful in confirming that none of the websites studied has carried out a positioning campaign. In fact, in the majority of cases they have not even included metadata information. Of the three basic metadata tags that we mentioned, 'title' was the most frequent (used in 90% of the websites analyzed), followed by 'keywords' (40%) and finally by 'description' (20%). They have not followed a link policy to achieve the best visibility, neither for potential users nor for search engines. On the other hand, the multivariable statistical analysis (ANOVA) proved valuable in confirming the observations made and verifying hypotheses. The joint application of the two methodologies leads us to affirm that the metadata tag 'description' plays an important role in web positioning, while there is no such evidence in support of the 'keywords' tag (Figure 4). We assert that the use of metadata tags is not a decisive factor for web positioning, as seen by cases of well-positioned sites without this information. Thus we should not neglect the other factors that influence positioning, such as popularity.

In fact, we observed that there does exist a correlation between the popularity of the websites analyzed (the number of incoming links they receive) and their positioning for the keywords chosen. This demonstrates that the popularity of a site plays and important role in its placement among the results given by the search engines. The popularity is influenced by the appropriate use of anchor texts, in this way strengthening the keywords chosen by the institution for their website's positioning. In the case of the websites we studied no such policy has been followed.

Regarding the search engines used in this study, Google displayed the most regular behavior in its use the metadata provided by websites, and Google, Yahoo! Search and Vivísimo are the three search engines that showed the best positioning for the keywords chosen for these ten websites. The majority of the search engines place the websites higher when the search term is very specific (KW1). The exceptions are Teoma and Vivísimo, which, to the contrary, position sites higher when using more general search terms (KW2 and KW3). For example, in the case of Eurodicautom, the website does not appear in the first results when KW3 is searched using Yahoo!, where as in Teoma it is placed second.

10000000000001C7000000CB53CD50C5

Figure 4. Web positioning using the following six search engines for sites with the meta tag 'description' (red line) and without (blue line): Google (1), Yahoo! Search (2), MSN Search (3), Altavista (4), Teoma (5) y Vivísimo (6)

 

6. Final note

We would like to express our gratitude to the institutions that have

participated in our study in some capacity:

The Institute for Applied Linguistics of the Pompeu Fabra University, for its support in the completion of this study and for the grants given to Albert Morales and Juan Manuel Pérez (full-time professor at the University of Antioquia, Colombia).

The Ministry of Education and Science (2004-2007), "Semantic web and document information systems" project number HUM2004-03162/FILO on which Mari-Carmen Marcos works as a researcher.

Fundación Caixa Galicia/Claudio San Martín, who collaborated in giving a grant to Paulo Malvar.

Program Alban (European Union High Level Scholarship Program for Latin America) which collaborated in giving grant number E05M059270MX to Fernanda López.

The Spanish Agency of International Cooperation of the Ministry of External Affairs, which gave a grant to Hajar Benmakhlouf.

The Ford Foundation, which gave a grant to Pedro Hernández.

 

7. References

Arbildi, Iñigo (2005). "Posicionamiento en buscadores: una metodología práctica de optimización de sitios web". El profesional de la información, v. 14, n. 2, pp. 108-124.

Azlor, S. (2003). Posicionamiento en buscadores: guía básica, < http://www.guia-buscadores.com/posicionamiento > [Retrieved: 11/04/06].

Bruce Clay Inc. Internet Business Consultants. Search engine optimization, < http://www.bruceclay.com/web_rank.ht > [Retrieved: 11/04/06]

Calishain, T.; Dornfest, R. (2004). Google: los mejores trucos. Madrid: Anaya.

Codina, Lluís (2004). "Posicionamiento web: conceptos y ciclo de vida". Anuario Hipertext.net, < http://www.hipertext.net > [Retrieved: 11/04/06].

Codina, Lluís; Marcos, Mari Carmen (2005). "Posicionamiento web: conceptos y herramientas". El profesional de la información, v. 14, n. 2, pp. 84-99.

Gonzalo, Carlos (2004). "La selección de palabras clave para el posicionamiento en buscadores: conceptos y herramientas de estudio" Anuario Hipertext.net < http://www.hipertext.net > [Retrieved: 11/04/06].

Kent, Peter (2004). Search engine optimization for dummies. Hoboken, Wiley.

Sullivan, Danny, Search Engine Watch, autoridad en la materia, < http://www.searchenginewatch.com > [Retrieved: 11/04/06].

Thurow, Shari (2003). Search Engine Visibility. Indianapolis: New Riders.



Creative Commons License
Last updated 05-06-2012
© Universitat Pompeu Fabra, Barcelona