Alicia García de León; Jorge Caldera Serrano
Citación recomendada: Alicia García de León; Jorge Caldera Serrano. Internal Site Searches: An approach to quality criteria [en linea]. "Hipertext.net", num. 5, 2007. <http://www.hipertext.net>
1. Introduction: information retrieval on the Web
2. Internal website information retrieval
2.1. User behaviour when in a website
2.1.1. Hierarchical navigation
2.1.2. The use of internal and local site searches
2.1.3. Hierarchical navigation versus internal searches
3. Internal Site Searches
4. Possible solutions for including an internal site search
5. Factors to consider relative to the quality of an internal site search
5.1. Factors to consider when choosing a search engine
5.2. Factors to consider when installing an internal site search
5.2.1. Presentation and location
5.2.2. The access and user interface
5.2.3. Results presentation
5.2.4. Help, support documentation and tool information
Every day we are witness to the growth in volume of information on the Web. There are enormous challenges for retrieving pertinent information found online, thus reflecting the exponential growth of scientific and technical information, and its accumulative nature. In terms of the human capacity to read, users are limited to reading only that which fits their needs, especially when up against such a volume of information that surpasses any user's ability to capture and understand it all.
There are many reasons for this difficulty, including the problems with presenting information on the Web. Moreover, there is no generalised standard adopted in creating them; there are an outstanding number of them; the majority is not structured; and websites are developed without a governing or centralised organism and regulating mechanisms.
Both the website creators and users lack a critical organism, a common problem for all young tools having quickly developed, growing rapidly without bothering to also study and reflect on it.
Meanwhile, the ease for publishing online, a basic advantage of the tool, is at the same time a detriment to its quality and proper retrieval, thus allowing sites on the Web to contain documents with dubious rigour, usually lacking standardisation.
After an initial relentless growth period, and even in spite of this growth, large efforts have been made by different agents (website and tool creators, information professionals and users) to facilitate pertinent content retrieval on the Web.
Among these efforts we can distinguish the creation of:
the constant improvement of search tools and algorithms;
strategies for optimising these tools´ response positioning;
general and specialised directories to unite and select Web information, including with a value added;
different metadata structures that aim to add information on a website attributable to its text;
guidelines for standardising websites;
quality criteria lists;
growing interest in institutional, university and research fields for a greater optimisation.
Just like efforts have been made for proper information retrieval on the Web, we should work towards guaranteeing that once on a website, access to the information be fluid and efficient.
What happens when the user is faced with a website? How does s/he procure the information s/he needs?
A user arrives at a website either from knowing the address (URL); through a link provided by another source; or from the information provided by a search engine or directory. It could be a homepage or a site's secondary page, but once there: How does a user find the information required in that structure?
There are various positions on how the user behaves when looking for information within a website.
Some researchers claim that users prefer to surf within the site rather than use an internal search engine; others consider that users are inclined to use internal search engines.
Researches like Jared M. Spool , based on user behaviour studies, consider their behaviour within a website as hierarchical navigation.
The methodological rigour in Spool´s research was questioned by his peers. We must also understand that his research was based on behaviour with users on commercial websites, thus we can not generalise it as a universal behaviour. Other users have: different attitudes, demands, strategies and behaviours.
Hierarchical navigation is a prevailing or minority behaviour that users can use when visiting a website in search of specific information.
A website should be a structure of information. This view must be present at its conception and creation. Effective "internal" navigation mechanisms must be pre-planned with the aim of optimising access to the information accessed.
A suitable information structure is as important as the content itself. It is not enough that the information exists; it must be organised so as to facilitate access and be presented as whole. Reading on the Web is not sequential and adequate forms for reading must be provided for direct and useful browsing.
The concept of hierarchical navigation presupposes that the information presented in a website is structured and hierarchical, something which is not a constant. Tools like sitemaps often are impossible to create due to their lack of a graphically expressible organisation.
There are different resources in website that facilitate internal navigation:
Site menus with keys or links;
Side bar with links;
Icons that orientate a user within the whole site and within each page.
Those who sustain the predominance of this type of navigation attribute it to the internal search engine's lack of quality.
Also from behavioural studies, Jakob Nielsen , and other researchers, considers that more than half of website users use the internal search engine: one fifth of them use hierarchical navigation and the rest use a combination of behaviours.
This understanding, sustained by Eduardo Manchón  among others, considers that " the majority of users receive their first impressions of websites through internal searches," and therefore, " search engines are crucial tools " for users to find the information they demand.
They also sustain that the user usually resorts to the use of search tools because of the lack of structure in the majority of websites.
However, we can not deny that the mere existence of an internal search engine does not guarantee effective information retrieval. Users do not always find user-friendly internal search tools or the proper help, at times they do not use the adequate terms to carry out their searches, or they do not know how to elaborate equations that bring them closer to their aims.
As we have seen, researches confront dissolving the leading user behaviour in finding information.
Hierarchical navigation and the use of internal search engines are not opposites, but complementary. Their inclusion depends both on the user's characteristics and experience, along with each search's situation and the type of content being sought.
The quality of a website is maintained by the development of a good structure, which must always be present. Philippe Vanhoolandt  in his article "Les moteurs de recherche internes" proposes this vision: " En d'autres termes, l'existence d'un moteur interne ne dispense pas le Web mestre d'une analyse approfondie de la structure et de l'arborescence des documents".
Moreover, the presence of an internal search engine is a great support that allows easy and fast access to any specific content. When a navigation system is badly structured, the navigational logic becomes almost impossible. In these situations, it is essential that information retrieval is facilitated not through the structure's logic or the sitemap, but through an internal site search.
Therefore, there is a key and essential difference that helps us understand this debate: hierarchical navigation allows us to surf the structure and search for specific information within the pages of a website. But through an internal search engine we find the occurrences of a term or set of terms. As a consequence, each of the behaviours provides a beneficial yet different product, of a different nature and for different aims. One type of behaviour does not substitute another.
We consider search engines any computer programme that indexes and stores information after examining a public domain website. With this material they generate databases that permit word searches for information retrieval.
Just like there are Web search engines for the whole public Internet, search engines that index and store only a single website's information are called internal or local site searches.
A website's internal site search is a tool to help site users quickly find the pages with specific words in a limited environment.
Some authors consider that when a website has more than 150 pages  an internal search engine is required; others say the cut off is at 200 pages . Toni Vicens Arasanz  states: " its usability significantly conditions a website's total usability, and should be paid special attention to."
The existence of an internal search is indicative of the website's quality and the creators´ effort and rigour. Just like Vicens claims, it provides an "emotional impact" on the users that experience its presence as a token of professionalism, user-friendliness and confidence.
The quality of a search engine is what makes it a success; just having one is not enough. A simple, deficient and improvised solution may be a reason to give up. This is how Vanhoolandt  interprets the results of various search behaviour studies: " Contrairement à une idée reçue, la présence d'un moteur sur un site ne facilite pas forcément l'accès à l'information... Les raisons ? Principalement les fautes de frappe, la méconnaissance du langage d'interrogation (opérateurs, troncatures, différences entre pluriel et singulier), le peu d'intérêt accordé à l'aide en ligne et l'absence de contexte dans les résultats affichés".
Many users judge the quality of a website's content on the results that come up from the search queries. If a website's search is not adequate, the user's first impression of the site will be negative, and stop continue using it.
An internal search engine may also contribute to a creator's evaluation and better understanding of information retrieval on their site. As a consequence, it can become a point of inflection for making changes in content and design according to a transactional analysis of the search engine's formulas and methods, registering queries for later analysis.
When including an internal site search on a website , Vanhoolandt  distinguishes between three possible options:
using an internal site search tool supplied by those who host the website;
using the distance indexing services provided by many organisations. Here all activities are performed externally to the site, becoming a visited object;
installing a tool in the server that will be controlled by the site administrators.
The first solution involves contracting a web hosting service that provides quality service at a reasonable cost.This generates a dependency on the service providers, and these tool's suitability and features vary. If the tool does not fulfil the expectations of the site creators or the users, this tool can be temporary.
The second solution requires external indexing. For this a site visit schedule must be agreed upon with an external search from another server.
An interface is created on the site for searching the database. The query is formulated on the indexed site, launching the search of the externally created and hosted database.The user receives the response and can interact with the search fully acknowledging that it is operated externally. This service is called an Application Service Provider (ASP).
This option has some advantages:
many services are free of charge;
some services are provided by well known companies and organisations;
it can be a high quality tool;
these tools have their years of experience backing them up;
there is a large body of critique for these services;
large installations or configurations are avoided;
the access interface may be locally customised;
human and material resources are limited;
time is saved by the administrators who transfer the activities;
the updates and maintenance are provided by the service providers;
they do not take up space on the website's server.
However, there are some disadvantages and difficulties:
one is dependent on an external server, and depends on their fate;
the tool may be only partly customised;
the interface may be only partly customised;
the service provider's demands may vary:
the updates may not fit the site's features.
Some organisations use more than one type of indexing. They use these services and an internal search found within it.
Figure 1. The World Meteorological Organisation's (WMO)website where two types of indexing are offered http://www.wmo.ch/index-sp.html
The possible third solution requires the installation of a search engine within the site controlled by the administrator's themselves. This requires an extra effort and investment in time. But at the same it allows one to develop custom solutions. Here in this article we will basically focus on this last option.
Without a doubt, the factors discussed below may also be useful for helping one decide between one of the first two options.
Hassan Montero  states " The level of utility and need for complexity on an internal search varies widely from site to site."
Each website is different, with its own mission and specific audience. It is not an infinite amount of power or unlimited storage capacity what makes a good search. Instead, it depends on its audience's need. The selection and implementation of an internal site search should pay special consideration to this aspect.
For the sake of categorising our research, we distinguish between:
factors to consider relative to the programme's quality as a tool allowing for an internal search. We will call this programme a search engine or custom software.
factors to consider when installing the search engine.
Many of these aspects to be considered belong in both groups, or while in one group it may show up in the other.
Cost : This is one of the first points to consider. We must know if we can and want to spend money, understand the payment options, free or free software, and analyse the pros and cons of an institutional contract with an existing or desired corporate service.
Technical requirements: It is important to understand the programme's technical requirements, both in terms of the operative system and the equipment, installation and maintenance. If using free software, one must be aware of the advantages as well as the commitment involved in developing this type of software.
Support: It is important to know whether or not it has an institution's or company's support guaranteeing responsibility, new versions, maintenance and updates. Depending on an organisation's structure that becomes cut short and does not follow the evolution of these tools could be a very serious mistake.
Customer service: Part of a tool's support is customer service. One must find out if it has a good customer service or communication tools with the users or between users themselves, along with a list of emails, blogs or news services. Customer service includes manuals and help along with updates.
Installation complexity: One must know how much work is involved when installing the programme chosen, the necessary intermediate goods both in terms of human resources and time.
Ability to customise: A programme's ability to be customised is key, including if it can fulfil the website's user requirements.
Robustness: This term from Serrano Cobos and Quintero Orta  describes the tool's ability to produce the least possible amount of errors.
Default features: One should know its functions, which features come with it and the possibility of customising them to other areas, including:
does it provide an interface capable of simple and advanced searches;
are Boolean functions permitted;
are other functions permitted;
are proximity operators permitted;
are truncation and stemming permitted;
can synonyms and polysemies be excluded;
can compound terms be specified (with quotes or literal phrases);
are foreign language characters recognised (ex. "ñ" or "ç");
does it handle clustering;
can capitalised or lowercase letters be distinguished;
can "empty words" be excluded, or is a stopword incorporated and adapted, or can it be created;
are all file types indexed: txt; doc.; html or.pdf and what must be done to file them;
can scoped searches be performed;
can searches be limited by fields;
can searches be limited by the document's date?
We must also know:
The number of pages that can be indexed, along with the time, frequency and action complexity.
Relative to the algorithm/s indexed, how are term occurrences weighed up to positioning aims? For example, the following must be considered: if the results are organised by number of term(s) occurrences, or are they weighed according to their location in document.
If indexing metadata, must specific standards be followed?
Can images be indexed?
If it has a spell-check, since users often make spelling mistakes when writing their queries, and it is useful to help by suggesting another term.
Can the option of viewing other formats like PDF in html be included?
Do the references offer the option to view similar pages?
It is important to keep in mind: How are the search results presented? How is the default results page? Can this page be adapted according to the user's demands? How is each reference it returns? What information is included? Does it indicate which words were included the query? By what criteria are the results organised? Are there different positioning options?
Relative to the search interfaces, it is best if it can be created freely and plenty of help is offered to the user (in their native language if possible) along with the option of limiting a search from the results.
The quality of an internal site search does not exclusively depend on the programme or search engine selected.
The decisions adopted when installing are key for the tool to be successful. The use of a quality programme does not always necessarily correspond to the creation of a quality search. That is, the same programme can give way to very different products.
Work must be done to suit the programme to the site's aims and the user's needs.
It is best if the internal search appears on the website's homepage or initial access.
Since its mission is to give the user a better orientation of the site's structure, the user must know it exists and be able to find it easily. It is not recommended to have a search and not display it clearly, or force people to look for it.
Within the page, the ideal position is in an upper hierarchical area such as: the upper-right corner, or mid-upper area.
Figure 2. The internal site search in Alicante University's homepage http://www.ua.es/es/index.html
It is convenient to choose a standard location for the search, facilitating it being found, and allowing the user to associate the tool with a physical location, thus reinforcing its corporate visual image.
Some sites do not place a "search box" on the homepage but a link to another page with the search, for example:
The magnifying glass icon in the homepage's navigation menu indicates the access to the internal site search in Madrid's Carlos Tercero University. http://www.uc3m.es/
This icon provides access to the search interface in the 4 th Pan American Health Library Sciences Congress homepage http://www.bireme.br/crics4w/progtx.htm.
However, it is not always the homepage where the user finds a website, so it would be best for the internal site search to be available on all pages of the website. That is, it is important that internal searches can be made from any internal location within the structure.
For example, in the World Bank's website, access to the internal search is in the footer of each page.
The access interface is the system of commands and menus through which the users communicate with the programme. It is the structure that makes communication possible, informing us what actions are possible, the current state of the object and the changes produced, allowing us to act with or on the tool. It is the border or common area between both fronts: the search engine and the user's searches.
Larson  defines it as " one of the most important components of any computational system, since it works as the link between the human and the machine. The user interface is a set of protocols and techniques for exchanging information between the computing application and the user."
The access interface is the user's first contact the tool, and should present it to the user. If the interface is not suitable, the tool or search engine's abilities are not properly presented, along with the other features provided (advanced searches, results list, error message, help document), which are silenced and their capabilities lost.
1. Locating, identifying and clarity
It is not only important that an internal site search exists, and that it can be located easily, it is important to identify it as such.
Yusef Hassan Montero  states: " A search must present itself in a standard format so that the user, with his/her experience surfing the web, can perceive its function clearly..."
A basic usability standard is that a tool should have a simple and clear interface for the users to easily identify. The most standard format for internal site searches is a text field with a button along side it reading "Search," or other similar formats.
Other unclear words should not be used, for example, " go " or " trouver " or " allons-y, " except if used with the words find or search.
2. The size of the box
The text box should be large enough to see the words entered in the search, or at least part of them. This will enable the user to see any possible spelling mistakes or typos.
Internal site search text box form the French Liberátion online newspaper http://www.liberation.fr/
Figure 4. Advanced search page for Bologna University(Italy) http://www.unibo.it/ , with a larger text box.
A larger area provides the possibility of verifying the query and revising it. This directly influences on the quality of the obtained results, and may diminish the level of disappointment often associated to these experiences.
Wider text boxes are especially useful for accessing advanced searches, like in Bologna's University page.
3. Simple search
Researches in this field sustain that the user's first contact with this tool should be a simple, direct and intuitive search.
The majority of users do not use operatives when formulating queries, nor do they attempt to constrain them to a part of the Web. Upon first contact, they also do not attempt to limit the documents by date, language or type of file.
These options should be provided as an advanced search, and a complex presentation of the interface would discourage users, and harm future links.
Therefore, Steve Krug  states: " ... Why doesn't the search engine take what I have introduced and present me with its results without first asking me to define what I am looking for, without having to specify what I am looking for...?"
Users feel comfortable with simple searches, which does not mean that more complicated and precise searches shouldn't be available too.
Figure 5. Montreal University's site provides the advanced search from the simple search: http://www.umontreal.ca/
Searches by sections, or scoped searches, allow users to limit their query to a specific area of the site. This is something that the user may not know how to handle, overwhelming him/her on their first contact with the website, where they do not know how the information structure works and its different sections or areas. On the other hand, it is possible that they want to know the existence of content beyond its areas and sections.
Figure 6. The search page on Madrid's Complutense University webpage
http://www.ucm.es/info/ucmp/index.php provides the option to limit the simple search by areas.
In a similar fashion, some sites present a search by area from the simple search, finding it useful and pertinent to its content and users.
Following the same idea of always presenting a simple form, providing access to an Internet search from the internal site search is deemed inappropriate.
4. Advanced search
As we have already mentioned, it is not recommended that the advanced search be presented at first contact with an internal site search. That is, the first contact should be through the simple search.
The advanced search should be an option, usually on a new page which can be accessed from the initial presentation interface, as well as on the page with the simple search results.
An advanced search must show users all the possible forms of limiting and specifying the query. This implies: the use of operatives and limiting fields by: date, section within the website, languages, type of file, etc.
The interface must clearly provide all of the necessary elements for formulating and operating a complex search. It is not used by the majority of the users, whom usually do not have the experience or knowledge of its capabilities.
The advanced search interface should be on the same page as the help page, helping the user and facilitating the dialogue with its design. Ergonomic considerations must especially be kept in mind. If the design is inappropriate, all of the search's capabilities, the search engine's power and the efforts in installing it would be lost.
Since the user does not have to be an expert in formulating Boolean searches, some interfaces use words substituting the operative functions. So the operative "and" is substituted by "all words."
The advanced search interface is associated with a search's help page. A help text can be created on a separate page, or help can be provided on the interface's graphics. Many searches provide both forms. A help text is provided and specific help is provided on the interface's functions. In both cases, the texts should be brief, simple, clear and with plenty of examples. Access to the help must be clearly presented.
It is important for the advanced search pages to clearly indicate that a simple search is available, with access to it. We must also remember that the user may first come across the advanced interface, and s/he should know that other search options are available.
Ergonomic and technical specifications on the user/computer dialogue systems should always be kept in mind, both in advance search pages as well as all of the website's pages. For example: the contrast between the letters´ colour and background; font; margins and space clarity; the existence of a single corporate visual identity; the presentation of a homogenous and constant image (always using the same icons in the same place) facilitating user's associations and favouring quick downloading, taking advantage of the browser's cache.
The results page of a search is as important as the access interface. This page presents the tool's product.
It is important that it presents the total number of items retrieved, and the equations that generated this result. It should be clear and contain the necessary information so that the user can easily decide if they are interested in the results or not. It should not be too extensive to overwhelm, nor so short as to limit conclusions. In difficult retrieval cases, messages should be presented such as "No results found." These messages should come accompanied by a new query proposal.
Figure 7. Example of the information provided when noresults are found on Spain's Rediris webpage http://rediris.es
Even though a result exists, many search engines provide the user a limited search, presenting a list of references, a new text box with the previous equation (which can be edited), while other searches even suggest new terms or a spell check.
2. Result's content and presentation
In comparison with search engines that try to retrieve information from the whole Web, an internal search may offer greater clarity, both when presenting the results as well as in its order. The key is in that being responsible for creating the webpages included in the search engine, we can design them in accordance with their ability to be read and evaluated by the search tool used. In return, the tool selected should include the unique aspects of the sites document's, not straining the structure.
This is the stage that influences the following: the decisions involved in choosing a search engine (as discussed in point 5.1) demonstrating that the user's requirements were kept in mind when installing and designing the search engine. Within the limits of controlling the documents, meaningful and selected titles can be selected, along with the use of metadata with pertinent descriptions.
We see here what data each reference can present:
Title (which is usually the document link)
Last modification date
The section the document is in
The level of pertinence the search engine gives it with regards to the query
Similar documents, sometimes using "more pages like this."
The custom description from the Meta Description in the metadata is an enormous help to the quality of each reference. A custom made text can not be substituted by the first lines of a page or random lines of the complete text.
Many searches allow a user to select the way of viewing them. Some of them let you choose between complete or abbreviated forms.
Figure 8. Example of the long RAUdo reference list in the RAU search engine (http://www.rau.edu.uy/raudo/buscador.htm), including: pertinence indicator (#), title, pages´ first words, URL, last update and page size.
Figure 9. Example of the short RAUdo reference list. Here the RAU search engine only presents the title, with a link to the long form.
It is useful that the terms in the query equation appear highlighted in each reference.
Figure 10. Reference of the internal site searchat McGill University, Canada http://www.mcgill.ca/ , where the search term is highlighted.
3. Results order criteria
The order criteria adds quality to the result since they are directly related to pertinence and relevance. If the first references in the results are not very meaningful, the user will get a negative impression of the tool since without the option of viewing more results, the user gives up on the query, and often on the search engine too. Here we see the results of the choice of search engine and its installation. There are different types of orders:
by term occurrence;
by term occurrence with respect to total terms on a page;
by term location in specific document areas;
by last update;
by algorithms combining these and other aspects.
Many search engines allow users to select the order priority once the query has been launched.
We have already mentioned the role of help in a search. Besides textual and graphic help on the interface, there are other possible ways: introductory texts and search assistance.
It is important that these texts remain brief and clear, easy to find and with examples.
Figure 11. Internal site search help page on the Centre Hospitalier Universitaire de Rouen (CHU de Rouen) webpage http://www.chu-rouen.fr/. Information is provided from the simple search box on the site's homepage.
Other forms of assistance include messages provided throughout the operation, like for an error, suggestion or alternative text through keys and icons. The tool's quality is shown when a search engine provides technical information such as name and version of the programme used, names of those involved in creating it and even information on design decisions. Other data concerning maintenance like regular update schedule and the last update gives the user a sample of its quality.
We are dealing with a vital tool for optimising internal information retrieval in a website. Because of its unquestionable impact, it deserves to be thought of as a crucial, high quality project. Since we are responsible for creating the pages and content in our website, we are responsible for designing ways for this information to be accessible.
This involves joining the ideas of "finding webpages" and "search engine" and making sure that the mechanisms we provide to view our content do them justice.
This joining must be facilitated by two factors:
Since we create our pages, we can find them, and customise them to some extent.
On the other hand, we choose the search engine and the way it is installed.
It is dangerous to make documents so that they are read by certain search engines since we risk altering them. It is equally dangerous to use an improperly installed powerful search engine that upon seeing the singularity of our documents is underutilised. However, as we have previously mentioned, the development of an internal site search is a contribution to quality, and not a guarantee. A website must be a structure of information conceived and created so as to facilitate information retrieval. Behind all this effort lies our attempt to understand the user, know him/her and facilitate their access. This effort goes beyond the mediums and algorithms. There is still a long way to go in terms of search engines, implementation and the generation of documents, as well as work to be done in standardising, training and dissemination.
 Spool, Jared M. Are There Users Who Always Search? [online] 2001 http://www.useit.com/alertbox/9707b.html.18 Dec. 2005.
 Nielsen, Jakob. Search and You May Find [online] 1997 http://www.useit.com/alertbox/9707b.html.18 Dec. 2005.
 Manchón, Eduardo. La conducta de navegación de los usuarios, sus características [en línea] 2005 http://www.wikilearning.com/la_lectura-wkccp-959-1.htm.18 Dec. 2005.
 Hassan Montero, Yusef. Buscador Interno. [online] December 2002 http://www.nosolousabilidad.com/articulos/buscador_interno.htm. 18 Dec. 2005.
 Instone, Keith. Site Usability Heuristics for the Web [online] 1997 http://www.webreview.com/1997/10_10/strategists/10_10_97_2.shtml.18 Dec. 2005.
 Vicens Arasanz, Toni. Normas básicas de usabilidad para buscadores internos [ online ] 2002 http://www.evolucy.com/esp/columns/20021024_usabilidad_buscadores.html.18 Dec. 2005.
 Serrano Cobos, Jorge; Quintero Orta, Ana. Elección de un motor de búsqueda: Pasos a seguir. [ online ]. Hipertext.net, nro. 1, 2003. http://www.hipertext.net/web/pag189.htm.18 Dec. 2005.
 Larson, James. Interactive software. New Jersey : Yourdon Press 1992.
 Krug, Steve. No me hagas pensar. Madrid : Pearson Educación, 2001.