Taxonomies for categorisation and organisation in Web sites

Miquel Centelles

Citación recomendada: Miquel Centelles. Taxonomies for categorisation and organisation in Web sites [en linea]. "Hipertext.net", num. 3, 2005. <http://www.hipertext.net> [Consulted: 31 ene. 2010]. undefined.

  1. The concept of Taxonomy
  2. Construction of taxonomy
    2.1. Processes for the construction of taxonomies
    2.2. Automation of the processes of construction of taxonomies
  3. Resource categorisation
  4. Application of taxonomy in the development of information search systems
  5. Bibliography
  6. Notas

 

1. The concept of Taxonomy

At the moment when this article is published, a fact will have happened which should mark a before and after in the evolution of taxonomies as content organisation systems: the appearance of the final draft of the revision of the ANSI/NISO Z39.19-1993 standard, Guidelines for the construction, format, and management of monolingual thesauri [1]. This revision has been carried out between 2002 and 2004 by the Thesaurus Advisory Group (hereinafter, TAG), created in the National Information Standards Organization, following the introduction of a more user-friendly language in the standard, the update of its scope to the current environment of digital information and the extension of its scope to the wide range of production and content organisations.

We do not have the draft of the revised standard, but we do have a summary of its contents and the notes from the TAG meetings. From these documents, we can see that one of the global modifications that have been proposed is the change of the standard title - Guidelines for the construction, format, and management of monolingual thesauri - for Construction, format and management of monolingual controlled vocabularies. The controlled vocabularies include the four main types: the lists, the synonym rings, the taxonomies and the thesauri. The revision of the standard ANSI/NISO Z39.19 proposes the "normalisation" definition of the four types, and establishes the essential elements for the construction and management of all these. Specifically, in the "TAG Conference Call, June 30, 2003" (2003), the provisional definitions below were included:

  • List: "A set of words or phrases displayed in an organized series."

  • Synonym rings: "A set of words or phrases that are considered to be equivalent for the purposes of retrieval. Synonym rings are not used during input."

  • Taxonomy: "An organized set of words or phrases used for organising information and primarily intended for browsing."

  • Thesaurus: "A controlled vocabulary that indicates preferred terms, variant terms, and term relationship. Usually considered to be the most complex of controlled vocabularies." From the modifications proposed by the TAG, the final definition is the following: "A set of words or phrases with equivalent terms explicitly identified and with ambiguous words or phrases (e.g. homographs) made unique. This set of terms also may include broader-narrower or other relationships."

In accordance with this definition, taxonomy does not require its components to be connected by a specific type of relationships; it simply requires its components to be organised. The defining characteristics are its purpose -prioritising browsing- and, therefore, the application environment -the digital environment-.

Nevertheless, in some documents relative to the process of revision of the ANSI/NISO Z39.19 standard, the difference between the four types of controlled vocabularies is determined for the lesser or greater structural complexity presented. On one hand, the lists and synonym rings only include the equivalence relationship; on the other hand, the thesauri include equivalence, hierarchy and associational relationships. In a central position, the taxonomies include equivalence and hierarchical relationships.

Waiting for the TAG works to provide a normative definition of the concept of taxonomy, we should highlight that we currently do not have a universally accepted concept of said term.

Etymologically speaking, taxonomy comes from the Greek terms "taxis" , ordering, and "nomos", rule. Aristotle was one of the first to use this term, in the year 300 before Christ, to name hierarchical schemes oriented to the classification of scientific objects. The botanist Carl Linnaeus (1707-1778) named with the term taxonomy the classification of the living beings in hierarchical groups, ordered from the most generic to the most specific (kingdom, type, order, gender, and species). From this classical concept, taxonomy developed as a subfield of biology, dedicated to the classification of organisms in accordance with their differences and similarities. In accordance with Grove (2003, p. 2774), the principles providing a strict guide for the construction of taxonomies were the logical basis, the empirical observation, the hierarchical structure based on feature inheritance, the evolutive history and the pragmatic use. The terminological sources of the general language still include the meaning specifically oriented to the experimental sciences environment, as proven by the article including the latest version in paper of the Diccionario de la lengua española (2001) -Dictionary of the Spanish Language-:

"1. f. Science dealing with the principles, methods, and purposes of classification. It is specifically applied, within Biology, to the hierarchical ordering and systems, with the names of the groups of animals and plants.

2. f. classification (? action and effect of classifying)."

In its basic concept, linked to the experimental sciences, taxonomy applies a mono-hierarchical criterion in the establishment of the classification systems; that is: each one of the groups or types making it up can only have one place, and only one, in the hierarchical structure.

At the beginning of the 90s, in the 20th Century, the concept of taxonomy is included in other fields of knowledge, such as Psychology, Social sciences and Information Technology, to name almost all the access systems to the information that attempt to establish coincidences between the terminology of the user, and that of the system. The first specialists developing web content organisation systems were part of the knowledge management consultancy area, coming from fields close to information technology and engineering (content management and information architecture); not being aware of the tradition of the documental languages of the Information Sciences field, they used the term taxonomy for the systems they developed. This term is currently used to name the content organisation systems in the Internet context, although the theory and practice of the documental languages has been intensively applied in this context.

Before proposing a definition of the term of taxonomy in accordance with the current development scopes, we have carried out a work of identification and confrontation of the semantic features with which they are defined. For this purpose, we have carried out an extensive search for definitions in all the study, development scopes and/or application of the term of taxonomy. Initially, we have not placed any limitation whatsoever on the origin of the definitions; we have only discarded those made from a classical definition of the term. The result has been the localisation of 36 definitions published between 2000 and 2005 in various types of sources [2].

The analysis of the definitions shows that these give importance to four variables: the place occupied by taxonomy in the scope of the knowledge organisation systems (hereinafter KOS); the information context where taxonomy is applied; the purposes sought by taxonomy; and the structural model with which the elements making up taxonomy interrelate.

From the documentation drafted by the NISO TAG, and in the view of the mainly accepted properties in the definitions formulated in the study, development and/or application scopes, the following definition is proposed:

Taxonomy is the type of controlled vocabulary where all the terms are connected by means of any structural model (hierarchical, tree, faceted,...) and specially oriented to browsing, organisation systems and search of contents of the web sites.

It is necessary to specify three points in the contents of this definition:

  • The terms (or categories) represent some aspect of the contents, context or structure of the information resources, and not only the contents.

  • The structural models are not usually presented in a pure state; it is possible (and in the real world, usual) that a same taxonomy presents structures resulting from the mixture of models.

  • The documents reflecting the discussions in TAG show a lack of agreement regarding the applications and preferential uses of the taxonomies. Some of the notes of the meetings of said group (for example, "TAG Conference Call, May 19, 2003" (2003)), reflect how the concept of taxonomy was initially oriented to browsing and the browsing in prejudice to the recovery ("searching"); in the final version of the definition for taxonomy its application also includes this last mechanism.

  • The folksonomies or distributed classifications are excluded from the concept of taxonomy (Mathes, 2004).

Once the definition of taxonomy is established, we shall carry out a brief tour on the taxonomy construction processes and the application in the categorisation of resources, and the development of information search systems of the web sites. Both processes should be preceded by strategic planning determining what characteristics the taxonomy should present from the analysis of the context-that will identify the priorities of the corporation in the organisation and presentation of the information on the web site-, of the audience -that will identify the needs and search behaviour and the use of the information by the various user segments- and of the content -which will identify content patterns-.

 

2. Construction of taxonomy

2.1. Processes for the construction of taxonomies

The construction of corporate taxonomies involves the carrying out of four processes:

1. Limitation of reality (entity, knowledge area, industrial sector, etc.) that will be represented by the taxonomy.

2. Extraction of the group of terms or categories that represent said reality.

In order to carry out this process the establishment is necessary, in the first place, of what the priority sources are and the ideal extraction mechanisms for each one of them. There are three types: the personal sources integrated by web users and specialists at the web domain; document sources, integrated by documents representative of the types of contents identified at the strategic planning stage; and the taxonomies or knowledge representation instruments already existing (from nomenclatures of the units and existing resources at an entity to the administration classification charts).

It is necessary to identify the extraction mechanisms for each one of the sources; thus, in the case of the personal sources, the interviews with web site users and the analysis of the search transaction registers are especially useful.

The result of this process is a register of representative terms or categories.

3. Terminological control of the terms or categories.

This process involves the carrying out of two tasks. In the first place, the terms making up a same concept are identified; in the event that there are two or more, it is necessary to specify which one is considered most preferential and which are the less. Secondly, giving a correct and consistent shape to all the taxonomy elements is necessary, regardless of whether these are preferential or not.

The result of this process is the establishment of the equivalence relationship between all the taxonomy terms.

4. Establishment of the scheme and organisation structure of the terms or categories.

The organisation scheme includes the criteria used to divide and group the categories. At the beginning, the criteria are limitless and their suitability depends on the object that should be represented by the taxonomy. Examples of the most widely used criteria are the following: the subjects, the matters and/or disciplines; the people; the addressees; the process, tasks and/or functions; the types of documents; etc.

The structural model defines the type of relationship established between the category groups derived from the organisation scheme. The general tendency has been the application of the hierarchical model (based on the "type of" relationship) and the tree model (based on the "part of" relationship) and, in fact, the international and national rules for thesauri designing that have been applied to the corporate taxonomies exalt these two structural models. A third model, the faceted, is a good alternative for the hypertext environment, where the breakdown of various perspectives from which a same concept or item can be seen is key. In fact, this model is being used more and more frequently for certain types of web sites. Nevertheless, the documentation we have on the revision of standard ANSI/NISO Z39.19 does not seem to show the inclusion of this alternative.

Traditionally, two techniques for the development of the structure of taxonomy have been distinguished: the up to down technique and the down to up technique.

  • The application of the up to down technique involves the initial identification of a limited number of higher categories, and the grouping of the rest of categories in successive levels of subordination up to reaching the most specific levels of categories. This technique can be oriented both to the application or a hierarchical structural model (and/or tree model) as well as faceted. The possibility of exercising a previous control on the main categories makes this technique applicable to the construction of taxonomies that have, as exclusive or priority purpose, the development of browsing systems.

  • The application of the down to up technique is based on the initial identification of the most specific categories, which are grouped in successive levels of subordination up to reaching higher levels of categories. Generally, this technique is mainly oriented to the application of a hierarchical structural model (and/or tree model) although, as in the previous case, can facilitate the analysis for decision taking on the structural model most ideal to be applied. In any event, it is this technique that is applied to the development of intervention methods of representatives of real and potential users in the establishment of the structure of taxonomies (for example, the card sorting method).

2.2. Automation of the processes of construction of taxonomies

A critical factor in the construction of taxonomy is the degree of automation applied to the previously indicated processes. The degree of automation can be seen as a continuum : on the one hand the manual systems (or intellectual) are placed, and on the other, the automatic ones. The semi-automatic systems are placed in a central point.

We should highlight that, currently, fully manual systems are rarely used in the creation of taxonomies.

In the minimum level of automation, there are two types of solutions: the taxonomy templates, specialised in a certain industrial sector, that should be adapted to the specific conditions of a certain organisation [3] , and the taxonomy edition tools. This second type of solution offers the administrators of the taxonomy a tank for term management, a friendly environment for the establishing of relationships between terms, and various modalities of presentation and viewing of results. Many of these applications already existed as thesauri administrators, and have not included excessive innovations for their new function in the context of taxonomies. Examples of these models can be the Multites 2005 (http://www.multites.com) or Term Tree (http://www.termtree.com.au) products.

At the maximum level of automation, we find programmes that analyse the corpus of digital resources of a web site and extract categories in fact, clusters of resources by means of the application of statistical analysis and/or linguistic processing. Generally, the process of construction of taxonomy and that of categorisation of resources is the same; even in some cases, the result is directly editable as a browsing system. An extreme option of this automation modality is that giving rise to the so-called dynamic taxonomies: groups of resulting resources of a search in a search engine that usually responds to a statistical analysis of frequencies than to linguistic processing. In the automatic systems, the possibilities of establishing equivalence and hierarchical relationships between the categories is very limited; the result is usually a flat taxonomy, closer to a clustering of resources than a classification in itself. An example of these solutions is the Automatic Taxonomy Generation module from IDOL Server (http://www.autonomy.com/content/Products/IDOL).

The completely automatic solutions have not offered, up to the current moment, satisfactory results on taxonomy construction. Consequently, semi-automatic alternatives are being developed that, as Ultraseek Topic Advisor (http://www.verity.com/products/ultraseek/index.html), assist in the process of creation and maintenance of taxonomy at the same time that it provides an interface for the revision and approval of categories.Said systems include an algorithm of statistical basis that analyses a resources corpus and suggests terms and relationships between terms to the administrator of the system for this to accept them or reject them. All this in a friendly working environment.

 

3. Resource categorisation

Categorisation can be defined as the content representation process, context and/or structure or information resources by means of the assignation of terms from a documental language -categorisation by assignation- or by means of the extraction of terms of the own resources -categorisation by extraction-.

The most efficient categorisation model currently existing is that based on metadata. According to Méndez and Senso (2004), we can define metadata as:

" all that descriptive information on the context, quality, condition or characteristics of a resource, data or object with the finality of facilitating its recovery, authentification, evaluation, preservation and/or inter-operateability ".

There are various models of metadata. The elements allowing the establishment of differences between these models are, basically, two:

  • Which aspects of the resources represent (the elements).

  • How those elements are represented (the syntax).

For example, Dublin Core, one of the most widely used models for the description of all the types of information resources, includes, in its simplest format (simple level), fifteen elements [4]. The syntax of each element usually includes three components:

  • Identification of an element. For example, in Dublin Core, the Key Words element is identified by means of the metalabel DC.Subject.

  • One or more qualifiers that specify some specific attribute of the element. For example, a qualifier of the metalabel DC.Subject can be SCHEME, which identifies the name of the controlled vocabulary applied for the categorisation of the element.

  • The value or values of the element assigned to the resource that is described. For example, the terms extracted from the controlled language used for the categorisation of the element.

In a web page coded by means of HTML metalanguage, the syntax of the Key element would present the following aspect:

<META NAME="DC.Subject" SCHEME="TAGS" CONTENT="Cultural heritage; Cultural events; Exhibitions; Administration documentation management; Internet; Files; Information Management ">

In a categorisation model based on metadata, the taxonomy constitutes a type of controlled vocabulary that is very useful for value extraction the terms that will be assigned to the elements describing the information resources. As previously indicated, the application of taxonomies should not be limited to the elements expressing the contents of the resources, and more exactly, to the matter, subject or discipline. The elements relative to the context and resource structure can also be expressed by means of categories extracted from taxonomy.

The use of taxonomies in the information resource taxonomies offer the general strong points of the controlled languages, as: the treatment of the semantic and syntactic aspects of the language; the representation of implicit concepts; the creation of a global vision of the domains object of the representation; the exhaustiveness in the indexing; the solution to the problems involved by the multilingual contexts. From the web site management point of view, the use of taxonomies in the categorisation of resources offers two additional important benefits:

  • On one hand, makes the construction and maintenance efforts of the taxonomy and resource categorisation profitable, as the same tool can be re-used in the development of various search, browsing, and personalisation applications.

  • On the other hand, it allows maintaining the conceptual and designing consistency in the representation of the elements of a same domain, which creates in the users an image of consistency in the whole web site, and in the entity creating it and maintaining it.

The categorisation model applied by a certain organisation should give a reply to four essential questions: what information resources will be categorised? With what purpose? Who will categorise them? How will this be done?

The last two questions are closely related to the degree of automation applied in the assignment of values to the metadata. From this point of view, the categorisation systems can be conceived as a continuum , on one hand the manual systems (or intellectual) are placed, and on the other, the automatic ones.

In the first case, an expert analyses the content, context and/or structure of a resource and assigns the appropriate categories to this from a controlled language (categorisation by assignation) or from the text of the resource itself (categorisation by extraction). The intellectual categorisation offers, as strong points, a high level of exactness in the description of resources, and the capacity of including the contextual meaning in the description. Additionally, it facilitates the categorisation of non-textual documents (images, applications, etc.); the weak points are the limited scalability, the high cost in human resources and the lack of consistency and exhaustiveness.

The automatic categorisation is based on algorithms that statistically analyse the document word sequence, identify word behaviour patterns from the variables such as collocation, order, proximity, frequency, etc., and group the documents that show similarities in said behaviour. The results are clusters of resources that show similar behaviour patterns, labelled by means of the word sequence extracted from the resources themselves that best represent the similarity.

A grouping system should be able to carry out the following tasks: statistically analyse the resource word sequences; calculate the value numerically representing the content of a document; and compare the values of the two (sub) documents and determine their degree of similarity.

Currently, the algorithms designed for the analysis of frequencies use one of the following analysis methods, or a combination of various: probability methods (Bayesian method Rocchio method,...); vectorial methods (K-Nearest Neighbor method, Support Vector Machines...); and trees and decision lists.

Examples of automatic categorisation can be the Automatic Categorization module from IDOL Server (http://www.autonomy.com/content/Products/IDOL), based on the Bayesian probability method, and Lotus Discovery Server (http://www.lotus.com), based on the vectorial method [5].

The strong points of the automatic categorisation are the efficiency and speed of processing, the high level of scalability and high level of consistency; its biggest weak point is the low level of exactness that it usually provides, making the very frequent use of these systems bases for decision taking by human categorisation experts.

The semi-automatic or hybrid categorisation systems combine human intelligence, which can identify the various levels of meaning existing in the documents, and the efficiency of the automatisms. Four families of semiautomatic systems of categorisation can be identified.

  • Systems statistically analysing the resources and presenting to the human experts recommended terms for categorisation for these to revise them and approve them. An example of this type of system is the Ultraseek Advanced Classifier (http://www.verity.com/products/ultraseek/index.html).

  • Categorisation systems based on search rules. Allows linking to each one of the taxonomy categories a search equation designed by specialists by means of advanced options (search rules). By means of an algorithm, the system analyses the documents and determines which is/are the equation/s with more coincidence. Then, the document is assigned to the category or categories which have said search rules linked. Examples of this type of system can be K2 Enterprise [6] (http://www.verity.com/products/k2_enterprise/index.html) and Ultraseek Content Classification Engine (http://www.verity.com/products/ultraseek/cce.html), both from Verity.

  • Categorisation systems based on groups of training or example documents. Allows linking each one of the categories of taxonomy to a limited number of documents selected by specialists that are considered the most relevant. By means of an algorithm, the system analyses the new documents that could be categorised and determines which of the example documents is most similar to. Then, the document is assigned to the category or categories of the most relevant ones.An example of this type of system can be Mohomine Classifier (http://www.kofax.com/products/mohomine/classifier.asp), from Mohomine.

  • Categorisation systems based on the linguistic analysis. An example of this type of system can be Smart Discovery [7] from InXight.

The strong points of the semi-automatic categorisation systems are the good balance between efficiency and exactness, the fact that the process is guided by human reasoning; and the capacity of accumulating and generating self-learning. Amongst the weak points, we should highlight the requirement of knowledge, skills and efforts of management and maintenance.

In a questionnaire carried out by Delphi Research [8] , the managers of 300 large companies all over the world (60% North American) gave the following answers to the question on the type of taxonomy implementation: 36%, hybrid; 26%, automatic; 23%, manual; the rest, or other options or no comment.

 

4. Application of taxonomy in the development of information search systems

As previously indicated, the differentiation of the taxonomy creation processes, of resource categorisation by means of taxonomy categories, and of application of taxonomy offers multiple benefits. The objective of the construction of this is the representation of a reality (an area of knowledge, the scope of an organisation activity, etc.) in the most appropriate way for the purpose and interests of the entity that could exploit said representation. Additionally, it should be the expression of the image and corporate interests of the entity itself.

The applications of taxonomy in the web site context can be diverse; if we focus on the information architecture scope, a same taxonomy can become a basic or auxiliary tool for the various browsing, organisation and content search, labelling and personalisation systems. The re-use of a same taxonomy for various information architecture tools offers various types of benefit:

  • In the first place, it allows the profitability of the initial effort of the creation of the taxonomy and of the subsequent maintenance efforts.

  • Secondly, it facilitates the management of the functionalities the taxonomy applies: a modification of categories or in the relationships between categories of the taxonomy can uniformly and consistently be transferred to all the functionalities.

  • Thirdly, it improves the use of the web site as a group as it considerably reduces the requirement of cognitive, memory and learning load.

  • Fourthly, it facilitates the interaction with the website and the creation of a consistent image of the organisation creating and applying the taxonomy.

There are various taxonomy presentation options.

  • Integral presentation of the taxonomy, with all its categories and relationships interconnecting them (equivalence relationship, hierarchical or faceted structural model, etc.).

  • Partial presentation of the original taxonomy, to be able to highlight contents from temporary or use criteria.

  • Reduction of the taxonomy to the equivalence relationship, in such a way that the taxonomy adopts the synonym ring shape.

  • Reduction of the taxonomy to the hierarchical relationship, for its use as category exploration system. In this case, this usually involves the decrease of the amplitude and depth levels to adjust the taxonomy to the recommendations derived from the cognitive, visual and memory capacity limitations of the standard user.

  • Alternative presentations, as can be the alphabetic ordering of the categories, or the tree, graphic and metaphoric presentations.

The selection of an option depends on various factors; the functionality for which it is applied, the users to which it is addressed, etc. Generally, the combination between various presentations of a same functionality offers good results.

One of the functionalities of the web sites where taxonomy plays an important role is in the search for information.The systems that allow searching contents in the web environment can be classified into three main groups: browsing, searching and filtering.

The browsing search engines offer the users an organised structure of categories where the information resources are included, and a browsing mechanism through said categories to find the relevant resources for the information requirement. These browsing systems are especially suitable for situations when the users are unable to specify the need for information to a high level (exploration search). The browsing system can be:

  • The taxonomy original hierarchical or faceted structure, complete or reduced.

  • One of the alternative presentations previously indicated: alphabetic, tree, graphic or metaphoric.

  • The combination of two or more presentations in a way that the user can select the most suitable for the information requirement conditions.

The information search systems offer the users the possibility of creating a search equation from a word or word combination. These exploration systems are especially suitable for search situations where the users can specify the information requirement with enough detail (search for a known item). The taxonomy is included to the search system to help the user in the identification of relevant terms for the creation of the search equation, and also to improve the result and presentation and search reformulation processes. The exploration and search systems imply interaction in real time between the user and the search mechanism.

The third modality, the filtering systems, offers the user the possibility to create and declare an information need (user profile) and receive an automatic reply when a certain period of time elapses, or when the system identifies relevant resources for said need. In this case, taxonomy allows the user the selection of relevant terms for the specification of the profile.

 

5. Bibliography

Bennett, Paul. (2002). Introduction to text categorization. Consulted: 1-03-2005, http://www.softlab.ece.ntua.gr/facilities/public/AD/Text%20Categorization /Introduction%20to%20Text%20Categorization.ppt#256 , 1, Introduction to Text Categorization


Diccionario de la lengua española (2001). Consulted: 22-03-2005, http://buscon.rae.es/diccionario/drae.htm


Fast, Karl; Leise, Fred; Steckel, Mike (2003). "Controlled vocabularies: a glosso-thesaurus". In: Boxes & arrows, October 27, 2003. http://www.boxesandarrows.com/archives/controlled_vocabularies_a _ glossothesaurus.php


Gilchrist, Alan; Kibby, Peter; Mahon, Barry. (2000). Taxonomies for business: access and connectivity in a wired world. London: TFPL. ISBN: 1-870-889-83-5


Grove, Andrew. "Taxonomy". (2003). In: Encyclopedia of library and information science. 2nd ed., rev and enlarg. New York [etc.]: Marcel Dekker, p. 2770-2777


IDOL Server. (2005). Consulted: 13-03-2005, http://www.autonomy.com/content/Products/IDOL


Information intelligence: content classification and the enterprise taxonomy practice (2004). Consulted: 25-01-2005, http://www.delphigroup.com/research/whitepapers/20040601-taxonomy-WP.pdf


K2 Enterprise. (2005). Consulted: 13-03-2005, http://www.verity.com/products/k2_enterprise/index.html


Lotus Discovery Server. (2004). Consulted: 1-sep-2004, http://www.lotus.com



Mathes, Adam. (2004). Folksonomies: cooperative classification and communication through shared metadata. Consulted: 26-01-2005, http://www.adammathes.com/academic/computer-mediated-communication/ folksonomies.html


Méndez, Eva; Senso, José A. (2004). Introducción a los metadatos. Consulted: 14-01-2004, http://www.sedic.es/autoformacion/metadatos/introduccion.htm


Metainformación: Dublin Core. (2003). Consulted: 13-03-2005, http://www.rediris.es/metadata


Mohomine Classifier. (2005). Consulted: 13-03-2005, http://www.kofax.com/products/mohomine/classifier.asp


Multites 2005. (2005). Consulted: 13-03-2005, http://www.multites.com


National Information Standards Organization. (2005). ANSI/NISO Z39.19-2003: guidelines for the construction, format, and management of monolingual thesauri. Consulted: 9-03-2005, http://www.niso.org/standards/standard_gather.cfm?pdflink = http://www.niso.org/standards/resources/Z39-19.pdf&std_id=518 . [Consulted: 9-03-2005]


Ruiz, Miguel E.; Srinivasan, Padmini. "Combining machine learning and hierarchical indexing structures for text categorization". In: ASIS/SIGCR Workshop on Classification Research (10è: Washington: 1999). Advances in classification research: proceedings of the ASIS SIG/CR Classification Research Workshop, v. 10 (1999), p. 107-124


Smart Discovery. (2005). Consulted: 13-03-2005, http://www.inxight.com/products/smartdiscovery


"TAG Conference Call, may 19, 2003" (2003). In: National Information Standards Organization. (2004). Developing the next generation of standards for controlled vocabularies and thesauri. Consulted: 23-04-2004. http://www.niso.org/committees/MTinfo.html


"TAG Conference Call, June 30, 2003" (2003). In: National Information Standards Organization. (2004). Developing the next generation of standards for controlled vocabularies and thesauri. Consulted: 23-04-2004. http://www.niso.org/committees/MTinfo.html


"TAG Notes November 1, 2004" (2004). In: National Information Standards Organization. (2004). Developing the next generation of standards for controlled vocabularies and thesauri. Consulted: 23-042004. http://www.niso.org/committees/MTinfo.html


Taxonomy strategies. Consulted: 25-01-2005, http://www.taxonomystrategies.com/index.htm


Taxonomy warehouse. Consulted: 22-02-2005, http://www.taxonomywarehouse.com


Term Tree. (2005). Consulted: 13-mar-2005, http://www.termtree.com.au



Ultraseek Advanced Classifier. (2005). Consulted: 22-02-2005, http://www.verity.com/products/ultraseek/index.html


Ultraseek Content Classification Engine (CCE). (2005). Consulted: 13-03-2005, http://www.verity.com/products/ultraseek/cce.html


Ultraseek Topic Advisor. (2005). Consulted: 22-02-2005, http://www.verity.com/products/ultraseek/index.html


Webopedia. Consulted: 28-01-2005, http://www.pcwebopedia.com/TERM/t/taxonomy.html

 

6. Notas

[1] In accordance with "TAG Notes November 1, 2004" (2004), the final draft should be ready for January 2005. [volver]

[2] A copy of the references can be obtained by sending an e-mail message to this article's author ([email protected] ). The reason for this request should be included. [volver]

[3] An example of this option is Semio Taxonomy from Entrieva. More information from: http://www.entrieva.com/entrieva/products/scts.asp?Hdr=scts [Consultado: 13-mar-2005]- [volver]

[4] Information extracted from the Metainformation web site: Dublin Core (2003), maintained by RedIRIS. [volver]

[5] In accordance with the report Information intelligence: content classification and the enterprise taxonomy practice (2004 , p. 38), Autonomy has a market share of 14% and Lotus Discovery Server of 7%. [volver]

[6] In accordance with the report Information intelligence: content classification and the enterprise taxonomy practice (2004 , p. 38), K2 has a market share of 15%. [volver]

[7] In accordance with the report Information intelligence: content classification and the enterprise taxonomy practice (2004 , p. 38), Smart Discovery has a market share of 4%. [volver]

[8] Information intelligence: content classification and the enterprise taxonomy practice (2004 , p. 26). [volver]



Creative Commons License
Last updated 05-06-2012
© Universitat Pompeu Fabra, Barcelona