In order to use this software and produce an auto-annotated
dataset you need the GATE system, ANT, and PERL.
The software has been tested in CYGWIN but it can be adapted 
for unix and linux, after all it only needs a jar file.

******

YOU MUST ADAPT THE SCRIPTS TO YOUR OWN ENVIRONMENT.
In  particular you will have to change the paths in

run_experiments.xml 

and in

process_all.sh

*******



The software follows the structure:

-- configs
-- data
-- output
-- software


In configs there are config files (config.xml) for both 
English and Spanish in specific directories. These config files 
are used by the SVM classification system from GATE.

In data there is data for one of the tested domains. It is
the aviation domain and it is under the directory eng/airplane for
English and esp/airplane for Spanish.

You can examine the files under the directories USING GATE, the *xml files are
in GATE format. The files contain linguistic annotations and also
GOLD STANDARD annotations (under the annotation set "sms").

The output directory is empty but will be populated by the software. In particular a "mapped" directory will be created where the documents with induced concepts will be produced. The files will have names 1-*xml and are also GATE documents containing an annotatopn set called "sms1" with the automatically induced concepts.

Finally the sofware directory contains a jar file a creole.xml file (for GATE), a grammar, and programs to do the whole learning pipeline.

The program to execute for inducing the concepts from the "raw" data is:
process_all.sh DOMAIN LANG.

Where DOMAIN is airplane 
and LANG is eng or esp.

So a possibility is

./process_all.sh airplane eng

Note that "airplane" is the name of the directory containig domain specific data.


Good luck!
