Collocation resources
The currently available collocation dataset is a list of about 10,000 collocations in English collected and tagged in terms of Lexical Functions (LFs) by I. Mel’čuk. In order to facilitate the use of this dataset in downstream NLP applications, we disambiguated the collocation bases (or “keywords” in the terminology of LFs) with respect to BabelNet synsets.
Please consult here the Readme for the precise description of the dataset
A subset of this dataset has been used in the experiments described in L. Espinosa-Anke, L. Wanner, and S. Schockaert. “Collocation Classification with Unsupervised Relation Vectors”, ACL 2019, Short paper track, Florence, Italy
Files:
Files:
Links:
A dataset of comparable size on French is in preparation and will be published soon.