ENGLISH DISAMBIGUATED LEXICAL FUNCTIONS DATASET
This dataset is a collection of 10,077 lexical function instances in English, whose keyword has been disambiguated with its Babelnetid.
From the original I. Melcuk's dataset, only the most productive lexical functions have been included here. All the selected lexical functions have more than 30 instances.
All the information in BLACK refers to the Keyword (= the base of the collocation), while the information in RED refers to the Value (= the collocate of the collocation).
The dataset includes:
1. The name of the lexical function
2. The Keyword
3. The Keyword's part of speech
The original comments by I. Mel'cuk about the Keyword, which have been classified into diferent categories: SEM (semantic information), SX (syntactic information), USE
(if the comment refers to its literal or metaphoric use) and EX (if it is an example).
SEM:[pronunc.] AntiMagn(accent) = imperceptible
SX:[with N-II] QSyn(agree) = _go along_
USE:[metaphoric] AntiBon(atmosphere) = evil
The original comments by I. Mel'cuk about the Value, also classified into: REG (if it refers to the formal/informal, old-fashioned, childish, literary or slang register), VAR (information about British or US American use), and POSIT (when the comment informs about the anteposed or postposed position of the value with respect to the keyword whenever this might not be the non-marked position). Other non-classified comments are restrictions imposed on the lexical funcion.
REG:old-fashioned AntiMagn(crazy) = dotty
VAR:British Qsyn(commotion) = kerfuffle
POSIT:anteposed Magn(appreciate) = very much, meaning that this lexical function is instantiated as "very much appreciated", with the Value (very much) anteposed to the Keyword (appreciate).
POSIT=postposed (Keyword=battle Value=royal) would indicate that the LF is "battle royal", with the Value being postposed to the keyword
Other restrictions: Son(bird) = chirps In this case, the Value_comments column includes the restriction "B.is small", indicating that this value is only valid if the bird producing the sound is small.
6. The value of the Lexical Function
7. Subcategorisation pattern
This information has been split in 2 columns, one to the left of the column "Value" (if the arguments are anteposed to the value) and one to the right of the column "Value" (if the arguments are postposed to the value or intraposed in the value). In the vast majority of the cases, the arguments are postposed. If the arguments are inserted in the value, they are subsituted by "[...]" in the column Value and the whole expression (value + arguments) is made explicit in the Subcategorisation column.
anteposed subcategorisation: Magn(try) = [N-I(gen)] best
postposed subcategorisation: CausOper1(available) = make [N-I ~]
intraposed subcategorisation: Anti(answer) = _give [N-II] the runaround_
where N-I is the first argument, N-II is the second argument and ~ is the keyword.
8. Fusion (//)
If the value of a lexical function expresses both the sense of the keyword and the lexical function it is marked with '//'. Although the Lexical function is syntagmatic in its nature,
in these cases it behaves as a paradigmatic one.
Magn(agree) = _there is no daylight_ between NX and NY //
in comparison to the normal operation of Magn
Magn(agree) = completely
9. Quasy-synonymy (< <<)
The operator < indicates the degree of synonymy.
Magn(amazing) = jaw-dropping <<
Magn(amazing) = stunning <
In this example, the operators '<<' and '<' mean that "jaw-dropping" is even more intense than "stunning".
10. Columns 'Babelnetid' and 'Babelnetid2' have the Babelnet identifiers that correspond to the sense of the keyword in each particular instance of the lexical function.
The first one ('Babelnetid') is the main or prefered sense, and a proposal for a second or alternative id ('Babelnetid2') is added whenever the disambiguation in between the sense choices in Babelnet is not possible without context, when two synsets are practically identical, or when two senses are possible (for example, a literal sense and a metaphorical sense).
A0(biology) = biological keyw_babelnetId=bn:00010549n keyw_babelnetId_2=bn:00010543n
CausFact0(bomb) = explode keyw_babelnetId=bn:00011917n keyw_babelnetId_2=bn:01237332n
This keyword has a semantic comment indicating that it refers to "airplanes". The first babelnetId is a more general sense and the second Id is specifically an aerial bomb, so both are possible.
Magn(thirst) = unquenchable keyw_babelnetId=bn:00076968n keyw_babelnetId_2=bn:00045230n.
The first babelnetId refers to a physical need (A physiological need to drink) while the second Id a more is a metaphorical use of the keyword (Strong desire for something (not food or drink)).
The lexical function Magn could be applied to both senses with the same value.
11. Comments on the Babelnetids.
Extra-information is recorded
11a. if some multiwords have their own proper Babelnetid (in red if they are collocations, and in black if the keyword is a multiword or contains one)
Magn(thin) = _as a rake_
Babelnet has a specific id for the whole expression and it is noted in this column for comments (Bid for "thin as a rake" bn:13662369a)
11b. if there is a mismatch between the keyword's part of speech and the synset's part of speech (Most cases are past participles).
Magn(staked (pos=A)) = heavily
Although the Keyword_pos is A (for adjective), Babelnet only includes an id for the corresponding verb (to stake). Staked is in fact the past participle of "to stake", so this verbal id is noted in the keyw_babelnetid column (bn:00082409v) and a comment "past participle" is written in the Comments Bid column.
11c. in a few cases, a comment is included if the keyword (for example: "bift") or the specific sense in which the keyword is used (for example: "conscience" in the sense of perception) doesn't have a Babelnetid.
Abbreviations used in the dataset:
pl = plural
gen = genitive
Vger = gerund
Vinf = infinitive
Aposs = possessive adjective