Expanded accomplishments

[Describe in greater detail the progress achieved during the current reporting period and include the significance of data/results. You are encouraged to include graphs, charts, and photos. No word limit.]

Evaluating complement/modifier distinctions

During the previous phase of the TFlex project, we harvested a lexicon of 2,700 verbs from the FrameNet semantically annotated corpus. The focus of much of our resulting work has been to "improve" this lexicon, with the aim of making it more compatible with the kinds of lexicon typically used by deep parsers, for example the TRIPS parser.

The process of harvesting verb entries from the FrameNet corpus has involved two stages: (a) harvesting subcategorisation frames directly from corpus annotations; and (b) filtering out spurious subcategorisation frames.

The FrameNet corpus consists of 160,000 semantically annotated sentences. Each annotation is done with respect to some target word, usually a verb. Take for example the following sentence:

  • Mathilde fried the catfish in an iron skillet.
Here the obvious target word to annotate is the verb "fried". The annotation itself involves identifying the 'dependents' of the target word, i.e. the words and phrases whose refer to objects and entities which play some improtant in the activity denoted by the verb:
  • [Mathilde] fried [the catfish] [in an iron skillet].
The annotation is tripartite, in that each identified dependent is then labelled with three distinct kinds of information:
  • the grammatical role it plays with respect to the target verb in the structure of the sentence (e.g. subject or object)
  • the grammatical category it belongs to, e.g. noun phrase (NP) or prepositional phrase (PP)
  • the semantic role it plays in the activity itself.
Thus, a complete FrameNet annotation can be represented as a table:

MathildeFRIEDthe catfishin an iron skillet
subject objectdependent
NP NPPP
cook foodcooking implement

The first part of the lexical harvesting process from the FrameNet corpus simply involves taking each annotated sentence and converting it into a "lexical subcategorisation frame" for the target verb. A bit of background -- the main components of a lexical entry in a lexicon for deep linguistic processing are generally assumed to be as follows:

  • the part-of-speech of the word (e.g. verb or adjective)
  • the semantic class of the word (e.g. verbs like "fry", "bake", "sauté" can be assigned to the same semantic class 'cook')
  • a set of subcategorisation, each one representing two things: (a) the kinds of sentence in which the word can occur grammatically; and (b) the legals ways that semantic roles can be mapped on to syntactic positions in the sentence types. For example, verbs of cooking are generally assigned a transitive subcategorisation frame which can be schematised as involving all and only the following TWO syntactic dependents:
  • subject - NP - cook
  • object - NP - food
A parser can use this information to construct the appropriate semantic representation for a sentence like "Mathilde fried the catfish" such that "Mathilde" is identified as the cook and "the catfish" is identified as the food, rather than vice versa. On the other hand, such verbs are not generally allowed to appear in an intransitive subcategorisation frame where only one dependent is specified, since the following sentences are neither grammatical nor meaningful in English (at least as descriptions of cooking events):
  • Mathilde fried.
  • The catfish fried.
It is essential for efficient, accurate language interpretation that a parser have access to a sound and complete list of which subcategorisation frames go with which verbs.

With this in mind, the first phase of the process which harvests verb entries from the FrameNet simply converts each annotated sentence into a subcategorisation frame, before collating all the individual frames into a single entry and removing duplicates. Thus, the annotated sentence above is converted into the following subcategorisation frame:

  • verb = "fry"
  • meaning = "cook"
  • dependents
    • subject - NP - cook
    • object - NP - food
    • dependent - PP - cooking implement
After this first phase of lexical harvesting, we had extracted a lexicon of 2,770 verbs from the FrameNet corpus, with an average of 9 distinct subcategorisation frames per verb. Subsequent work has mainly been geared towards improving the quality of this lexicon, by identifying and eliminating spurious subcategorisation frames, whose existence in the lexicon would cause large numbers of undesirable analyses to be constructed, thereby hampering both efficiency and accuracy. This work was reported in McConville & Dzikovska (2007).

One of the major sources of spurious subcategorisation frames we encountered in this process involved that fact that FrameNet annotates both "complements" and "modifiers" of verbs, and hence both sets of dependents ended up in our harvested lexicon. Linguistic theory generally assumes a fundamental distinction between the characteristic "complements" of a verb, and the free "modifiers" that can occur with all or most verbs. Complements should be listed in lexical subcategorisation frames, modifiers should not be. An example from the FrameNet corpus itself is as follows, for the verb "eclipse":

  • Kokonin eclipsed him in power in recent months.
The annotated dependents are as follows:
  • "Kokonin" - subject - NP - item
  • "him" - object - NP - standard
  • "in power" - dependent - PP - attribute
  • "in recent months" - dependent - PP - time
Of these four annotated dependents, only the first three should be listed in the relevant lexical subcategorisation frame. The time adverbial "in recent months" is NOT characteristic of the verb "eclipse" - it can appear with most any verb in the lexicon (Kokonin died/ate spaghetti/seemed happy in recent months) - and hence should be omitted from the lexical entry. Lexical specification of dependents is thus only relevant for the subject of the verb and its "complements", i.e. generally those phrases which play a central role in the semantics of the verb.

The process of harvesting the verb lexicon directly from FrameNet annotated sentences meant that a large number of modifiers had made it into subcategorisation frames, leading to a lexicon which was massively redundant. We needed to find a simple, systematic way of weeding these out, whilst maintaining the complements proper. In order to distinguish between complements and modifiers in the harvested lexicon (and to eliminate the latter), we chose a simple expedient --- we would identify complements with those dependents whose semantic role was specified as being 'semantically core' in the ontology underlying FrameNet , and modifiers as all the others. Doing so meant that we were able to reduce the number of subcategorisation frames listed across the lexicon by 45%, resulting in a much slimmer lexicon (down to just 3.4 subcategorisation frames per verb).

We decided to evaluate this expedient to see how accurate our harvested lexicon was. We selected a random, weighted sample of around 500 dependents from across the lexicon, and classified these manually into complements and modifiers, using the VerbNet verb lexicon as our gold standard wherever possible. We concluded that the agreement between the standard notion of syntactic complementhood and FrameNet 's notion of semantic coreness was 0.85. Thus by simply equating the two notions and eliminating all and only the non-core dependents from the harvested lexicon:

  • we lose 13% of complements
  • 9% of the dependents remaining in the lexicon are actually modifiers
The results of this evaluation were published as McConville and Dzikovska (2008b).

Using inheritance to improve the lexicon

The verb lexicon we harvested from FrameNet during the first phase of the TFlex project contained 2,770 entries, each of which specified a range of subcategorisation frames, as discussed above. Each subcategorisation frame is abstracted as a set of arguments (i.e. subjects and complements) and each argument has three parts: (a) a syntactic role (e.g. subject or object); (b) a syntactic category (e.g. noun phrase, clause, prepositional phrase); and a semantic role (e.g. agent, mover, goal). For example, the subcategorisation frame underlying the example sentence "The key opened the door" can be represented as follows:

  • subject - noun phrase - instrument (i.e. the key)
  • object - noun phrase - container-portal (i.e. the door)
The verb lexicon we harvested from FrameNet during the first phase of the TFlex project contained subcategorised-for arguments bearing 440 distinct semantic role labels, distributed over 362 verbal semantic types, an average of 1.2 roles per type.

It should be noted here that the number of distinct semantic roles in the harvested lexicon is undesirably large, from the perspective of an efficient deep parser. In other words, the vocabulary of semantic roles underlying the FrameNet corpus is much more fine-grained than we need for our purposes, making many detailed distinctions that are unnecessary (and indeed counterproductive) for a practical parser. For example, with the verb "open" FrameNet has different semantic roles for different kinds of object which can be opened -- you can open a "container" itself (e.g. "John openend the box"), or you can open a "container portal" (e.g. "John opened the lid"); in some cases both kinds of role are listed in the same sentence -- "John opened the lid of the box". From the perspective of deep parsing, we just need to specify a single role for the entity that can be opened.

The proliferation of semantic roles in FrameNet also stands out with respect to comparable lexical resources. For example, the VerbNet verb lexicon has just 33 roles distributed over 395 semantic types, which is the kind of ratio we are looking for in a lexicon for practical deep parsing. The aim of the next phase of the project was to improve the verb lexicon we had harvested from the FrameNet corpus by looking at ways to "generalise" the specified semantic roles in subcategorisation frames.

We first of all investigated whether we could use the inheritance structure inherent in the FrameNet ontology to reduce the vocabulary of semantic role labels in our harvested lexicon to a manageable level, in essence by automatically resolving specific semantic roles like 'cook' and 'food' to more general 'proto'-roles such as 'agent' and 'patient'. Doing so we were able to reduce the size of the vocabulary by 21%, though further investigation showed that this figure increases to 39% if we concentrate only on the more 'connected' sections of the FrameNet ontology.

The results of this part of the project were published in McConville and Dzikovska (2008a).

Using coreness sets to improve the lexicon

We next turned to another aspect of the FrameNet ontology which we had previously ignored --- the organisation of related semantic roles (such as the 'source', 'goal' and 'path' roles of motion verbs) into 'coreness sets'. Our aim here was to investigate whether it was feasible to automatically resolve all roles which appear in some coreness set to a more general abstract role, such as a 'trajectory' role subsuming more specific instances like 'source', 'goal' and 'path'. Doing so would allow us to have a more concise, general lexicon with both a smaller vocabulary of distinct semantic role labels and significantly fewer (i.e. 16% fewer) subcategorisation frames which need to be listed in verb entries.

We conducted a manual evaluation of this process, involving an unbiased sample of 100 cases where a semantic role would be collapsed into a coreness set. We found that around 10% of these were problematic, the most common reason being that FrameNet annotators had not been sufficiently careful in assigning uses of target ver4bs to particular semantic types.

The results of this part of the project were published in McConville and Dzikovska (2008a).

Integration with CMU team

One aim of the joint work we are undertaking with the CMU team is to investigate to what extent the 'deep' output representation of the TRIPS parser can improve the kinds of classification technologies they are working on. As a preliminary step in this direction, we parsed one of the dialogues collected by the CMU team using TRIPS, and sent them a brief report listing the features inherent in the TRIPS output which we believe will be of use to them.

Deep grammatical relations

One of the key components of the kind of language-enhanced educational technology that forms the motivation for the TFlex project is a language interpretation module. For example, this part of the system is responsible for analysing what the student says to a automated tutorial dialogue system so as to allow the domain reasoner to plan an appropriate response. In the field of computer-supported collaborative learning, the language interpreter is responsible for scanning both the documents created by students and the record of their interactions to diagnose the appropriate time for tutor intervention.

The kind of language interpretation system we have in mind needs to be:

  • wide-coverage, i.e. able to handle all the words and grammatical structures that occur in educational domains
  • robust, i.e. able to deal with fragmentary, mispelled or ungrammatical input, or with words and idioms the system has not seen before
  • deep, i.e. able to output semantically-transparent representations which can interface straightforwardly with a symbolic reasoning system.
A language interpretation system typically involves the following modules:
  • a lexicon - a comprehensive, accurate list of all the words and idioms in the language, along with some kind of representation of their meanings
  • a grammar - a comprehensive, accurate list of all the rules which form phrases and sentences from words, including information about how to combine word meanings to form phrasal and sentential meanings
  • a parser - an abstract machine which consults both a lexicon and a grammar in order to work out the possible meanings of some input sentence
  • an oracle - where a given input sentence is ambiguous (i.e. has been assigned more than one possible meaning by the parser/grammar/lexicon), the oracle decides which reading is the best one, based on linguistic, contextual and/or word knowledge.
For example, assuming the input sentence "The man saw the rabbit with a telescope", let us assume that : (a) all the relevant words are contained within our lexicon; and (b) the grammar contains the necessary rules to combine verbs, nouns, prepositions and articles. Since this sentence is formally ambiguous, the parser should construct a set of potential interpretations, including the following four:
  1. the man used a telescope to see the rabbit
  2. the rabbit was holding a telescope when the man saw it
  3. the rabbit was accompanied by a telescope when the man saw it
  4. the man was accompanied by a telescope when he sae the rabbit
A successful oracle should then be able to examine information about the linguistic (and preferably extralinguistic) context in order to decide between these candidate interpretations - in most cases (but not all) the preferred interpretation will be (1).

The lexicon/grammar underlying the TRIPS system has been developed painstakingly by hand, over a number of years. One advantage of this mode of grammar development is that the resultant system will have very high accuracy - if the parser returns a particular analysis of an input sentence, you can be reasonably sure that this analysis is not a spurious one. On the other hand, the major disadvantage of constructing lexicons and grammars by hand involves the problem of coverage - at the start of the TFlex project, the TRIPS lexicon contained just 400 verbs, less than one tenth of the total number in common use in contemporary English.

The main thrust of the TFlex project (at least from the Edinburgh perspective) has been to attempt to semi-automatically harvest verbs from existing lexical resources (first VerbNet then FrameNet ), in order to deliver an order-of-magnitude improvement in the coverage of the TRIPS lexicon, whilst maintaining both the high levels of accuracy and the "deepness" of the semantic analyses. In this we have had considerable success, adding X brand new verbs from VerbNet , and an additional 1,100 from FrameNet . At the same time, we have become interested in an alternative approach to improving the coverage and robustness of a language interpretation system for educational applications, whilst remaining true to our ideology of reusing existing resources wherever possible - harnessing the potential of existing wide-coverage dependency parsers to interpret student language.

Over the last 15 years, research in (English) parser technology has been dominated by the Penn Treebank - a large, syntactically analysed corpus of English, based predominantly around Wall Street Journal newswire text. The existence of this resource has had a number of important effects on grammar and parser development:

  • wide-coverage lexicons and grammars need not be constructed by hand, but can be extracted automatically from the treebank
  • highly accurate statistical disambiguation modules can be trained on the parsed corpus.
First generation statistical parsers were developed mainly with evaluation in mind - the parser itself produced the same kind of syntactic tree structure as was found in the treebank and its success was measured based on how good it was at choosing the best analysis for ambiguous sentences. In recent years however, the focus has turned to applications, i.e. evaluating statistical parsers based on how useful they are as the preliminary stage of typical information extraction tasks such as extracting facts about the interactions among biological entities (e.g. genes and proteins) from relevant research papers.

This second phase of statistical parsing research has given rise to a new generation of parsers ("dependency parsers") which output representations based on sets of grammatical dependencies rather than syntactic tree structures. Such representations are preferred for information extraction tasks since they are closer to the underlying semantic structures allowing symbolic reasoning to take place. Some notable wide-coverage dependency parsers are:

  • Ted Briscoe and John Carroll's RASP parser - a wide-cverage CFG, trained on the SUZANNE treebank
  • Steve Clark and James Curran's C&C parser - a wide-coverage CCG parser trained on an improved version of the Penn Treebank
  • The Stanford Parser - a wide-coverage CFG parser, trained directly on the Penn Treebank
We are interested in investigating the potential for using these "off-the-shelf", wide-coverage dependency grammars in educational application, for example for interpreting student input to tutorial dialogue systems or computer-supported collaborative learning environments. The fact that the grammatical dependency representaion is (at least superficially) similar to one of the levels of representation used in the grammar that underlies the TRIPS parser has provided encouragement for this kind of work.

Our work to date has involved analysing the systems of grammatical dependencies utilised by the aforementioned parsers, and comparing them with the kinds of dependencies we would want a parser to output, so as to successfully interpret student input and connect to a symbolic reasoning system. We started by drafting a list of desiderata for a system of 'deep' grammatical relations, which are as close to the level of semantic structure as it is possible to go, while still remaining straightforwardly computable by a parser. We then decided to use this proposal as the basis of our participation in the shared task at the Cross-Domain and Cross-Framework Parser Evaluation workshop at COLING. The aim of this task was to compare and contrast different systems of grammatical dependency annotation based on various practical tasks, in our case tutorial dialogue. We concluded that no one system provided for all ot our desiderata, but every feature we desired was covered by at least one of the competing systems.

The results of this investigation were published as McConville and Dzikovska (2008c).

Other work we have considered in this theme involves: (a) creating a gold-standard evaluation corpus for dependency parsers based on our deep grammatical dependency system and the dialogues between an automatic tutorial system and human learners collected as part of the BEETLE project; (b) evaluated the various competing dependency parsers based on good they are at parsing this evaluation corpus.

Syntactic alignment of the harvested verb lexicon with TRIPS notation

We completed the alignment of the lexicon we had harvested from FrameNet with the TRIPS lexicon, at least as far as the encoding of syntactic information is concerned. This involved handling the following issues:

  • mapping FrameNet syntactic roles onto TRIPS syntactic roles
  • mapping FrameNet syntactic categories onto TRIPS syntactic categories and features
  • approximating PFORM features for preposition phrase complements
  • automatically distinguishing verb-particle constructions from multi-word expressions
  • distinguishing meaningful from non-meaningful prepositions

-- MarkMcConville - 05 Aug 2008

Topic revision: r1 - 05 Aug 2008 - 15:24:40 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies