Harvesting verb entries from FrameNet

The first step involved harvesting the basic verb entries from FrameNet , converting annotations into subcategorisation frames. Recent versions harvest all lexical categories, including nouns and adjectives. There is a TFlex Java package devoted to this -- framenet.lupos. The main() method is in framenet.lupos.LexUnitAnnotation. Important auxiliary classes are in the TFlex packages framenet.ontology (parsing the FrameNet ontology) and framenet.tflex (parsing and processing TFlex lexicon files).

Initial filtering

Initial filtering involved removing non-canonical subcategorisation frames --- subjectless, including modifiers, passive and passive related, as well as identifying and dealing with control. We did an evaluation of the effectiveness of equating non-Core FrameNet dependents with syntactic modifiers for the LREC paper.

Role inheritance and coreness sets

Role names have been resolved to their most general level. I investigated using coreness sets to reduce the inventory of semantic roles but decided against using this in the harvested lexicon, apart from path arguments. This work was evaluated for the LAW paper, from the perspective of how great a reduction in the vocabulary of semantic roles we attain.

Public release

The entire pipeline down to here is appropriate for public release, since it basically provides for a syntactically cleaned up version of the FrameNet lexicon. I was having problems getting the ANT file properly configured, but I know someone who can help me debug it.

Syntactic mapping

Syntactic categories were converted from FrameNet format to TRIPS format, including features. One step which needs to be done next is to eliminate all path arguments, and add some marker to the verb denoting that it expresses motion over a trajectory, thereby licensing path PP modifiers in the syntax. I'm not exactly sure how TRIPS does this.

Prepositions

The harvested lexicon contains a number of PP arguments in subcategorisation frames, even after non-Core dependents have been eliminated. In order to harmonise these with the format required for the TRIPS lexicon, a number of steps are necessary:

  • I added a 'pform1' attribute to each PP argument, representing the first word in the PP
  • I also added a 'pform2' attribute to each PP argument, representing the first word in the PP which has a recognised prepositional POS tag (e.g. IN in the PTB vocabulary)
  • I did an informal evaluation of the first method of getting PFORM features (i.e. first word in PP). Accuracy appeared to be about 90%. I have yet to evaluate the second method (using POS tags). A third method springs to mind also - use the first word, but ignore a small set of well-known preposition intensifiers (straight/right/directly etc).
  • In order to check inter-annotator agreement on the task of partitioning preposition occurrences into 'meaningful' and 'non-meaningful' classes, I extracted a small, unbiased sample of 100 preposition occurrences from the harvested lexicon. Both MD and MM annotated this. It lives in doc/TflexReports/fn2tflx-pforms-sample.xml for future reference.
  • there is a procedure for identifying frames which have both transitive and prepositional verbs (e.g. trust X vs rely on X). The idea was maybe to use this as a criterion for non-meaningfulness. We never got round to evaluating this, but it has become clear from my other work on prepositions that it is not a valid criterion, cf. go around vs. circumnavigate. Currently work is underway on a set of alternative criteria we can use, along with an evaluation.

Semantic mapping

The next step involves converting the FrameNet frames and roles in each lexical entry into TRIPS LF types and LF roles. Currently:

  • there is a mappings file from frames to LF types, presenting all of the relevant intersections and complements
  • there is a nice Java GUI for viewing and editing this mappings file (this is contained within the TFlex framenet.mappings package)

Syntactic dependencies

Back in December last year we kicked off some work on:

  • developing a system of deep grammatical relations which can be used as the ourput of a standard dependency parser and as the input for a TRIPS like reasoning system
  • annotating a couple of BEETLE dialogues with these relations, computing IAA figures, and then evaluating RASP and C &C based on how close they get to our desired representation. This work culminated in our COLING paper, where we focused on the bits that we really agreed on, and agreed to defer details like copulas, apposition etc. [Files are in src/xml/beetleDialogues]
This is one aspect of the work which it would be really interesting to progress, since it is recognised to be an important topic in a range of areas. Parsing people are devoting more and more effort to dependency-based approaches (since they can be evaluating in a more neutral fashion), and information extraction people are starting to look at using deeper features to train their systems for relation and fact extraction. The original idea we had of trying to take an off-the-shelf dependency parser, systematically tweak its output to give deeper relations, and then feed it into the TRIPS reasoner is still a good one, I think.

-- MarkMcConville - 26 Sep 2008

Topic revision: r4 - 30 Sep 2008 - 11:17:14 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies