TWiki> TFlex Web>Proposals>DeepGRs>ShortDraft (revision 4)EditAttach

Shallow versus deep syntactic dependencies for relation extraction tasks

The aim of this project is to evaluate three systems of syntactic dependency representation for English, differing in terms of the relative "depth" of the underlying linguistic analyses. The evaluation will measure how effective each representation format is as the input into typical relation extraction tasks.

Syntactic dependency representation systems

Recent years have seen growing research interest in syntactic parsers which output, not labelled bracketings corresponding to syntactic phrase structure trees, but rather sets of labelled dependencies between heads and dependents. It has been argued that parser output based on syntactic dependencies is a better option for two main reasons: (a) this format is more theory-neutral, allowing a more level playing field for parser evaluation; and (b) syntactic dependencies are more appropriate for information extraction tasks than labelled bracketings, since they are closer to the underlying predicate-argument structure.

A number of different systems of syntactic dependency representation have been proposed for English, which can be seen as varying according to the "depth" of the linguistic analyses they presuppose. The most basic, "surface-y" systems assume that the syntactic representation of a sentence constitutes a TREE - in other words EVERY word (apart from that which functions as the "root") is a dependent of exactly ONE other word. (e.g. Link parser, Minipar, CoNLL shared tasks 2006-2008). More sophisticated systems allow for "reentrant" structures where a word may simultaneously be a dependant of two distinct heads, allowing for a better analysis of phenomena like control, relativisation and coordination (e.g. Stanford Typed Dependencies, RASP grammatical relations). McConville and Dzikovska (2008) argue in favour of taking this trend to its logical conclusion - a system of "deep" syntactic dependencies, involving full normalisation of well-known syntactic alternations such as passive, dative shift and the distinctions between meaningful and non-meaningful prepositions, and between predicative and attributive adjectives.

In this project, three distinct syntactic dependency representation systems will be evaluated:

  • CoNLL dependencies (unordered trees)
  • Stanford typed dependencies (limited reentrancy/normalisation)
  • Deep syntactic dependencies (full reentrancy/normalisation)

Relation extraction tasks

[Short introduction to relation extraction]

Three distinct relation extraction tasks will be undertaken, from three contrasting domains:

  • biomedical - protein-protein interactions and tissue expressions in the ITI TXM corpora
  • educational - some relation extraction task using the Beetle corpus?
  • cultural heritage - some relation extraction task using Kate Byrne's corpus?

Workplan

Given some corpus C which has already been annotated with respect to named entities and relations:

  1. annotate C for CoNLL dependencies, Stanford typed dependencies and deep syntactic dependencies
  2. train three relation extractors on C's training set, using each of the syntactic dependency annotations as features
  3. determine the accuracy of each of the three relation extractors on C's test set

References

Mark McConville and Myroslava O. Dzikovska (2008). 'Deep' Grammatical Relations for Semantic Interpretation. In: Proceedings of the COLING'08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation.

-- MarkMcConville - 12 Aug 2008

Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 14 Aug 2008 - 14:28:41 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies