The Stanford typed dependencies representation

Marie-Catherine de Marneffe and Christopher D. Manning (2008). "The Stanford typed dependencies representation." Proceedings of the COLING'08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation, Manchester, England.

Desiderata for a system of grammatical relations:

  1. the system should concentrate just on the grammatical relations required for PRACTICAL information extraction tasks, providing SEMANTICALLY CONTENTFUL information
  2. the system should be SIMPLE enough to be understood and used by people without linguistic expertise who want to extract textual relations (e.g. biologists, lawyers, market analysts); excessive detail is viewed as a defect, detracting from uptake and usability
  3. there should be an automatic procedure for EXTRACTING the relations from PTB-style phrase structure parser output

Dependency parsers are more intuitive than PTB-style parsers for non-experts - the "widespread use of MiniPar and the Link Parser ... clearly shows that ... it is very easy for a non-linguist thinking in relation extraction terms to see how to make use of a dependency representation (whereas a phrase structure representation seems much more foreign and forbidding)".

Detailed design principles:

  1. each datum is uniformly represented as some BINARY RELATION between two sentence words; this representation maps straightforwardly to common representations of potential users like RDF triples or directed graphs
  2. relations should be SEMANTICALLY CONTENTFUL and USEFUL to applications; less commonly used details like tense and number should be ignored; the argument/adjunct distinction, which is 'largely useless in practice', should be ignored; there should be a detailed ontology of NP-internal relations ('an inherent part of corpus texts and critical in real-word applications'), distinguishing between different types of modifiers (e.g. numbers, appositives, attributive adjectives etc.)
  3. where possible, relations should use notions of TRADITIONAL GRAMMAR for easier comprehension by users
  4. UNDERSPECIFIED relations should be available to deal with the complexities of real text (i.e. relations should be organised into a type hierarchy)
  5. where possible, relations should be between CONTENT WORDS, not indirectly mediated via function words; prepositions and conjunctions should be 'collapsed out' of the representation (e.g. converted into relations); typically, content words should be heads, with complementisers being dependents of them (relations between content words are key to extracting the 'gist of the sentence semantics', and it is important for applications to be able to retrieve them easily)
  6. the representation should be SPARTAN rather than overwhelming with linguistic details

The Stanford dependencies (SD) representation:

  • based on LFG-style systems like RASP or PARC DepBank
  • 56 relations, organised into a type hierarchy
  • prepositions and conjuncts are 'collapsed out' (at least in the simplified form of the representation), occasionally sacrificing 'linguistic fidelity'
  • there is a (limited) tool to extract GRs from PTB trees; it doesn't handle long-distance dependencies though
  • has proved effective in: the PASCAL Recognising Textual Entailment (RTE) challenges; bioinformatic text mining (extracting relations between genes and proteins from text); sentiment analysis; biomedical domain parser evaluation (e.g. the BioInfer corpus)

Although we believe that 'extrinsic', task-based evaluation of parsers is more valuable than any kind of 'intrinsic' evaluation, the fact that the Stanford typed dependency representation has proved useful in information extraction tasks means that using it as the basis for intrinsic evaluation is a 'useful surrogate'.

-- MarkMcConville - 21 Jul 2008

Topic revision: r5 - 24 Jul 2008 - 15:10:15 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies