TWiki> TFlex Web>Proposals>DeepGRs>LitReview (revision 4)EditAttach

The Stanford typed dependencies representation

Marie-Catherine de Marneffe and Christopher D. Manning (2008). "The Stanford typed dependencies representation." Proceedings of the COLING'08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation, Manchester, England.

Desiderata for a system of grammatical relations:

  1. the system should concentrate just on the grammatical relations required for PRACTICAL information extraction tasks, providing SEMANTICALLY CONTENTFUL information
  2. the system should be SIMPLE enough to be understood and used by people without linguistic expertise who want to extract textual relations (e.g. biologists, lawyers, market analysts); excessive detail is viewed as a defect, detracting from uptake and usability
  3. there should be an automatic procedure for EXTRACTING the relations from PTB-style phrase structure parser output

Dependency parsers are more intuitive than PTB-style parsers for non-experts - the 'widespread use of MiniPar and the Link Parser ... clearly shows that ... it is very easy for a non-linguist thinking in relation extraction terms to see how to make use of a dependency representation (whereas a phrase structure representation seems much more foreign and forbidding)'.

Detailed design principles:

  1. each datum is uniformly represented as some BINARY RELATION between two sentence words; this representation maps straightforwardly to common representations of potential users like RDF triples or directed graphs
  2. relations should be SEMANTICALLY CONTENTFUL and USEFUL to applications; less commonly used details like tense and number should be ignored; the argument/adjunct distinction, which is 'largely useless in practice', should be ignored [MM: but SD distinguished 'mod' and 'comp']; there should be a detailed ontology of NP-internal relations ('an inherent part of corpus texts and critical in real-word applications'), distinguishing between different types of modifiers (e.g. numbers, appositives, attributive adjectives etc.)
  3. where possible, relations should use notions of TRADITIONAL GRAMMAR for easier comprehension by users
  4. UNDERSPECIFIED relations should be available to deal with the complexities of real text (i.e. relations should be organised into a type hierarchy)
  5. where possible, relations should be between CONTENT WORDS, not indirectly mediated via function words; prepositions and conjunctions should be 'collapsed out' of the representation (e.g. converted into relations); typically, content words should be heads, with complementisers being dependents of them (relations between content words are key to extracting the 'gist of the sentence semantics', and it is important for applications to be able to retrieve them easily)
  6. the representation should be SPARTAN rather than overwhelming with linguistic details

The Stanford dependencies (SD) representation:

  • based on LFG style systems like Carroll's GR and the PARC DepBank format.
  • 56 relations, organised into a type hierarchy
  • prepositions and conjuncts are 'collapsed out' (at least in the simplified form of the representation), ocasionally sacrificing 'linguistic fidelity'

Conjuncts are also collapsed (unlike in GR or PARC) - 'makes electronic, computer and building products' = dobj(makes, products), amod(products,electronic), amod(products,computer), amod(products,building), conj_and(electronic,computer), conj_and(electronic,building). [but: what about collective readings of conjoined NPs?]

Treating prepositions as relations (though "useful for 98% of users 98% of the time") sacrifices 'linguistic fidelity' for usability. e.g. modifiers of prepositions have to be expressed with respect to the verb rather than the preposition (e.g. Bill went RIGHT through the woods). Also, PP conjunction is problematic and must be treated as VP coordination - 'Bill went over the river and right through the woods' = "Bill went over the river and went right through the woods" - 'Not collapsing the relations in such a way would prevent the alteration of the semantics, but would lead to a non-uniform treatment of prepositions. Uniformity is key for readability and user convenience'. [but: these prepositions are meaningful and hence are allowed a different treatment!].

The formalism and the tool

The Stanford parser includes a tool to extract GRs from (PTB) phrase structure trees, with structural relations used to define GRs.

But the tool is limited, e.g. it cannot handle long-distance dependencies.

Stanford dependencies in practice

The PASCAL Recognising Textual Entailment (RTE) challenges - the number of entries using SD is increasing.

Bioinformatics - using SD as an output representation appears to deliver state-of-the-art results in extracting relations (e.g. between genes and proteins) from text.

SD is also used to evaluate parsers for the biomedical domain (e.g. the BioInfer corpus).

SD has also been used in other information extraction tasks, sentiment analysis.

Suitability for parser evaluation

Task-based evaluation is more important than other type of evaluation. Our simple scheme reflects only relations important for NLP tasks. Therefore our scheme approximates a task-based evaluation scheme and hence is best.

Intrinsic evaluation

Extrinsic (task-based) evaluation

Although extrinsic evaluation is more valuable than intrinsic evaluation, since the SD representation is of more practical use than phrase structure representations, it appears that intrinsic evaluation over SD GRs is close to typical user tasks, thus is a useful surrogate.

Criticisms:

  • no distinction between meaningful and non-meaningful prepositions
  • no distinction between collective and distributive NP conjunctions
  • POS and GR information fused together
  • better treatment of NP-internal GRs is conceivable - why is 'appos' useful, as opposed to a basic distinction between restrictive (i.e. cyclic) and non-restrictive modifiers

-- MarkMcConville - 21 Jul 2008

Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 22 Jul 2008 - 09:53:14 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies