TWiki> TFlex Web>Proposals>DeepGRs>LitReview (revision 3)EditAttach

The Stanford typed dependencies representation

Marie-Catherine de Marneffe and Christopher D. Manning (2008). "The Stanford typed dependencies representation." Proceedings of the COLING'08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation, Manchester, England.

Desiderata for a system of grammatical relations:

  1. the system should concentrate on the grammatical relations required for PRACTICAL information extraction tasks
  2. the system should be SIMPLE enough to be understood and used by people without linguistic expertise who want to extract textual relations (e.g. biologists, lawyers, market analysts) [cf. TFlex]
  3. there should be an automatic procedure for EXTRACTING the relations from PTB-style phrase structure parser output

  • provide semantically contentful information

[i.e. simplicity is more important than expressivity]

The 'widespread use of MiniPar and the Link Parser . . . clearly shows that . . . it is very easy for a non-linguist thinking in relation extraction terms to see how to make use of a dependency representation (wheareas a phrase structure representation seems much more foreign and forbidding)'.

Simplicity:

  • all information is represented as binary relations (maps straightforwardly to common representations of potential users, e.g. RDF triples, DAGs)
  • excessive detail is viewed as a defect, detracting from uptake and usability
  • favours relations between content words, ignoring 'less used' features such as tense and agreement

Task-based evaluation is more important than other type of evaluation. Our simple scheme reflects only relations important for NLP tasks. Therefore our scheme approximates a task-based evaluation scheme and hence is best.

Design choices and their implications

Starting point - LFG style systems GR and PARC. Aim - a more practical model of sentence representation for relation extraction tasks.

Certain words are 'collapsed out' of the representation (in the simplified SD representation discussed here) - e.g. prepositions are turned into relations.

Six design principles:

  • everything is uniformly represented as some binary relation between two sentence words

  • relations should be semantically contentful and useful to applications

  • where possible, relations should use notions of traditional grammar for easier comprehension by users

  • underspecified relations shouuld be available to deal with the complexities of real text (i.e. relations should be organised into a type hierarchy)

  • where possible, relations should be between content words, not indirectly mediated via function words

  • the representation should be spartan rather than overwhelming with linguistic details

SD:

56 relations, organised into a type hierarchy:

dependent > aux(iliary) > arg(ument) > subj(ect) > comp(lement) > mod(ifier)

etc.

Detailed ontology of NP-internal relations ('an inherent part of corpus texts and critical in real-word applications') - appos(itive modifier), nn (noun compound), num(eric modifier), number (element of compound number), abbrev(iation), amod (adjectival modifier). GR and and PARC is less fine-grained in this respect.

Not concerned with the argument/adjunct distinction, which is 'largely useless in practice'. [but SD GRs distinguish 'arg' from 'mod'!]

Content words are heads; auxiliaries, complementisers etc. are dependents of them. Relations between content words are key to extracting the 'gist of the sentence semantics', and it is important for applications to be able to retrieve them easily. In particular, prepositions are converted into relations, e.g. 'a workout of the Suns' = prep_of(workout,Suns). [cf. prepositions are role markers].

Conjuncts are also collapsed (unlike in GR or PARC) - 'makes electronic, computer and building products' = dobj(makes, products), amod(products,electronic), amod(products,computer), amod(products,building), conj_and(electronic,computer), conj_and(electronic,building). [but: what about collective readings of conjoined NPs?]

Unlike in PARC, non-binary relations are ignored (e.g. tense, number, person, adjectival degree) - 'this kind of information is often less used in practice'. It 'impedes readability and convenience'.

Treating prepositions as relations (though "useful for 98% of users 98% of the time") sacrifices 'linguistic fidelity' for usability. e.g. modifiers of prepositions have to be expressed with respect to the verb rather than the preposition (e.g. Bill went RIGHT through the woods). Also, PP conjunction is problematic and must be treated as VP coordination - 'Bill went over the river and right through the woods' = "Bill went over the river and went right through the woods" - 'Not collapsing the relations in such a way would prevent the alteration of the semantics, but would lead to a non-uniform treatment of prepositions. Uniformity is key for readability and user convenience'. [but: these prepositions are meaningful and hence are allowed a different treatment!].

The formalism and the tool

The Stanford parser includes a tool to extract GRs from (PTB) phrase structure trees, with structural relations used to define GRs.

But the tool is limited, e.g. it cannot handle long-distance dependencies.

Stanford dependencies in practice

The PASCAL Recognising Textual Entailment (RTE) challenges - the number of entries using SD is increasing.

Bioinformatics - using SD as an output representation appears to deliver state-of-the-art results in extracting relations (e.g. between genes and proteins) from text.

SD is also used to evaluate parsers for the biomedical domain (e.g. the BioInfer corpus).

SD has also been used in other information extraction tasks, sentiment analysis.

Suitability for parser evaluation

Intrinsic evaluation

Extrinsic (task-based) evaluation

Although extrinsic evaluation is more valuable than intrinsic evaluation, since the SD representation is of more practical use than phrase structure representations, it appears that intrinsic evaluation over SD GRs is close to typical user tasks, thus is a useful surrogate.

-- MarkMcConville - 21 Jul 2008

Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 21 Jul 2008 - 16:09:08 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies