de Marneffe, MacCartney and Manning (2006)

Marie-Catherine de Marneffe, Bill MacCartney , and Christopher D. Manning (2006). "Generating typed dependency parses from phrase structure trees." Proceedings of LREC'06.

Motivation: "there has been increasing interest in using dependency parses for a range of NLP tasks, from machine translation to question answering. Such applications benefit particularly from having access to dependencies between words typed with grammatical relations, since these provide information about predicate-argument structure which are not readily available from phrase structure parses."

Grammatical relations:

  • motivated by practical rather than theoretical concerns
  • based on LFG-style systems like RASP or PARC DepBank
  • 48 relations, organised into a type hierarchy
  • relatively fine-grained ontology of NP-internal relations

Two-stage extraction method (from PTB trees):

  • dependency extraction - in general, content words are preferred as heads
  • dependency typing - using tregex pattern matching
Each word (aside from the root) is now the dependent of exactly one other word (i.e. the graph is a tree). Subsequently, pairs of dependencies can be optionally 'collapsed' into one, thus treating prepositions, conjunctions and possessive clitics as relations.

Conjunctions:

  • h d1 and d2 - source (e.g. loves Mary and Sue)
  • X(h,d1), cc(d1,and), conj(d1,d2) - after extraction
  • X(h,d1), cc_and(h,d2) - after first phase of optional collapsing
  • X(h,d1), cc_and(h,d2), X(h,d2) - after optional second phase of optional collapsing

Prepositions:

  • h p d - source (e.g. jump on Mary)
  • prep(h,p), pobj(p,d) - after extraction
  • prep_p(h,d) - after optional collapsing

Relative clauses:

  • n wh v - source (e.g. man who jumped)
  • rcmod(n,v), ref(n,wh), X(v,wh) - after extraction
  • rcmod(n,v), X(v,n), rel(v,wh) - after optional collapsing

Differences with Link parser representation:

  • the Link parser assumes that the head of an embedded clause is its subject rather than its main verb
  • the Link parser assumes a much more fine-grained vocabulary of relations (106 types versus just 48 for SD) - many of the extra ones are purely structural rather than semantic; in comparison, the RASP system is coarse-grained (just 23 labels)

The representation defined here was the foundation for Stanford's entry in the 2005/6 PASCAL Recognising Textual Entailment (RTE) challenges. This system "attained the highest confidence-weighted score of all entrants in the 2005 competition by a significant margin".

de Marneffe and Manning (2008)

Marie-Catherine de Marneffe and Christopher D. Manning (2008). "The Stanford typed dependencies representation." Proceedings of the COLING'08 Workshop on Cross-Framework and Cross-Domain Parser Evaluation.

Desiderata for a system of grammatical relations:

  1. the system should concentrate just on the grammatical relations required for PRACTICAL information extraction tasks, providing SEMANTICALLY CONTENTFUL information
  2. the system should be SIMPLE enough to be understood and used by people without linguistic expertise who want to extract textual relations (e.g. biologists, lawyers, market analysts); excessive detail is viewed as a defect, detracting from uptake and usability
  3. there should be an automatic procedure for EXTRACTING the relations from PTB-style phrase structure parser output

Dependency parsers are more intuitive than PTB-style parsers for non-experts - the "widespread use of MiniPar and the Link Parser ... clearly shows that ... it is very easy for a non-linguist thinking in relation extraction terms to see how to make use of a dependency representation (whereas a phrase structure representation seems much more foreign and forbidding)".

Detailed design principles:

  1. each datum is uniformly represented as some BINARY RELATION between two sentence words; this representation maps straightforwardly to common representations of potential users like RDF triples or directed graphs
  2. relations should be SEMANTICALLY CONTENTFUL and USEFUL to applications; less commonly used details like tense and number should be ignored; the argument/adjunct distinction, which is 'largely useless in practice', should be ignored; there should be a detailed ontology of NP-internal relations ('an inherent part of corpus texts and critical in real-word applications'), distinguishing between different types of modifiers (e.g. numbers, appositives, attributive adjectives etc.)
  3. where possible, relations should use notions of TRADITIONAL GRAMMAR for easier comprehension by users
  4. UNDERSPECIFIED relations should be available to deal with the complexities of real text (i.e. relations should be organised into a type hierarchy)
  5. where possible, relations should be between CONTENT WORDS, not indirectly mediated via function words; prepositions and conjunctions should be 'collapsed out' of the representation (e.g. converted into relations); typically, content words should be heads, with complementisers being dependents of them (relations between content words are key to extracting the 'gist of the sentence semantics', and it is important for applications to be able to retrieve them easily)
  6. the representation should be SPARTAN rather than overwhelming with linguistic details

The Stanford dependencies (SD) representation:

  • based on LFG-style systems like RASP or PARC DepBank
  • 56 relations, organised into a type hierarchy
  • prepositions and conjuncts are 'collapsed out' (at least in the simplified form of the representation), occasionally sacrificing 'linguistic fidelity'
  • there is a (limited) tool to extract GRs from PTB trees; it doesn't handle long-distance dependencies though
  • has proved effective in: the PASCAL Recognising Textual Entailment (RTE) challenges; bioinformatic text mining (extracting relations between genes and proteins from text); sentiment analysis; biomedical domain parser evaluation (e.g. the BioInfer corpus)

Although we believe that 'extrinsic', task-based evaluation of parsers is more valuable than any kind of 'intrinsic' evaluation, the fact that the Stanford typed dependency representation has proved useful in information extraction tasks means that using it as the basis for intrinsic evaluation is a 'useful surrogate'.

The complete hierarchy

Taken from the javadoc files in the Stanford Parser distribution:

DEP(endent)
  > AUX - link between a content verb and an auxiliary verb, e.g. 'Reagan has died' - aux(died,has)
      > AUXPASS - link between a passive participle and a passive auxiliary, e.g. 'Kennedy has been killed' - auxpass(killed,been)
      > COP - link between a predicative content word and its copula, e.g. 'Bill is big' - cop(big,Bill), 'Bill is an honest man' - cop(man,is)
  > CONJ - link between two (content) words connected by a conjunction, e.g. 'Bill is big and honest' - conj(big,honest)
  > CC - link between a content word and a conjunction, e.g. 'Bill is big and honest' - cc(big,and)
  > ARG - link between a verb and one of its arguments, e.g. 'Clinton defeated Dole' - arg(defeated,Clinton), arg(defeated,Dole)
      > SUBJ - link between a verb and its subject, e.g. 'Clinton defeated Dole' - subj(defeated,Clinton), 'what she said is untrue' - subj(is,what she said)
          > NSUBJ - link between a verb and an NP subject, e.g. 'Clinton defeated Dole' - nsubj(defeated,Clinton)
              > NSUBJPASS - link between a passive participle and an NP surface subject, e.g. 'Dole was defeated by Clinton' - nsubjpass(defeated,Dole)
          > CSUBJ - link between a verb and a CP subject, e.g. 'what she said makes sense' - csubj(makes,said), 'what she said is untrue' - ccsubj(untrue,said)
              > CSUBJPASS - link between a passive participle and a CP surface subject, e.g. 'that she lied was suspected by everyone' - csubjpass(suspected,lied)
      > COMP - link between a verb and its complement, e.g. 'she gave me a raise' - comp(gave,me), comp(gave,raise); 'I like to swim' - comp(like,swim)
          > OBJ - link between a verb and one of its objects, e.g. 'she gave me a raise' - obj(gave,me), obj(gave,raise)
              > DOBJ - link between a verb and one of its accusative objects, e.g. 'she gave me a raise' - dobj(gave,raise)
              > IOBJ - link between a verb and its dative object, e.g. 'she gave me a raise' - iobj(gave,me)
              > POBJ - link between a preposition and its object, e.g. 'on the chair' - pobj(on,chair)
          > PCOMP - link between a preposition and a verb which heads its complement CP or VP, e.g. 'information on whether users are at risk' - pcomp(on,are), 'they heard about you missing classes' - pcomp(about,missing)
          > ATTR - link between a verb like 'be/seem/appear' and its complement
          > CCOMP - link between a verb or adjective and a CP complement (finite or remnant subjunctive), e.g. 'he says that you like to swim' - ccomp(says,like), 'I am certain that he did it' - ccomp(certain,did)
          > XCOMP - link between a verb or adjective and a (controlled) VP complement, e.g. 'I like to swim' - xcomp(like,swim), 'I am ready to leave' - xcomp(ready,leave)
          > COMPL(m) - link between a subordinate verb in a complement clause and the 'that' complementiser that introduces it, e.g. 'he says that you like to swim' - complm(like,that)
          > MARK - link between a subordinate verb in an adverbial clause and the subordinating conjunction (i.e. marker) that introduces it, e.g. 'after insurgants launched simultaneous attacks' - mark(launched,after)
          > REL - link between a verb in a relative clause and the head of the relative pronoun phrase which introduces it, e.g. 'the man that you love' - rel(love,that), 'the man whose wife you love' - rel(love,wife)
          > ACOMP - link between a verb and an adjective complement, e.g. 'she looks very beautiful' - acomp(looks,beautiful)
      > AGENT - link between a passive participle and the by-PP introducing its agent, e.g. 'The man has been klled by the police' - agent(killed,police)
  > REF - link between a noun and a relative pronoun introducing a relative clause, e.g. 'the book which you bought' - ref(book,which)
  > EXPL - link between a verb and an expletive 'there' subject, e.g. 'there is a statue in the corner' - expl(is,there)
  > MOD - link between a verb and one of its modifiers, e.g. 'last night I swam in the pool' - mod(swam,in the pool), mod(swam,last night)
  > SDEP - semantic dependent
      > XSUBJ - link between a controlled verb and its controlling subject, e.g. 'Tom likes to eat fish' - xsubj(eat,Tom)
  > PRED - link between a subject and its predicate, e.g. 'Reagan died' - pred(Reagan,died)
  > PUNC - link between a word and a punctuation marker, e.g. 'Go home!' - punc(Go,!)
Modifiers:
MOD
  > ADVCL - link between a verb and a verb heading a modifier CP (temporal clause, consequence, conditional clause etc.), e.g. 'the accident happened as the night was falling' - advcl(happened,falling), 'if you know who did it, you should tell the teacher' - advcl(tell,know)
  > PURPCL - link between a verb and the verb in an '(in order) to' purpose VP modifier, e.g. 'he talked to the president in order to secure the account' - purpcl(talked,secure)
  > TMOD - link from a verb or adjective to any constituent which modifies it by specifying a time, e.g. 'last night, I swam in the pool' - tmod(swam,night)
  > RCMOD - link from a noun to the verb which heads a relative clause, e.g. 'I saw the man you love' - rcmod(man,love), 'the book which you bought' - rcmod(book,bought)
  > AMOD - link from a noun to an adjective modifier, e.g. 'Sam eats red meat' - amod(meat,red)
  > INFMOD - link from a noun to a verb of a VPto postmodifier, e.g. 'points to establish are ...' - infmod(points,establish)
  > PARTMOD - link from a noun or verb to a participle postmodifier, e.g. 'truffles picked during the spring are tasty' - partmod(truffles,picked), 'Bill picked Fred for the team, demonstrating his incompetence' - partmod(picked,demonstrating)
  > NUM - link from a noun to a number premodifier, e.g. 'three sheep' - num(sheep,three)
  > NUMBER - link from one part of a number phrase or currency unit to a number, e.g. '$ 3 billion' - number($,billion)
  > APPOS - link from a noun to the head of an appositive NP, possibly in parentheses, e.g. 'Sam, my brother, eats red meat' - appos(Sam,brother), 'Bill (John's cousin)' - appos(Bill,cousin)
  > NN - link from a noun to a noun premodifier, e.g. 'oil price futures' - nn(price,oil), nn(futures,price)
  > ABBREV - link from a noun to a noun abbreviation, e.g. 'the Australian Broadcasting Corporation (ABC)' - abbrev(Corporation,ABC)
  > ADVMOD - link from a word to an adverb, e.g. 'genetically modified' - advmod(modified,genetically), 'less often' - advmod(often,less)
      > NEG - link from a predicative word to a negative word, e.g. 'Bill is not a scientist' - neg(scientist,not), 'Bill doesn't drive' - neg(drive,n't)
  > POSS - link from a noun to a possessive adjective or noun, e.g. 'their offices' - poss(offices,their), 'Bill's clothes' - poss(clothes,Bill)
  > POSSESSIVE - link from a noun to its genitive suffix, e.g. 'John's book' - possessive(John,'s)
  > PRT - link from a (phrasal) verb to its particle, e.g. 'they shut down the station' - prt(shut,down)
  > DET - link from a noun to its determiner, e.g. 'the man' - det(man,the), 'which man' - det(man,which)
  > PREP - link from a verb, adjective or noun to a PP, e.g. 'a cat in a hat' - prep(cat,in), 'I saw a cat with a telescope' - prep(saw,with), 'responsible for meals' - prep(responsible,for)
  > QUANTMOD - link between a quantifier and a modifier, e.g. 'about 200 people' - quantmod(200,about)
  > MEASURE - link from an adjective to a measure word, e.g. 'he is 65 years old' - measure(old,years)
  > PREDET - link from a noun to a predeterminer, e.g. 'all the boys' - predet(boys,all)
  > PRECONJ - link from a noun to a preconjunct, e.g. 'both the boys and the girls' - preconj(boys,both)

Constructions

Passive

e.g. "Bills were submitted by Brownback."

auxpass(submitted,were)
ncsubjpass(submitted,Bills)
agent(submitted,Brownback)     [via prep_by(submitted,Brownback)]
i.e. not normalised, but identified as passive.

Conjunction

e.g. "Bills on ports and immigration ..."

prep_on(Bills,ports)
prep_on(Bills,immigration)
cc_and(ports,immigration)

e.g. "neither the Bush administration nor arms-control experts seem ..."

nsubj(seem,administration)
nsubj(seem,experts)
conj_nor(administration,experts)
preconj(administration,neither)

Unsaturated conjunction:

e.g. "Bell makes and distributes products."

conj_and(makes,distributes)
nsubj(makes,Bell)
nsubj(distributes,Bell)
dobj(makes,products)
i.e. only partially compiled out - no direct link between 'distributes' and 'products'.

e.g. "He commissions scores and does some conducting"

conj_and(commissions,does)
nsubj(commissions,He)
dobj(commissions,scores)
nsubj(does,He)
dobj(does,conducting)

e.g. "makes electronic, computer and building products"

dobj(makes,products)
amod(products,electronic)
conj_and(electronic,computer)
conj_and(electronic,building)
amod(products,computer)
amod(products,building)

e.g. "plays Mozart and Strauss concertos"

dobj(plays,concertos)
nn(concertos,Mozart)
conj_and(Mozart,Strauss)  --- note no nn(concertos,Strauss)

e.g. "my companion, who was the only member and whose knuckles were white"

dep(companion,member)  -- rcmod??
dep(companion,white)
conj_and(member,white)
nsubj(member,who)
nsubj(white,knuckles)
poss(knuckles,whose)

Conjunction of dependents (including noun modifiers):

h d1, d2, ..., dn-1 and dn

X(h,d1)
X(h,d2)   -- but not if X=nn?
...
X(h,dn-1)
X(h,dn)

conj_and(d1,d2)
...
conj_and(d1,dn-1)
conj_and(d1,dn)

Conjunction of heads:

h1, h2, ..., hn-1 and hn d

X(h1,d)
X(h2,d)  -- but not if X=dobj??
...
X(hn-1,d)
X(hn,d)

conj_and(h1,h2)
...
conj_and(h1,hn-1)
conj_and(h1,hn)

Non-restrictive relative clauses

e.g. "a salesman, who has stopped ..."

rcmod(salesman,stopped)
rel(stopped,who)
nsubj(stopped,salesman)

e.g. "the office of disclosure policy, which proposed the changes"

rcmod(office,proposed)
rel(proposed,which)
nsubj(proposed,office)

e.g. "my companion, whose knuckles were white"

dep(companion,white)   -- rcmod??
nsubj(white,knuckles)
poss(knuckles,whose)

e.g. "Bell, based in LA, ..." [reduced]

partmod(Bell,based)

Restrictive relative clauses

e.g. "man who loves you"

rcmod(man,loves)
nsubj(loves,man)
dobj(loves,you)
rel(loves,who)

e.g. "not all those who wrote"

rcmod(those,wrote)
rel(wrote,who)
nsubj(wrote,those)

e.g. "a repertoire that ranges from ..."

rcmod(repretoire,ranges)
rel(ranges,that)
nsubj(ranges,repertoire)

e.g. "everyone else you talk to"

rcmod(everyone,talk)

e.g. "the executives who are really calling the shots"

rcmod(executives,calling)
rel(calling,who)
nsubj(calling,who)

e.g. "the filings required ...", "investors following transactions" [reduced]

partmod(filings,required)   -- no subject/object specified with partmod 
partmod(inversors following)
dobj(following,transactions)

Attributive adjectives

The adjective is an amod of the head noun. No distinction between restrictive and non-restrictive.

e.g. "electronic products", "a hard line", "fearsome contemporary scores"

amod(products,electronic)
amod(line,hard)
amod(scores,fearsome)
amod(scores,contemporary)

e.g. "the pixie-like clarinetist", "a gleeful Alex de Castro" [non-restrictive]

amod(clarinetist,pixie-like)
amod(Castro,gleeful)

Attributive participles

e.g. "the proposed rules", "an ingeniously chosen potpourri", "the only other English-speaking member", "state-funded programs", "its bottling franchisee", "leading indicators"

amod(rules,proposed)
amod(potpourri,chosen)
advmod(chosen,ingeniously)
amod(member,English-speaking)
amod(programs,state-funded)
amod(franchisee,bottling)
amod(leading,indicators)

Apposition

Multiword proper names

The last word (e.g. surname) is the head of the NP. Preceding names or titles are nn dependents of it.

e.g. "Los Angeles", "Rolls-Royce Motor Cars Inc.", "Coca-Cola Co.", "National Security Agency"

nn(Angeles,Los)
nn(Inc.,Roll-Royce)
nn(Inc.,Motor)
nn(Inc.,Cars)
nn(Co.Coca-Cola)
nn(Agency,National)
nn(Agency,Security)

e.g. "Mr. Lane", "President Bush"

nn(Lane,Mr.)
nn(Bush, President)

e.g. "Heinz Hollinger", "Alex de Castro", "Edward F. Peduzzi"

nn(Hollinger,Heinz)
nn(Castro,de)
nn(Castro,Alex)
nn(Peduzzi,Edward)
nn(Peduzzi,F.)

e.g. "Fraser & Neave Ltd."

nn(Ltd.,Fraser)
nn(Ltd.Neave)
conj_&(Fraser,Neave)

Comparatives

e.g. "the filings will be at least as effective, if not more so, for investors"

nsubj(effective,filings)
prep_for(effective,investors)
advmod(effective,least)
dep(least,as)
advmod(least,at)
advmod(effective,more)
advmod(more,so)
mark(more,if)
neg(more,not)

e.g. "these problems may take more time to thrash out than President Bush has allowed"

nsubj(take,problems)
dobj(take,time)
amod(time,more)
xcomp(take,thrash) prt(thrash,out)
advcl(take,allowed)
mark(allowed,than)
nsubj(allowed,Bush)

e.g. "they yielded as much as 20% more corn than naturally pollinated plants"

dobj(yielded,corn)
amod(core,more)
measure(more,%)
num(%,20)
quantmod(20,as)
quantmod(20,much)
quantmod(20,as)
prep_than(corn,plants)
amod(plants,pollinated)
advmod(pollinated,naturally)

e.g. "the CIA were more interested than they let on to Mr. Stoll"

nsubj(interested,CIA)
advmod(interested,more)
ccomp(interested,let)  prt(let,on)
mark(let,than)
nsubj(let,they)
prep_to(let,Stoll)

e.g. "More stable industries were to build . . ."

xsubj(build,industries)
amod(industries,stable)
advmod(stable,more)

-- MarkMcConville - 24 Jul 2008

Topic revision: r9 - 05 Aug 2008 - 15:09:30 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies