Ryan McDonald , Fernando Pereira, Kiril Ribarov and Jan Hajic (2005). "Non-projective Dependency Parsing using Spanning Tree Algorithms". Proceedings of HLT-EMNLP'05.
Paper focuses on UNTYPED, NON-PROJECTIVE (i.e. containing crossing dependencies) dependency TREES in CZECH (although there is one result reported for English). Mention of paper by Yamada and Matsumoto (IWPT'03) who define rules for converting PTB-style trees into PROJECTIVE (i.e. no crossing dependencies) dependency TREES.
- insight: dependency parsing involves searching for the maximum (i.e. highest-scoring) spanning tree in a directed graph
- the score of a dependency tree T given input sentence x is simply the SUM of the scores of its edges (i.e. branches): Σ_{(i,j)∈T} w.f(i,j), where w is a weight vector and f(i,j) is a high dimensional feature representation of the edge from word i to word j ("edge-based factorisation"?)
- the initial search space is the directed graph containing an edge from every word to every other word
- we can use Eisner's (1996) algorithm (a bottom-up, dynamic programming algorithm) to find the maximum spanning PROJECTIVE TREE in O(n^3) time
- we can use Chu-Liu-Edmonds MST algorithm (a greedy, recursive algorithm) to find the maximum spanning NON-PROJECTIVE TREE in O(n^2) time; this dispels the notion that non-projective dependency parsing is harder than projective dependency parsing
- we can use the Margin Infused Relaxed Algorithm (MIRA) to estimate the edge weights from a training corpus. There are two different approaches to make this task computationally efficient: (a) single-best MIRA; and (b) factored MIRA.
Experiments:
We trained FIVE different dependency parsers on TWO subsets of the Prague Dependency Treebank (Czech). The parsers included two versions of our non-projective MST method (along with other state-of-the-art systems): (a) trained using single-best MIRA; and (b) trained using factored MIRA. The training sets were: (a) full PDT; and (b) the subset of PDT sentences with at least one non-projective dependency (23%).
Results: our MIRA-trained MST systems are slightly (1%) more accurate than the other systems. Also, factored MIRA is 0.3% more accurate than single-best MIRA.
We then trained three different dependency parsers on the Penn Treebank (using Yamada and Matsumoto's rules for converting trees into dependencies). We found that training and testing with Chu-Liu-Edmonds is worse than with Eisner's algorithm, since the latter assumes a priori knowledge that trees are projective.
Ryan McDonald and Fernando Pereira (2006). "Online Learning of Approximate Dependency Parsing Algorithms". Proceedings of EACL'06.
Extends treatment to ACYCLIC dependency structures, where a word can have more than one parent (e.g. for subjects in VP coordination or for relative clauses). Reference made to Kromann's annotation of the Danish Dependency Treebank. Results presented for English (projective trees trained on the PTB), Czech (non-projective trees trained on the Prague DepBank ), and Danish (non-projective, acylic dependency graphs). All results assume untyped dependencies.
- extends previous parsing approach from "first-order" factorisation (the score of a dependency tree is the SUM of the scores of its edges) to "second-order" factorisation (the score of the tree is factored into the sum of adjacent edge-pair scores, i.e. a small amount of parsing history is remembered)
- extends the previous parser to handle reentrant trees (i.e. acyclic structures where a word can be a dependant of more than one head)
- the Eisner algorithm can be used to find the maximum spanning projective tree using second-order factorisation, with time complexity O(n^3)
- finding the maximum spanning NON-PROJECTIVE tree using second-order factorisation is NP-hard; therefore we designed an "approximate" algorithm based on the Eisner algorithm, which transforms the maximum spanning projective tree into the maximum spanning non-projective tree
- we then adapted this approximate algorithm to find the maximum spanning non-projective reentrant rooted acyclic dependency structure using second-order factorisation
- we then used MIRA to train the parsers on the various training corpora
- we trained a English parser (from Penn Treebank - projective trees) using second-order factorisation - the results were better than using first-order factorisation
- we trained a Czech parser (from PDB - non-projective trees) using second-order factorisation - the results were better than using first-order factorisation
- we trained a Danish parser (from the Danish Dependency Treebank - non-projective, rooted, acyclic structures)
-- MarkMcConville - 13 Aug 2008