TWiki> TheBeast Web>UnsupervisedAdaptation (revision 4)EditAttach

Unsupervised Adaptation

Data

  • PTB Gold.
  • BIO unlabelled.

Requirements

  • Script to random add domain label to dependency labels. add_domains(G,X)
  • Script to fix domain dependency labels. fix_domains(G,X)
  • Script to filter incorrect system dependency compared to a gold standard. filter

Initial Procedure

  1. PTB -> add_domains(G,N) -> PTB-D
  2. PTB-D -> train parser
  3. BIO -> run parser -> BIO-D-noisy
  4. BIO-D-noisy -> fix_domains(G,B) -> BIO-D

This gives us the following data:

  • PTB-D: gold dependencies with randomly assigned domain labels.
  • BIO-D: system dependencies with a mix of system and randomly assigned domain labels.

Iterative Phase

  1. PTB-D + BIO-D -> train parser
  2. PTB + BIO -> run parser -> PTB-D-noisy + BIO-D-noisy
  3. PTB-D-noisy -> fix_domains(G, N) -> PTB-D
  4. PTB-D + PTB-Gold -> filter -> PTB-D
  5. BIO-D-noisy -> fix_domains(G, B) -> BIO-D
  6. If not converged: goto step 1.

Alternatively

  1. define a loss function l_d1(gold-structure, guessedgold, guess) for the domain d1 we have gold data (penalizes out of domain labels and wrong structure)
  2. define a loss function l_d2(guessedgold, guess) (only penalizes out of domain labels)

Preliminary Results

  General News Bio
PTB 112175 112223 0
Bio TB 35682 0 11568
Initial 147857 112223 11568
Iteration 1 noise 207354 63264 1030
Iteration 1 fixed 209201 58799 3648

Filter output for iteration 1:

Parses fixed: 2708
Domains added: 2708
Domains changed: 5244

New Direction

  • Train on PTB with label N.
  • Run PTB model on PBIOTB.
  • Relabel PBIOTB with label B.
  • Combine PTB and PBIOTB data.
  • Train model on gold PTB and news gold PBIOTB.
  • Run model on training data.
  • Compare unlabelled accuracy of output on our gold PTB and news gold PBIOTB.

We want to see a high accuracy on the PTB and a low accuracy on the PBIOTB at first. This will mean that edges in our fools PBIOTB have changed, thus we are not creating the same edges the baseline parser (trained on PTB) would be creating. If there is a low accuracy for output against news gold PBIOTB. Take the output to on PBIOTB to be our fools gold.

Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 26 Feb 2007 - 14:37:43 - JamesClarke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies