English to ASL Gloss Machine Translation

Low-resource languages, including sign languages, are a challenge for machine translation research. Given the lack of parallel corpora, current researchers must be content with a small parallel corpus in a narrow domain for training a system. For this thesis, we obtained a small parallel corpus of English text and American Sign Language gloss from The Church of Jesus Christ of Latter-day Saints. We cleaned the corpus by loading it into an open-source translation memory tool, where we removed computer markup language and split the large chunks of text into sentences and phrases, creating a total of 14,247 sentence pairs. We randomly partitioned the corpus into three sections: 70% for a training set, 10% for a development set, and 20% for a test set. After downloading and installing the open-source Moses toolkit, we went through several iterations of training, translating, and evaluating the system. The final evaluation on unseen data yielded a state-of-the-art score for a low-resource language.

Thesis Author: Bonham, Mary Elizabeth

Year Completed: 2015

Thesis Chair: Deryle W. Lonsdale

Click here to access full article