Synchronous Combinatory Categorial Grammar
- 14:00 23rd April 2014 ( week 0, Trinity Term 2014 )Room 051
Statistical machine translation has been very successful, resulting in a thriving industry highlighted by products like Google Translate. Yet translation systems still often fail to capture many linguistic phenomena, because they model translation as simple substitution and permutation of word tokens, sometimes informed by syntax. Formally, these models are probabilistic relations on regular or context-free sets, a poor fit for many of the world's languages. If we are to build translation systems that adequately capture linguistic phenomena, we must model those phenomena. Computational linguists have developed expressive mathematical models of language that exhibit high empirical coverage of annotated language data, correctly predict a variety of important linguistic phenomena in many languages, and can be processed with efficient algorithms. I will describe a new formal model of translation based on one of these formalisms, combinatory categorial grammar (CCG). I will describe a synchronous CCG that generates a relation on sentence pairs with provably equivalent semantics. I will then give a solution for the crucial problem of recognition—the basis of any probabilistic translation algorithm—derived from a view of parsing as language intersection.