Syntactic Analysis of Natural Languages

The project has considered syntactic analysis of natural languages, with a focus on semi-supervised approaches that require very limited amounts of training data. One focus of the project has been on highly efficient methods for the learning of lexical representations from unlabeled data; these representations can then be used in various natural language processing problems. We have derived a new algorithm for word clustering that is significantly more efficient than previous approaches, and has strong theoretical guarantees. In other work, we have investigated methods for part of speech tagging - the problem of assigning the part of speech to each word in the sentence - using minimal amounts of training data. Our results show that a few hundred words of labeled data are sufficient for high accuracy. A final piece of work has focused on efficient dependency parsing of multiple languages, example applications being machine translation and information extraction.

Michael Collins

Computer Science
Vikram S. Pandit Professor

Michael Collins is a Vikram S. Pandit Professor of computer science at Columbia University. He completed a PhD in computer science from the University of Pennsylvania in December 1998. From January 1999 to November 2002 and from January 2003 until December 2010 he was a researcher at AT&T Labs-Research. Dr.

Back to Top