How DARPA Plans to Decrypt the Languages That Computers Still Don't Understand

Article Header

Published on Vice.com

In the weeks that followed the 2010 earthquake in Haiti, it wasn’t just money that locals needed to rebuild, but people with whom they could speak. Even when medicine and clean water were available, foreign troops and aid workers couldn’t converse with locals about where those supplies were needed most. With far too few human translators available, hopes for more effective disaster relief fell to machine translation—but the Haitian Creole spoken by many of the country’s displaced people was largely unknown to computer linguistics.

About 10 million people speak Haitian Creole, but in the parlance of linguistics it is still a “low resource” language. These languages are mostly absent from the cross-referenced linguistic databases that feed modern translation software, have few written texts with which to study, and aren’t widely used online. And yet, they comprise the vast majority of the world’s more than 7,000 linguistic divisions, and often dominate the most conflict-ridden nations on Earth.