| Description: |
Language occupies a central role on the web: most content is expressed in a given language, and most access takes place via natural language input and interfaces. Today, investigation of human language in all its forms depends on access to this vast store of language data. In particular, linguists and language technologists annotate and analyze this data and develop new language resources including grammars, dictionaries, and a raft of new technologies for automatic translation, information extraction, question answering, and so forth. As this new documentation is disseminated on the web, and as the new technologies are in turn deployed on the web, a further round of collection and processing is enabled, closing the loop. For instance, a collection of Japanese text with an aligned English translation can be used for translation studies, for adding examples to bilingual dictionaries, and developing translation systems. These resources can then be used for new purposes, e.g. to provide English speakers access to content stored in Japanese text, or to provide Japanese learners of English with more authentic example sentences. In the first five years of the web English content was dominant. Then in mid-2000, the combined content from all other languages exceeded English for the first time; the growth of this non-English content continues to outstrip the growth of English content. Most striking of all has been the emergence |