Building a Large Multilingual Resource with Natural Language Processing Techniques 1 The problems of language identification within hugely multilingual data sets Fei Xia…