Reflections on the development of corpora for Zimbabwe’s understudied languages


  • Emmanuel Chabata University of Zimbabwe


Corpus, linguistic data, understudied language, Zimbabwe


Linguistic corpora are one of the primary research tools in modern-day linguistics. The centrality of corpora derives from the philosophy that data from them is more accurate, observable, objective, reliable and verifiable. However, very little has been done on developing corpora in understudied languages. Yet compiling readily available corpora is principally important for these languages since most researchers have restricted physical access to them given that most of them are located in remote areas. This article examines issues of corpus designing, compilation and querying and is a call for the development of corpora in Zimbabwe’s understudied languages. Taking a cue from some of the challenges encountered in the development of Shona and Ndebele language corpora, the article focuses on issues that need special consideration when developing corpora in these languages. Some such issues relate to the languages’ level of development, the scarcity of written and electronic materials in them as well as the sociolinguistic context in which they are found. An argument is made that corpora should be developed in these languages so that they become an important footing upon which the development of other linguistic resources can be anchored.


Metrics Loading ...