What is CorpuScript?

Ghent University has always stimulated innovations in second language learning. One of the many exponents was an educational innovation project that aimed at supporting students “Language and literature: two languages” during their writing proficiency courses in both languages. This resulted in the creation of a multilingual website which focuses on writing skills and which is supplemented by an online correction tool: the editor. This online application enables teachers to digitally correct writing assignments. Thus, the online application not only builds a database of learners written production, but also of mistakes typically made by our students.


The aim of the online correction tool is to create a vast database of learner production for the languages English, French, Italian, Swedish as foreign languages, and in a later stadium also German, Spanish and Dutch as mother tongues. During the process of online correcting texts, the mistakes are fed into the database in a uniform way. For this we have used an improved version of a standard correction module that allows teachers to choose from a set of predefined categories. Like James (1998) we distinguish four types of mistakes: overinclusion, omission, selection, and misorder. These types can occur in three paradigms: lexical, grammatical and discursive. In addition to these categories we have also added spelling mistakes. We have not only implemented this matrix of mistakes in our online correction application, but have also expanded it in that we can also add comments to our correction, or insert links to relevant pages on the companion website. This way the system is both uniform and systematic on the one hand, while it does allow for some flexibility on the other.


The importance of research in second language learning and more specifically the characteristics of interlanguage, based on empiric analyses of corpora, has already been extensively emphasized. Thanks to the online correction tool we can, in a structured and systematic way, create a large and tagged multilingual learners corpus that will in the long term result in the disclosure of research on the effect of innovative approaches in written language learning on the learner’s process, emphasizing the longitudinal perspective: were identified problems sufficiently remedied? It is not just our aim to quantitatively determine in which contexts and languages certain structures are wrong or nonidiomatic but also to analyse to what extent factors such as age, sex, the previous training, the level of study and the chosen language combination possibly influence the learners process. This analysis, based on learners’ profiles, will give us a nuanced reflection of this learners process, enabling us to determine exactly where and when a more differentiated training is necessary.