COREFL: Corpus of English as a Foreign Language


Oct. 2021

What is COREFL?

COREFL stands for CORpus of English as a Foreign Language.

COREFL is a large database containing the language produced by learners of English as a second/foreign language (L2). This database is called a language ‘corpus’. This is the first release of COREFL (version 1). COREFL has been designed following the corpus principles and philosophy of the CEDEL2 corpus.

COREFL contains written and spoken data. An important feature of the spoken texts is that every spoken text can be matched to a written text since they have been produced by the same participant, who did the same task twice: the written text was produced first and then, after at least 15 days (so as to avoid task-repetition effects), the spoken text was produced. In this way, researchers can investigate the effects of medium (spoken vs. written language) while maintaining the learner and the task as constant.

COREFL currently amounts to a total of XXXX participants and XXXX words. COREFL currently holds data from learners of English with two different L1 backgrounds (where ‘L1’ means the learners’ mother tongue and ‘L2’ their foreign language):

For comparative purposes, COREFL also contains two ‘control’ subcorpora, i.e., data from the mother tongue (L1) of the learners:

There should be a third control corpus of L1 German natives, but this is currently unavailable.

Can I use/download COREFL?

Can I participate in COREFL?

You can contribute to the corpus in two ways:



Open Data Science

COREFL follows the Open Data Science philosophy. COREFL is publicly available, fully searchable and freely downloadable. It is licensed under a Creative Commons license (CC BY-NC-ND 3.0 ES). You can use COREFL data for your research/teaching purposes provided you cite the corpus appropriately (‘About’ > ‘How to cite COREFL’).

Further info


COREFL has been publicly funded by the Spanish Research Agency (Agencia Estatal de Investigación, Ministerio de Ciencia e Innovación), which we gratefully acknowledge: Research project FFI2016-75106-P ‘ANACOR’ (Ministerio de Ciencia e Innovación), PI Cristóbal Lozano.

This website uses own and third party cookies to allow it to work fine and to allow us to know how it is being used. If you click on ACCEPT these both types of cookies will be enabled. If you want more information, you can read the COOKIES POLICY document of our website. You can change your settings by clicking on Cookie settings

Technical cookies So that our website can work. Activated by default.

Technical cookies are strictly necessary for our website to work and for you to navigate through it. These types of cookies are those that, for example, allow us to identify you, give you access to certain restricted parts of the website if necessary, or remember different options or services already selected by you, such as your privacy preferences. Therefore, they are activated by default and your authorization is not necessary.

Through the configuration of your browser, you can block or alert the presence of this type of cookies, although such blocking will affect the proper functioning of the different functionalities of our website.

Analysis cookies To allow us to know how our web is being used. You can enable or disable them.

Analysis cookies allow us to study the navigation of the users of our website in general (for example, which sections of the site are the most visited, which services are used most and if they work correctly, etc.). From this statistical information about navigation on our website, we can improve both the operation of the site itself and the different services it offers. Therefore, these cookies do not have an advertising purpose, but only serve to make our website work better, adapting to our users in general. By activating them you will contribute to this continuous improvement.

You can activate or deactivate these cookies by changing the corresponding sliders.