What is COREFL?
COREFL stands for CORpus of English as a Foreign Language.
COREFL is a large database containing the language produced by learners of English as a second/foreign language (L2). This database is called a language ‘corpus’. This is the first release of COREFL (version 1). COREFL has been designed following the corpus principles and philosophy of the CEDEL2 corpus.
COREFL contains written and spoken data. An important feature of the spoken texts is that every spoken text can be matched to a written text since they have been produced by the same participant, who did the same task twice: the written text was produced first and then, after at least 15 days (so as to avoid task-repetition effects), the spoken text was produced. In this way, researchers can investigate the effects of medium (spoken vs. written language) while maintaining the learner and the task as constant.
COREFL currently amounts to a total of XXXX participants and XXXX words. COREFL currently holds data from learners of English with two different L1 backgrounds (where ‘L1’ means the learners’ mother tongue and ‘L2’ their foreign language):
- L1 Spanish-L2 English
- L1 German-L2 English
For comparative purposes, COREFL also contains two ‘control’ subcorpora, i.e., data from the mother tongue (L1) of the learners:
- L1 Spanish natives (Spanish and Latin American varieties)
- L1 English natives (American, British and other varieties)
There should be a third control corpus of L1 German natives, but this is currently unavailable.
Can I use/download COREFL?
- To search or download the corpus straight away, click on the tab ‘Search/Download’.
- Click on the tab ‘User guide’ for instructions and details.
- If you use COREFL, please cite the corpus appropriately: ‘About’ > How to cite COREFL.
Can I participate in COREFL?
You can contribute to the corpus in two ways:
IF YOU ARE A LEARNER OF ENGLISH OR A NATIVE SPEAKER:
- We are still collecting data from learners and natives for the future COREFL (version 2).
- If you participate as a learner of English, you will get your score in the English placement test for free, plus a certificate of participation.
- Participate here: learnercorpora.com
IF YOU ARE A RESEARCHER OR A TEACHER:
- If you are a teacher/researcher of English, you can collaborate in the data collection.
- There are many ways of doing this and your learners can benefit from it. Please get in touch with the COREFL director (Cristóbal Lozano, Universidad de Granada) by clicking on the tab ‘Contact’.
Open Data Science
COREFL follows the Open Data Science philosophy. COREFL is publicly available, fully searchable and freely downloadable. It is licensed under a Creative Commons license (CC BY-NC-ND 3.0 ES). You can use COREFL data for your research/teaching purposes provided you cite the corpus appropriately (‘About’ > ‘How to cite COREFL’).
Further info
- For contact and further details, please get in touch with the COREFL director (Cristóbal Lozano, Universidad de Granada) by clicking on the tab ‘Contact’.
- For details on how to cite COREFL, click on the tab ‘About’ > ‘How to cite COREFL’.
Funding
COREFL has been publicly funded by the Spanish Research Agency (Agencia Estatal de Investigación, Ministerio de Ciencia e Innovación), which we gratefully acknowledge: Research project FFI2016-75106-P ‘ANACOR’ (Ministerio de Ciencia e Innovación), PI Cristóbal Lozano.