This page provides general information on the size of the corpus and gives a list of documents incorporated in ParCoLab.

Size of the corpus

The parallel corpus contains a total of 29.000.000 words, in all four languages. The data collected so far are distributed as follows:

List of included documents

The content is predominantly literary, with texts originally written in French, Serbian, English, Spanish and Occitan, but diversification efforts are ongoing, especially towards including legal texts, subtitles and web content. In ParCoLab you can also find various journalistic and philosophical texts.

To date, the following documents are integrated into the parallel corpus.