This page provides general information on the size of the corpus and gives a list of documents incorporated in ParCoLab.

Size of the corpus

The parallel corpus contains a total of 25.000.000 words, in all four languages. The data collected so far are distributed as follows:

List of included documents

The content is predominantly literary, with texts originally written in French, Serbian, English and Spanish, but diversification efforts are ongoing, especially towards including legal texts, subtitles and web content.

To date, the following documents are integrated into the parallel corpus.