{"id":284,"date":"2015-05-18T07:13:26","date_gmt":"2015-05-18T06:13:26","guid":{"rendered":"https:\/\/parcolab.univ-tlse2.fr\/onama\/ressources\/"},"modified":"2023-04-30T12:16:24","modified_gmt":"2023-04-30T11:16:24","slug":"resursi","status":"publish","type":"page","link":"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/resursi\/","title":{"rendered":"Resursi"},"content":{"rendered":"<p class=\"lead\">This page aims at collecting the linguistic resources developed in the framework of the ParCoLab project.<\/p>\n<h3>ParCoTrain-Synt &#8211; Syntactic analysis of Serbian<\/h3>\n<p>ParCoTrain-Synt is a training and evaluation corpus for\u00a0the POS-tagging, fine-grained morphosyntactic annotation, lemmatisation and parsing of Serbian. The corpus contains 81 000 tokens annotated manually on all levels. The source texts for the corpus are contemporary Serbian novels from the 2nd half of the 20th century.<\/p>\n<p>For each token, the corpus indicates the lemma, POS-tag, detailed morphosyntactic description, morphosyntactic traits important for parsing, syntactic governor and syntactic function. The syntactic annotation is done in the dependency-based approach.<\/p>\n<p>This resource was developed by Aleksandra Miletic, Dejan Stosic and C\u00e9cile Fabre (CLLE, CNRS &amp; University of Toulouse).<\/p>\n<p><strong>Licence<br \/>\n<\/strong>Some rights are reserved. ParCoTrain-Synt is distributed under a Creative Commons BY-NC-SA 3.0 licence. Please read the licence carefully before using the corpus in your work.<\/p>\n<p><strong>Contact<br \/>\n<\/strong>Aleksandra Miletic,\u00a0aleksandra.miletic@univ-tlse2.fr<\/p>\n<p><strong>Download<br \/>\n<\/strong><a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2017\/07\/ParCoTrain-Synt-v0.1.zip\">Corpus<br \/>\n<\/a>PDF documentation (forthcoming)<\/p>\n<h3>ParCoJour &#8211; an MSD-tagged, lemmatized and parsed Serbian news corpus<\/h3>\n<p><strong>Description<\/strong><\/p>\n<p><strong>ParCoJour <\/strong>is a Serbian news corpus containing 34,000 tokens. There are 37 articles from one daily (<em>Danas<\/em>) and one weekly (<em>NIN<\/em>) newspaper published between 2003 and 2017. The corpus indicates the lemma, POS-tag, detailed morphosyntactic traits important for parsing, syntactic governor and syntactic function for each token. The linguistic annotation of the corpus follows the guidelines of the ParCoTrain-Synt corpus.<\/p>\n<p><strong>Download<br \/>\n<a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2020\/05\/ParCoJour_v0.1.zip\">ParCoJour_v0.1<\/a><br \/>\n<\/strong><\/p>\n<p><strong>Licence<br \/>\n<\/strong><a href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/\" rel=\"license\"><img decoding=\"async\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-sa\/3.0\/80x15.png\" alt=\"Creative Commons License\" \/><\/a> This work is licensed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/\" rel=\"license\">Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License<\/a>.<\/p>\n<p><strong>Contact<br \/>\n<\/strong>Dusica Terzic, dusica.terzic@fil.bg.ac.rs<\/p>\n<p><strong>References<br \/>\n<\/strong><strong>Terzic, Dusica. (2019). <\/strong>Parsing des textes journalistiques en serbe par le logiciel Talismane. <em>Proceedings of TALN-RECITAL 2019, pp. 591-604. <\/em>Toulouse, France. [<a href=\"https:\/\/www.irit.fr\/pfia2019\/wp-content\/uploads\/2019\/07\/actes_TALN-RECITAL-recital_CH_PFIA2019-2.pdf\">PDF<\/a>]<\/p>\n<h3>ParCoLab &#8211; files available for download<\/h3>\n<p><strong>Description<\/strong><\/p>\n<p>A part of ParCoLab&#8217;s content is free of copyright and available for download. The portion of the corpus that is currently available contains 588 000 tokens in total (63 000 in Serbian, 260 000 in French, and 265 000 in English). The description of the texts included along with their size in tokens is given below.<\/p>\n<table border=\"1\" width=\"616\" cellpadding=\"6\">\n<tbody>\n<tr>\n<td rowspan=\"2\" width=\"116\">Source<\/td>\n<td rowspan=\"2\" width=\"206\">Type<\/td>\n<td colspan=\"3\" width=\"201\">Tokens per language<\/td>\n<td rowspan=\"2\" width=\"73\">Total<\/td>\n<\/tr>\n<tr>\n<td width=\"68\">Serbian<\/td>\n<td width=\"66\">French<\/td>\n<td width=\"66\">English<\/td>\n<\/tr>\n<tr>\n<td width=\"116\">French Embassy in Canada<\/td>\n<td width=\"206\">Web content<br \/>\n(short texts)<\/td>\n<td width=\"68\">&#8211;<\/td>\n<td width=\"66\">28\u00a0297<\/td>\n<td width=\"66\">28\u00a0288<\/td>\n<td width=\"73\">56\u00a0585<\/td>\n<\/tr>\n<tr>\n<td width=\"116\">TV series <em>Bref<\/em><\/td>\n<td width=\"206\">Subtitles<br \/>\n(spoken language)<\/td>\n<td width=\"68\">13\u00a0305<\/td>\n<td width=\"66\">15\u00a0168<\/td>\n<td width=\"66\">&#8211;<\/td>\n<td width=\"73\">28\u00a0473<\/td>\n<\/tr>\n<tr>\n<td width=\"116\">Web magazine <em>Pescanik<\/em><\/td>\n<td width=\"206\">Web content<br \/>\n(socio-political articles)<\/td>\n<td width=\"68\">31 151<\/td>\n<td width=\"66\">&#8211;<\/td>\n<td width=\"66\">34\u00a0275<\/td>\n<td width=\"73\">65\u00a0426<\/td>\n<\/tr>\n<tr>\n<td width=\"116\">JRC-Acquis<\/td>\n<td width=\"206\">Legislation<br \/>\n(legislative texts from EU)<\/td>\n<td width=\"68\">&#8211;<\/td>\n<td width=\"66\">195\u00a0095<\/td>\n<td width=\"66\">181\u00a0290<\/td>\n<td width=\"73\">376\u00a0385<\/td>\n<\/tr>\n<tr>\n<td width=\"116\">TED talks<\/td>\n<td width=\"206\">Subtitles<br \/>\n(short talks on various subjects)<\/td>\n<td width=\"68\">18\u00a0933<\/td>\n<td width=\"66\">21\u00a0105<\/td>\n<td width=\"66\">21\u00a0410<\/td>\n<td width=\"73\">61\u00a0448<\/td>\n<\/tr>\n<tr>\n<td colspan=\"2\" width=\"321\">Total<\/td>\n<td width=\"68\">63\u00a0389<\/td>\n<td width=\"66\">259\u00a0665<\/td>\n<td width=\"66\">265\u00a0263<\/td>\n<td width=\"73\"><strong>588\u00a0317<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>\n<p><strong>Contact person:\u00a0<\/strong><a href=\"http:\/\/clle.univ-tlse2.fr\/accueil\/miletic-aleksandra-414135.kjsp?RH=1458287996569\">Aleksandra Miletic (CLLE-ERSS)<\/a>, aleksandra.miletic@univ-tlse2.fr<\/p>\n<\/div>\n<div>\n<p><strong>Licence:\u00a0<\/strong>Some rights are reserved. ParCoLab is distributed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/deed.fr\">Creative Commons BY-NC-SA 3.0<\/a> licence.<\/p>\n<\/div>\n<div>\n<p><strong>Download<\/strong><\/p>\n<p><a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2017\/04\/parcolab-copyrightfree.zip\">zip archive with XML files<\/a><\/p>\n<\/div>\n<h3><\/h3>\n<h3><span style=\"color: #666699;\">ParCoTrain &#8211; morfosintaksi\u010dka analiza i lematizacija srpskog jezika<\/span><\/h3>\n<p><strong>Opis<\/strong><\/p>\n<p>ParCoTrain je korpus za u\u010denje i evaluaciju alata za automatsku identifikaciju vrsta re\u010di i lematizaciju srpskog. Lematizovani deo korpusa sadr\u017ei 95 585 ru\u010dno anotiranih tokena, dok deo oboga\u0107en anotacijom vrsta re\u010di sadr\u017ei ukupno 153 625 tokena, od kojih je 95 585 anotirano ru\u010dno, a 57 977 anotirano automatski, a anotacija je potom ru\u010dno proverena i ispravljena. Korpus je zasnovan na tekstu 3 savremena srpska romana iz druge polovine XX veka.<\/p>\n<p>Anotacija vrsta re\u010di sadr\u017ei glavnu kategoriju i pod-kategoriju, a za prideve i priloge navodi se i stepen pore\u0111enja. Detaljan pregled etiketa kori\u0161\u0107enih pri anotaciji dat je u dokumentaciji u PDF formatu koju mo\u017eete skinuti preko linka u dnu strane.<\/p>\n<p>Ovaj resurs razvili su <a href=\"http:\/\/clle.univ-tlse2.fr\/accueil\/miletic-aleksandra-414135.kjsp?RH=1458287996569\">Aleksandra Mileti\u0107<\/a> (istra\u017eiva\u010dka ekipa CLLE-ERSS, Univerzitet Tuluz &#8211; \u017dan \u017dores), <a href=\"http:\/\/stl.recherche.univ-lille3.fr\/sitespersonnels\/balvet\/page_balvet\/page_Balvet.html\">Antonio Balvet<\/a> (istra\u017eiva\u010dka ekipa STL, Univerzitet Lil 3) i <a href=\"http:\/\/clle.univ-tlse2.fr\/accueil\/actualites\/annuaire\/stosic-dejan-327542.kjsp?RH=1458287996569\">Dejan Sto\u0161i\u0107<\/a> (istra\u017eiva\u010dka ekipa CLLE-ERSS, Univerzitet Tuluz &#8211; \u017dan \u017dores) u okviru projekta <a href=\"https:\/\/parcolab.univ-tlse2.fr\/\">ParCoLab<\/a>.<\/p>\n<p><strong>Kontakt:<\/strong> Aleksandra Mileti\u0107 (CLLE-ERSS), aleksandra.miletic@univ-tlse2.fr<\/p>\n<p><strong>Prava:<\/strong> Neka prava su zadr\u017eana. ParCoTrain se distribuira pod licencom a href=&#8220;http:\/\/creativecommons.org\/licenses\/by-nc-sa\/3.0\/deed.fr&#8220;&gt;Creative Commons BY-NC-SA 3.0. Molimo vas da je pa\u017eljivo pro\u010ditate.<\/p>\n<p><strong>Fajlovi koje mo\u017eete skinuti:<\/strong><br \/>\n<a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2015\/05\/ParCoTrain.zip\">Korpus za u\u010denje i evaluaciju<\/a><br \/>\n<a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2016\/09\/ParCoTrain-Documentation-en.pdf\">Dokumentacija na engleskom<\/a><br \/>\n<a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2016\/09\/ParCoTrain-Documentation-fr.pdf\">Dokumentacija na francuskom<\/a><\/p>\n<p><strong>Reference:<\/strong><\/p>\n<p>Balvet, A., Stosic, D., &amp; Miletic, A. (2014). TALC-sef, Un corpus \u00e9tiquet\u00e9 de traductions litt\u00e9raires en serbe, anglais et fran\u00e7ais. In SHS Web of Conferences (Vol. 8, pp. 2551-2563). EDP Sciences. [<a href=\"cmlf2014.pdf\">PDF<\/a>] [<a href=\"https:\/\/scholar.google.fr\/scholar.bib?q=info:r0AbJiIbPQcJ:scholar.google.com\/&amp;output=citation&amp;scisig=AAGBfm0AAAAAVmbuU90ZoQ7_Ce1OX20cIhZFeRj6ggth&amp;scisf=4&amp;hl=fr\">BibTex<\/a>]<\/p>\n<p>Balvet, A., Stosic, D., &amp; Miletic, A. (2014, May). TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French. In LREC 2014. [<a href=\"lrec2014.pdf\">PDF<\/a>] [<a href=\"https:\/\/halshs.archives-ouvertes.fr\/halshs-01077767v1\/bibtex\">BibTex<\/a>]<\/p>\n<p>Miletic, A. (2013). Annotation semi-automatique en parties du discours d&#8217;un corpus litt\u00e9raire serbe. M\u00e9moire de Master. Universit\u00e9 Charles de Gaulle Lille 3, France.<\/p>\n<div>\n<h3>wikimorph-sr &#8211; a lexicon for POS-tagging and parsing of Serbianh b<\/h3>\n<\/div>\n<p><strong>Description<\/strong><\/p>\n<p><b>wikimorph-sr<\/b>\u00a0is a morphosyntactic lexicon for Serbian that can be used for POS-tagging, parsing and lemmatisation. It was mainly extracted from the serbo-croatian edition of the Wiktionary (<a href=\"https:\/\/sh.wiktionary.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">sh.wiktionary.org<\/a>).<\/p>\n<p>The lexicon contains 1 222 486 different wordforms corresponding to 117 445 different lemmas and to 3 061 616 unique combinations <i>wordform, lemma, morphosyntactic description<\/i>. Each morphosyntactic description contains a POS indication, a subcategory and a set of relevant morphosyntactic traits: case, number and gender for nouns, adjectives and pronouns; verb form, person, gender and number for verbs; degree of comparison for adjectives and adverbs. More details are available in the PDF documentation of the lexicon.<\/p>\n<p>This resource was developed as part of the <a href=\"https:\/\/parcolab.univ-tlse2.fr\/\" target=\"_blank\" rel=\"noopener noreferrer\">ParCoLab project<\/a> by <a href=\"http:\/\/clle.univ-tlse2.fr\/accueil\/miletic-aleksandra-414135.kjsp\" target=\"_blank\" rel=\"noopener noreferrer\">Aleksandra Miletic<\/a> (UMR 5263 CLLE-ERSS, CNRS &amp; Universit\u00e9 Toulouse &#8211; Jean Jaur\u00e8s, France).<\/p>\n<div>\n<p><strong>Contact person<br \/>\n<\/strong><a href=\"http:\/\/clle.univ-tlse2.fr\/accueil\/miletic-aleksandra-414135.kjsp?RH=1458288097865\" target=\"_blank\" rel=\"noopener noreferrer\">Aleksandra Miletic<\/a><br \/>\nContact: <a href=\"mailto:aleksandra.miletic@univ-tlse2.fr\">aleksandra.miletic@univ-tlse2.fr<\/a><\/p>\n<\/div>\n<p><strong>Licence<br \/>\n<\/strong>Some rights are reserved. <b>wikimorph-sr<\/b>\u00a0is distributed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/deed.fr\" target=\"_blank\" rel=\"noopener noreferrer\">Creative Commons BY-SA 3.0<\/a> licence.<\/p>\n<p><strong>Downloads<\/strong><\/p>\n<p><a href=\"http:\/\/redac.univ-tlse2.fr\/lexiques\/wikimorph-sr\/wikimorph-sr_1.0.zip\">Lexicon<\/a><br \/>\n<a href=\"https:\/\/parcolab.univ-tlse2.fr\/wp-content\/uploads\/2017\/04\/wikimorph-sr_documentation-en-v1.1.pdf\">PDF documentation in English<\/a><\/p>\n<p><strong>References<\/strong><\/p>\n<p><b>Miletic, Aleksandra. (2017)<\/b>. Building a morphosyntactic lexicon for Serbian from Wiktionary. <i>Actes de la 6e \u00e9dition des Journ\u00e9es d&#8217;\u00e9tude toulousaines (J\u00e9Tou2017)<\/i>. Toulouse, France.<\/p>\n<p><strong>Acknowledgements<\/strong><\/p>\n<div>\n<p>Many thanks to Franck Sajous (UMR 5263 CLLE, CNRS &amp; Universit\u00e9 de Toulouse &#8211; Jean Jaur\u00e8s) for sharing his experience in working on the Wiktionary.<\/p>\n<\/div>\n<h5 style=\"text-align: center;\">[<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/\">O nama<\/a>] \u00a0 \u00a0[<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/ekipa\/\">Ekipa<\/a>] \u00a0\u00a0[<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/doc\/\">Dokumentacija<\/a>] \u00a0 [<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/sadrzaj\/\">Sadr\u017eaj<\/a>] \u00a0 \u00a0[<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/radovi\/\">Objavljeni radovi<\/a>] \u00a0 [<a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/hvala\/\">Zahvaljujemo<\/a>]<\/h5>\n","protected":false},"excerpt":{"rendered":"<p>This page aims at collecting the linguistic resources developed in the framework of the ParCoLab project. ParCoTrain-Synt &#8211; Syntactic analysis of Serbian ParCoTrain-Synt is a training and evaluation corpus for\u00a0the POS-tagging, fine-grained morphosyntactic annotation, lemmatisation and parsing of Serbian. The corpus contains 81 000 tokens annotated manually on all levels. The source texts for the&#8230;  <a href=\"https:\/\/parcolab.univ-tlse2.fr\/sr\/onama\/resursi\/\" class=\"more-link\" title=\"Read Resursi\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":40,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-284","page","type-page","status-publish","hentry"],"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/pages\/284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/comments?post=284"}],"version-history":[{"count":8,"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/pages\/284\/revisions"}],"predecessor-version":[{"id":3125,"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/pages\/284\/revisions\/3125"}],"up":[{"embeddable":true,"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/pages\/40"}],"wp:attachment":[{"href":"https:\/\/parcolab.univ-tlse2.fr\/sr\/wp-json\/wp\/v2\/media?parent=284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}