Workshops:
The workshop is designed for educators, translators, and researchers interested in utilizing language corpora for foreign language teaching and translation, regardless of their prior experience with corpus linguistics. Participants will explore the methodological foundations of corpus linguistics and how these can be applied across various research areas.
We will explore cross-linguistic lexical inconsistencies, such as concepts that exist in one language but not in another, shaped by cultural differences and linguistic anisomorphism (including polysemy and lexical gaps). Through hands-on exercises, participants will learn to effectively search parallel corpora, progressing from basic to complex queries, including Named Entity Recognition (toponyms, anthroponyms, etc.).
The second half of the workshop will be dedicated to practical application, where participants will independently extract translation equivalents and named entities from the bilingual corpora of literary texts It-Sr-NER and SerbItaCor3_sr.
For active participation in the workshop, it is necessary to have a computer.
The workshop will provide participants with an insight into the concepts and techniques of automatic Named Entity Recognition (NER). Participants will learn how to compare models that identify people, places, and organizations in literary works and link them to the corresponding entities on Wikipedia.
The practical part of the workshop will include the use of tools and models, including those based on the vector representation of words, which are created as part of the TESLA Text Embeddings - Serbian Language Applications project funded by the Science Fund of the Republic of Serbia. Tools and services available at https://ners.jerteh.rs/ as well as model jerteh-355-tesla, the INCEPTION tool and Wikidata will be used
For active participation in the workshop, it is necessary to have a computer.
The workshop is intended for everyone who is interested in modern techniques and methods in the processing of natural languages. Participants will first get acquainted with the concept and methods of textometric analysis embedded in the TXM tool, and then with the models and resources developed for the Serbian language by the Society for Language Resources and Technologies ЈеРТех.
The goal of the workshop is to show the participants how they can use textometric analysis on ready-made Jerteh corpora, and then create their own corpora. The second part of the workshop will be devoted to the creation and textometric analysis of own corpora using the TXM tool. Texts from the corpus of Serbian novels (1840–1920) SrpELTeC will be prepared for the exercises.
For active participation in the workshop, it is necessary to have a computer.
Overview:
This workshop aims to familiarize participants with the techniques of using large language models (GPT-4) for automated extraction of synonyms and antonyms and building lexical networks. During the workshop, participants will learn how to correctly set prompts for language models, define lexical relationships, and use the results for visualization and analysis of semantic structures.
Main workshop topics:
Objectives:
Duration:
The workshop would last between 3 and 4 hours, including breaks for discussion and practical exercises.
Required prior knowledge:
The workshop is intended for researchers and practitioners dealing with linguistic analysis, lexicography, and natural language processing (NLP), but deep technical background is not required.