Location : Faculty of Philology, University of Belgrade

Registration

Workshops:

  • Corpus Query Language (CQL): Lexical gaps in bilingual corpora
  • Named-entity recognition (NER) and linking with Wikidata
  • Corpus analysis: textometry, TXM and other tools
  • Creating Lexical Networks using Large Language Models (LLM) with a Focus on Synonym extraction

Corpus Query Language (CQL): Lexical gaps in bilingual corpora


The workshop is designed for educators, translators, and researchers interested in utilizing language corpora for foreign language teaching and translation, regardless of their prior experience with corpus linguistics. Participants will explore the methodological foundations of corpus linguistics and how these can be applied across various research areas.

We will explore cross-linguistic lexical inconsistencies, such as concepts that exist in one language but not in another, shaped by cultural differences and linguistic anisomorphism (including polysemy and lexical gaps). Through hands-on exercises, participants will learn to effectively search parallel corpora, progressing from basic to complex queries, including Named Entity Recognition (toponyms, anthroponyms, etc.).

The second half of the workshop will be dedicated to practical application, where participants will independently extract translation equivalents and named entities from the bilingual corpora of literary texts It-Sr-NER and SerbItaCor3_sr.

For active participation in the workshop, it is necessary to have a computer.

Organisers: Assistants:
  • Miloš Utvić (University of Belgrade, Faculty of Philology)
  • Milica Ikonić Nešić (University of Belgrade, Faculty of Philology)
Date:
  • Date: 23.11.2024., Time: 10:00 - 13:00
Venue:
  • Venue: Faculty of Philology, University of Belgrade, Studentski trg 3, Meeting Hall, Attendance is also possible online

Named-entity recognition (NER) and Linking with Wikidata


The workshop will provide participants with an insight into the concepts and techniques of automatic Named Entity Recognition (NER). Participants will learn how to compare models that identify people, places, and organizations in literary works and link them to the corresponding entities on Wikipedia.

The practical part of the workshop will include the use of tools and models, including those based on the vector representation of words, which are created as part of the TESLA Text Embeddings - Serbian Language Applications project funded by the Science Fund of the Republic of Serbia. Tools and services available at https://ners.jerteh.rs/ as well as model jerteh-355-tesla, the INCEPTION tool and Wikidata will be used

For active participation in the workshop, it is necessary to have a computer.

Organisers:
  • TESLA project team and Language Resources and Technologies Society JeRTeh:
    msr Milica Ikonić Nešić, dr Mihailo Škorić, Saša Palinkar
Date:
  • Date: 22.11.2024., Time: 16:00 - 19:00
Venue:
  • Venue: Faculty of Philology, University of Belgrade, Studentski trg 3, Meeting Hall, Attendance is also possible online

Corpus analysis: textometry, TXM and other tools


The workshop is intended for everyone who is interested in modern techniques and methods in the processing of natural languages. Participants will first get acquainted with the concept and methods of textometric analysis embedded in the TXM tool, and then with the models and resources developed for the Serbian language by the Society for Language Resources and Technologies ЈеРТех.

The goal of the workshop is to show the participants how they can use textometric analysis on ready-made Jerteh corpora, and then create their own corpora. The second part of the workshop will be devoted to the creation and textometric analysis of own corpora using the TXM tool. Texts from the corpus of Serbian novels (1840–1920) SrpELTeC will be prepared for the exercises.

For active participation in the workshop, it is necessary to have a computer.

Organisers:
  • TESLA project team and Language Resources and Technologies Society JeRTeh:
    prof. dr Ranka Stanković, prof. dr Cvetana Krstev and prof. dr Duško Vitas
Date:
  • Date: 21. 11. 2024., Time: 16:00 to 19:00
Venue:
  • Venue: Faculty of Philology, University of Belgrade, Sala za sednice, first floor

Creating Lexical Networks using Large Language Models (LLM) with a Focus on Synonym extraction


Overview:
This workshop aims to familiarize participants with the techniques of using large language models (GPT-4) for automated extraction of synonyms and antonyms and building lexical networks. During the workshop, participants will learn how to correctly set prompts for language models, define lexical relationships, and use the results for visualization and analysis of semantic structures.

Main workshop topics:

  • Introduction to the concepts of lexical networks and relations (synonymy, antonymy, hierarchical relations)
  • Using LLMs to extract lexical relations
  • Practical demonstrations of prompt-engineering to obtain precise lexical data
  • Analysis of results and visualization of lexical networks using a graph model
  • Practical applications in NLP and lexicography

Objectives:

  • Increasing accuracy and speed of lexical relations extraction through the use of GPT
  • Developing prompt-engineering skills for the needs of lexicographic research
  • Creation and evaluation of semantic graphs based on the obtained data
  • Workshop languages:
  • Workshop is in the Croatian/Serbian language, in order to adapt to the target audience. Considering the nature of the work, there is a possibility of switching to English for specific technical terms, but the main communication will be in Croatian.

Duration:
The workshop would last between 3 and 4 hours, including breaks for discussion and practical exercises.

Required prior knowledge:
The workshop is intended for researchers and practitioners dealing with linguistic analysis, lexicography, and natural language processing (NLP), but deep technical background is not required.

Date:
  • 23.11.2024, Time: 14:30 - 17:30
Venue:
  • Venue: Faculty of Philology, University of Belgrade, Sala za sednice, first floor