Responsive image





LectureChat: Hybrid RAG over Wikipedia and Multilingual Lectures

Dimitsas, Markos; Leidner, Jochen L. (2026)

The 48th European Conference on Information Retrieval (ECIR 2026), Delft, The Netherlands, March 28-April 3, 2026.


Peer Reviewed
 

LectureChat extends the WikiChat conversational AI system by integrating multilingual university lecture transcripts alongside

Wikipedia content. The demo showcases a dual retrieval architecture that combines structured encyclopedic knowledge with academic lecture

material, leveraging multiple segmentation strategies and cross index reconciliation to improve retrieval quality. The system maintains separate

citation spaces for Wikipedia (numeric) and lectures (alphabetic) and preserves temporal provenance for direct video navigation. We present

the overall architecture, interaction flow, implementation details, and a reproducibility plan.

more

Learner Models: Design, Components, Structure, and Modelling - A Systematic Literature Review

Böck, Felix; Ochs, Michaela; Henrich, Andreas; Landes, Dieter; Leidner, Jochen L....

User Modeling and User-Adapted Interaction 35, 15.


Open Access Peer Reviewed
 

Learning is at the heart of every progress the human species makes. It is most effective when it considers who we are as individuals, what learning approach we prefer and what we already know to begin with. In the digital age, we strive to capture such information in the form of a digital representation -- the so-called learner model --, to tailor learning-related systems to this information and build upon it to create more personalised learning experiences. Over recent years, the proliferation of diverse models across various educational applications and disciplines has made it challenging to access targeted research.

In this survey, we aim to address this gap, reviewing the latest advances in learner modelling and conducting a comprehensive analysis of the existing approaches, focusing on developments from 2014 to 2023. With the help of a systematic literature review, we want to provide designers and developers of learner models with a structured overview and simplified entrance into the topic and the field of learner models. We investigate the question: What do learner models look like and how are they filled, kept up-to-date, and used?

To this end, we analyse and classify existing approaches. Our findings provide a comprehensive and structured overview of the field of learner modelling, allowing researchers to navigate and understand the diverse approaches more easily and providing developers of learner models or adaptive systems with a practical tool to access relevant information according to their needs.


more

Second AI4AI Learning 2024 Workshop, Würzburg

Schmid, Ute; Leidner, Jochen L.; Wolter, Diedrich; Kohlhase, Michael (2025)

Proceedings of the Second Work shop on Artificial Intelligence for Artificial Intelligence Education 45.
DOI: 10.20378/irb-107661


Open Access Peer Reviewed
more

From Toponym Resolution to Advanced Models of Spatial Grounding: Past, Present and (One Possible) Future

Leidner, Jochen L. (2025)

Third International Workshop on Geographic Information Extraction from Texts (GeoExt) to be held at the 47th European Conference on Information Retrieval (ECIR 2025) in Lucca, Italy, April 10th, 2025.


 

The textual realm and the geographic/spatial realm intersect when we use human language to talk about geographic space. Various terms have been used to talk about this intersection (“geoparsing”, “georeferencing”, “toponym resolution”, “spatial grounding” etc.) and related applications such as geographic information retrieval. In this keynote, I will review some things that the community has accomplished since 2003, what occupies people’s minds at the moment, and I will raise a few research questions that would be interesting to answer, or that would unlock the potential for new kinds of applications. I conclude with some personal conjectures about how one version of the future might look like.

more

Requirements for Machine Learning Process Software Tooling

Leidner, Jochen L.; Reiche, Michael (2024)

Development Methodologies for Big Data Analytics Systems.


Peer Reviewed
 

A number of machine learning process models (SEMMA, KDD, CRISP-DM, CRISP-ML, Data-to-Value1 etc.) have been recently proposed to facilitate the development of machine learning models in their organizational context. While the existing proposals vary with respect to complexity and suitability for particular tasks, it would be desirable to have software tools that embody or support these process models, and make it easier for project teams to capture, share among team members and stakeholders and preserve the relevant project information pertaining to the various process stages. In particular, recorded past statistics may be applied to predict the duration of stages or the overall project effort.

Presently, to the best of our knowledge, no requirement analysis exists that stipulates the detailed needs. To this end, we present a first collection and analysis of a requirements document for the software tooling for machine learning process models. We describe the functional and non-functional requirements of a Computer-Aided Machine Learning Modeling (CAMLM) tool, the soft-computing world’s counter-part to a CASE (Computer Aided Software Engineering) tool.

Various software cover sub-areas such as team management and communication management (Confluence, Jira, Slack, Zoom...) or project management (CRISP-DM, Scrum, Kanban-Board...) or data and information management (model management [Weber, Christian; Hirmer, Pascal; Reimann, Peter; Schwarz, Holger (2019): A New Process Model for the Comprehensive Management of Machine Learning Models. In: Proceedings of the 21st International Conference on Enterprise Information Systems: SCITEPRESS - Science and Technology Publications.] ). What is not available to our knowledge, however, is software that covers the entire sub-areas and the entire life cycle of machine learning projects in detail.


Second Workshop on AI for AI Education (AI4AILearning)

Kohlhase, Michael; Leidner, Jochen L.; Schmid, Ute; Wolter, Diedrich (2024)

held at KI 2024: 47. Deutsche Jahrestagung für Künstliche Intelligenz, Würzburg, 23.09. - 27.09.2024.


Peer Reviewed
more

Artificial Intelligence. ECAI 2023 International Workshops - XAI³, TACTIFUL, XI-ML, SEDAMI, RAAIT, AI4S, HYDRA, AI4AI, Kraków, Poland, September 30 - October 4, 2023, Proceedings, Part I

Nowaczyk, Slawomir; Biecek, Przemyslaw; Chung, Neo Christopher; Vallati, Mauro...

Communications in Computer and Information Science (CCIS) 1947.


Peer Reviewed
more

Language-Model Assisted Learning How to Program?

Leidner, Jochen L.; Reiche, Michael (2023)

Workshop on AI for AI Learning Held at ECAI 2023, Kakow, Poland, September 30, 2023.


Peer Reviewed
more

Bridging the Programming Skill Gap with ChatGPT: A Machine Learning Project with Business Students

Reiche, Michael; Leidner, Jochen L. (2023)

Workshop on AI for AI Learning Held at ECAI 2023, Kakow, Poland, September 30, 2023.


Peer Reviewed

Topic Segmentation of Educational Video Lectures Using Audio and Text

Dimitsas, Markos; Leidner, Jochen L. (2023)

Workshop on AI for AI Learning Held at ECAI 2023, Kakow, Poland, September 30, 2023.


Peer Reviewed

Artificial Intelligence. ECAI 2023 International Workshops - XAI³, TACTIFUL, XI-ML, SEDAMI, RAAIT, AI4S, HYDRA, AI4AI, Kraków, Poland, September 30 - October 4, 2023, Proceedings, Part II

Nowaczyk, Slawomir; Biecek, Przemyslaw; Chung, Neo Christopher; Vallati, Mauro...

Communications in Computer and Information Science (CCIS) 1948.
DOI: 10.1007/978-3-031-56066-8_22


Peer Reviewed
more

Extractive Summarization of Financial Earnings Call Transcripts

Nugent, Timothy; Leidner, Jochen L.; Gkotsis, George (2023)

Advances in Information Retrieval: Proceedings of the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2-6, 2023 2, 3-15.
DOI: 10.1007/978-3-031-28238-6_1


Peer Reviewed
 

To date, automatic summarization methods have been mostly developed for (and applied to) general news articles, whereas other document types have been neglected. In this paper, we introduce the task of summarizing financial earnings call transcripts, and we present a method for summarizing this text type essential for the financial industry. Earnings calls are briefing events common for public companies in many countries, typically in the form of conference calls held between company executives and analysts that consist of a spoken monologue part followed by moderated questions and answers.

We show that traditional methods work less well in this domain, we present a method suitable for summarizing earnings calls. Our large-scale evaluation on a new human-annotated corpus of summary-worthy sentences shows that this method outperforms a set of strong baselines, including a new one that we propose specifically for earnings calls. To the best of our knowledge, this is the first application of summarization to financial earnings calls transcripts, a primary source of information for financial professionals.

more

Which Country Is This? Automatic Country Ranking of Street View Photos

Menzner, T.; Mittag, Florian; Leidner, Jochen L. (2023)

Advances in Information Retrieval: Proceedings of the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2-6, 2023 3, 275–280.
DOI: 10.1007/978-3-031-28241-6_26


Peer Reviewed
 

In this demonstration, we present Country Guesser, a live system that guesses the country that a photo is taken in. In particular, given a Google Street View image, our federated ranking model uses a combination of computer vision, machine learning and text retrieval methods to compute a ranking of likely countries of the location shown in a given image from Street View. Interestingly, using text-based features to probe large pre-trained language models can assist to provide cross-modal supervision. We are not aware of previous country guessing systems informed by visual and textual features.

more

DocFin: Multimodal Financial Prediction and Bias Mitigation using Semi-structured Documents

Mathur, Puneet; Goyal, Mihir; Sawhney, Ramit; Mathur, Ritik; Leidner, Jochen L....

Findings of the Association for Computational Linguistics: EMNLP 2022 (Empirical Methods in Natural Language Processing), December 2022, Abu Dhabi, United Arab Emirates, 1933-1940.


Open Access Peer Reviewed
 

Financial prediction is complex due to the stochastic nature of the stock market. Semi-structured financial documents present comprehensive financial data in tabular formats, such as earnings, profit-loss statements, and balance sheets, and can often contain rich technical analysis along with a textual discussion of corporate history, and management analysis, compliance, and risks. Existing research focuses on the textual and audio modalities of financial disclosures from company conference calls to forecast stock volatility and price movement, but ignores the rich tabular data available in financial reports. Moreover, the economic realm is still plagued with a severe under-representation of various communities spanning diverse demographics, gender, and native speakers. In this work, we show that combining tabular data from financial semi-structured documents with text transcripts and audio recordings not only improves stock volatility and price movement prediction by 5-12% but also reduces gender bias caused due to audio-based neural networks by over 30%.

more

The University of Sheffield at CheckThat! 2020: Claim Identification and Verification on Twitter

McDonald, Thomas; Dong, Ziqing; Zhang, Yingji; Hampson, Rebekah; Young, James...

Cross Language Evaluation Forum (CLEF) Working Notes 2020: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020. 2696, 162.


Open Access Peer Reviewed
more

Data to Value: An 'Evaluation-First' Methodology for Natural Language Projects

Leidner, Jochen L. (2022)

Proceedings of the 27th International Conference on Applications of Natural Language to Information Systems (NLDB 2022), Valencia, Spain, June 15-17, 2022, 517-523.


Peer Reviewed
 

While for data mining projects (for example in the context of e-commerce) some methodologies have already been developed (e.g. CRISP-DM, SEMMA, KDD), these do not account for (1) early evaluation in order to de-risk a project (2) dealing with text corpora (“unstructured” data) and associated natural language processing processes, and (3) non-technical considerations (e.g. legal, ethical, project management aspects). To address these three shortcomings, a new methodology, called “Data to Value”, is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of large-scale NLP project teams with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.

more

Detecting Environmental, Social and Governance (ESG) Topics Using Domain-Specific Language Models and Data Augmentation

Nugent, Tim; Stelea, Nicole; Leidner, Jochen L. (2021)

Proceedings of the 14th International Conference on Flexible Query Answering Systems (FQAS 2021), Bratislava, Slovakia, September 19–24, 2021, 157-169.
DOI: 10.1007/978-3-030-86967-0_12


Peer Reviewed
 

Despite recent advances in deep learning-based language modelling, many natural language processing (NLP) tasks in the financial domain remain challenging due to the paucity of appropriately labelled data. Other issues that can limit task performance are differences in word distribution between the general corpora – typically used to pre-train language models – and financial corpora, which often exhibit specialized language and symbology. Here, we investigate two approaches that can help to mitigate these issues. Firstly, we experiment with further language model pre-training using large amounts of in-domain data from business and financial news. We then apply augmentation approaches to increase the size of our data-set for model fine-tuning. We report our findings on an Environmental, Social and Governance (ESG) controversies data-set and demonstrate that both approaches are beneficial to accuracy in classification tasks.

more

A Survey of Textual Data & Geospatial Technology

Leidner, Jochen L. (2021)

Handbook of Big Geospatial Data, 429–457.
DOI: 10.1007/978-3-030-55462-0_16


more

Text Meets Space: Geographic Content Extraction, Resolution and Information Retrieval

Leidner, Jochen L.; Martins, Bruno; McDonough, Katherine; Purves, Ross S. (2020)

Proceedings of the 42nd European Conference on Information Retrieval Research (ECIR 2020), Lisbon, Portugal, April 14–17, 2020 II, 669-673.
DOI: 10.1007/978-3-030-45442-5_89


Peer Reviewed
 

In this half-day tutorial, we will review the basic concepts of, methods for, and applications of geographic information retrieval, also showing some possible applications in fields such as the digital humanities. The tutorial is organized in four parts. First we introduce some basic ideas about geography, and demonstrate why text is a powerful way of exploring relevant questions. We then introduce a basic end-to-end pipeline discussing geographic information in documents, spatial and multi-dimensional indexing [19], and spatial retrieval and spatial filtering. After showing a range of possible applications, we conclude with suggestions for future work in the area.

more

Prof. Dr. Jochen L. Leidner


Hochschule Coburg

Fakultät Wirtschaftswissenschaften (FW)

T +49 9561 317 422
Jochen.Leidner[at]hs-coburg.de