Bartel, Holger; Kraft, Mirko; Leidner, Jochen L. (2023)
Zeitschrift für die gesamte Versicherungswissenschaft 112 (1), 1-30.
Mathur, Puneet; Goyal, Mihir; Sawhney, Ramit; Mathur, Ritik; Leidner, Jochen L.; Dernoncourt, Franck; Manocha, Dinesh (2022)
Mathur, Puneet; Goyal, Mihir; Sawhney, Ramit; Mathur, Ritik; Leidner, Jochen L....
Findings of the Association for Computational Linguistics: EMNLP 2022 (Empirical Methods in Natural Language Processing), December 2022, Abu Dhabi, United Arab Emirates, 1933-1940.
Financial prediction is complex due to the stochastic nature of the stock market. Semi-structured financial documents present comprehensive financial data in tabular formats, such as earnings, profit-loss statements, and balance sheets, and can often contain rich technical analysis along with a textual discussion of corporate history, and management analysis, compliance, and risks. Existing research focuses on the textual and audio modalities of financial disclosures from company conference calls to forecast stock volatility and price movement, but ignores the rich tabular data available in financial reports. Moreover, the economic realm is still plagued with a severe under-representation of various communities spanning diverse demographics, gender, and native speakers. In this work, we show that combining tabular data from financial semi-structured documents with text transcripts and audio recordings not only improves stock volatility and price movement prediction by 5-12% but also reduces gender bias caused due to audio-based neural networks by over 30%.
Leidner, Jochen L. (2022)
Invited Talk, Joint Webinar of the CFA organization (French Chapter) and the London Stock Exchange Group (LSEG), London/Paris/online, November 18, 2022.
McDonald, Thomas; Dong, Ziqing; Zhang, Yingji; Hampson, Rebekah; Young, James; Cao, Qianyu; Leidner, Jochen L.; Stevenson, Mark (2022)
McDonald, Thomas; Dong, Ziqing; Zhang, Yingji; Hampson, Rebekah; Young, James...
Cross Language Evaluation Forum (CLEF) Working Notes 2020: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, September 22-25, 2020. 2696, 162.
Leidner, Jochen L. (2022)
Proceedings of the 27th International Conference on Applications of Natural Language to Information Systems (NLDB 2022), Valencia, Spain, June 15-17, 2022, 517-523.
While for data mining projects (for example in the context of e-commerce) some methodologies have already been developed (e.g. CRISP-DM, SEMMA, KDD), these do not account for (1) early evaluation in order to de-risk a project (2) dealing with text corpora (“unstructured” data) and associated natural language processing processes, and (3) non-technical considerations (e.g. legal, ethical, project management aspects). To address these three shortcomings, a new methodology, called “Data to Value”, is introduced, which is guided by a detailed catalog of questions in order to avoid a disconnect of large-scale NLP project teams with the topic when facing rather abstract box-and-arrow diagrams commonly associated with methodologies.
Holtorf, Christian; Leidner, Jochen L. (2022)
Rahmen der Literaturtage „Coburg liest“ 2022. 2022.
Nugent, Tim; Stelea, Nicole; Leidner, Jochen L. (2021)
Proceedings of the 14th International Conference on Flexible Query Answering Systems (FQAS 2021), Bratislava, Slovakia, September 19–24, 2021, 157-169.
DOI: 10.1007/978-3-030-86967-0_12
Despite recent advances in deep learning-based language modelling, many natural language processing (NLP) tasks in the financial domain remain challenging due to the paucity of appropriately labelled data. Other issues that can limit task performance are differences in word distribution between the general corpora – typically used to pre-train language models – and financial corpora, which often exhibit specialized language and symbology. Here, we investigate two approaches that can help to mitigate these issues. Firstly, we experiment with further language model pre-training using large amounts of in-domain data from business and financial news. We then apply augmentation approaches to increase the size of our data-set for model fine-tuning. We report our findings on an Environmental, Social and Governance (ESG) controversies data-set and demonstrate that both approaches are beneficial to accuracy in classification tasks.
Leidner, Jochen L. (2021)
Handbook of Big Geospatial Data, 429–457.
DOI: 10.1007/978-3-030-55462-0_16
Leidner, Jochen L.; Martins, Bruno; McDonough, Katherine; Purves, Ross S. (2020)
Proceedings of the 42nd European Conference on Information Retrieval Research (ECIR 2020), Lisbon, Portugal, April 14–17, 2020 II, 669-673.
DOI: 10.1007/978-3-030-45442-5_89
In this half-day tutorial, we will review the basic concepts of, methods for, and applications of geographic information retrieval, also showing some possible applications in fields such as the digital humanities. The tutorial is organized in four parts. First we introduce some basic ideas about geography, and demonstrate why text is a powerful way of exploring relevant questions. We then introduce a basic end-to-end pipeline discussing geographic information in documents, spatial and multi-dimensional indexing [19], and spatial retrieval and spatial filtering. After showing a range of possible applications, we conclude with suggestions for future work in the area.
Fakultät Wirtschaftswissenschaften (FW)
T +49 9561 317 422 Jochen.Leidner[at]hs-coburg.de