On creating benchmark datasets

In the Odeuropa project, we are creating several benchmark datasets. These datasets are used to evaluate how well our computer vision or language technology tools perform on a certain task. Last week, Stefano Menini presented our text benchmark of 6 languages at the LChange workshop in Dublin, held in conjunction with the Association for Computational Linguistics’ Annual Meeting:

The dataset contains parts of texts in English, Italian, French, German, Dutch, and Slovene that have been marked by humans for references to smell. For each sentence, we don’t only mark the fact that there is a reference to a smell, but also what emotions are evoked by the smell, what location the smell is perceived at, any qualitative remarks (did the perceiver like the smell or not) etc. The annotation format was previously presented in the paper “FrameNet-like Annotation of Olfactory Information in Texts“.

The benchmark contains over 20,000 annotations and spans 4 centuries (1620 – 1920) across 10 different text genres. This allows us to investigate how smells are referenced in different settings over time. Historians worked with computational linguistics to provide a historical background to the linguistic aspects of smells that are investigated. We hope that linguists and historians alike will find it useful.

You can find the paper at: https://aclanthology.org/2022.lchange-1.1/

The dataset is openly available so other researchers can also evaluate their olfactory information extraction tools against it.