Understanding the Olfactory Lexicon

Linguists have observed in several studies that languages seem to have a smaller vocabulary to describe smells compared to other senses. Odours are often described borrowing terms from other senses, for example “sweet” or “fresh”, or relying on qualities of objects, like “musky” or “metallic”. On the other hand, other domains such as perfumery and oenology make use of extremely precise and structured repositories of terms and qualities used by professionals for describing perfumes and wines from an olfactory perspective. One of the goals of the text processing team of Odeuropa is to understand these phenomena and analyse whether there are differences across languages in the way in which odours are described. Is the smell-related dimension of the olfactory vocabulary something that is more evident in some languages? For example, does Slovenian, which is a Balto-Slavic language, have different characteristics in terms of olfactory vocabulary compared to Romance or Germanic languages like Italian and English? If ‘yes’, are there historical or cultural reasons for this?

We aim to address these questions using text mining techniques by processing large amounts of digitised texts covering four centuries and automatically extracting the terminology pertaining to smell. To this purpose, we are collecting freely available texts issued between 1650 and 1925 and covering different domains, in the seven project languages (English, German, French, Latin, Dutch, Slovene, and Italian). These texts range from travel literature to scientific texts and medical records. This process takes a long time because after preparing a detailed list of available sources, the data need to be downloaded, cleaned, standardised and accompanied with the correct metadata. While the Odeuropa multilingual corpus is being completed, we are testing different approaches to terminology extraction. Our testbed is the GoogleNgram repository, a large collection of n-grams (i.e. word sequences) extracted from Google Books divided by year of publication.

The n-grams cover the period of interest for Odeuropa, allowing us to perform preliminary analyses aimed at comparing terminology in multiple languages over time. In this analysis, we start from a small list of smell-related words provided by Odeuropa domain experts such as “odour”, “smelly”, “reek”. We then extract for different time periods the terms that have the highest association strength with the smell words, meaning that they tend to appear together more frequently than usual. Terms co-occurring with the smell words provide a concise overview of the semantic domains associated with odours over time, and make comparisons across languages possible. For example, we can analyse terms related to “odor” (English) , “odore” (Italian) and “reuk” (Dutch) for the n-grams between 1900 and 1925. These are displayed in the picture below, where the bubble dimension is proportional to the association strength. Some concepts mentioned in relation to smell seem to be present for the three languages, for example flowers, tobacco and sanctity. On the other hand, in English, medical-related terms are more present, while for Italian food and beverages are mentioned (see also “sapore” / “taste”) and for Dutch, fishing seems to play a role in the word association. For now, our results are too preliminary to draw conclusions on olfactory terminology, but we are really looking forward to understanding what texts from the past tell us about odours and their story.

Google ngram visualisation
Terms extracted from Google N-grams that are more frequently used associated with “odor”, “odore” and “reuk”.

Finding references to smell in artworks

Identifying visual references of olfactory phenomena in artworks is an important way to uncover how Europe may have smelled in the past and how smell was represented. The computer-vision team of the Odeuropa project is currently working on methods which would automatically extract these references from various large collections of European artworks by applying, modifying, and extending state-of-the-art object detection methods. In order to collect and extract these olfactory references using computer recognition, it is necessary to first identify how smell is visually represented or depicted in historical artworks.

To provide an example of how this works, we used the print Smell (1581-1656) by Nicolaes de Bruyn, which is currently housed at the Rijksmuseum in Amsterdam.

Smell (1581-1656) by Nicolaes de Bruyn

In the sixteenth century, the pairing of a woman with a dog was used as a visual depiction or personification of the sense of smell. Since the object detection method was able to identify the dog and the woman, this would seem like an effective system. However, there are certain challenges which come with this detection. Firstly, not all pairings of people or women with dogs are ‘olfactory’, for example in other centuries a dog on the lap or feet of a woman represents fidelity, as seen in Jan van Eyck’s Arnolfini Portrait (1434).

Jan van Eyck’s Arnolfini Portrait (1434)
This presents us with the challenge of distinguishing when a dog is or is not ‘olfactory’ in nature. A second challenge is that the olfactory gesture of the woman smelling the flowers was also not detected by computer recognition. This poses further limitations on detecting olfactory elements in paintings.

Many olfactory-related narratives can also be found in the Bible, the Sacrifice of Noah (Genesis 8:20) for example. The print, Sacrifice of Noah after the Flood by Casper Luyken, shows Noah creating a burnt offering of animals, combined with the usual “Covenant of the Rainbow” in the background.

Sacrifice of Noah after the Flood by Casper Luyken

These types of olfactory narratives reveal more limitations of existing object detectors, while the people and animals were easily detected but the rainbow and cloud of smoke were not, hence overlooking the olfactory element of the artwork. This could be because these object detection systems are limited to the data with which they have been trained, leading to two problems. Firstly, since the detectors are trained with photographic data, their effectiveness decreases when applied to images with an artistic style such as historical paintings and prints. Secondly, it could be that certain objects (like smoke and rainbows) were either underrepresented or not at all part of the detector’s training data.

In order to tackle these issues of computer recognition, we will apply and modify domain adaptation techniques in order to improve the detection abilities on artistic image domains. After implementing a working object detection system, we plan to incorporate art historical knowledge which would also enable our system to recognize complex and context-specific olfactory references.

Submit your work: First International Workshop on Multisensory Data & Knowledge

Together with the Polifonia team, we’re organising a workshop on Multisensory Data & Knowledge to take place in conjunction with the Language Data and Knowledge conference in September. The goal of this workshop is to advance our understanding of how smells and music are represented in texts and structured data. The topics we want to address revolve around extracting references to smells, music, context, and visual information from text as well as relevant data describing their cultural, historical and political context, and model them in the form of interlinked knowledge graphs. This research has a strong interdisciplinary character, hence the workshop has the potential to attract researchers from diverse disciplines from both social sciences and humanities and computer science. Its potential impact is significant to many application areas including: preservation and valorisation of cultural heritage, data-driven policy making in cultural heritage, urban planning, artistic performances, applications for scholars in musicology and history, applications for museums, innovation in teaching, maintenance and exploitation of large catalogues, archives and libraries.

We invite long papers between 10 and 15 pages and short papers between 6 to 8 pages. Note that this workshop is organised following the computer science conference/publication culture, so initial submissions are expected to be in a near publishable state and will be reviewed by three reviewers. Accepted papers will be published through ceur-ws.org.

Submission deadline: 23 April 2021, the workshop will take place on 1 September.

More information on the workshop: https://odeuropa.github.io/mdk21/

Opening the fragrant conversation

When we announced Odeuropa last November, we couldn’t have dreamed that it would be received with so much enthusiasm. As the research teams kick off the programme, we’d like to share an overview of the international response to Odeuropa’s launch from the media, research community and general public, which is already developing into a wider conversation into the relevance of smell in our lives and how the idea of olfactory heritage resonates with a large variety of audiences.

Not only in Europe, but around the globe, the project was widely covered by feature articles and in-depth interviews, for instance by the Spanish scientific news agency,  Le Monde, La Stampa, La RepubblicaDeutschland Radio, NPO Radio 1, Delo, El PaísCNET, and CNN. The Guardian, The New York Times and The Sunday Times reported on our research aims and innovative approach, along with the BBC.

From a video by Indonesian YouTube channel The Shiny Peanut (+12M subscribers) which animated members of our research team to a thoughtful reflection by the Times of Israel of the power of scent to overcome ignorance and communicate on a subject as complex as the Holocaust, we were delighted to see that so many people across the world share our curiosity to explore the potential of olfactory heritage.

Shiny Peanut
The Shiny Peanut, YouTube.

People shared the scents they considered meaningful in comments about nostalgic memories or fragrant experiences, and even reflections about covid-related loss of smell.  The New York Times, inspired by Odeuropa, invited their readers to imagine a museum of smells and worked with artist Janie Korn to produce a series of one-off candles inspired by the project

We are honoured and intensely happy to engage with these and the many other responses we received. In the next weeks and months, we will start to explore how we can accommodate new collaborations and keep expanding our network. Please stay tuned for information here on our website and via Twitter @odeuropa

Paper: Towards Olfactory Information Extraction from Text – A Case Study on Detecting Smell Experiences in Novels

This weekend, Marieke van Erp presented a paper on extracting olfactory information from English text at the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, organised in conjunction with COLING 2020. The paper was presented in a poster presentation, sadly not in Barcelona, but in a gather.town session.

For this paper, we did a first set of experiments into how we can best recognise references to smell in texts, which is an important task in Odeuropa’s Work Package 3.  For this paper, we first created an annotated dataset, i.e. a set of texts in which humans (= Odeuropa team members) marked whether the text described a reference to a smell. We then created patterns based on a set of smell related words from the Cambridge dictionary of English to such as ‘smells like X’ and ‘a Y fragrance’ where X and Y can stand for nouns and adjectives. We ran the patterns over a large set of texts to see if we could find more expressions referring to smells in text as compared to only using the dictionary smell keywords, and our experiments showed that patterns indeed worked better than keywords. In Odeuropa, we will further build on this, as well as try out other methods (such as machine learning) to recognise references to smells in Latin, English, Italian, German, French, Dutch, and Slovene texts from 1600 – 1920 across different genres.

This research paper was based on the Ryan Brate’s MSc thesis work which he did for the University of Amsterdam’s Data Science degree programme under the supervision of prof. dr. Paul Groth and dr. Marieke van Erp. Full citation:

Brate, Ryan, Paul Groth, and Marieke van Erp. “Towards Olfactory Information Extraction from Text: A Case Study on Detecting Smell Experiences in Novels.” In Proceedings of the The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 147-155. 2020.