Announcing the Odeuropa Network

We are delighted to share a new initiative with you today: the Odeuropa Network. From the moment our project was announced in November last year, we’ve been excited and honoured to receive a steady stream of emails (to date, over 150) from people who wanted to explore opportunities for collaboration, share their own work and research, or who simply wished to learn more about our project. Over time, these messages began to reveal fascinating new clusters of olfactory expertise to us – from archaeobotanists and architects, to linguists, perfumers, artists, chemists, and historians. Frustratingly, we simply don’t have the resources to pursue even a fraction of these opportunities in the context of the already carefully planned and budgeted Odeuropa research project. But happily, we are in a position where we can enable our contacts to learn of each other’s interests and expertise, connect with each other, and create their own olfactory partnerships and collaborations. Hence the Odeuropa Network, a searchable, public directory hosted on the Odeuropa website, with basic information on individuals and institutions interested in olfactory heritage and sensory data mining.

To become a member of the network, please fill-out this online form where you can list your basic information, interests, and expertise in the public members’ directory and also sign-up for our upcoming newsletter to keep up to date with Odeuropa’s events and activities. The online form will be kept open and the directory will be curated and updated on an ongoing basis by the Odeuropa project team. Our goal is to publish the first version of the directory in just a week or two. We hope you will take advantage of this opportunity!

Update (22/6/21): You can now view current members of the Odeuropa Network on an interactive map.

Understanding the Olfactory Lexicon

Linguists have observed in several studies that languages seem to have a smaller vocabulary to describe smells compared to other senses. Odours are often described borrowing terms from other senses, for example “sweet” or “fresh”, or relying on qualities of objects, like “musky” or “metallic”. On the other hand, other domains such as perfumery and oenology make use of extremely precise and structured repositories of terms and qualities used by professionals for describing perfumes and wines from an olfactory perspective. One of the goals of the text processing team of Odeuropa is to understand these phenomena and analyse whether there are differences across languages in the way in which odours are described. Is the smell-related dimension of the olfactory vocabulary something that is more evident in some languages? For example, does Slovenian, which is a Balto-Slavic language, have different characteristics in terms of olfactory vocabulary compared to Romance or Germanic languages like Italian and English? If ‘yes’, are there historical or cultural reasons for this?

We aim to address these questions using text mining techniques by processing large amounts of digitised texts covering four centuries and automatically extracting the terminology pertaining to smell. To this purpose, we are collecting freely available texts issued between 1650 and 1925 and covering different domains, in the seven project languages (English, German, French, Latin, Dutch, Slovene, and Italian). These texts range from travel literature to scientific texts and medical records. This process takes a long time because after preparing a detailed list of available sources, the data need to be downloaded, cleaned, standardised and accompanied with the correct metadata. While the Odeuropa multilingual corpus is being completed, we are testing different approaches to terminology extraction. Our testbed is the GoogleNgram repository, a large collection of n-grams (i.e. word sequences) extracted from Google Books divided by year of publication.

The n-grams cover the period of interest for Odeuropa, allowing us to perform preliminary analyses aimed at comparing terminology in multiple languages over time. In this analysis, we start from a small list of smell-related words provided by Odeuropa domain experts such as “odour”, “smelly”, “reek”. We then extract for different time periods the terms that have the highest association strength with the smell words, meaning that they tend to appear together more frequently than usual. Terms co-occurring with the smell words provide a concise overview of the semantic domains associated with odours over time, and make comparisons across languages possible. For example, we can analyse terms related to “odor” (English) , “odore” (Italian) and “reuk” (Dutch) for the n-grams between 1900 and 1925. These are displayed in the picture below, where the bubble dimension is proportional to the association strength. Some concepts mentioned in relation to smell seem to be present for the three languages, for example flowers, tobacco and sanctity. On the other hand, in English, medical-related terms are more present, while for Italian food and beverages are mentioned (see also “sapore” / “taste”) and for Dutch, fishing seems to play a role in the word association. For now, our results are too preliminary to draw conclusions on olfactory terminology, but we are really looking forward to understanding what texts from the past tell us about odours and their story.

Google ngram visualisation
Terms extracted from Google N-grams that are more frequently used associated with “odor”, “odore” and “reuk”.