Beyond tagging

According to Wikipedia, a tag is "a relevant keyword or term associated with or assigned to a piece of information". Really tags are very useful to associate concepts to digital objects like images, audio and video files, widgets and applets.

There are two advantages using tags. First, the possibility to categorize digital content that cannot be easily indexed by search engines, since there is not a simple way to extract concepts from the file itself. Let us take a movie, for example, a documentary on animals. You may have the dialogue, the soundtrack, and various clips in sequence, of course. By using some sophisticated software you may generate transcripts from the dialogue, identify the title and the author of the soundtrack, and even recognize the subject of some clips, but it would be very expensive and time consuming, and you could easily miss some important aspect of the movie that is evident to human beings but absolutely hidden to machines. So, tagging both the whole movie and the single clips would make easier to find what you are looking for by using a search engine.

The second advantage deals with the possibility to associate to digital content other pieces of information that are logically related to it but that cannot be deduced from it. For example, just consider the following passage from the "Julius Caesar" by William Shakespeare (Act 3 Scene 2):

Friends, Romans, countrymen, lend me your ears; I come to bury Caesar, not to praise him. The evil that men do lives after them; The good is oft interred with their bones: So let it be with Caesar.

It is a piece of the famous Antony’s funeral oration. Of course, there is nothing inside the text that can lead you to Shakespeare or to Brutus. So you might want to associate also William Shakespeare and Brutus to this text, as well as funeral oration or tragedy. By adding tags to this passage we improve the possibility to find it if we are looking for a Shakespeare’s tragedy, for example.

However tagging is not only useful to content’s authors to add metadata to their work, but also to visitors to share opinions, rate the quality of content, provide any kind of feedback. Folksonomy classification over the Internet, social bookmarking, and other forms of social software are all based on tags. Tags can be classified, connected each other, measured in terms of usage and frequency as well as popularity.

Many people associate tagging with the idea of the Semantic Web, but is it true? Is really the tagging mechanism a semantic layer for digital content?

Let us consider the following image, and let us associate to it the following tags: baby, teddy bear, cute, joy.

Associated tags: baby, teddy bear, cute, joy.

The first two terms deal with the content of the image, that is, the subject. Note that a computer cannot decide which is the main subject of the image, since both the baby and the teddy bear are on the foreground, but if you ask a human being, most will tell you that the subject is the baby, not the toy. Therefore, even when both tags refer to the same aspect, they could have different relevances.

So, what’s about the third one, cute? Isn’t it that baby cute? I think he is. Does not matter if you think differently, because if you allow everybody to tag your content, it is up to them to decide how to tag it. This implies that you may have two contrasting tags associated to the same content. In such a case, you have to develop some mechanism to decide which tag to take and which to leave, or how to manage inconsistencies. For example, you may want to implement a rating mechanism (one to five stars, or thumb up and down) so that people use it rather than tags to rate your content. You can also give visibility to a tag only if confirmed by a minimum amount of people, hiding tags that are specified by only few individuals, but this democratic mechanism may remove useful tags associated by experts, for example. And what about joy. Isn’t that baby happy? Maybe this image evokes in you a feeling of happiness. So, is that tag related to the image, or rather to you?

So a tag can complete or extend any content, associate attributes or logically related information to a digital object, even be absolutely independent from the content itself but related to subjective and contingent aspects, as feelings and associations of ideas. Thence, stating that we are adding semantics to digital objects by tagging content is wrong.

You can speak of semantics when you contextualize tags inside an ontology. What does that tag mean? Is it related to that content or to other objects which are related to that content? Does it deal with the author’s or visitors’ point of view? How should I interpret it? Is it relevant? Ok, but for whom?

Let us take the tag "blue". We could associate it to many different concepts: to a color, of course, but also to a musical genre, and in English also to a melancholic mood. So, we could tag by blue any of the following images:

Color, music, and mood… all blue!

Without an ontology we cannot speak of semantics. Therefore we can use tags to add semantic to a digital object only if we can relate those tags to one or more formal ontologies, described by using a language like OWL, for example. A great (and free) tool to visually create ontologies (and much more) is Protégé. Really it is a free, open-source platform that provides people with a suite of tools to build domain models and knowledge-based applications with ontologies.

A formal ontology is important, but not always sufficient, anyway. In fact, semantic deals with meaning, and any meaning has to be interpreted within a culture, and cultures express themselves by natural languages. There are concepts that have no correspondences in different cultures, others that have a different meaning or interpretation, others that use different terms to be expressed. Black is the color for mourning in many Western countries, whereas white is the color for weddings, but in many Middle East countries it is white to be used for mourning. Analogously, blue color is related to no mood in Italian culture. So, just translating the tag is not enough, as well as it is not enough to specify the ontology. The cultural context is the third element of the game. Many researchers are inclined to underestimate this aspect because a relevant part of web speaks English, especially the scientific one, but even English is a language used by many different cultures in many different ways.

So, if we want to add semantics to the tagging mechanism, in addition to a dictionary of terms, we need an ontology and a cultural context, which could be or not associated to a specific natural language. These ones are the three axes of a 3D semantic space. Please, note that the focus is on culture, not just language. It is not simply a matter of translation. We can probably use existing formal languages like OWL and RDF to provide these axes, but without them, any tagging does not provide to our web more meaning to digital object that what can be indexed by analysing the object itself.

Please use Facebook only for brief comments.
For longer comments you should use the text area at the bottom of the page.

Facebook Comments

Leave a Reply

In compliance with the appropriate provisions of the law I state that this site is no profit, has not a predefined recurrence and is not updated according to a deadline. It may therefore not be considered an editorial product under Italian law #62 of March 7th, 2001. In addition, this site makes use of the right of citation for academic and criticism provided in Article 10 of the Berne Convention on copyright.