September 2008
M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Upon whom can we rely?

Posted by Dario de JudicibusPosted on Tuesday, 16 September 2008 at 23:31

Add Comments

Abstract

Nowadays, reliability of contents is extremely important in the web. We can mostly find any kind of information in Internet, but how much reliable is a source? How much accurate are the pieces of information provided? Several feed aggregators and site directories have defined various criteria to rate sites and blogs, but most of them pretend to automate the process, so that the resulting ranking is questionable. At present, the best approach is still based on human reviews, but in such a way we are just shifting the problem: how much reliable is a reviewer? How do I measure his/her trustworthiness? The purpose of this article is to propose an alternative solution to define the reliability of contents published to the web and to measure the reputation of reviewers.

A matter of trusting

There is not an irrefutable evidence of the size of the web. Some sources report 60 billion indexed pages, others up to a trillion. However, whereas on the one hand a significant percentage of web is not indexed at all, on the other hand we have to take in account a lot of duplicates and proliferation of variants of the same piece of information. So it is difficult to provide a reliable picture. Anyway, there is a sure thing: web is big.

This is about the size, but what about the content? How reliable is a single piece of information? How accurate is it? Generally speaking there is no way to know. It is a matter of trusting. Assuming you know by sure who uploaded that piece of information, your rating depends on how much you trust him/her. Of course, if you directly know the author, it is not a problem, but in most cases you have no way to know the individual who published a content to the web. In some case, you do not even know his/her real name. So what? Well, in the real world each of us found trusting on personal experience and elsewhere opinions, that is, on a reputation network. If A trusts B and B trusts C, then A might trust C. It is not an unfailing approach, but it usually works.

Source reliability

If we focus on traditional information sources, as newspapers, news channels, enterprises, and governments, the scenario may vary country by country. In certain countries some media can be considered mostly trustable, but this is not necessarily true in any land and for any media. Also in the democratic and liberal countries there is always the risk that some information is manipulated because of cultural, religious, or political pressures.

In the virtual world some site got a good reputation. For example Google, Amazon, or Wikipedia. However, even in such cases we have to be really careful. Let us consider Wikipedia, for example. The fact that each article is potentially the result of the contribute of many authors might be considered a guarantee, but since each Wikipedia site is somehow related to a specific culture and sometimes to a specific country, it might represent just a point of view. To avoid that, for example, in the English Wikipedia there is a WikiProject to countering systemic bias, but it is still a proposal. Similarly, most Wikipedia articles are requested to provide references to support any questionable statement included in the text, but how can we rely on a reference? For example, what if the reference deals with a publication in another language because there is none in the same language of the article? Some Wikipedian has a very severe approach with respect references in foreign languages, but in some case such a reasonable approach might discriminate articles related to individuals, events, and concepts belonging to other countries. It is not easy to define a safe rule about.

Blogsphere

One of the most popular area of the web, where rating and ranking methods apply to, is the blogsphere. At the beginning, rating mechanisms were based on the number of times a certain blog was linked by other blogs. However blogrolls and similar gadgets have completely nullified such approach. Therefore, some site began to rate blogs according to the number of links and trackbacks only if enclosed within the articles of other blogs. However, this approach says nothing about how good is the referenced post. Ab absurd, if an article is "bad and unreliable" and thousand of blogs indicates it, in fact, as bad and unreliable, it will get a great rating.

A third approach is based on social bookmarking and direct votes. For example, this mechanism is used by Digg. This is not bad, since it is founded on human reviews rather than algorithms to rate a content. However, even this approach is questionable, since it is not really measuring the content reliability but the blog popularity. It’s different. If an article is related to some appealing gossip or deals with a popular personality, product, or event, it will probably get a high ranking even if its content is trivial. Furthermore, that approach discriminate among blogs in different languages, since ranking is proportional to the potential audience, that is, the most spoken languages are favored with respect the minor ones and the dialects. Placing in the same list blogs written in different languages make no sense, in fact.

Last but not least, the fact that you can read a specific piece of information does not imply that you may have enough skill to understand it. This is particularly true for scientific and technical articles. One of the defects of voting is that all votes count the same. However, not all voters have the same competency. Prejudices, ideologies, believes, and other non-objectives human factors might bias the final result. Furthermore, a group of bloggers might create real digital lobbies to sustain a specific point of view or simply to support each other.

Conclusions

To summarize, most rating mechanisms do not measure reliability, but popularity, and popularity is related to how much famous is the subject of the article, how many people are able to read that article because of the language, how many readers are able to understand that article because of skills, and how much honest and trustable is the reputation network referring to that specific site or author. So both automated mechanisms as well as social based approaches are questionable.

The problem of trusting is not typical of web. It always existed and it has already been partially solved in specific sectors of human society. A valuable approach used in enterprises is based on certification authorities. This method usually fails only when big economical interests are at stake, otherwise is enough reliable. A certification authority is usually specialized on specific contents, processes, and competencies. It has also to be conformant to regulations often established by laws. So, for example, if a company needs to demonstrate that it complies to ISO 9000, it has to be certified by an independent auditor. The same approach might be used in the web. Of course, to be certified as a "reliable source of information" cannot be a must. All the mechanisms I previously described in this article may apply and, in any case, "not to be certified" does not mean "not to be reliable". However independent certification mechanisms are already used in the web. An example is the ICRA labeling system, by FOSI. ICRA is not a rating mechanism, but a labeling one; anyway the principle is similar.

Just a proposal

What we have to do is to define a rating system to measure only content reliability. Rating values should be readable by both humans and machines through a semantic language, for example, RDF. More than one rate might also be applied to the same site in case it deals with several subjects, while a single rating value should be used for a specific article or piece of information. Site certification should not be forever, but should periodically expire, so that it would be necessary to renew it. On the other hand, article certificates can be considered always valid unless the content is changed.

A snippet related to a sample blog might look like:

<rdf:RDF> 
	<rdf:Rating about="https://www.myblog.biz" certifiedBy="https://www.certauth.biz" expireOn="20081023">
		<rcl:rate for="astronomy">AAA+</rcl:rate>
		<rcl:rate for="physics">AA</rcl:rate>
		<rcl:rate for="cousine">C</rcl:rate>   
		<rcl:rate for="gardening">F</rcl:rate>        
	</rdf:Rating>
	<!-- some signature to validate the snippet -->
</rdf:RDF>

Of course, this is just an example. A real implementation will probably require a deeper analysis of the problem and some security mechanism to prevent frauds. In any case the idea of using independent auditors, whose reliability will be based on conformance to well-defined regulations, might work. Of course, most bloggers will not care about being rated by a certification authority, but if they wish to be considered reliable by other reliable sources, this is probably the most effective approach. In my humble opinion, at least.

Tags: analysis, blog, eng, proposals, trusting, rating, web

Please use Facebook only for brief comments.
For longer comments you should use the text area at the bottom of the page.

Facebook Comments

Also in same categories

Search

Subscribe to News

Select language

Recent Comments

Tag Cloud

My Social Profiles

Calendar

Other Sites

Badges

Related News

My Books