Digital ashes

A couple of years ago, the former CEO of Google, Eric Schmidt said in a conference in California that every two days we create as much data as we did from the dawn of civilization up until 2003, something like five exabytes of data [Techonomy]. For the record, one exabyte is ten to the eighteenth bytes, or a trillion bytes. Someone also said that all current human knowledge can be collected in about 12 exabytes of data.

However, that such values be correct, be still actual, be either under or overestimated, it does not matter. It is a fact that nowadays we produce every day a remarkable amount of data, most of which are generated by network users, but that is not all. Many, in fact, are also automatically produced against various activities or operations. Think for example to all logs that are generated inside all nodes of the network. Furthermore, not all data are indexed by search engines or publicly available to everyone. Just think of the millions of databases of companies and research institutes made available only to certain communities or not accessible without appropriate permissions.

In quantitative terms, then, humanity is producing an unprecedented volume of information, but is this really the only parameter that we have to consider, that is, the quantitative one? Indeed there are four factors to consider if you want to have a very complete picture of the situation.

The first is the redundancy, the second is relevance, the third is reliability and the fourth is volatility.


Digital contents are easy to copy. Just a simple “copy & paste” operation and the same text published by a website ends up on hundreds of other sites as it is or with a number of variations and modifications. In theory, the web had been designed to take people to use as much as possible links when they want to refer to texts of someone else, but a lot of people do not use them. There are various reasons for that.

The first and perhaps most important is that the users that are able to produce truly original content, especially for what concerns texts and articles, are a small minority of the total. Most people simply copy, hoping to get some visibility in the web. Ironically, the very people who create original material are often ignored or little known, while others, perhaps already known for other reasons, are recognized for the publication of material that was actually taken from other sources.

A second reason is that many administrators of blogs, forums and digital newspapers, do not allow to insert links into whatever someone may write in their sites. In some cases this is done to avoid links to inappropriate content on which the administrator of that site has obviously no control; often, however, it is only a matter of power: to allow people to add a link in a comment, for example, means to allow to redirect the visitors elsewhere, and this is something that some forum administrator do not really want.

In general, then, a lot of material in the web is actually produced by few and duplicated by many. Photos and messages that people generates, especially with mobile phones, through mechanisms such as Instagram, Foursquare, Pinterest, and Facebook are an exception. So in these cases we can truly say that we are dealing with original material but, and here we come to the second parameter, how much of this content is really relevant?


Of course, the concept of relevance is subjective: what is interesting for someone may not be of interest to others and vice versa. Moreover, the material published by mobile phones is often intended for few friends or family, so that it is normal that it is ignored by most. On the other hand a good article on a blog — for example, a valid review of a book or a movie — has its own intrinsic importance. We may say that, when data or information are relevant for a large number of people, we can state that they have some intrinsic value.

Indeed there are two parameters to assess the relevance of an article: the intrinsic value of the content and the quality of the text. The former is obviously more important that the latter, since even a poorly written article, reporting, however, information of a certain value, can be quite interesting. However, if the article is well written and has a good readability, the message it contains will spread more easily, that is, it will be more effective.


The fact that a subject be relevant or even well treated, however, does not mean it is reliable. Whoever uses disinformation, or rather, whoever uses very well false information, is also able to prepare the material for publication with certain criteria, so that it will have the widest possible dissemination. Photomontages, false statements, statistics and data whose source is unknown or not guaranteed by any authority we can trust, even few lines of simple messages that suggest that a certain event took place or that something has been said by someone, are extremely common in the web.

The substantial capability of amplification of the web eventually becomes instrumental in spreading false information. The mechanism is almost always the same: someone posts something plausible, almost always something capable of arousing interest or anger — negative feelings are the strongest lever exploited by the disinformers — and publish it in a site, a social network, or a page read by people who have largely low critical spirit in relation to that topic. How?

It is not so hard to do it. Each of us has in general a position against almost all topics. Often this position is influenced by our political or religious conviction, in any case by our own scale of values. If we read something that comforts us because we recognize it as a “truth” in which we already believe, we will be much less critical with respect the entire article if we were to read a text containing statements that differ or be contrary to our beliefs.

At this point the game is done: the article or the image in question will be quickly shared, duplicated, and spread across all the network, making it difficult to figure out where it originated. At a time when that information has reached a certain critical mass, more and more people, also not of the same political and religious faith, will begin to think that those statements might be true, just as so widespread. In practice, if many say something, then it should be true, shouldn’t it? That is not all: very often this information is associated with secret forces or hidden powers, plots, injustices, so anyone trying to prove that it is false would end up being branded as a conspirator, reactionary, dominated by this or that power. In practice he would achieve the opposite effect.


If we consider these first three parameters, namely redundancy, relevance, and reliability, then it is evident that the amount of original information that has a certain value and it is really true, is much inferior to those famous five exabytes of data Schmidt was talking about. There is, however, to consider a fourth parameter, often ignored by most: volatility.

It is said that once something is published to the web, there is no way to delete it, just because of the mechanisms of duplication and sharing, in many cases automated by more or less sophisticated applications. But is it true? How volatile is an information in the world wide web?

Consider this article. Today I posted it to my blog and in a short time it will be indexed by those many search engines which are constantly sending off their data-catcher spiders. Furthermore, probably someone will share it in some social network or maybe will duplicate it, in whole or in part, on another blog. But what will happen in one year, or in five, ten, fifty years?

Well, a year from this article it will be no longer on the front page on my blog or on any other site where it has been published. Furthermore all the shares of it in social networks will probably been archived, even if it will still be possibly accessible through a web search. In five or ten years, perhaps you can still find it, but is unlikely it will be in the top results of any search, unless it will use very specific criteria.

Then, in fifty years, I will be most likely dead or otherwise completely senile, although some friends think this is already true. What will happen to my blog to my death? Perhaps my daughter or some friends will maintain it for a while, or maybe not. It must be said that perhaps it will not even be necessary fifty years, since the web itself probably will be so different in twenty or thirty years, to make this blog obsolete long before.

What will happen then to these words when my domain is deleted, all the contents removed from the hosting storage? Some pieces will likely continue to remain here and there, but even those systems sooner or later will be changed, canceled, renewed. Not being famous, probably nothing of what I have written will end up on a website with a persistence greater than mine, such as Wikipedia, but how do you know that what is currently the most famous encyclopedia in the web does not disappear in a few decades? After all, famous sites that have made the history of the web have disappeared and they often are even unknown to new generations.

Therefore, not only the quantity of valuable material published on the web is less than you may think, but probably it has a much greater volatility than other forms of communications, which for centuries have been used, such as paper, for example. Today we are able to read scrolls and sheets that have a thousand years and more, but in a thousand years, how much of what is now the web will still be available? How many words that you are reading now will be just digital ashes?

Please use Facebook only for brief comments.
For longer comments you should use the text area at the bottom of the page.

Facebook Comments

Leave a Reply

In compliance with the appropriate provisions of the law I state that this site is no profit, has not a predefined recurrence and is not updated according to a deadline. It may therefore not be considered an editorial product under Italian law #62 of March 7th, 2001. In addition, this site makes use of the right of citation for academic and criticism provided in Article 10 of the Berne Convention on copyright.