The multiplication of social websites and semantic web technologies

Andrew recently invited me to Dopplr, where I can see his travels, and of course where I can add my own planned trips. Now why would I want to add yet another web 2.0 site to the LinkedIn, Flickr, Dopplr, Frappr, Twitter, and all others (including this blog) where I can tell things about myself? Also, shortly after I accepted Andrew's invitation, Ugo added me to his Dopplr contacts. Now why would I want to one more time build a network on that site with the same people I'm already linked to in other sites?

The Web 2.0 allows allows people to publish things about themselves and build their network. But the incredible explosion of the number and the variety of social networking websites makes it a real pain for connected people to maintain their personal information everywhere.

So, rather than users entering and maintaining their data on every site they want to appear in, the contrary should happen. I should describe myself and my network only once, and those websites I find useful and I want to register to should pull my information, or at least those parts of my information I want to show them.

Now how can we do this? What common format should I use so that all these websites are able to understand my information? How can I describe my connections to other people? The answer is simple: semantic web. There are RDF vocabularies for most of the things you usually tell about yourself on social networking websites, such as FOAF, RDFCal or SiOC.

Now this radically changes the social web as we know it today. We first need semantic publishing tools (e.g. RDFa-enabled blog engines) so that creating and publishing our information is easy and natural. Registering to a social website will then simply mean giving the URL of our semantic information (or a subset thereof, which the semantic blog tool should help us to define), and of course our OpenID URL (or even not, since it can be in our metadata). We change from a web of websites hosting people descriptions to a web of people publising their descriptions.

But this also comes with some problems: if you don't own your domain name, the hosting provider will "own" your data, and moving to a different hosting will be a pain since your URL changes. Sure, we can use abstract URIs and a naming service to resolve it to the possibly changing URL, but this is introducing a level of complexity for average users, and who will own the naming service?

This will also be a great opportunity for marketers and spammers, since they can know a lot about you by harvesting your data and those of people in your network. Something the current balkanization of data on many websites makes difficult. But our semantic blog engine can check the OpenID of robots that want to harvest our data, with the dedicated HTTP authentication scheme.

So, the time is approaching in my opinion when the use of semantic technologies to represent our personal information only once will be worth it, because of the multiplication of social websites that actually bring some value but require to enter the same information again and again. But this is a chicken and egg problem: websites won't harvest our data if we don't publish it, and we won't publish it if no website harvests them. Piggy Bank or GRDDL allow to extract RDF from existing websites, we now need tools that do it the other way around, posting our RDF data to the social websites, until they harvest them.

And once people are used to expressing semantically rich information about themselves, they will hopefully understand the value of semantically rich documents.