Thoughts on the Semantic Web

Saturday 28 January 2012

Recently, as part of my work, I have been working on an installation of the VIVO Semantic Web application. As such, I have been investigating and trying to understand what exactly the Semantic Web is. And, for some reason, I find it very exciting.

The Semantic Web has been hailed as the successor to the current world wide web (“web”) as we know it. Currently, what exists is a “document web”, where information on pages is described through prose and images, and concepts are informally linked together through hyperlinks. This web has served us well for many years, and has enabled the sharing of information on an unprecedented scale. As it stands, however, the web is very hard for computers to understand (as computers can’t read very well). Thus, it is left up to humans to try to link all the disparate bits of knowledge available on the web together into whatever they are trying to find out. Wouldn’t it be great, though, if this could all be done in an automatic fashion?

Well, the Semantic Web is a system that aims to implement this idea. The way it basically works is that, possibly in addition to current human-readable formats, information on the web is represented in a machine-understandable format known as triples — basically just sentences in the form of “subject—predicate—object”, such as “John lives in Atlanta”. In the Semantic Web system, however, all the subjects, predicates, and objects are uniquely identified through the use of URIs (basically like URLs, but they don’t need to actually go anywhere — just be unique). In this fashion, therefore, triples sharing the same subjects or objects can be chained together, to form more complex statements, such as “John lives in Atlanta is a state of America is in the Northern hemisphere”, and thus we conclude that John lives in the Northern hemisphere. The point of the Semantic Web, however, is that this can be done as much as we want, across as many data sources as we need, as everything is uniquely identified. No longer are we restricted to gaining information from a single data source at the time — the entire web is essentially linked as one giant graph. Imagine the power! We could theoretically find out anything we wanted to, no matter how complex the reasoning was.

This is how the aforementioned VIVO application works. Its aim is to link up the researcher databases of many universities at once, using the Semantic Web system. In this manner, it is possible to search across all the databases at once, and for example, see that John from Cornell wrote Article A, but that Mark from the University of Florida also wrote Article A, and then conclude that Mark and John are collaborators — something that may not have been obvious from looking at a single database.

As a system of W3C standards, the Semantic Web system has actually been around for ages — in fact, the first standards were published in 2001. Outside of a few specific applications, however, the Semantic Web has not really seen the widespread implementation required to make the sort of universal reasoning it envisioned possible. Perhaps it is just too much effort for websites to do, especially those not driven by a single organised data source. The lack of widespread ontologies also makes it hard, which means that while one site may say “John like cake”, another one may say “cake is liked by John” — and without something to tell it, a computer would never know they mean the same thing. This is why projects like VIVO, which have developed a consistent and well-designed ontology for their specific area (i.e., research information), are very important.

Well, in the end I just wanted to say that hopefully more people end up implementing the Semantic Web system in the future. The we will have unlimited knowledge! Maybe. Do you think the Semantic Web is a good idea?