Not so long ago, Tim Berners-Lee came up with the notion of
a World Wide Web revolutionizing the way people exchange information and
documents. We now live in an economy where data plays a central role in decision
making. As the implications of utilizing large data sets in research and
enterprises are being realized, there is also a certain degree of underlying frustration
when it comes to acquiring quality data sets. The Internet has proved to be a phenomenal source of information. However, most of it is unstructured and
scattered. Linked
Data provides a publishing paradigm in which not only documents, but also data,
can be a first class citizen of the Web, thereby enabling the extension of the
Web with a global data space based on open standards - the Web of Data.
Partial Snapshot of the Linked Data Graph |
The difference between the current web and a web of data is
best explained with an example. Nowadays a typical online store would consist
of pages describing different products. Such a page contains all the
information a human reader requires. But this information is represented in a
way that makes automatic processing of it hard. In a web of data, the online
store would actually publish data resources to the web, which represent the
products. These resources can be retrieved in different representations, such
that a browser would allow you to view a product page while a web crawler would
get a machine understandable representation of the product.
Now comes the part which makes it even more exciting for research.
Consider a case of a researcher developing a drug to treat Alzheimer’s. Suppose, she wants to find the proteins which
are involved in signal transduction AND are related to pyramidal neurons. (The
question probably makes sense to her).
Searching for the same on Google returns about 2,240,000 results
not one of which leads to an answer. Why? Because no one has ever had that idea
before! There exists no single webpage
on the web with the result. Querying the same on the Linked healthcare
database pulls in data from two distinct data sets and produces 32 hits, EACH of which is a protein which has those properties.
The vision of Linked Data is the liberation of raw knowledge
and making it (literally) accessible to
the world. Indeed, a Web of Ideas.
Let us now take a deeper look into the principles involved
and how the same can be applied in a Complex Network domain. Broadly, it entails
four basic principles:
- Use URIs as names for things. These may include tangible things such as people,
places and cars, or those that are more abstract, such as the relationship
type of knowing somebody, the set of
all green cars in the world, or the color green itself.
- Use HTTP URIs, so that people can look up those names.
- When someone looks up a URI, provide useful information, using the
standards (RDF,
SPARQL).
- Include links to other URIs, so that they can discover more things.
For
example, a hyperlink of the type friend of may be set between two people,
or a hyperlink of the type based near may be set between a person and
a place.
The power here lies in the links. There are three important types of RDF links:
- Relationship Links : point at related things in other data sources
- Identity Links point at URI aliases used by other data sources to identify the same real-world object or abstract concept
- Vocabulary Links point from data to the definitions of the vocabulary terms that are used to represent the data
Having
a good idea of the basic concepts, we are now ready to see how we can harness
Linked Data for research involving Complex Networks.
As I see it, there are two interesting areas
here:
As described in this old article,
Linked Data is growing at an exponential rate since its modest beginnings back
in 2007. Over the years, different data sets have started linking to the global
database and a clear emergence of genres can be seen. The Linked
Data graph may offer a good opportunity to study the temporal behavior of
the largest structured knowledge database ever witnessed by the world.
Then again, as mentioned before, the data offered itself has
huge potential in terms of network research. A good question therefore is how
to get our hands on this data.
One approach is to use existing systems to get data. Linked
data is accessible through browsers like the Disco Hyperdata
browser, the Tabulator browser
or the more recent LinkSailor. We can also
make use of search engines such as Sig.ma, Falcons and SWSE.
For example, research on Citation Analysis may be
supplemented with data available on authors (example) or subject areas (example). Linguistic research
stands to gain from the relationships relating words and objects with actions, events,
facts, etc. (example,
example).
As with all generic applications however, the problem is that
deployed applications offer little control over how the data is accessed. For specific research, therefore, it is
best to develop applications and crawlers from scratch. This website would be a good
place to start. It explains in detail about RDF and architectures involving linked
data applications.
To conclude, Linked Data provides a more generic, more
flexible publishing paradigm which makes it easier for data consumers to
discover and integrate data from a large numbers of data sources.
Though, still in its infancy period, it has come a long way since its
inception. Coupled with the onset of an Internet of Things, Linked Datasets will
encompass the physical world, identifying social relationships, behavioral
patterns and events. What we do with this information is entirely up to us.
USEFUL LINKS
Regarding indexing of text and metadata
information, the Solr system:
Regarding navigation and search interfaces:
http://searchuserinterfaces.com/book/sui_ch8_navigation_and_search.html
http://searchuserinterfaces.com/book/sui_ch8_navigation_and_search.html
REFERENCES:
- Linked data on the web (LDOW2008), Christian Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee
- http://www.w3.org/standards/semanticweb/data
- http://linkeddata.org/
- The Next Web, a TED talk by Tim Berners Lee (http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html)
- Linked Data, Structured Data on the Web (D. Wood, M. Zaidman, L. Ruth)
- http://www.web2society.com/webtrends/already-linked/
- http://en.wikipedia.org/wiki/Linked_data
- Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
- http://linkeddatabook.com/editions/1.0/
No comments:
Post a Comment