Saturday, November 12, 2005

The Database Community and the Semantic Web

The PODS/SIGMOD/VLDB database community is developing interest in the Semantic Web. Enrico Franconi will give an invited tutorial on the Semantic Web at the PODS 2006 conference. Also the article "From Databases to Dataspaces: A New Abstraction for Information Management" by Michael Franklin, Alon Halevy and David Maier (Sigmod Record, Dec. 2005) contains the following quote in section 5.1 (Relationships to Other Field): "Recent developments in the field of knowledge representation (and the Semantic Web) offer two main benefits as we try to make sense of heterogeneous collections of data in a dataspace: simple but useful formalisms for representing ontologies, and the concept of URI (uniform resource identifiers) as a mechanism for referring to global constants on which there exists some agreement among multiple data providers."

It is nice that these developments are acknowledged in the database community. However, the database community is very heavily invested in the XML stack, and this paper is no exception. So I am curious how the database community is planning to integrate ontologies and URIs into the XML stack and at the same time get global consensus on that integration - when there is already an alternative stack (based on RDF) building on URIs and ontologies. Interestingly enough, query processing and data management questions relating to the RDF stack are so far ignored by the database community (with a few notable exceptions). This leads to the fact that data management solutions including query languages for RDF are mostly developed inside the Semantic Web community without much involvement from database people. But maybe this will change now.

In that respect it is also insightful to read the transcripts of the The Lowell Database Research Self-Assessment Meeting, May 2003, of which conclusions have been recently published in the Communications of the ACM, since it shows the understanding that senior database researchers have of their field in relation to the Semantic Web. Here are some extracts:

  • Bruce Croft - IR & structured data.
    semantic web - "if you made the web a database" - this is make the web into a knowledge base and that won't happen - we've had a debate for decades about manual vs. automatic representations of what documents "mean" and both work better than either one but creating the manual versions is very hard. That's the lesson from the IR work
    go for knowledge or statistics?
  • Ullman - Re semantic web - you talk about semantics but when you have to do something you do syntax. If you take the temperature in Lowell thing you ought to be able just to say "temperature Lowell" - How much more is there to do? Crawlers are bad at this because it is timely. History in Lowell would work better on Google. I’m curious as to what you think is the advantage of focusing on deep understanding rather than giving people tools to use?
  • Widom - When did you add semantic web? I'm not responsible for that.
  • Abiteboul - All this is syntax. Makes Ulman happy; the most fundamental difference from relational DB to web is that you don't know the semantics.

These statments indicate to me that senior database researcher are mostly not interested in the Semantic Web. However, the final report then has the following paragraph (in section 3.11: New User Interfaces):

"Perhaps most interesting is the research opportunities suggested by the term “semantic Web.” While it may be unclear what the concept truly entails, much of the recent work has centered on “ontologies.” An ontology characterizes a field or domain of discourse by identifying concepts and relationships between them, usually in a formal language. We mentioned in Section 2.2 how this work may support information integration, since a fundamental problem in that area is the inability to combine databases that at a deep level are talking about the same thing, but do so in different terminology. Work on ontologies may likewise enable users of databases and other resources to use speech or natural language to query in their own terminology. The database community should be looking for opportunities to exploit these developments in future database management systems."

This paragraph indicates some interest - although this section does not acknowledge that the Semantic Web is build on the RDF stack of technologies and it rather sees the result of the Semantic Web as only relevant for user interfaces.