Linked Data

Linked data refers to structure data on the Web with standard technology (RDF) so as to enable different data sources be connected. It has been considered the basis to get the Semantic Web working as it is defined.

Linked data is a set of best practices for publishing and connecting structured data on the Web. Key technologies that support Linked Data are URIs (a generic means to identify entities or concepts in the world), HTTP (a simple yet universal mechanism for retrieving resources, or descriptions of resources), and RDF (a generic graph-based data model with which to structure and link data that describes things in the world). LinkedData.org

Tim Berners-Lee in a speech at the TED 2009 Conference emphasized the importance of having structured data publicly available in an open format. He wrote a piece of text entitled Linked Data – Design Issues, at w3.org website, to describe the four rules that should be followed to publish data on the Web:

  1. Use URIs as names for things;
  2. Use HTTP URIs so that people can look up those names;
  3. Use URIs to expose the context of your data (i.e., expose information);
  4. Include links to other URIs so that they can discover more things.

Following these rules will help data be more easily connected and integrated to other data sources. There are many documents out there suggesting how to publish linked data. The document How to Publish Linked Data on the Web describes some practices recipes for publishing data on the Web. This document gives an overview about publishing linked data, discussing, for example, questions like "what should be returned as RDF description for a URI," "how to set RDF links to other data sources," and "how to serve information as linked data."

Reading/watching the following documents one will figure out the importance of and how to publish linked data.

Linked Data – Design Issues
http://www.w3.org/DesignIssues/LinkedData.html

How to Publish Linked Data on the Web
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

TED 2009 Conference – Tim Berners-Lee
http://www.ted.com/index.php/talks/lang/eng/tim_berners_lee_on_the_next_web.html

Linked Data – The Story So Far (2009)
Christian Bizer, Tom Heath, and Tim Berners-Lee
http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf

Hermeneus: A Framework for Information Seeking and Retrieval

I have just published a paper about my doctorate research coauthored by my advisors – Frederico Fonseca and Roberto Pacheco. The paper presents the framework Hermeneus that was built to provide an interactive environment where users can develop their ideas while browsing the information and the concepts that represent the information. Hermeneus was based on concepts discussed in the areas of information retrieval, information seeking, ontology, and hermeneutic circle. Hermeneus: A framework for Information Seeking and Retrieval was published at DataGramaZero journal and below you can read the paper’s abstract – there is also a prototype with more information.

Even with the advancement of information retrieval research, extracting valuable information from information retrieval systems is still very time consuming. Common problems affecting users are uncertainty, deficient query definition, and poor systems interactivity. Based on such questions, we propose a framework to information seeking and retrieval called Hermeneus. We resorted to the hermeneutic circle to provide the principles of such a framework. In our implementation of the hermeneutic circle in an information retrieval system, users develop their ideas while browsing the information and the concepts that represent the information. We chose ontologies to implement the hermeneutic circle. Hermeneus works as an intermediary that facilitates the user to move from the initial state of information need to the goal state of resolution. Our framework intends to be the bridge between the user’s question and the answer to be found while she/he navigates in the ontology concepts and the instances of these concepts in a back and forth way.

Information retrieval, ontologies, and Web 3.0

The term “Web 2.0” was just coined and we are already creating another one, called “Web 3.0.” Basically, Web 3.0 means a new layer of semantic content (Semantic Web) that uses Web 2.0 technologies. Ora Lassila and James Hendler states in the paper “Embracing Web 3.0” that “semantic web is the symbiosis of web technologies and knowledge representation.” Web 3.0 means that semantic web methods, methodologies, and technologies can take advantage of Web 2.0 technologies and therefore provide more interactive and autonomous tools – success depends on the knowledge representation (ontologies) and powerful tools to manipulate such knowledge (e.g., information retrieval systems that use ontology to understand the information meaning and thus provide better answers).

David Teten in a presentation at “Web 2.0 NYC Conference 2007” suggests some possible ideas to define Web 3.0: “The Web as a database; The Web + Artificial Intelligence; The Web as a space; and Space as a Web.” There can be different definitions for the same term, but it is interesting to pay attention to the examples used to described the importance of tools or services available in this new Web. Teten, for instance, describes “a hotel application that understands concepts, such as room, temperature, bed comfort, and hotel price and can distinguish between concepts, such as great, almost great, and mostly OK, to provide useful direct answers.” Clearly, the concepts described by Teten should be defined through ontologies and an intelligent (ontology-based) information retrieval system can provide the required answers.

How important can information retrieval research be in this new “semantic era?” John Markoff gives another example in his article entitled “Entrepreneurs See a Web Guided by Common Sense” for The New York Times very objective that shows the importance of information retrieval technology. He says “the Holy Grail for developers of the semantic Web is to build a system that can give a reasonable and complete response to a simple question like: ‘I am looking for a warm place to vacation and I have a budget of $3,000. Oh, and I have an 11-year-old child.’” Here, one more time, we can see some concepts – warm place, vacation, budget, child – that could be defined by an ontology in order to enable information retrieval systems provide more useful/intelligent answers. Do I still need to emphasize the importance of new information retrieval models that take advantage of ontologies in Web 3.0 era? I don’t think so!

Social Information Retrieval

Two weeks ago I received an email describing an upcoming workshop about Social Information Retrieval (SRI) – Workshop on Social Information Retrieval in Technology-Enhanced Learning (SIRTEL’07). Accessing the workshop’s web site, I found some topics of interest that are direct related to the comments I wrote in my previous post. According to SIRTEL’s web site:

"Social information retrieval (SIR) refers to a family of techniques that assist users in obtaining information to meet their information needs by harnessing the knowledge or experience of other users. Examples of SIR techniques include sharing of queries, collaborative filtering, social network analysis, social navigation, social bookmarking and the use of subjective relevance judgements such as tags, annotations, ratings and evaluations."

If we consider that users can assist others in a social environment, maybe SIR can help find a way to create information retrieval (IR) systems where users can have a more active role – as producers of information and even knowledge instead of just consumers. Looking for more information about SRI, I found a thesis about this subject. In his work, Sebastian Marius Kirsch describes two techniques for IR in social environments that improve search effectiveness through the incorporation of information about social networks and relationships into the information retrieval process." SIR addresses a new dimension in IR systems – people. According to the author, a SIR system must be characterized by the presence of three entities: documents, queries, and individuals. Therefore, inserting a new dimension affects directly the way we design IR systems, mainly when this dimension (i.e., people) do not have a predictable behavior.

When applied specifically to the Internet, SIR has a different denomination, called Social Search Engine (SSE). Broadly speaking, SSE uses interactions or contributions of people to determine the relevance of documents on the Internet. Eurekster, for instance, implements some concepts defined by SSE "that harness the knowledge and behavior of online communities to increase search relevance and value for site visitors, site publishers, and advertisers." According to ZDNet, "Eurekster takes the concept of social networking a step further. Instead of simply making connections between individuals, it helps people locate information that their friends and colleagues already find interesting. It also takes search engine technology to a new level by personalizing results." Collarity is another example of SSE that combines search terms, URLs, selection of users and visitors to rank search results based on user’s interests. In the Collarity’s web site we can read: "Collarity is a search companion that understands your interest areas, preferences and search experiences. This information is distilled from the complex interaction of keywords and URLs. Collarity then connects you with the successful results of ‘like minded’ communities of searchers and provides contextually relevant results. At the same time, your experience is used to implicitly guide and help others."

Some research areas (e.g., SRI, SSE, information seeking, Web 2.0) show a preoccupation with people in the process of searching for information. Traditional IR models worry about documents, queries, recall, and precision; however, they are not concerned about the human behavior and how interactive tools can help users find the desired information more quickly. It is my opinion that IR systems, besides retrieving documents related to a query, should also provide tools to assist users move from the state of information need to the goal state of resolution. In the post "Why research about search?" I wrote some comments about problems we still face when searching for information. Perhaps, the perfect search can not be possible; nevertheless, information retrieval systems should be developed to take advantage of social environments and interactive tools and thus stimulate people to engage in an active information-seeking behavior, and not only supply their information needs but also extend their state of knowledge.

Information retrieval and Web 2.0

I have been asking myself about how information retrieval (IR) models/applications can take advantage of Web 2.0. One of the principles of Web 2.0, according to Tim O’Reilly, describes Web as a platform – users can use more interactive applications entirely through a browser. This changed completely the role of users from passive and consumers to active and producers. These new kinds of applications designed for Web 2.0 encourage users to participate, for instance, giving their comments, expressing their opinions, interacting through social networks, forming virtual communities, sharing their documents, pictures, movies, etc. But how IR applications can take advantage of users and this collaborative process in order to enhance retrieval mechanisms and provide more accurate results?

One of the techniques that has been used in some applications so as to categorize content collaboratively has been defined as folksonomy. Basically, users employ open-ended labels called tags to add explicit meaning to information and objects. This process of tagging happens in a social environment usually shared and opened. Taxonomies are the structures that describe such categories. It is my opinion that using taxonomies can impoverish this process, because most cases knowledge (i.e., the semantic content – tags) cannot be modeled hierarchically. However, this can be the beginning of a process where information can have more well-defined meaning and IR systems can provide more interactive tools and precise answers.

Because users create and publish new contents using tools developed for Web 2.0, folksonomy can work. Nevertheless, we still have a problem with the amount of public information available not categorized. The big challenge reflects in how to develop an application that can help users find their information need and provide mechanisms that incentive users, for instance, to categorize such information. For example, I find a paper about IR and ontology and after reading it I describe this paper in a specific structure that facilitates IR systems to provide better answers for next queries – new users. In my opinion there can be a way to create an interactive IR system that uses ontologies instead of taxonomies and helps users tag content without requiring hard-working. I still do not have developed this idea – not yet…

Ontology and Information Retrieval

Information retrieval (IR) field still requires research in order to create models capable of providing answers more precisely. Trying to enhance IR systems, several researchers have been using ontologies due to its capacity to be “understood” by computer systems (definition about ontology can be read in my previous post). There are some systems available but we still do not have the killer application.

Using an ontology means to narrow the scope of research to a specific domain, i.e., an ontology represents the knowledge about a domain. It corresponds to the vision of computer science community. However, philosophers believe that there is only one ontology that should describe the only one possible reality. This philosophical view, in my opinion, seems to be impractical, at least in my area.

How to define and represent the only possible reality? If we think about a simple domain, clearly we can see that (domain) specialists may have different opinions about concepts and how to transcribe them into an ontology – imagine to create an ontology to represent all domains. But if we do not develop practical applications and build our ontologies, we will discuss it forever and will not be able to verify whether ontologies can be useful or not – for practical purpose.

Again, we still do not have the killer application in IR using ontologies, but at least there are many researchers working on it – developing practical IR systems. Many papers describe the (great) importance of inserting ontology technology to enhance IR systems. All works I have read report advantages in using ontologies over the classical approaches – including my current work (to be available soon). Imagine how many people could take advantage of that if we have this kind of application available for any domain!? Let’s keep working…

Why research about search?

There is a simple answer for this question: we still do not have the perfect search! There are some researchers, though, who suggest that such a thing is impossible because there will always be the user – specify information need is not an easy task. It means that we can develop a system that provides answers precisely but if the user does not know how to describe their needs we do not have the “perfect search.” For example, someone is trying to find a document that s/he knew it exists in his/her computer but s/he is not able to describe the document, its content, or its location. Therefore, uncertainty or incompleteness is a variable that should be considered in information retrieval (IR) research.

Trying to help users understand their information needs is a goal that IR systems must provide. Researchers are combining techniques from different areas so as to create IR systems more interactive. If we think that one of the main problems in retrieving the desired information is related to the user’s state of incompleteness, we first need to develop IR systems that help users understand their needs. Information seeking, for instance, is one of these areas. It focuses on the user behavior and how s/he searches for information (such information can be related to human and technological contexts). Basically, the main difference between information seeking and IR is that the former is more human-oriented whilst the latter is technology oriented. There are many works in information seeking that can help IR research so as to develop tools/mechanisms that can help users during the whole search process.

Another research area that has been used in order to enhance IR systems is ontologies – mainly after the advent of Semantic Web. There are many definitions of ontology in computer science – artificial intelligence (ontology is also a term used in philosophy – but this is not the focus here). We can think of ontology as a structure described in a formal language that allows computer systems understand its (semantic) content. This simple definition suggests that IR research can take advantage of this and implement more intelligent systems. The premise here is: If a system is able to understand document’s content it will be able to provide answers more precisely.

All of the above subjects are part of my current research and I hope in the near future present here how these techniques can be used together. Search research still requires a lot of work mainly because the amount of information is increasing exponentially (and I am not talking about just the Internet – even in our personal computers it is a reality). In my next posts I will try to show how new scientific technologies are being developed in order to help users understand their information needs and then find the desired information more quickly.

Welcome to my blog!

Welcome to my blog. I intend to use this space to write about general stuffs related to my research subjects. I am currently interested in information retrieval, information seeking, ontologies, semantic web, and web 2.0 and 3.0. My current work is related to information retrieval systems that can take advantage of ontologies and information seeking research so as to help users during the search process. If I had to choose one word to represent my work, I would say “Search”. All the subjects I intend to write about will have a narrow relationship with the search, semantics, and web subjects. I hope you can enjoy my personal opinions.