[ Content | View menu ]

Information retrieval, ontologies, and Web 3.0

August 4, 2007

The term “Web 2.0” was just coined and we are already creating another one, called “Web 3.0.” Basically, Web 3.0 means a new layer of semantic content (Semantic Web) that uses Web 2.0 technologies. Ora Lassila and James Hendler states in the paper “Embracing Web 3.0” that “semantic web is the symbiosis of web technologies and knowledge representation.” Web 3.0 means that semantic web methods, methodologies, and technologies can take advantage of Web 2.0 technologies and therefore provide more interactive and autonomous tools - success depends on the knowledge representation (ontologies) and powerful tools to manipulate such knowledge (e.g., information retrieval systems that use ontology to understand the information meaning and thus provide better answers).

David Teten in a presentation at “Web 2.0 NYC Conference 2007” suggests some possible ideas to define Web 3.0: “The Web as a database; The Web + Artificial Intelligence; The Web as a space; and Space as a Web.” There can be different definitions for the same term, but it is interesting to pay attention to the examples used to described the importance of tools or services available in this new Web. Teten, for instance, describes “a hotel application that understands concepts, such as room, temperature, bed comfort, and hotel price and can distinguish between concepts, such as great, almost great, and mostly OK, to provide useful direct answers.” Clearly, the concepts described by Teten should be defined through ontologies and an intelligent (ontology-based) information retrieval system can provide the required answers.

How important can information retrieval research be in this new “semantic era?” John Markoff gives another example in his article entitled “Entrepreneurs See a Web Guided by Common Sense” for The New York Times very objective that shows the importance of information retrieval technology. He says “the Holy Grail for developers of the semantic Web is to build a system that can give a reasonable and complete response to a simple question like: ‘I am looking for a warm place to vacation and I have a budget of $3,000. Oh, and I have an 11-year-old child.’” Here, one more time, we can see some concepts - warm place, vacation, budget, child - that could be defined by an ontology in order to enable information retrieval systems provide more useful/intelligent answers. Do I still need to emphasize the importance of new information retrieval models that take advantage of ontologies in Web 3.0 era? I don’t think so!

Research, Information Retrieval, Ontology - 0 Comments

Social Information Retrieval

July 29, 2007

Two weeks ago I received an email describing an upcoming workshop about Social Information Retrieval (SRI) - Workshop on Social Information Retrieval in Technology-Enhanced Learning (SIRTEL’07). Accessing the workshop’s web site, I found some topics of interest that are direct related to the comments I wrote in my previous post. According to SIRTEL’s web site:

"Social information retrieval (SIR) refers to a family of techniques that assist users in obtaining information to meet their information needs by harnessing the knowledge or experience of other users. Examples of SIR techniques include sharing of queries, collaborative filtering, social network analysis, social navigation, social bookmarking and the use of subjective relevance judgements such as tags, annotations, ratings and evaluations."

If we consider that users can assist others in a social environment, maybe SIR can help find a way to create information retrieval (IR) systems where users can have a more active role - as producers of information and even knowledge instead of just consumers. Looking for more information about SRI, I found a thesis about this subject. In his work, Sebastian Marius Kirsch describes two techniques for IR in social environments that improve search effectiveness through the incorporation of information about social networks and relationships into the information retrieval process." SIR addresses a new dimension in IR systems - people. According to the author, a SIR system must be characterized by the presence of three entities: documents, queries, and individuals. Therefore, inserting a new dimension affects directly the way we design IR systems, mainly when this dimension (i.e., people) do not have a predictable behavior.

When applied specifically to the Internet, SIR has a different denomination, called Social Search Engine (SSE). Broadly speaking, SSE uses interactions or contributions of people to determine the relevance of documents on the Internet. Eurekster, for instance, implements some concepts defined by SSE "that harness the knowledge and behavior of online communities to increase search relevance and value for site visitors, site publishers, and advertisers." According to ZDNet, "Eurekster takes the concept of social networking a step further. Instead of simply making connections between individuals, it helps people locate information that their friends and colleagues already find interesting. It also takes search engine technology to a new level by personalizing results." Collarity is another example of SSE that combines search terms, URLs, selection of users and visitors to rank search results based on user’s interests. In the Collarity’s web site we can read: "Collarity is a search companion that understands your interest areas, preferences and search experiences. This information is distilled from the complex interaction of keywords and URLs. Collarity then connects you with the successful results of ‘like minded’ communities of searchers and provides contextually relevant results. At the same time, your experience is used to implicitly guide and help others."

Some research areas (e.g., SRI, SSE, information seeking, Web 2.0) show a preoccupation with people in the process of searching for information. Traditional IR models worry about documents, queries, recall, and precision; however, they are not concerned about the human behavior and how interactive tools can help users find the desired information more quickly. It is my opinion that IR systems, besides retrieving documents related to a query, should also provide tools to assist users move from the state of information need to the goal state of resolution. In the post "Why research about search?" I wrote some comments about problems we still face when searching for information. Perhaps, the perfect search can not be possible; nevertheless, information retrieval systems should be developed to take advantage of social environments and interactive tools and thus stimulate people to engage in an active information-seeking behavior, and not only supply their information needs but also extend their state of knowledge.

Research, Information Retrieval - 0 Comments

Information retrieval and Web 2.0

July 10, 2007

I have been asking myself about how information retrieval (IR) models/applications can take advantage of Web 2.0. One of the principles of Web 2.0, according to Tim O’Reilly, describes Web as a platform - users can use more interactive applications entirely through a browser. This changed completely the role of users from passive and consumers to active and producers. These new kinds of applications designed for Web 2.0 encourage users to participate, for instance, giving their comments, expressing their opinions, interacting through social networks, forming virtual communities, sharing their documents, pictures, movies, etc. But how IR applications can take advantage of users and this collaborative process in order to enhance retrieval mechanisms and provide more accurate results?

One of the techniques that has been used in some applications so as to categorize content collaboratively has been defined as folksonomy. Basically, users employ open-ended labels called tags to add explicit meaning to information and objects. This process of tagging happens in a social environment usually shared and opened. Taxonomies are the structures that describe such categories. It is my opinion that using taxonomies can impoverish this process, because most cases knowledge (i.e., the semantic content - tags) cannot be modeled hierarchically. However, this can be the beginning of a process where information can have more well-defined meaning and IR systems can provide more interactive tools and precise answers.

Because users create and publish new contents using tools developed for Web 2.0, folksonomy can work. Nevertheless, we still have a problem with the amount of public information available not categorized. The big challenge reflects in how to develop an application that can help users find their information need and provide mechanisms that incentive users, for instance, to categorize such information. For example, I find a paper about IR and ontology and after reading it I describe this paper in a specific structure that facilitates IR systems to provide better answers for next queries - new users. In my opinion there can be a way to create an interactive IR system that uses ontologies instead of taxonomies and helps users tag content without requiring hard-working. I still do not have developed this idea - not yet…

Research, Information Retrieval, Ontology - 4 Comments

Ontology and Information Retrieval

June 30, 2007

Information retrieval (IR) field still requires research in order to create models capable of providing answers more precisely. Trying to enhance IR systems, several researchers have been using ontologies due to its capacity to be “understood” by computer systems (definition about ontology can be read in my previous post). There are some systems available but we still do not have the killer application.

Using an ontology means to narrow the scope of research to a specific domain, i.e., an ontology represents the knowledge about a domain. It corresponds to the vision of computer science community. However, philosophers believe that there is only one ontology that should describe the only one possible reality. This philosophical view, in my opinion, seems to be impractical, at least in my area.

How to define and represent the only possible reality? If we think about a simple domain, clearly we can see that (domain) specialists may have different opinions about concepts and how to transcribe them into an ontology - imagine to create an ontology to represent all domains. But if we do not develop practical applications and build our ontologies, we will discuss it forever and will not be able to verify whether ontologies can be useful or not - for practical purpose.

Again, we still do not have the killer application in IR using ontologies, but at least there are many researchers working on it - developing practical IR systems. Many papers describe the (great) importance of inserting ontology technology to enhance IR systems. All works I have read report advantages in using ontologies over the classical approaches - including my current work (to be available soon). Imagine how many people could take advantage of that if we have this kind of application available for any domain!? Let’s keep working…

Research, Information Retrieval, Ontology - 0 Comments

Why research about search?

June 25, 2007

There is a simple answer for this question: we still do not have the perfect search! There are some researchers, though, who suggest that such a thing is impossible because there will always be the user - specify information need is not an easy task. It means that we can develop a system that provides answers precisely but if the user does not know how to describe their needs we do not have the “perfect search.” For example, someone is trying to find a document that s/he knew it exists in his/her computer but s/he is not able to describe the document, its content, or its location. Therefore, uncertainty or incompleteness is a variable that should be considered in information retrieval (IR) research.

Trying to help users understand their information needs is a goal that IR systems must provide. Researchers are combining techniques from different areas so as to create IR systems more interactive. If we think that one of the main problems in retrieving the desired information is related to the user’s state of incompleteness, we first need to develop IR systems that help users understand their needs. Information seeking, for instance, is one of these areas. It focuses on the user behavior and how s/he searches for information (such information can be related to human and technological contexts). Basically, the main difference between information seeking and IR is that the former is more human-oriented whilst the latter is technology oriented. There are many works in information seeking that can help IR research so as to develop tools/mechanisms that can help users during the whole search process.

Another research area that has been used in order to enhance IR systems is ontologies - mainly after the advent of Semantic Web. There are many definitions of ontology in computer science - artificial intelligence (ontology is also a term used in philosophy - but this is not the focus here). We can think of ontology as a structure described in a formal language that allows computer systems understand its (semantic) content. This simple definition suggests that IR research can take advantage of this and implement more intelligent systems. The premise here is: If a system is able to understand document’s content it will be able to provide answers more precisely.

All of the above subjects are part of my current research and I hope in the near future present here how these techniques can be used together. Search research still requires a lot of work mainly because the amount of information is increasing exponentially (and I am not talking about just the Internet - even in our personal computers it is a reality). In my next posts I will try to show how new scientific technologies are being developed in order to help users understand their information needs and then find the desired information more quickly.

Research - 0 Comments

Welcome to my blog!

Welcome to my blog. I intend to use this space to write about general stuffs related to my research subjects. I am currently interested in information retrieval, information seeking, ontologies, and semantic web. My current work is related to information retrieval systems that can take advantage of ontologies and information seeking research so as to help users during the search process. If I had to choose one word to represent my work, I would say “Search”. All the subjects I intend to write about will have a narrow relationship with the search subject. I hope you can enjoy my personal opinions.

Diverse - 0 Comments