Information Retrieval Agent

 

This agent uses text information retrieval techniques to find the most similar documents to a query, or then to find similarities between documents. A document can be an URL or a local file where the agent is running.

For this agent an information retrieval ontology was defined. It consists of terms relevant to information retrieval in particular but some of them might be of interest to other agents. Some further discussion will be need to settle on a common ontology.
For this agent a "documents" is an Identifier (interface) in the scope of Dublin Core, and in MAS they can be an URLIdentifier (term) or an ISBNIdentifier (Term), being only the first one implemented.

Before using the agent to retrieve information, the author/user must "tell" the agent which is the context of the document. Similarity and relevance of documents in IR is mostly determined against other documents. So a Collection (term) of documents, sorry, Identifiers must be defined with InCollection (predicate).
For similarity matching this agent pre-process the information, since it would be to expensive to calculate similarities/relevance on real time. For this, the IR Agent build indexes that can use later on. So the author/user, after selecting all the Identifiers, and build a Collection, must create an AtomicIndex (term) using IsCollectionIndexed (predicate). If the user/author just want to add or remove Identifiers then he/she can use IsIdentifierIndexed (predicate). Both this predicates are implementation of IsIndexed (interface).

After this we can then use the IR Agent to retrieve information.
The predicate that will be used more often most probably is MostSimilar (predicate). On this predicate we can find similarity/relevance by specifying the Query (interface), that can be a String - StringQuery (term), or an Identifier -IdentifierQuery (term), and an Index (Interface), that can be just one index - AtomicIndex (predicate) or a collection of indexes - CompoundIndex (predicate). To build the CompoundIndex one must use ComposedOf (predicate). The results will be retrieved by analysing Relevant (term), on which we have associated with the Identifier a value of relevance/similarity and a ranking.

Terms:

public class Collection extends Term implements Identifier (StringLiteral)

public interface Index extends Identifier
public class AtomicIndex extends Term implements Index (StringLiteral)
public class CompoundIndex extends Term implements Index (StringLiteral)

public class Query extends Term implements Identifier
public class StringQuery extends Term implements Query (StringLiteral)
public class IdentifierQuery extends Term implements Query (Identifier)

public class Relevant extends Term (Identifier, FloatLiteral, IntigerLiteral)

 

Predicates:

public class MostSimilar extends Term implements Predicate (Index, Query, Relevant)

public interface IsIndexed extends Predicate
public class IsIdentifierIndexed extends Term implements IsIndexed (Identifier, Index)
public class IsCollectionIndexed extends Term implements IsIndexed (Collection, Index)

public class ComposedOf extends Term implements Predicate (CompoundIndex, Index)

public class InCollection extends Term implements Predicate (Collection, Indentifier)

 

The only operation that the agent does by know is the pre-processing of the information, basically the IsIndexed predicate.
The MostSimilar, maybe the most useful predicate of this agent, is not yet functional but support will come soon. Please, read to this web page later on for further news.

 

mas.users.rm95r.ontology.*;

mas.users.rm95r.agents.ir.IRAgent;
mas.users.rm95r.agents.ir.IRClient;

 

 


Rui Miguel Neto Marinheiro

Multimedia Research Group
Department of Electronics & Computer Science,
The University of Southampton,
Highfield,
Southampton,
SO17 1BJ
United Kingdom



Telephone: +44 1703 595415
Fax: +44 1703 592865
Email: rm95r@ecs.soton.ac.uk
Url: http://www.ecs.soton.ac.uk/~rm95r/

 

" Was it worth while?
It is worth while, all,
if the soul is not small.
" - Fernado Pessoa