Information Retrieval

on the Internet


    -->Project Details

Project Summary

Research Areas

Publication & Reports





Contact Us

What is SPIRIT?

The main aim of the project is to create tools and techniques to help people to find information that relates to specified geographical locations. The intention is to meet the needs of two broad groups of people. The first group are those who wish to find information about a topic of interest or phenomenon that occurs, or is associated with, a particular place. They might be interested in services such as shops, garages, museums or sporting facilities. Alternatively they might, for example, be interested in documents or images that refer to some aspects of the history, culture or environment of a place of interest. 

The second group are those who are searching for some type of digital geo-information, such as maps or terrain models, relating to some aspects of the environment of a place or region. The latter group are likely to be users of geographical informations systems technology that can exploit geographical datasets, and as such they form a much smaller group than the former. 

In both cases it is imperative that the search tools that they use are flexible, with regard to textual and graphical mode of input, and intelligent in the sense of recognising place names and spatial terminology employed to describe location. Thus if the user specifies a place name, the system should be aware of multiple uses of the name, alternative informal names of the same place, and the administrative and topographic structure of the area to which it refers. It must also be possible for the system to interpret spatial qualifiers such as inside, near, north of and between.

Specifically, the project will develop a spatially-aware web search engine with a geographically-intelligent user interface. It will also develop techniques that automatically extract geographical context from web documents, and from digital map datasets, and annotate the resources with metadata that will assist web search engines in classifying the resources geographically. There are six major aspects to the work:

  1. Study user requirements to search for geographically-specific resources on the web and specify the search engine functionality to meet these requirements. 
  2. Implement an efficient architecture for a spatially-aware search engine, with regard to indexing and ranking methods. 
  3. Create geographical and application-specific ontologies to support query term expansion, information description and relevance ranking of retrieved information. 
  4. Design and implement a multi-modal user interface to the search engine that supports query specification and user feedback using textual geographical description, an interactive multi-scale map and sketching. 
  5. Develop procedures to extract the geographical semantics of web documents and geo-data sets, and annotate these resources with the resulting metadata. 
  6. Evaluate the techniques and tools developed.

User requirements. Although the project starts from the premise that people require information that is geographically-specific, there is a need to target and meet the needs of specific user groups. The project will start therefore by determining the different types of geographical information that people require and the ways in which they may describe their geographical location. The results will feed into a specification of system functionality.

A spatially-aware search engine. To meet the user requirements the search engine will support several modes of access that are sympathetic to different user's models of geographic space. In general, users will specify the nature of the things that they are interested in and a geographical location. The multi-modal user interface will provide facilities for text input, interactive maps and free hand sketching. The text input will include place names, non-spatial concepts and spatial relationships. The interactive map will enable the user to specify graphically a region on a map that represents the geographical extent of the search.

The search engine will "expand" specified query terms to equivalent and nearby locations, including those referred to by alternative names and in other languages from that of the query. Following execution of the query against a local cache of web resources, the relevant resources will be ranked with respect to geographical location and application-specific concepts. These ranked resources will be presented to the user in various formats including a sorted list and a geographical map that presents hyperlinks to resources as symbols located according to their geographical context. Cartographic symbology will be employed to indicate the relevance ranking of the retrieved resources.

Geographical and conceptual ontologies, term expansion and relevance ranking. In order to mediate and rank the results of geographical queries a geographical ontology of the region of interest will be designed and implemented for part of Europe. The geographical ontology will build upon concepts of gazetteers and geographical thesauri and will provide the knowledge structure for expanding a user's locational query terms to include alternative names and nearby places. The model will include both formal administrative names and informal or imprecise names that are commonly in use. The place names will be spatially related to encode hierarchical, adjacency and overlap relationships and they will be associated with a spatial footprint that links them to geographical coordinate systems. Techniques will be developed to construct footprints for imprecise regions for which no exact boundary exists.

In addition to providing intelligence with regard to geographical space, the system will use application-specific ontologies to interpret general (non-spatial) conceptual terms that the users may employ to specify their interests. The major research effort in the area of ontology construction however will focus upon the design and exploitation of the geographical ontology. Both geographical and application-specific ontologies will be exploited for the purpose of developing ranking procedures that provide measures of semantic distance between query and document terminology. The geographical ontology will provide hybrid measures that combine quantitative coordinates and qualitative spatial relations between places.

Metadata extraction and encoding. Successful retrieval of relevant web pages and digital map data will depend both upon the intelligence of the search engine and upon the quality of the metadata associated with the resources. Because of the critical importance of metadata, part of the project will be dedicated to the problem of automated generation of geographical metadata. This will result in tools for resource providers to ensure that their web pages and datasets are clearly "visible" to and exploitable by spatially-aware search engines.

Metadata will be extracted from text documents and geometric datasets and encoded in a markup language based on RDF/XML, following and building upon existing initiatives within the Open GIS Consortium and ISO/TC 211. The research emphasis here is upon metadata extraction methods. This will require development of techniques to identify geographical context in text documents and spatial analytical tools for the interpretation of spatial properties of digital map datasets.

Evaluation of the various aspects of the demonstrator. Evaluation will be conducted with regard to the major research issues addressed, including ease of use of the search engine interface, semantic appropriateness of retrieved results, quality of relevance ranking, quality of automatically generated metadata and speed of retrieval from the search engine.


Further information about innovative aspects of the SPIRIT project can be found in the article "Spatial Information Retrieval and Geographical Ontologies: An Overview of the SPIRIT project" (.pdf).




© All material is copyright