Research Topics

Article Index

Active: 2003-2008

DOSE (Distributed Open Semantic Elaboration platform) is an open-source platform for managing semantic indexing and retrieval of web resources according to predefined ontologies, with support for multilinguality, autonomic management, ease of integration within existing web portals, databases, learning management systems, etc.


Architecture

The H-DOSE platform

H-DOSE stands for Holistic Distributed Open Semantic Elaboration; it is an enhanced semantic platform in which the joint adoption of Web Services and Multi Agent Systems enables the provision of easy to access, autonomic and semantic indexing, searching and deep-searching functionalities.

More specifically, Web Services allow easy integration of provided functionalities into the current web development workflow, by offering semantic functionalities through a well defined interface, with a well defined interaction paradigm and protocol such as SOAP. Agents, instead, allow code-to-information mobility, natural distribution and work balancing and contribute to the platform maintenance process by implementing principles of autonomic computing. Moreover the agent deployment of semantic classification services allows performing deep searching by directly interfacing web applications and their knowledge bases in order to extract and index relevant information.

Design Principles

The H-DOSE platform aims at providing semantic functionalities for web applications through an easy to access interface, allowing rapid inclusion of services into the existing development workflow and trying to maximize the benefit/cost ratio for semantics inclusion in web applications. In particular, H-DOSE is focused on semantic search and indexing services, providing means for classifying a web resource with respect to a conceptual model given by an ontology, for storing conceptual descriptors of indexed resources (annotations) and for retrieving such resources in response to user queries, according to the semantic similarity between queries and resource descriptors.

H-DOSE is an evolution of a previous version of the same platform, called DOSE. The basic principles in DOSE (modularity, external annotations, synsets, and semantic mapper) have been preserved, but the external interfaces and the deployment architecture were redesigned in order to meet the fast integration goals.The platform conceptual organization is shown in Figure 1.

 

dose logical architecture
Figure 1. The DOSE/H-DOSE conceptual organization

 

The H-DOSE platform is logically organized as a layered architecture (Figure 1) in which each service is located at a specific level depending on the task it accomplishes in the platform specific processes (indexing and search).

 

dose conceptual organization
Figure 2. The DOSE/H-DOSE conceptual organization.

 

Services are deployed on three different layers: the Service layer, the Kernel layer and the Wrapper layer.

Each service could be implemented either as a web service or as an agent system exposing web services, depending on the task to be performed.

The Service layer

The Service Layer groups all services that require interaction with the external applications, i.e., the services that are accessible by web application developers wishing to include semantics into their works. Such services are the Search service, the Indexing service and the Gateway service. The Search service allows external applications to perform semantic searches on the platform knowledge base, i.e., searches for semantic matching between queries expressed, either as string of terms or as weighted set of concepts, and conceptual descriptors of web resources indexed by the platform. Provided results are composed by the set of URIs of the web resources relevant with respect to the query, ordered by the estimated relevance factor: most relevant resources will be in the first positions of the result set.

The indexing service offers a functionality complementary of the search service allowing external applications to specify URIs or resources to be semantically classified. In opposition to the search service, the indexing service is an asynchronous service since it does not require backward communication to the calling application rather than a request acknowledge.

Eventually, the gateway services are run-time created services which are used by the platform for dynamic publication of agent-based web services, the platform management as an example. Such services are dynamically built in response to proper actions taken by agents in the platform.

The Kernel layer

The Kernel Layer groups the services that perform the greatest amount of semantic elaboration provided by H-DOSE; they do not offer access to external applications but they are targeted to the Service Layer entities performing high complexity tasks such as semantic classification of web resources.

The kernel layer uses agent-based services in a more massive manner than the Service Layer: agents, in fact, become useful when complex information-dependent tasks are required (semantic indexing) or where their proactive behavior can produce significant advantages (autonomic maintenance). The kernel services of H-DOSE are the Annotation Storage service, the Expander service, the Indexing service and the Autonomic Maintenance service.

The Annotation Storage service gives persistence to semantic annotations extracted by the Indexing service, allowing searches on the knowledge base composed by the set of stored semantic descriptors and the platform ontology.

The Expander service allows the Search service to extract the semantic equivalent of user queries and to perform semantic comparisons between such queries and the annotations managed by the Annotation Storage service. Both the Expander and the Storage services are deployed as simple, static, web services.

Conversely, the Indexing service is deployed as a set of collaborative agents having different goals such as syntactic to semantic translation, semantic descriptor creation, etc.

The novelty aspect in the Indexing service is that rather than being a single service, it is an aggregate service provided by several collaborating agents and exposed as a single web service through a gateway framework. Such arrangement allows focused indexing, i.e., indexing of resource with respect to small subsets of the platform ontology, performed on the servers publishing the resources to be indexed. This allows exploiting both work balancing and deep search capabilities of agents.

Finally, the Maintenance service is a somewhat atypical service because rather than being passive, it is proactive and constantly monitors the state of the Annotation Storage and of the Search service to find out if there have been some "no data" responses and to recover the platform from this state. With more detail, the Maintenance service, from one side acts on the Storage service by inspecting stored data with respect to the platform ontology and by searching for those conceptual areas for which there are too few semantic descriptors (self-management), and, on the other side, checks the Search service state to reveal alerts about "no data" responses given to user requests (self-healing).

If an alert is found or if a low covered conceptual area is detected, the Maintenance service coordinates a proper set of agents to discover new knowledge on the web and triggers a focused indexing cycle augmenting the amount of resources classified by the platform and possibly recovering the platform from the failure state allowing to successfully address future requests similar to those that failed.

The Wrapper layer

The Wrapper layer includes all the utilities and wrappers needed by platform operations that do not require publication as Web Services but that are included in the platform services as libraries. At this level we find data-persistence wrappers allowing H-DOSE to be deployed on different database servers such as PostgreSQL, MySQL and Oracle, ontology wrappers for accessing the conceptual model that drives the platform operations according to different definition languages such as RDF/S, DAML, OWL, and data-manipulation libraries useful for performing syntax to semantics conversion (stemmer, language recognition, and so on), etc.

It is important to notice that, even if no novelty is introduced into such layer, it still assumes a critical role in the overall platform deployment allowing platform mobility on different hardware devices and enabling the platform to be targeted at different conceptual domains by simply changing the ontology used.


OBSOLETE

Installation HOW-TO

Disclaimer

The H-DOSE platform is autonomously developed by the e-Lite research group at the Politecnico di Torino. Since the main goals of the e-Lite group are related to research purposes, the platform is given as is, without any formal or official support service. A set of HOW-TOs and research papers are made available by the developing team, together with the complete Javadocs for the platform APIs. There also some examples included in the published documentation.

For any problems concerning the installation of the H-DOSE platform, any undetected bugs, etc. please post a message onto the H-DOSE users mailing list. For developers there is also another active mailing list which is more concerned with the development of platform components and modules.
Anyway, for urgent questions, please feel free to contact directly the authors at their e-mail addresses.

Hardware requirements

The H-DOSE platform is completely developed with Java technologies, therefore it can be run on whatever device running a Java Runtime Environment v1.4.2 or greater. However since some of the processes accomplished by the platform are computationally expensive, it is recommended to use systems having at least a computational power similar to a PentiumIII machine equipped with at least 256 MBytes of RAM. We have some evidence that the platform could be run on machines with lower computational power, however some processes, the indexing process in particular, become too slower in order to be useful.

Getting started

The H-DOSE platform is mainly composed of a jar file, a couple of web services deployed as *.jws files by means of the Apache Axis framework and a set of maintenance and management java server pages. In order to get started with the platform you need to:

  1. If not already installed, download a Java Development Kit (JDK 1.6 is recommended)
  2. Download the Tomcat Servlet container from the Apache Jakarta web site (any of the 5.0.x releases should let H-DOSE work, newer versions have not yet been thoroughly tested).
  3. Download the Apache Axis framework from the Apache Axis web site.
  4. Access to an account on a PostgreSQL database server (alternatively you may install PostgreSQL on your machine).
  5. Download the H-DOSE installer this web site. (download section)

Once you have downloaded all required resources, you can start the installation process.

Getting a Tomcat running on your machine.

You probably have downloaded the installation files for Tomcat from the Apache Jakarta web site;
the Tomcat servlet container requires the presence of a Java Runtime Environment on the machine where it will be running, therefore if you haven't already installed the JDK it is now time to do it.
Please follow the installer indications in order to get a fully functional Java environment on you machine.
Ok, now you have to proceed with the installation of Tomcat.

Win32

In the win32 version, Tomcat comes as a self-installer package that can be run by double-clicking the installer filename. All versions of Tomcat are suitable for deploying the H-DOSE platform, starting from the 4.0 version. However all our tests are deployed using a Tomcat 5.0 (the 5.5 is currently under testing, too - the newest jaxrpc and xerces libraries seem to cause some conflict).
The only two things that you have to pay attention to, during the installation process, are the location in which Tomcat will be installed and the kind of server startup is required: in order to allow H-DOSE to properly work on your new Tomcat server you need to specify a location name that does not contain spaces or special characters like -§,°,£,&,... This is mainly due to the agent-based part of the platform, which is deployed usign the Jade framework.

Secondly, depending on the use you may want to do of the platform, you should set the correct type of installation for Tomcat, which can be either as a service or as a standalone application. In general, if you are deploying a set of web applications and you also plan to use the H-DOSE platform as semantic middleware, it is recommended to install Tomcat as a service, instead, if your are developing applications and you need frequent restarts of the servlet container it should be useful to select a standalone installation.

Anyway you can find more information about the Java installation and the Tomcat installation at their respective sites:

  1. The JDK Installation HOW-TO
  2. The Tomcat Installation HOW-TO

Linux

At first you should check if the Tomcat package is available for your specific distribution, in that case you should check that the java virtual machine active on your system is the one from Sun. As an example, with Debian GNU/Linux (and derivatives, like Ubuntu) the Tomcat packages depends on a JVM and a java complier, but Sun's JVM is not present in Debian: without user intervention will be installed and used the kaffe JVM or GNU gcj/gij. We have not tested if H-HOSE will work on these configurations. On Debian you could either add a repository for a Sun JVM (see blackdown.org for instructions), create a Debian package, or install directly the Sun JVM. Then, using the update-alternatives mechanism, make the system use the Sun JVM.

In case there are no packages for your distribution, you can download the Tomact servlet container from the Jakarta Web site.

Install Apache Axis

For a complete guide on Apache Axis Installation, please check the official installation page at:

http://ws.apache.org/axis/java/install.html.

Once you have downloaded the Axis archive from this site, it should be sufficient to copy the axis directory from the downloaded archive into the Tomcat webapps directory.

You may verify that Axis is operative by running the Tomcat server and then exploring the http://localhost:8080/axis/happyaxis.jsp page. If any library is missing, we recommend to download and install it in into the shared directory of Tomcat

%CATALINA_HOME%/shared/lib/

As we will soon see, all the libraries required by H-DOSE will be installed (copied) into the same shared folder. Be warned that a few libraries may be overwritten. This should guarantee compatibility between H-DOSE and Axis/Tomcat.

Getting a PostgreSQL running on your machine

Win32

From few months ago, also the Windows users have the ability to natively run a PostgreSQL server, without the heavy burden of setting up PostgreSQL under Cygwin. There is, in fact, a version of the PostgreSQL v8.0 that comes with a fully functional, user friendly, Windows installer, therefore if you are a Windows geek, then you have the possibility to run PostgreSQL in the usual Windows style, possibly without crashes.

You only need to download the installer from the PostgreSQL site.

H-DOSE installation

The installation of the H-DOSE platform is organized as follows.

  1. Stop the execution of your tomcat server;
  2. Launch the H-DOSE installer:

    java -jar install.jar

  3. Follow the installer instructions;
  4. Once ended the installer execution, makes sure that tomcat has write access to the H-DOSE installation directory;
  5. Restart Tomcat;
  6. Point your browser at the H-DOSE location, something similar to:

    http://localhost:8080/axis/hdose

  7. Follow the indications reported in the visualized pages for finalizing the installation;
  8. For further information see the README file that comes with the installer file.

The DoseAgency module

To start-up the Dose Agency module open your preferred browser and navigate to the DoseAgency.jsp page.
This page initializes the Dose Agency requiring the specification of some functional parameters:

Central Agency Host
The server that will host the Dose Agency. The full name must be specified here, e.g.: myserver.mydomain.net instead of localhost. Under Windows you may wish to add a row in the file C:\WINDOWS\System32\Drivers\etc\hosts specifying the server IP and its (real or fictitious) full name.
AnnotationWS endpoint
The URL of the Annotation Repository Web Service http://hdosehost:8080/axis/AnnoRepositoryWS.jws.
ExpanderWS endpoint
The URL of the Expander Web Service "http://hdosehost:8080/axis/ExpanderWS.jws".
SearchWS endpoint
The URL of the Search Engine Web Service "http://hdosehost:8080/axis/SearchEngineWS.jws".
Autonomic Delay
The minimum interval of time (minutes) that can occur between two successive starts of the platform auto-check processes.
StopWords file URL
The location were the file of language dependent stopwords is accessible, usually "http://hdosehost:8080/axis/stopwordDictionary.xml".
Ontology file URL
The location were the H-DOSE ontology could be accessed, usually "http://hdosehost:8080/axis/ontology.owl"
DB uri
The URI of the database in which data should be stored, usually "jdbc:postgresql://dbservername/dbname/" (see also the notes above about the initialization of the PostgresSQL server).
DB user
The user you configured on PostgreSQL to own the H-DOSE database tables.
DB password
The password of the user that owns H-DOSE tables.
DB driver
The jdbc driver used to access PostgreSQL, if you are not a developer, leave it as "org.postgresql.Driver".
EndPoint
The servers running H-DOSE or providing containers for the deployment of platfrom agents, usually in the form "http://agencyhost:8080/axis/ContainerProviderWS.jws"

When all the parameters have been inserted, the Start Agency button launches the agent platform of H-DOSE, and, by doing so, the platform becomes ready to provide answers to automatic indexing requests...

We hope you will enjoy our effort and please, if you find any bugs or if you have comments or suggestion write to the HDOSE users mailing list.


Download

HDOSE, latest versionImage


Documentation

This document contains links to some useful documents about DOSE. Please see the contact links to send suggestions or comments about these documents.

Table of content


Javadoc

Here you can find the documentation about the H-DOSE Java source files. Please note that this documentation is still under refiniment.

Additional information about the H-DOSE architecture can be found in the documentation page.

Download the Javadoc

  • The documentation of the H-DOSE Java source files in a JAR archive.
  • The documentation of the H-DOSE Java source files in a ZIP archive.

Browse the Javadoc on-line

Open the H-DOSE Javadoc in another window.


FAQ

This document contains the most frequently asked question about DOSE. Please see the contact links to send suggestions or comments about this FAQ.

Table of content


General questions about DOSE

What does DOSE stand for?

The Distributed Open Semantic Elaboration platform:

distributed:
it can exploit the computing power of a computer network;
open:
freely available (open source);
semantic:
it is based on concepts for indexing and searching of web pages;
elaboration:
elaborates the web pages to update the underlying database for semantic search;
platform:
an etherogeneous set of programs handles database, web pages, user interface.

For further details see the documentation section.

What is the difference between DOSE and H-DOSE?

The H-DOSE (Holistic-DOSE, pron. "High-DOSE") platform is the most recent version of DOSE. This new name "emphasizes the importance of the whole and the interdependence of its parts".

What can I use DOSE for?

The DOSE platform can be used as a search engine for web resources. It uses ad-hoc strategies for adding new pages into its database. Then one can search the indexed pages for a concept by indicating words semantically related to such concept. The most relevant pages (or fragments) are displayed.

Why is it free?

This project has been developed as university research program. Currently, the platform is mature enough - i.e. stable and functional - for being shared with the web community. We think both web users and programmers may be interested in ether using the DOSE platform or improving it, providing the necessary feedback.


 

Technical questions

What do I need to install DOSE?

You can download DOSE from the download page. However, you also need to install some other program before you can run DOSE. DOSE has been almost entirely written in Java, so you will need a Java Run-time Environment. All the indexing stuff will be stored and retrieved through a database, an SQL server is necessary, too. Plus, a number of web services have been written, so you will also need a web server. Free implementations of such programs are available on the web. You should look in the download section for further details.

How do I install DOSE?

At now, the procedure is not as easy as we wish. However, a detailed explanation is available in the how-to section.


Contact Information

The DOSE project started in 2003 as a research program at the Polytechnic of Turin by the PhD student Dario Bonino under the advising of Prof. Fulvio Corno and Laura Farinetti. Soon the team included the PhD students Alessio Bosca and Federico Pescarmona, and the research assistant Michele Debandi. Very recently the PhD student Paolo Pellegrino has joined the team, too.
Since the spring of 2005 Intellisemantic s.r.l. is actively collaborating with this project, for what concerns performance optimization and future developments, and has developed commercial applications based on internal evolutions of this library.

Please check our mailing list This email address is being protected from spambots. You need JavaScript enabled to view it. or contact us for DOSE related questions.