Research Topics

Article Index

Architecture

The H-DOSE platform

H-DOSE stands for Holistic Distributed Open Semantic Elaboration; it is an enhanced semantic platform in which the joint adoption of Web Services and Multi Agent Systems enables the provision of easy to access, autonomic and semantic indexing, searching and deep-searching functionalities.

More specifically, Web Services allow easy integration of provided functionalities into the current web development workflow, by offering semantic functionalities through a well defined interface, with a well defined interaction paradigm and protocol such as SOAP. Agents, instead, allow code-to-information mobility, natural distribution and work balancing and contribute to the platform maintenance process by implementing principles of autonomic computing. Moreover the agent deployment of semantic classification services allows performing deep searching by directly interfacing web applications and their knowledge bases in order to extract and index relevant information.

Design Principles

The H-DOSE platform aims at providing semantic functionalities for web applications through an easy to access interface, allowing rapid inclusion of services into the existing development workflow and trying to maximize the benefit/cost ratio for semantics inclusion in web applications. In particular, H-DOSE is focused on semantic search and indexing services, providing means for classifying a web resource with respect to a conceptual model given by an ontology, for storing conceptual descriptors of indexed resources (annotations) and for retrieving such resources in response to user queries, according to the semantic similarity between queries and resource descriptors.

H-DOSE is an evolution of a previous version of the same platform, called DOSE. The basic principles in DOSE (modularity, external annotations, synsets, and semantic mapper) have been preserved, but the external interfaces and the deployment architecture were redesigned in order to meet the fast integration goals.The platform conceptual organization is shown in Figure 1.

 

dose logical architecture
Figure 1. The DOSE/H-DOSE conceptual organization

 

The H-DOSE platform is logically organized as a layered architecture (Figure 1) in which each service is located at a specific level depending on the task it accomplishes in the platform specific processes (indexing and search).

 

dose conceptual organization
Figure 2. The DOSE/H-DOSE conceptual organization.

 

Services are deployed on three different layers: the Service layer, the Kernel layer and the Wrapper layer.

Each service could be implemented either as a web service or as an agent system exposing web services, depending on the task to be performed.

The Service layer

The Service Layer groups all services that require interaction with the external applications, i.e., the services that are accessible by web application developers wishing to include semantics into their works. Such services are the Search service, the Indexing service and the Gateway service. The Search service allows external applications to perform semantic searches on the platform knowledge base, i.e., searches for semantic matching between queries expressed, either as string of terms or as weighted set of concepts, and conceptual descriptors of web resources indexed by the platform. Provided results are composed by the set of URIs of the web resources relevant with respect to the query, ordered by the estimated relevance factor: most relevant resources will be in the first positions of the result set.

The indexing service offers a functionality complementary of the search service allowing external applications to specify URIs or resources to be semantically classified. In opposition to the search service, the indexing service is an asynchronous service since it does not require backward communication to the calling application rather than a request acknowledge.

Eventually, the gateway services are run-time created services which are used by the platform for dynamic publication of agent-based web services, the platform management as an example. Such services are dynamically built in response to proper actions taken by agents in the platform.

The Kernel layer

The Kernel Layer groups the services that perform the greatest amount of semantic elaboration provided by H-DOSE; they do not offer access to external applications but they are targeted to the Service Layer entities performing high complexity tasks such as semantic classification of web resources.

The kernel layer uses agent-based services in a more massive manner than the Service Layer: agents, in fact, become useful when complex information-dependent tasks are required (semantic indexing) or where their proactive behavior can produce significant advantages (autonomic maintenance). The kernel services of H-DOSE are the Annotation Storage service, the Expander service, the Indexing service and the Autonomic Maintenance service.

The Annotation Storage service gives persistence to semantic annotations extracted by the Indexing service, allowing searches on the knowledge base composed by the set of stored semantic descriptors and the platform ontology.

The Expander service allows the Search service to extract the semantic equivalent of user queries and to perform semantic comparisons between such queries and the annotations managed by the Annotation Storage service. Both the Expander and the Storage services are deployed as simple, static, web services.

Conversely, the Indexing service is deployed as a set of collaborative agents having different goals such as syntactic to semantic translation, semantic descriptor creation, etc.

The novelty aspect in the Indexing service is that rather than being a single service, it is an aggregate service provided by several collaborating agents and exposed as a single web service through a gateway framework. Such arrangement allows focused indexing, i.e., indexing of resource with respect to small subsets of the platform ontology, performed on the servers publishing the resources to be indexed. This allows exploiting both work balancing and deep search capabilities of agents.

Finally, the Maintenance service is a somewhat atypical service because rather than being passive, it is proactive and constantly monitors the state of the Annotation Storage and of the Search service to find out if there have been some "no data" responses and to recover the platform from this state. With more detail, the Maintenance service, from one side acts on the Storage service by inspecting stored data with respect to the platform ontology and by searching for those conceptual areas for which there are too few semantic descriptors (self-management), and, on the other side, checks the Search service state to reveal alerts about "no data" responses given to user requests (self-healing).

If an alert is found or if a low covered conceptual area is detected, the Maintenance service coordinates a proper set of agents to discover new knowledge on the web and triggers a focused indexing cycle augmenting the amount of resources classified by the platform and possibly recovering the platform from the failure state allowing to successfully address future requests similar to those that failed.

The Wrapper layer

The Wrapper layer includes all the utilities and wrappers needed by platform operations that do not require publication as Web Services but that are included in the platform services as libraries. At this level we find data-persistence wrappers allowing H-DOSE to be deployed on different database servers such as PostgreSQL, MySQL and Oracle, ontology wrappers for accessing the conceptual model that drives the platform operations according to different definition languages such as RDF/S, DAML, OWL, and data-manipulation libraries useful for performing syntax to semantics conversion (stemmer, language recognition, and so on), etc.

It is important to notice that, even if no novelty is introduced into such layer, it still assumes a critical role in the overall platform deployment allowing platform mobility on different hardware devices and enabling the platform to be targeted at different conceptual domains by simply changing the ontology used.