From Digital Objects to Content across eInfrastructures DILIGENT DILIGENT : : Deploying Virtual Research Deploying Virtual Research Environments on-demand Environments on-demand Donatella Castelli, Pasquale Pagano ISTI-CNR Yannis Ioannidis Univ. of Athens
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
From Digital Objectsto Content acrosseInfrastructures
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
OutlineOutline
Motivations & overview Achievements
DL related services DILIGENT Infrastructure ImpECt application
D4Science
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Motivations Motivations –– from DLs to from DLs to VREsVREs
DLs are evolving into “Virtual Research Environments”(Collaboratoria)
Distributed frameworks for carrying out cooperative activitieslike “in silico experiments”, data analysis and processing,production of new knowledge using specialised tools
Largely based on retrieval and access of always updatedknowledge from diverse heterogeneous content sources
The knowledge produced is preserved and made available forother usages inside and outside the VRE
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
VREsVREs trend trend
Highly dynamic, created anddismissed on-demand
Based on specialised toolswhich support the generationof new knowledge
M26
0
0,2
0,4
0,6
0,8
1
1,2
Info
rma
ion S
erv
ice
Bro
ke
r &
Ma
tch
ma
ke
r Ke
ep
er
DV
OS
VD
L G
en
era
tor
Co
nte
nt
Ma
na
ge
me
nt
Wra
pp
er
& M
on
ito
r
Co
nte
nt
Se
cu
rity
Me
tad
ata
Bro
ke
r
An
no
tati
on
Me
tad
ata
Ma
na
ge
me
nt
Da
ta F
usio
n
CS
DS
Pe
rso
na
liza
tio
n
Ind
ex
Se
rvic
e
Se
arc
h S
erv
ice
Fe
atu
re E
xtr
acti
on S
erv
ice
Pro
ce
ss
De
sig
n &
Ve
rifi
ca
tio
n
Pro
ce
ss
Ex
ecu
tio
n &
Re
lia
bil
ity
Pro
ce
ss
Op
tim
iza
tio
n
Art
e P
ort
al
Imp
EC
t P
ort
al
PrototypeAvailableBuild
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
VRE systemVRE system
VRE
VRE System
Content SourcesDedicated Resources
Services
Computing & storage elements
…
…
…
Management and Orchestration…
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Distribution #1: Information SourcesDistribution #1: Information Sources
THE CHALLENGE Characterizing and indexing a diversity of sources Selecting the appropriate sources Fusing/Merging the results in meaningful lists
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Indexing for Content Based Search Indexing for Content Based Search
QueryExtract features
Portal Feature Extraction
Query Index
Metadata &Content Mgt
Index203 236 172 210 78
Access metadata& createResultSet
MDPresent results
Index Mgt
FeedBuild Index
Content &Metadata
Feature Extraction Service Feature Index
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Selecting Sources and Fusing ResultsSelecting Sources and Fusing Results
Index
MetadataManager
ContentSource
Description
ContentSource
Selection
Data Fusion
Search
ExternalSource
ExternalSource
ContentManager
MetadataCollections
ContentCollections External
Repositories
Describe
Indices
IndexStatistics
SourceDescriptions
Select Sources
Query Sources
Query SourcesAcquire Results
Acquire
Results
Reranked Lists
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Distribution #2: Information RetrievalDistribution #2: Information Retrieval
Numerous Search services, for info retrieval & processing Structured data and XML processing (scanners, sorters,
joiners, filterers, transformers, retrievers) Lookups (indices, FT indices, XML indices, Geo indices) Content-based searches External source probes Fusion / Merging of results
Query language (internal) for interfacing Workflow language (BPEL) for execution Data transport mechanism (ResultSet) for communication
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Query and Workflow ManagementQuery and Workflow Management
project by 'title', 'description', 'subject'on (keeptop 20
on (sort ASC by 'DocID'on (merge
on (fieldedsearchby 'title' contains '*woman'in 'ENGLISH'on ‘CollectionOfMedicalImages'as 'dc')
and (fieldedsearchby 'description' contains '*term*'in 'ENGLISH'on ‘CollectionOfMedicalBooks'as 'dc')
))
)
Produce & Execute BPEL Workfl
owOptimization
Complex Cost CalculationProfiling / MonitoringResource selection “hinting”Domain specific planning…
Parallelization
Active Planning
Query
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Queries & Workflows: It can getQueries & Workflows: It can getcomplexcomplex……
project by 'title', 'date' on(sort ASC by 'DocID' on
(merge on//MAP REPORTSkeeptop 8 on
(sort ASC by 'RankID' on(join inner by 'DocID' on
(fulltextsearch by 'Mediterranean' in 'ENGLISH' on 'd369b3e0-fa4c-11db-a297-9c01d805f283')and
(fulltextsearch by 'Environmental' in 'ENGLISH' on 'd369b3e0-fa4c-11db-a297-9c01d805f283')))
keeptop 8 on (sort ASC by 'RankID' on (join inner by 'DocID' on (fulltextsearch by 'Mediterranean' in 'ENGLISH' on'd369b3e0-fa4c-11db-a297-9c01d805f283') and (fulltextsearch by 'Environmental' in 'ENGLISH' on 'd369b3e0-fa4c-11db-a297-9c01d805f283')))
// EEA reportskeeptop 8 on
(sort ASC by 'RankID' on(fieldedsearch by 'date' contains '*1999*' on
(join inner by 'DocID' on(fulltextsearch by 'air polution' in 'ENGLISH' on '25ad3c50-fa41-11db-a270-9c01d805f283')
and(fulltextsearch by 'european' in 'ENGLISH' on '25ad3c50-fa41-11db-a270-9c01d805f283')
))
))
)
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Optimal Utilization of ResourcesOptimal Utilization of Resources
Pre-query optimization: Monitoring and adaptation of VRE layout for optimal resource use
Content Source Selection: Filtering of collections unlikely to contain useful data Query terms and automatically pre-constructed Content Source
Descriptors Query Planning:
Cost based optimization Heuristics and space-search
Process Execution: Process optimization selects and allocates appropriate resource for tasks
On-The-Spot processing: ResultSet mechanism to allow local filtering of large XML chunks of data
Further mechanisms to facilitate efficient searches: Indices ResultSet transport mechanism
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Information Retrieval: How it WorksInformation Retrieval: How it Works
SearchMaster
Planner
Que
ryP
repr
oces
sing
Search Service
Query
Environment info
PES
Query Parser
WorkflowPersonalization
CSS
Linguistics
DIS
XML Sorter
XML Merger
XML Transformer
XML Joiner
XML Processor
External Source
FTI Lookup
Data Fusion
Metadata Catalog
Results
Geo IndexLookup
Feature IndexLookup
S1
F1F2
F4
C
J1
M1
S2
F3
S3
T
bpel4ws
Q
E
Q
P
Q
ActivePlanning
From Digital Objectsto Content acrosseInfrastructures
from theory ...from theory ...... to reality... to reality
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Next step: DILIGENT for ScienceNext step: DILIGENT for Science
Provide and operate a production D4Science e-Infrastructure Consolidate and extend gCube Built VREs serving Environmental Monitoring and Fishery
Resources Management domains
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
Provide and operate a production D4Science e-InfrastructureDefine the operational procedures for sites (sitess include content andservice sites)
Consolidate and extend gCubeExtend the the Data Kit to deal with very large and heterogenouscontent sources (e.g. textual repositories, satellite images, statisticaldatabases) and other content-related resources (e.g. gazetters,ontologies, thesauri)
Build VREs serving Environmental Monitoring and FisheryResources Management domains
Serve the needs of a multitude of researchers and decision-makersfrom many disciplines (biologists, climatologists, GIS experts, socio-economists, fishery managers, etc.) operating with many differenttools
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
http://www.diligentproject.org
http://www.d4science.org/
From Digital Objectsto Content acrosseInfrastructures
Thank you!Questions?
From Digital Objectsto Content acrosseInfrastructures
Additional
Slides
Rome, 29-30th October 2007European Information Space: Infrastructures, Services and Applications Workshop
gCube SystemgCube System
An application framework for the development ofservices that can be outsourced to a grid-enabledinfrastructure
An advanced container for the hosting of WS on thegrid
A runtime environment for the provision of information about shared resources management of services and applications execution of VRE build-in services: content and