Using Lucene for Search within XIS

Post on 11-May-2015

571 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Allex Lyons, a programmer at Access Innovations, Inc., talks about the decision made by this company to apply a faster, more reliable and efficient Lucene index to XIS for searching docsets, instead of a random access file.

Transcript

XIS Lucene Indexing and Search

What is XIS? XIS is a XML schema-based database system used to

store user data All records are stored in individual XML files Option to zip XML files available with XIS Project DTD

How XIS Data Is Stored Docsets

Stores records with multiple fields (similar to SQL Table) Can also have subfields and lists of field values nested within a

record Can look up values from other fields in other Docsets or other

tables Tables

Stores a single list of values Can be referenced by other Docsets Can be directly accessible for editing or kept hidden from user

view

How to Create a XIS Project Create DTD file for XIS project

Specify MAI Thesaurus to link to project Create Docset and Tables Specify ID lengths for each Docset Create fields for Docsets

Save DTD to dhserver/projects/projects/xml folder Create XIS Project folder under dhserver/data Create subfolders for each Docset under XIS Project

folder as well as Tables directory XIS Projects can only be created by administrators

Starting a XIS Project Start Data Harmony server where project is located Log in to Admin module

Start MAI Thesaurus Start XIS Project Index XIS Project, especially if just created

Run startXis program Enter server, port, thesaurus, username, and password

to log in

Indexing a XIS Project

XIS Login Screen

XIS Project View

XIS Docset View

XIS Table View

XIS Record Format Saved in XML file Starts with tag to represent Docset name along with ID

as attribute Fields are listed within Docset tag along with values.

Subfields are nested within their parent fields

XIS Search View

XIS Search Results

Current XIS Indexing and Search Uses text-based indexes Creates large number of index files (one for each field) Generates temporary files for results Uses less reliable RandomAccessFile search Has limited amount of search operands Does not take into account numerical values

Lucene vs. Current XIS Index Fewer index files needed Allows for broader searches

Fuzzy matching Start and end wildcard searches

Recognizes numerical and date fields as such Can be utilized to remove stopwords

New Lucene Search Process Establish index reader to perform search Submit query string containing fields and parameters Return results

Other Lucene Functions Will be used for adding, updating, and deleting XIS

records Indexes will be housed on Data Harmony server

Any Questions?

top related