Search features and architecture in DNN 7.1

Post on 24-May-2015

1719 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

7.1 Search and Lucene.Net Lucene.Net was the obvious choice of technology for Search in 7.1. Lucene is a general purpose search engine, integrating with the intricracies with DNN wasn't trivial. Ash was very instrumental in design and development of the new Search in 7.1. Join Ash to hear all about DNN Search and Lucene.Net and what's the future look like.

Transcript

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

7.1 Search and Lucene.Net

Ash Prasad

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• History and New Objectives • Architecture• Lucene / Lucene.Net• Crawlers, Entities, Controllers• Ranking, Synonyms, Ignore Words,

Stemming• Security Trimming• Module Integration, New Crawler

Agenda

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform Edition• SQL Server• ISearchable

• Commercial Edition• Lucene 2.9.2• URL and Files

History of Search

Lucene

Scheduler

SQL

Scheduler Module

Module

ISearchable

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Handle diverse Content • CMS, Social, Localized, 3rd Party

Modules)

• Consistent User Experience• Simple for Module Developers• Uniform Architecture • Feature based differentiation

Objectives of New Search

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Architecture

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Java-based indexing and search technology

• Managed by Apache• NOSQL database• Near real-time, Spellchecking,

Highlighting, Ranking, Synonyms

• Many companies use Lucene directly or customize

• Facebook’s Graph search uses

similar ‘Inverted Index’

What’s Lucene

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Line-by-line port from Java to C#• Maintains high-performance requirements• A bit behind Java releases• Who Uses Lucene.Net• Products - RavenDB, Orchard, Umbraco,

SubText• Commercial Sites – BBC UK Top Gear,

AutoDesk, Koders.Com

What’s Lucene.Net

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Flexible Schema

• Consists of Documents• Which are collection of Fields

• Documents can have different set of Fields• Field(“ID”,”xxx-yyy-999”), Field(“Title”,

“My best doc”)• Field(“Owner”,”Ash”),

Field(“Locale”,”en-US”)

Lucene – A Document Store

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Denormalized (No Referential Integrity)

• Deletion – Done through a flag• Compact reclaims deleted space

• Update is Delete + Insert • Boost = Ranking• Unicode compliant

Lucene – A Document Store (Contd.)

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Book consulted for Search

• Book on version 3.0

• ~ 500 pages• Very useful

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Phases

Content Acquisition• Crawling• ISearchable• ModuleSearchBase• URL• Doc / PDF

Content Indexing• Text Analysis• Ranking• Synonyms• Ignore Words• Stemming

Content Search• Querying• Sorting• Security Trimming• Boolean Search• Highlighting

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform• Site Crawler• Module and Tab Metadata• Module Content

(ModuleSearchBase/ISearchable)

• Commercial Edition• File Crawler • Uses IFilter for extraction of text

PDF/Office files

• URL Crawler• Internal and External URLs

Crawlers

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchType• Distinguishes Crawlers

• SearchDocument• Properties for a Content• Stored in the Index

• SearchQuery• Parameters to execute a Query

• SearchResult• Derived from SearchDocument

Search Entities

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Entities – Indexing vs. Querying

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchController• For Querying

• InternalSearchController• For Adding / Updating / Deleting

• LuceneController• Interacts with Lucene

Controllers

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Doc and/or Field can be boosted in Lucene

• DNN does Field boosts (Default - 10)• Title (50)• Tag (40)• Keyword (35)• Description (20)• Author (15)

• Configured manually by HostSettings

Ranking = Boosting

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Synonyms are injected into Index

• Ignore Words are removed from Index

Synonyms and Ignore Words

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Convert words to its root• PorterStemFilter is used• Country and Countries = countri• breathe, breathes, breathing,

breathed = breath• fishing, fished, fisher = fish

Stemming

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Done through Collectors (Callback)

• Each Doc found is sent to Collector

• Collector rejects/accept per Permission

• Site Crawler - Module / Tab Permission

• File Crawler - Folder Permission• User Crawler – Profile

Permission

Security Trimming

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• ModuleSearchBase • New abstract class with just one

method• Defined in BusinessControllerClass• GetModifiedSearchDocuments• Returns New, Changed and Deleted

content• Delta based• Granular Permission, Localization, etc.

• ISearchable continues to work (no delta)

Module Integration

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Define a new SearchType• Optionally use IsPrivate to hide

from site search

• Implement BaseResultController (2 methods)• HasViewPermission• GetDocUrl

• Create Scheduled Task• Call AddSearchDocuments to inject

content

New Crawler (How to)

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Demo

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• New Search uses Lucene.Net• Platform has Site Crawler • Commercial has URL and File

Crawlers• Modules to implement

ModuleSearchBase• New Crawler implements

BaseResultController

Recap

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

THANKS TO ALL OF OUR GENEROUS SPONSORS!

top related