YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

7.1 Search and Lucene.Net

Ash Prasad

Page 2: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• History and New Objectives • Architecture• Lucene / Lucene.Net• Crawlers, Entities, Controllers• Ranking, Synonyms, Ignore Words,

Stemming• Security Trimming• Module Integration, New Crawler

Agenda

Page 3: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform Edition• SQL Server• ISearchable

• Commercial Edition• Lucene 2.9.2• URL and Files

History of Search

Lucene

Scheduler

SQL

Scheduler Module

Module

ISearchable

Page 4: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Handle diverse Content • CMS, Social, Localized, 3rd Party

Modules)

• Consistent User Experience• Simple for Module Developers• Uniform Architecture • Feature based differentiation

Objectives of New Search

Page 5: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Architecture

Page 6: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Java-based indexing and search technology

• Managed by Apache• NOSQL database• Near real-time, Spellchecking,

Highlighting, Ranking, Synonyms

• Many companies use Lucene directly or customize

• Facebook’s Graph search uses

similar ‘Inverted Index’

What’s Lucene

Page 7: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Line-by-line port from Java to C#• Maintains high-performance requirements• A bit behind Java releases• Who Uses Lucene.Net• Products - RavenDB, Orchard, Umbraco,

SubText• Commercial Sites – BBC UK Top Gear,

AutoDesk, Koders.Com

What’s Lucene.Net

Page 8: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Flexible Schema

• Consists of Documents• Which are collection of Fields

• Documents can have different set of Fields• Field(“ID”,”xxx-yyy-999”), Field(“Title”,

“My best doc”)• Field(“Owner”,”Ash”),

Field(“Locale”,”en-US”)

Lucene – A Document Store

Page 9: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Denormalized (No Referential Integrity)

• Deletion – Done through a flag• Compact reclaims deleted space

• Update is Delete + Insert • Boost = Ranking• Unicode compliant

Lucene – A Document Store (Contd.)

Page 10: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Book consulted for Search

• Book on version 3.0

• ~ 500 pages• Very useful

Page 11: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Phases

Content Acquisition• Crawling• ISearchable• ModuleSearchBase• URL• Doc / PDF

Content Indexing• Text Analysis• Ranking• Synonyms• Ignore Words• Stemming

Content Search• Querying• Sorting• Security Trimming• Boolean Search• Highlighting

Page 12: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform• Site Crawler• Module and Tab Metadata• Module Content

(ModuleSearchBase/ISearchable)

• Commercial Edition• File Crawler • Uses IFilter for extraction of text

PDF/Office files

• URL Crawler• Internal and External URLs

Crawlers

Page 13: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchType• Distinguishes Crawlers

• SearchDocument• Properties for a Content• Stored in the Index

• SearchQuery• Parameters to execute a Query

• SearchResult• Derived from SearchDocument

Search Entities

Page 14: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Entities – Indexing vs. Querying

Page 15: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchController• For Querying

• InternalSearchController• For Adding / Updating / Deleting

• LuceneController• Interacts with Lucene

Controllers

Page 16: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Doc and/or Field can be boosted in Lucene

• DNN does Field boosts (Default - 10)• Title (50)• Tag (40)• Keyword (35)• Description (20)• Author (15)

• Configured manually by HostSettings

Ranking = Boosting

Page 17: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Synonyms are injected into Index

• Ignore Words are removed from Index

Synonyms and Ignore Words

Page 18: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Convert words to its root• PorterStemFilter is used• Country and Countries = countri• breathe, breathes, breathing,

breathed = breath• fishing, fished, fisher = fish

Stemming

Page 19: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Done through Collectors (Callback)

• Each Doc found is sent to Collector

• Collector rejects/accept per Permission

• Site Crawler - Module / Tab Permission

• File Crawler - Folder Permission• User Crawler – Profile

Permission

Security Trimming

Page 20: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• ModuleSearchBase • New abstract class with just one

method• Defined in BusinessControllerClass• GetModifiedSearchDocuments• Returns New, Changed and Deleted

content• Delta based• Granular Permission, Localization, etc.

• ISearchable continues to work (no delta)

Module Integration

Page 21: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Define a new SearchType• Optionally use IsPrivate to hide

from site search

• Implement BaseResultController (2 methods)• HasViewPermission• GetDocUrl

• Create Scheduled Task• Call AddSearchDocuments to inject

content

New Crawler (How to)

Page 22: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Demo

Page 23: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• New Search uses Lucene.Net• Platform has Site Crawler • Commercial has URL and File

Crawlers• Modules to implement

ModuleSearchBase• New Crawler implements

BaseResultController

Recap

Page 24: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

THANKS TO ALL OF OUR GENEROUS SPONSORS!


Related Documents