Top Banner
@DNNCon @ashishprasad Don’t forget to include #DNNCon in your tweets! 7.1 Search and Lucene.Net Ash Prasad
24

Search features and architecture in DNN 7.1

May 24, 2015

Download

Technology

ashishpd

7.1 Search and Lucene.Net
Lucene.Net was the obvious choice of technology for Search in 7.1. Lucene is a general purpose search engine, integrating with the intricracies with DNN wasn't trivial. Ash was very instrumental in design and development of the new Search in 7.1. Join Ash to hear all about DNN Search and Lucene.Net and what's the future look like.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

7.1 Search and Lucene.Net

Ash Prasad

Page 2: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• History and New Objectives • Architecture• Lucene / Lucene.Net• Crawlers, Entities, Controllers• Ranking, Synonyms, Ignore Words,

Stemming• Security Trimming• Module Integration, New Crawler

Agenda

Page 3: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform Edition• SQL Server• ISearchable

• Commercial Edition• Lucene 2.9.2• URL and Files

History of Search

Lucene

Scheduler

SQL

Scheduler Module

Module

ISearchable

Page 4: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Handle diverse Content • CMS, Social, Localized, 3rd Party

Modules)

• Consistent User Experience• Simple for Module Developers• Uniform Architecture • Feature based differentiation

Objectives of New Search

Page 5: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Architecture

Page 6: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Java-based indexing and search technology

• Managed by Apache• NOSQL database• Near real-time, Spellchecking,

Highlighting, Ranking, Synonyms

• Many companies use Lucene directly or customize

• Facebook’s Graph search uses

similar ‘Inverted Index’

What’s Lucene

Page 7: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Line-by-line port from Java to C#• Maintains high-performance requirements• A bit behind Java releases• Who Uses Lucene.Net• Products - RavenDB, Orchard, Umbraco,

SubText• Commercial Sites – BBC UK Top Gear,

AutoDesk, Koders.Com

What’s Lucene.Net

Page 8: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Flexible Schema

• Consists of Documents• Which are collection of Fields

• Documents can have different set of Fields• Field(“ID”,”xxx-yyy-999”), Field(“Title”,

“My best doc”)• Field(“Owner”,”Ash”),

Field(“Locale”,”en-US”)

Lucene – A Document Store

Page 9: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Denormalized (No Referential Integrity)

• Deletion – Done through a flag• Compact reclaims deleted space

• Update is Delete + Insert • Boost = Ranking• Unicode compliant

Lucene – A Document Store (Contd.)

Page 10: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Book consulted for Search

• Book on version 3.0

• ~ 500 pages• Very useful

Page 11: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Phases

Content Acquisition• Crawling• ISearchable• ModuleSearchBase• URL• Doc / PDF

Content Indexing• Text Analysis• Ranking• Synonyms• Ignore Words• Stemming

Content Search• Querying• Sorting• Security Trimming• Boolean Search• Highlighting

Page 12: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Platform• Site Crawler• Module and Tab Metadata• Module Content

(ModuleSearchBase/ISearchable)

• Commercial Edition• File Crawler • Uses IFilter for extraction of text

PDF/Office files

• URL Crawler• Internal and External URLs

Crawlers

Page 13: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchType• Distinguishes Crawlers

• SearchDocument• Properties for a Content• Stored in the Index

• SearchQuery• Parameters to execute a Query

• SearchResult• Derived from SearchDocument

Search Entities

Page 14: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Search Entities – Indexing vs. Querying

Page 15: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• SearchController• For Querying

• InternalSearchController• For Adding / Updating / Deleting

• LuceneController• Interacts with Lucene

Controllers

Page 16: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Doc and/or Field can be boosted in Lucene

• DNN does Field boosts (Default - 10)• Title (50)• Tag (40)• Keyword (35)• Description (20)• Author (15)

• Configured manually by HostSettings

Ranking = Boosting

Page 17: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Synonyms are injected into Index

• Ignore Words are removed from Index

Synonyms and Ignore Words

Page 18: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Convert words to its root• PorterStemFilter is used• Country and Countries = countri• breathe, breathes, breathing,

breathed = breath• fishing, fished, fisher = fish

Stemming

Page 19: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Done through Collectors (Callback)

• Each Doc found is sent to Collector

• Collector rejects/accept per Permission

• Site Crawler - Module / Tab Permission

• File Crawler - Folder Permission• User Crawler – Profile

Permission

Security Trimming

Page 20: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• ModuleSearchBase • New abstract class with just one

method• Defined in BusinessControllerClass• GetModifiedSearchDocuments• Returns New, Changed and Deleted

content• Delta based• Granular Permission, Localization, etc.

• ISearchable continues to work (no delta)

Module Integration

Page 21: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• Define a new SearchType• Optionally use IsPrivate to hide

from site search

• Implement BaseResultController (2 methods)• HasViewPermission• GetDocUrl

• Create Scheduled Task• Call AddSearchDocuments to inject

content

New Crawler (How to)

Page 22: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

Demo

Page 23: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

• New Search uses Lucene.Net• Platform has Site Crawler • Commercial has URL and File

Crawlers• Modules to implement

ModuleSearchBase• New Crawler implements

BaseResultController

Recap

Page 24: Search features and architecture in DNN 7.1

@DNNCon @ashishprasad

Don’t forget to include #DNNCon in your tweets!

THANKS TO ALL OF OUR GENEROUS SPONSORS!