Top Banner
32

Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Nov 15, 2014

Download

Documents

The Google Search Appliance is an on-premise hardware and software solution that brings Google search into the enterprise, so users can find content quickly and securely. In this session, learn how partners today are plugging enterprise data sources into the GSA through Connectors and displaying results using OneBox.

Watch a video at http://www.bestechvideos.com/2009/06/09/google-i-o-2009-extending-the-google-search-appliance-to-crawl-valuable-data-behind-the-firewall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Page 2: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google Search Appliance Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Nitin MangtaniMay 27, 2009

Page 3: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Search is the starting point to the world’s information

Page 4: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google Enterprise Search

More than 20,000 enterprise search customers

Dedicated team of enterprise engineers focused on solving enterprise search problems.

Backed by Google’s core research and development

Bringing Google.com search experience to businesses

Our Search Products

Page 5: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Universal Search

Employee Directory

Content Management

Wikis

Intranet

File share

SharePoint

Page 6: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google’s Search Philosophy

User

All information‘Real-time’ dataCustomizable and extendable

Reach

Highly secure architectureStandards-basedLeverage existing security

Security

Intuitive, unified resultsHighly relevantUser-friendly innovation

Large corpus searchCross-enterprise managementFlexible infrastructure

Scale

Page 7: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Personalized Search Experience

Marketing

Engineering

Page 8: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Advanced Biasing Controls

Administrators can create multiple biasing policies.

Source biasing

Date biasing

Metadata biasing New!

Front-end biasing New!

Simple setup - No complex coding or scripts.

Page 9: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Metadata Biasing New!

Determine influence of metadata parameter

On Specific metadata name,

content

Biasing based on metadata attribute and value

“Boost all documents that have author as Larry Page”

Administrators control influence (positive or negative) on metadata attribute/value pairs

Page 10: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Embedding Search Box in your application

<form method="GET" action="http://search.mycompany.com/search"> <input type="text" name="q" size="32" maxlength="256" value="query string"> <input type="submit" name="btnG" value="Google Search"> <input type="hidden" name="site" value="default_collection"> <input type="hidden" name="client" value="default_frontend"> <input type="hidden" name="output" value="xml_no_dtd"> <input type="hidden" name="proxystylesheet" value="default_frontend"></form> Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.

A web application may make a HTTP GET request directly:GET /search?q=query+string&site=default_collection &client=default_frontend &output=xml_no_dtd &proxystylesheet=default_frontend HTTP/1.0

Page 11: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Leverage users’ input

Page 12: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Do-It-Yourself KeyMatch

Page 13: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Search-as-you-Type

Page 14: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google Search Appliance

Fileshares Intranets Databases Enterprise

applicationsContent

Management

Universal Search: Powered by Google Search Appliance

Documentum

SharePoint

FileNet

Livelink

Any other system

Over 200 file formats

MS Office, PDF, HTML, etc.

Web servers

Portals

Oracle

SQL Server

MySQL

DB2

Sybase

ERP systems

Business intelligence systems

Page 15: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Architecture

Page 16: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

SecureReal-time access to business information

Real-Time Access to Business Applications

“The Google Search Appliance with OneBox is our command line interface to our world …adding more content and additional OneBox

interfaces will only increase the value to our organization” – Danny Perri, BOC Gases

Access to real-time business data with OneBox

2008 Q4

Q1 2007 Q3 2007 Q1 2008 Q3 2008Q1 2007 – Q4 2008

Page 17: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

②③

https://provider…

XML

ProviderServer

Google OneBox for Enterprise

1. User enters a query 2. OneBox “trigger”

determines if the query is relevant to a OneBox module.

3. The appliance makes a secure REST call (https GET request) to the predefined OneBox provider, passing security credentials and other parameters.

4. The provider users the information to determine appropriate, user-specific, secure results to the query, and passes those results back to the appliance in XML.

5. The XML is transformed into HTML based on the XSL template provided in the OneBox module and presented to the user inline with their search results.

Page 18: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google OneBox for Enterprise

Real-time, secure access to information from the search boxTriggers - Configurable to show OneBox results:

Always On: the module is invoked for every query

Keyword(s): the module is invoked in response to specific keywords

Regular Expression: invoked when query matches a regular expression

Providers Internal: Specialized search content in a separate appliance collection

External: Modules from OneBox module gallery

External: API enables you to create your own modules

Page 19: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

OneBox Results Schema<OneBoxResults><resultCode>result_code </resultCode><Diagnostics>failure_reason </Diagnostics><provider>provider_name </provider><searchTerm>query_escape </searchTerm><totalResults>total_results_escape </totalResults><title><urlText>results_title </urlText><urlLink>results_uri </urlLink></title><IMAGE_SOURCE>image_uri </IMAGE_SOURCE><MODULE_RESULT><U>uri </U><Title>title </Title><Field name="name1 ">value1 </Field><Field name="name2 ">value2 </Field><Field name="nameN ">valueN </Field></MODULE_RESULT></OneBoxResults>

Page 20: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Common Security Protocols

HTTP-Basic

NTLM (v1, v2)

LDAP

Advanced Security

Kerberos New!SSO - Oracle (Oblix), CA/SiteMinderX509 Certificates

Custom Authentication & Authorization Support for SAML SPI

Document Level Security Provide the right users with access to the right documents

Security

“Zero” Sign-on

Page 21: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Access Control (NTLM, HTTP Basic, SSO, etc.)

1. User executes search for public and secure content (access=a)

2. User is prompted for credentials (if NTLM/Basic Auth & SSO, user is prompted for both sets of credentials)

3. Users credentials are sent securely to the search appliance

4. Google Search Appliance queries index for all possible results

5. Search appliance makes ‘authorization’ requests of the host content servers with user’s credential set

6. Host servers respond with success or failure

7. Secure results restricted to user are filtered from search results

8. Final search results (filtered) are presented to the user

nonehttp://corp…/welcome/…http basichttp://corp…/policyhtml2ntlmhttp://corp…/preso.ppt1SecureURL#

Results

ssohttp://int…/customer.jspn

Index x

401200 200

DatabaseFile sharesContent Mgmt.

Page 22: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Traditional search technology for millions of docs

+

Disaster Recovery Server

+Patch Deployment Management Server

+

Volume License Management Server

Page 23: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Google Architecture: 10M documents in a box

Page 24: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Page 25: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Health Vine SimplicityPatients

Immediate Family

Community

Page 26: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Page 27: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Page 28: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Page 29: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Where’s your GSA??

The State of Missouri’s use of Google GSA

Page 30: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Where was Missouri?

16 Executive AgenciesNo common web searchNo unified way for citizen’s or businesses find information about State Government.

Page 31: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Where is Missouri??

Centrally Managed Google GSAFront Ends and Collections provided to all State Government entitiesCommon search across all State Government web contentReliable information now easily found by citizens and businesses

Page 32: Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall