Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Post on 15-Nov-2014

110 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Google Search Appliance is an on-premise hardware and software solution that brings Google search into the enterprise, so users can find content quickly and securely. In this session, learn how partners today are plugging enterprise data sources into the GSA through Connectors and displaying results using OneBox.Watch a video at http://www.bestechvideos.com/2009/06/09/google-i-o-2009-extending-the-google-search-appliance-to-crawl-valuable-data-behind-the-firewall

Transcript

Google Search Appliance Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall

Nitin MangtaniMay 27, 2009

Search is the starting point to the world’s information

Google Enterprise Search

More than 20,000 enterprise search customers

Dedicated team of enterprise engineers focused on solving enterprise search problems.

Backed by Google’s core research and development

Bringing Google.com search experience to businesses

Our Search Products

Universal Search

Employee Directory

Content Management

Wikis

Intranet

File share

SharePoint

Google’s Search Philosophy

User

All information‘Real-time’ dataCustomizable and extendable

Reach

Highly secure architectureStandards-basedLeverage existing security

Security

Intuitive, unified resultsHighly relevantUser-friendly innovation

Large corpus searchCross-enterprise managementFlexible infrastructure

Scale

Personalized Search Experience

Marketing

Engineering

Advanced Biasing Controls

Administrators can create multiple biasing policies.

Source biasing

Date biasing

Metadata biasing New!

Front-end biasing New!

Simple setup - No complex coding or scripts.

Metadata Biasing New!

Determine influence of metadata parameter

On Specific metadata name,

content

Biasing based on metadata attribute and value

“Boost all documents that have author as Larry Page”

Administrators control influence (positive or negative) on metadata attribute/value pairs

Embedding Search Box in your application

<form method="GET" action="http://search.mycompany.com/search"> <input type="text" name="q" size="32" maxlength="256" value="query string"> <input type="submit" name="btnG" value="Google Search"> <input type="hidden" name="site" value="default_collection"> <input type="hidden" name="client" value="default_frontend"> <input type="hidden" name="output" value="xml_no_dtd"> <input type="hidden" name="proxystylesheet" value="default_frontend"></form> Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.

A web application may make a HTTP GET request directly:GET /search?q=query+string&site=default_collection &client=default_frontend &output=xml_no_dtd &proxystylesheet=default_frontend HTTP/1.0

Leverage users’ input

Do-It-Yourself KeyMatch

Search-as-you-Type

Google Search Appliance

Fileshares Intranets Databases Enterprise

applicationsContent

Management

Universal Search: Powered by Google Search Appliance

Documentum

SharePoint

FileNet

Livelink

Any other system

Over 200 file formats

MS Office, PDF, HTML, etc.

Web servers

Portals

Oracle

SQL Server

MySQL

DB2

Sybase

ERP systems

Business intelligence systems

Architecture

SecureReal-time access to business information

Real-Time Access to Business Applications

“The Google Search Appliance with OneBox is our command line interface to our world …adding more content and additional OneBox

interfaces will only increase the value to our organization” – Danny Perri, BOC Gases

Access to real-time business data with OneBox

2008 Q4

Q1 2007 Q3 2007 Q1 2008 Q3 2008Q1 2007 – Q4 2008

②③

https://provider…

XML

ProviderServer

Google OneBox for Enterprise

1. User enters a query 2. OneBox “trigger”

determines if the query is relevant to a OneBox module.

3. The appliance makes a secure REST call (https GET request) to the predefined OneBox provider, passing security credentials and other parameters.

4. The provider users the information to determine appropriate, user-specific, secure results to the query, and passes those results back to the appliance in XML.

5. The XML is transformed into HTML based on the XSL template provided in the OneBox module and presented to the user inline with their search results.

Google OneBox for Enterprise

Real-time, secure access to information from the search boxTriggers - Configurable to show OneBox results:

Always On: the module is invoked for every query

Keyword(s): the module is invoked in response to specific keywords

Regular Expression: invoked when query matches a regular expression

Providers Internal: Specialized search content in a separate appliance collection

External: Modules from OneBox module gallery

External: API enables you to create your own modules

OneBox Results Schema<OneBoxResults><resultCode>result_code </resultCode><Diagnostics>failure_reason </Diagnostics><provider>provider_name </provider><searchTerm>query_escape </searchTerm><totalResults>total_results_escape </totalResults><title><urlText>results_title </urlText><urlLink>results_uri </urlLink></title><IMAGE_SOURCE>image_uri </IMAGE_SOURCE><MODULE_RESULT><U>uri </U><Title>title </Title><Field name="name1 ">value1 </Field><Field name="name2 ">value2 </Field><Field name="nameN ">valueN </Field></MODULE_RESULT></OneBoxResults>

Common Security Protocols

HTTP-Basic

NTLM (v1, v2)

LDAP

Advanced Security

Kerberos New!SSO - Oracle (Oblix), CA/SiteMinderX509 Certificates

Custom Authentication & Authorization Support for SAML SPI

Document Level Security Provide the right users with access to the right documents

Security

“Zero” Sign-on

Access Control (NTLM, HTTP Basic, SSO, etc.)

1. User executes search for public and secure content (access=a)

2. User is prompted for credentials (if NTLM/Basic Auth & SSO, user is prompted for both sets of credentials)

3. Users credentials are sent securely to the search appliance

4. Google Search Appliance queries index for all possible results

5. Search appliance makes ‘authorization’ requests of the host content servers with user’s credential set

6. Host servers respond with success or failure

7. Secure results restricted to user are filtered from search results

8. Final search results (filtered) are presented to the user

nonehttp://corp…/welcome/…http basichttp://corp…/policyhtml2ntlmhttp://corp…/preso.ppt1SecureURL#

Results

ssohttp://int…/customer.jspn

Index x

401200 200

DatabaseFile sharesContent Mgmt.

Traditional search technology for millions of docs

+

Disaster Recovery Server

+Patch Deployment Management Server

+

Volume License Management Server

Google Architecture: 10M documents in a box

Health Vine SimplicityPatients

Immediate Family

Community

Where’s your GSA??

The State of Missouri’s use of Google GSA

Where was Missouri?

16 Executive AgenciesNo common web searchNo unified way for citizen’s or businesses find information about State Government.

Where is Missouri??

Centrally Managed Google GSAFront Ends and Collections provided to all State Government entitiesCommon search across all State Government web contentReliable information now easily found by citizens and businesses

top related