Top Banner
BOSS around the web Souri Datta Structured Data Extraction Team http://www.flickr.com/photos/sumrow/1267682594/sizes/l/
32

HackU IIT Kgp 2013 BOSS + CA

Nov 22, 2014

Download

Technology

souridatta

Presentation talks about BOSS and Content Analysis along with Dapper.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HackU IIT Kgp 2013 BOSS + CA

BOSS around the web

Saurabh Sahni YDN Developer, Hacker, Evangelist

Souri DattaStructured Data Extraction Team

http://www.flickr.com/photos/sumrow/1267682594/sizes/l/

Page 2: HackU IIT Kgp 2013 BOSS + CA

BOSS is Build your own search service

http://developer.yahoo.com/search/boss/

Page 3: HackU IIT Kgp 2013 BOSS + CA

Provides APIs

To our Searchdatabase

Page 4: HackU IIT Kgp 2013 BOSS + CA

TO BUILD your ownpowerful

Search applications

Page 5: HackU IIT Kgp 2013 BOSS + CA

BOSS allows you to search over

Web, images, news & Blogs

Page 6: HackU IIT Kgp 2013 BOSS + CA

What can be done on top of BOSS?

• Blend and re-rank search results

• Your own look and feel• Mix it with other APIs

Page 7: HackU IIT Kgp 2013 BOSS + CA

BOSS Pricing

Page 8: HackU IIT Kgp 2013 BOSS + CA

Free for building your hacks!!

Page 9: HackU IIT Kgp 2013 BOSS + CA

BOSS uses OAuth for securityCode : https://github.com/sourind/hacku/

Page 10: HackU IIT Kgp 2013 BOSS + CA

Get a FREE consumer key and

secret

http://hackyourworld.org/hacku/

Page 11: HackU IIT Kgp 2013 BOSS + CA

http://developer.yahoo.com/yql/console/

Page 12: HackU IIT Kgp 2013 BOSS + CA
Page 13: HackU IIT Kgp 2013 BOSS + CA

3. Copy This url

1. Select yql query

2. Select output format

Page 14: HackU IIT Kgp 2013 BOSS + CA
Page 15: HackU IIT Kgp 2013 BOSS + CA

Finding images of “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="images"

and ck="..." and secret="..."

Page 16: HackU IIT Kgp 2013 BOSS + CA

Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com

select * from boss.search where q="The Dark Knight Rises" and

sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."

Page 17: HackU IIT Kgp 2013 BOSS + CA

Spell Check and Correction

select * from boss.search where q="The Dirk Knight Rises" and service="spelling" and

ck="..." and secret="..."

Page 18: HackU IIT Kgp 2013 BOSS + CA

Finding news on “The Dark Knight Rises”

select * from boss.search where q="The Dark Knight Rises" and service="news" and

ck="..." and secret="..."

Page 19: HackU IIT Kgp 2013 BOSS + CA

Finding interesting objects:Content Analysis

select * from contentanalysis.analyze where text="Sachin Tendulkar is batting very well"

Page 20: HackU IIT Kgp 2013 BOSS + CA

Content Analysis from a URL

select * from contentanalysis.analyze where url="http://www.cnn.com/"

Page 21: HackU IIT Kgp 2013 BOSS + CA

Lets See it in Action!

Page 22: HackU IIT Kgp 2013 BOSS + CA

Query Cheatsheet• Find images of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="images" and ck="..." and secret="..."

• Find reviews of “The Dark Knight Rises”• select * from boss.search where q="reviews

intitle:The Dark Knight Rises" and service="web" and ck="..." and secret="…"

• Search for Avatar but not the movie: • select * from boss.search where q="Avatar -

movie" and ck="..." and secret="... "

• Search pdfs of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and type="pdf" and ck="..." and secret="..."

Page 23: HackU IIT Kgp 2013 BOSS + CA

Query Cheatsheet• Find all the news of “The Dark Knight Rises”• select * from boss.search where q="The Dark

Knight Rises" and service="news" and ck="..." and secret="..."

• Get long abstracts in the results• select * from boss.search where q="The Dark

Knight Rises" and abstract="long" and ck="..." and secret="…"

• Retrieve 51-100 results of the query• select * from boss.search where q="The Dark

Knight Rises" and start=51 and ck="..." and secret="... "

Page 24: HackU IIT Kgp 2013 BOSS + CA

EXAMPLES

Page 25: HackU IIT Kgp 2013 BOSS + CA

duckduckgo.com

Page 26: HackU IIT Kgp 2013 BOSS + CA
Page 27: HackU IIT Kgp 2013 BOSS + CA

Data Extraction

Page 28: HackU IIT Kgp 2013 BOSS + CA

Why extraction is difficult?• Internet has lot of information• Not all can be processed by machines

– Unstructured data– E.g. DiscountedPrice and RedcudedPrice of a

product (both mean the same)

• Ultimate aim is to publish data in structured format

• Most simple way- xml,json

Page 29: HackU IIT Kgp 2013 BOSS + CA

Web Scraping• Demo Dapper

Page 30: HackU IIT Kgp 2013 BOSS + CA

More Resources• Yahoo! BOSS:

http://developer.yahoo.com/boss • BOSS Technical Documentation: http://

developer.yahoo.com/search/boss/boss_api_guide/

• Content Analysis : http://developer.yahoo.com/contentanalysis/

• Oauth sample code : https://github.com/sourind/hacku/

Page 31: HackU IIT Kgp 2013 BOSS + CA

Questions??http://www.flickr.com/photos/reem_unique/4119729692/

Page 32: HackU IIT Kgp 2013 BOSS + CA

• http://slideshare.net/souridatta

• https://github.com/sourind/

Thanks!!