Top Banner
Predicting the News of Tomorrow Using Patterns in Web Search Queries Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology
13

Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Dec 17, 2015

Download

Documents

Lizbeth Oliver
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Predicting the News of Tomorrow Using Patterns in

Web Search Queries

Kira Radinsky, Sagie Davidovich, Shaul MarkovitchComputer Science Department Technion – Israel Institute of technology

Page 2: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Goal

"We find that changes in oil prices strongly predict future stock market returns in many countries in the world... The impact of this predictability on stock returns tends to be large.“ (“Striking Oil: Another Puzzle?”Gerben Driesprong, Benjamin

Maat and Ben Jacobsen)

Oil Peaks and Stock Market Crashes

NEW YORK – Crude-oil futures shot up as commodities

markets benefited from a surge in investor confidence.

Light, sweet crude for January delivery settled $4.57, or

9.2%, higher at $54.50 a barrel on the New York

Mercantile Exchange. January Brent crude on the ICE

futures exchange settled $4.74, or 9.6%, higher at

$53.93 a barrel.

Humans can predict eventsCan it be done automatically?

Page 3: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Solution OutlineIdentify events that occur today

More than 0.5 billion daily searches on the web (2008)

Many queries are related to current events

Analyze what events tend to follow today’s events in the pastHistory repeats itselfQuery log archives

Page 4: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

• Google Hot Trends• Technorati• Online news (Newzingo)

Knowledge Sources

July 08

Aug 08

Sep08

Page 5: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

July 08

Aug 08

Sep08

Identifying EventsHurricane Ivan

Hurricane Wilma Hurricane

Dean

Hurricane Gustav

Hurricane Katrina

Peak Detection AlgorithmEach maximum point my has at most two neighboring minimumpoints. We consider a maximum point as a peak if:

1. Local maximum my> Δ1 (high-pass filter).2. The difference between the point my and the lowest of its neighboring minimum points is above Δ2.

Page 6: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Prediction

Indication Weight1. : How many of the peaks of w2 (future

candidate) appeared k days after w1 (today’s term)

2. Saliency of w1: Significance of the peak in the search volume.

hurricane

Storm

Flood

Weather

Evacuation

Gas

Economics

TalibanWar

South Asia

china

pope

texans

0.85

0.40

0.10

0.36

0.12

0.30

0.05

0.01

0.08

Goal: For each candidate term evaluate the likelihood of it to appear in the future, given today’s terms.

Likelihood to appear in k days

Future candidate

terms

Today’s salient

terms

Indication weight on

the candidate

0.9

0.7

212 | tktt wPwwP

Page 7: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Hurricane

Gas

Oil, Gas May Soar as Storm Shuts U.S. Gulf ProductionCrude-oil and natural-gas prices may soar after Hurricane Katrina moved into production regions of the Gulf of Mexico, forcing companies including Exxon Mobil Corp. and Chevron Corp. to close operations

Gas Prices Rise

as Industry Assesses Storm

Damage HOUSTON — Gasoline prices rose

Saturday by an average of five

cents a gallon across the country as

the oil industry anticipated

disruptions at several refineries

along the Texas coast because of

Hurricane Ike.

Hurricane

Page 8: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Empirical MethodologyTesting on aggregation of 4500 online news

sources

What is “to appear in the news”Appear significantly more times than its

average in the past year

Precision at 100

Page 9: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Empirical Evaluation

• Baseline method - What happens today happens tomorrow• Each point is how many of the 100 appeared• A total of 30 days of experiments

Page 10: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Empirical Evaluation

• Baseline method - What happens today happens tomorrow• Each point is an average of results from 30 days of tests

Page 11: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Empirical Evaluation

• Baseline-related – 100 terms which are related to today’s terms are selected randomly• Each point is how many of the 100 appeared• A total of 30 days of experiments

Baseline - Related

Baseline - Related

Page 12: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Empirical Evaluation

• Cross-Correlation - Not using indication weights• Each point is how many of the 100 appeared• A total of 30 days of experiments

Page 13: Kira Radinsky, Sagie Davidovich, Shaul Markovitch Computer Science Department Technion – Israel Institute of technology.

Conclusions

A new method for prediction of global future events using their patterns in the past.

A novel application of aggregated collection of search queries, represented as a time series of a search term.

Testing methodology for evaluating such news prediction algorithms.