Top Banner
Collecting useful information from web with open source tools
27

Collecting web information with open source tools

May 13, 2015

Download

Technology

Sammy Fung

my lightening talk slide at coscup 2011, taipei
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Collecting web information with open source tools

Collecting useful information

from web with open source tools

Page 2: Collecting web information with open source tools

@sammyfung

Page 3: Collecting web information with open source tools

Hong Kong

Page 4: Collecting web information with open source tools

First chairman of Hong Kong Linux User Group

Page 5: Collecting web information with open source tools

opensource.hk webmaster

Page 6: Collecting web information with open source tools

How does programmers

solve problemsin daily life ?程式員解決

現實問題的方法 ?

Page 7: Collecting web information with open source tools

Coding!就是寫程式 !

Page 8: Collecting web information with open source tools

a lot of popular web sites

running on II$ in Hong Kong.

香港很多大型網站都是用 II$

Page 9: Collecting web information with open source tools

Very slow when you're using!當你在用的時候,就會很慢!

Page 10: Collecting web information with open source tools

Visiting websites manually, repeatly for any latest update.

為了追蹤最新消息,人手重覆重瀏覽同一網站

Page 11: Collecting web information with open source tools

Will you still addicted to plurk/twitter without

auto new response/reply alert ?

如果沒有自動新回應提示 , 你還會沉迷噗浪

和推特 ?

Page 12: Collecting web information with open source tools

What do you need ?你需要甚麼 ?

Page 13: Collecting web information with open source tools

Regular Expression

Page 14: Collecting web information with open source tools

HTML Parser

Page 15: Collecting web information with open source tools

Web Crawling Framework

Page 16: Collecting web information with open source tools

scrapy.org

Page 17: Collecting web information with open source tools

About Scrapy

written in python

x = HtmlXPathSelector(response)

torrent = TorrentItem()

torrent['url'] = response.url

torrent['name'] = x.select("//h1/text()").extract()

<h1>Hello World</h1>

Page 18: Collecting web information with open source tools

all of above are available in

open source!以上所有的也有

開源軟件

Page 19: Collecting web information with open source tools

Problem #1 a lot of popular web sites

running on II$ in Hong Kong.

Page 20: Collecting web information with open source tools

develop a list of football matches live

on cable tv做了「電視足球直播時間表」

Page 21: Collecting web information with open source tools

Problem #2 some web sites doesn't

provide data API.

Page 22: Collecting web information with open source tools

Hong Kong Weather Info香港天氣

Page 23: Collecting web information with open source tools

@weatherhk

Page 24: Collecting web information with open source tools

Alerts of Tropical Cyclones in Northwest Pacific Ocean

@tctrack @tropicalhk

Page 25: Collecting web information with open source tools

Path and Forecast of active tropical cyclone

Page 26: Collecting web information with open source tools

Let's solve your own problems with

open source tools.所以多多利用開源軟件

來解決你生活上遇到的問題吧

 Thank you! 謝謝 !

Page 27: Collecting web information with open source tools

solving problems with open source.

Thank you.