Top Banner
Web Scraping @alvaro_aguirre Saturday, November 5, 2011
34

Web Scraping using Diazo!

May 19, 2015

Download

Technology

pythonchile

Web Scraping using Diazo!

Talk given at the StarTechConf 2011
Santiago, Chile
www.startechconf.com
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Web Scraping using Diazo!

Web Scraping@alvaro_aguirre

Saturday, November 5, 2011

Page 2: Web Scraping using Diazo!

In search of our cosmic origins...

Saturday, November 5, 2011

Page 3: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 4: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 5: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 6: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 7: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 8: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 9: Web Scraping using Diazo!

Data Scraping vs

Web Scraping

Saturday, November 5, 2011

Page 10: Web Scraping using Diazo!

<html>

<header></header>

<body>

.....

</body>

</html>

Data Scraping

Saturday, November 5, 2011

Page 11: Web Scraping using Diazo!

Web Scraping

Saturday, November 5, 2011

Page 12: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 13: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 14: Web Scraping using Diazo!

DeliveranceXDV

Diazo

Saturday, November 5, 2011

Page 15: Web Scraping using Diazo!

Diazo

Saturday, November 5, 2011

Page 16: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 17: Web Scraping using Diazo!

<replace css:content=”h1” css:theme=”#main” />

Saturday, November 5, 2011

Page 18: Web Scraping using Diazo!

<drop css:content=”h1” />

<drop css:theme=”breadcrumbs” />

Saturday, November 5, 2011

Page 19: Web Scraping using Diazo!

<replace css:theme=”#header” content=”#header-element” if-content=”” />

Saturday, November 5, 2011

Page 20: Web Scraping using Diazo!

<drop css:theme="#info-box" if-path="/news"/>

Saturday, November 5, 2011

Page 21: Web Scraping using Diazo!

<theme/><notheme/><replace/><before/><after/><drop/><strip/><merge/><copy/>

Saturday, November 5, 2011

Page 22: Web Scraping using Diazo!

<replace css:theme="#details"> <dl id="details"> <xsl:for-each css:select="table#details > tr"> <dt><xsl:copy-of select="td[1]/text()" /></dt> <dd><xsl:copy-of select="td[2]/node()"/></dd> </xsl:for-each> </dl></replace>/></dt>

<table id="details"> <tr> <td>One</td> <td>1</td> </tr> <tr> <td>Two</td> <td>2</td> </tr></table>

<dl id="details"> <dt>One</dt> <dd>1</dd> <dt>Two</dt> <dd>2</dd></dl>

Saturday, November 5, 2011

Page 23: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 24: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 25: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 26: Web Scraping using Diazo!

Tools

Saturday, November 5, 2011

Page 27: Web Scraping using Diazo!

External Content

Saturday, November 5, 2011

Page 28: Web Scraping using Diazo!

Saturday, November 5, 2011

Page 29: Web Scraping using Diazo!

• development of web & mobile interfaces

• legacy apps integrations

• prototypes

• low coupling

Saturday, November 5, 2011

Page 30: Web Scraping using Diazo!

from diazo.compiler import compile_themefrom lxml import etreefrom diazo.compiler import compile_theme

absolute_prefix = "/static"

rules = "rules.xml"theme = "theme.html"

compiled_theme = compile_theme(rules, theme, absolute_prefix=absolute_prefix)

transform = etree.XSLT(compiled_theme)content = etree.parse(some_content)transformed = transform(content)

output = etree.tostring(transformed)

Saturday, November 5, 2011

Page 31: Web Scraping using Diazo!

github/aaguirre

Saturday, November 5, 2011

Page 32: Web Scraping using Diazo!

diazo.org

Saturday, November 5, 2011

Page 33: Web Scraping using Diazo!

plone.org

Saturday, November 5, 2011

Page 34: Web Scraping using Diazo!

gracias!

Saturday, November 5, 2011