Top Banner
ENTER 2015 Research Track Slide Number 1 A practical approach to big data in tourism: a low cost Raspberry Pi cluster Mariano d’Amore, Rodolfo Baggio, and Enrico Valdani Bocconi University, Italy
23

A practical approach to big data in tourism: a low cost Raspberry Pi cluster

Jul 15, 2015

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 1

A practical approach to big data in tourism:

a low cost Raspberry Pi cluster

Mariano d’Amore, Rodolfo Baggio, and Enrico ValdaniBocconi University, Italy

Page 2: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 2

Dedicated to all those who loved messing around with:

Page 3: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 3

OSN & Big Data

Page 4: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 4

OSN & Big Data

• Online Social Networks (OSNs) contain innumerable trails of people’s activities

• Big Data

– an incredible opportunity for its supposed capacity to provide answers to practically any question that could be asked about people’s behaviors, views and feelings

Page 5: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 5

Big Data: what

• Size, but mainly fragmentation, variability, and mixture of structured and unstructured data

• Tend to give rigorous answers to ambiguous questions– overrate irrelevant phenomena, unjustified clusterings etc.

• Risky correlations– significant but meaningless correlations, correlation/causation …

• Can be influenced and biased (Google bombing)

– watch out for propagation of wrong results

• Semantic analysis still too «clumsy»– language and cultural difficulties

• Hype: as for all fashions: many excesses …• Good complement to «traditional» research, but no total

replacement• Still need clear objectives and rigorous data collection

Page 6: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 6

Big Data: how

• Data collection from OSNs via APIs:

– from providers (Gnip, Datasift, Topsy)

– plugins for other SW applications (NodeXL, Gephi)

– autonomous equipment• some programming skills (open source libraries)

– Python

• dedicated hardware

– long elapsed times due to APIs constraints

– NB: better use more machines in parallel to avoid IP blocking

Page 7: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 7

Big Data research

• Research approach remains the same, the use of technology makes the process different

• Without a strong background on research methodology, use of technology alone is useless

Page 8: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 8

Our work: objective

• Build a system for accessing OSNs Big Data autonomously

– scalable open source technology

– make the collection of OSN data manageable while focused on specific objectives

– functionality demonstration via a simple mapping ‘exercise’

• A solution:

– cluster of Raspberry Pi machines running Linux with Python programs for accessing OSN APIs

Page 9: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 9

Page 10: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 10

Page 11: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 11

A credit card sized single-board low-cost computer developed by

the Raspberry Pi Foundation (www.raspberrypi.org)

• Embedded Single Board Computer (SBC) – small size: 85,60mm x 53,98mm (credit-card size)

• Project started in 2006 – objective: build a cheap open source computer for educational

purposes (cost = $35)

• Officially available on 29 February 2012 at 6.00 UTC – original plan: 10 000 pieces– sold (mid 2014) > 3 million machines

Page 12: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 12

Technicalities• SoC (System on Chip) Broadcom BCM2835

– CPU 700 MHz ARM11 ARM1176JZF-S (ARMv6)– No real time clock– GPU Broadcom VideoCore IV, OpenGL ES 2.0, OpenVG 1080p30 H.264 high-profile

encode/decode– SDRAM 256/512MB partially shared with GPU

• SDcard as mass memory (4GB; Class 4)

• GPIO & UART 26 pin connector

• HDMI + RCA Video Composite + 3.5mm stereo jack audio

• DSI (Serial Digital Interface) & CSI (Camera Serial Interface) connectors

• 5 status LEDs, Powered DC 5V, 1A via Micro USB connector

• Two models– A: 256MB SDRAM, 1 USB2 port via BCM2835, NO ethernet– B: 512MB SDRAM, 2 USB2 ports via LAN9512, Ethernet 10/100Mbps– new models arriving…

• Linux based OSs (Raspbian)– Python, etc.

• More info on Embedded Linux Wiki (elinux.org/R-Pi_Hub)

Page 13: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 13

RasPi cluster

Page 14: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 14

RasPi cluster

Page 15: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 15

RasPi cluster

Page 16: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 16

RasPi cluster

Page 17: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 17

RasPi cluster

Page 18: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 18

An exercise in geolocation

Page 19: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 19

Python programs

Page 20: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 20

Geolocation exercise

• Retrieved all geotagged elements– Python OSN API libraries– Area: Lugano 5 km– Sources: Facebook, Twitter, Instagram, Foursquare– Time period: 2 weeks– Heatmap: heatmap.js on OpenStreetMap– Markers map: Google Maps API

in 2 weeks, collected:

Page 21: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 21

Heatmap

Page 22: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 22

Markers map

round solid=Facebook

round light=Twitter

square light=Instagram

square solid=Foursquare

Page 23: A practical approach to big data in tourism: a low cost Raspberry Pi cluster

ENTER 2015 Research Track Slide Number 23

Concluding remarks

• Big Data

– good opportunity, but a number of drawbacks & issues

– one problem is resources needed (mainly for SMEs)

• Rasberrry Pi

– a low-cost computing usable system

• Field test: Raspberry Pi cluster

– low-cost (both acquisition & operations)

– «easy» implementation• open source libraries for accessing OSN make task relatively

simple and affordable also by SMEs

– now fully functional