Top Banner
Hermes Distributed social network monitoring system Daniel Cea and Jordi Nin Barcelona Supercomputing Center BSC Universitat Politècnica de Catalunya UPC {dcea, nin}@ac.upc.edu
25

Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Jul 31, 2015

Download

Data & Analytics

NoSQLmatters
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Hermes

Distributed social network monitoring system

Daniel Cea and Jordi Nin

Barcelona Supercomputing Center (﴾BSC)﴿́

Universitat Politècnica de Catalunya (﴾UPC)﴿

{dcea, nin}@ac.upc.edu

Page 2: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Index

1.  Introduction

2.  Technologies

3.  Implementation

4.  Results

5.  Conclusions

Page 3: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

1. Introduction

Problem formulation

Objectives

Page 4: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Problem formulation

Platform to build social relations among people

who share interests, activities, backgrounds or

real-‐life connections.

New issues born:

Privacy, child safety,

addiction.

4/25

Page 5: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Problem formulation §  Rise of social networks -‐> Big amounts of

social data.

§  Two main problems: Multiple sources +

Hardware limitations.

§  Solution: Implement a distributed, scalable

social media analyser ready to gather from

multiple sources and show the aggregated results in real-‐time.

5/25

Page 6: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Objectives

Input web interface:

§  Start a new query.

§  Control the data

recollection.

§  Query history.

6/25

Page 7: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Objectives

Backend:

§  Render interfaces.

§  Gather data from external

APIs.

§  Enrich and store data into a

NoSQL database.

7/25

Page 8: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Objectives

Output web interface:

§  See aggregated

results.

§  Filter results.

§  Customize how the

results are displayed.

8/25

Page 9: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

2. Current Technologies

Data Access

Data Process

Data Storage

Page 10: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Data Access

Twitter Stream API (﴾ready to add other sources)﴿

10/25

Page 11: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Data Process

JavaScript (﴾client and server side)﴿

§  Platform: Node.js

§  Web framework: Express

§  Sentiment analysis:

Dictionaries obtained from Amazon Turk*

* Amy Beth Warriner, Victor Kuperman, Marc Brysbaert. "Norms of valence, arousal, and dominance for 13,915 English

lemmas”. December 2013, Ghent university. 11/25

Page 12: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Data Storage

CouchBase (﴾Storage)﴿ + ElasticSearch (﴾Indexing)﴿

12/25

Page 13: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

3. Implementation

Description

Data access layer

Business logic layer

Enrichers

Page 14: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Description

Implementation structured in 3 layers, following a Model

View Controller pattern:

•  Data access -‐> Storage and indexing of documents

(﴾ json)﴿ and queries.

•  Business logic -‐> Start query, manage data stream,

process + enrich tweets, send them to storage.

•  User Interface -‐> Allow user control of the system.

14/25

Page 15: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

15/25

Page 16: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

16/25

Page 17: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Enrichers Stream slots implement the following data enrichers:

§  Device enricher: Determines the device used to

write the message.

§  Geo enricher: Filters messages by geo-‐location.

§  Spain enricher: For messages coming from Spain,

determines the autonomous community.

17/25

Page 18: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Enrichers §  Stopwords enricher: Remove stop words from

the text.

§  Stemmer enricher: Applies a stem to the prior

filtered words.

§  Sentiment enricher: Determines the sentiment

and arousal of the stemmed message.

18/25

Page 19: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

4. Results

Use case: 9N referendum

Page 20: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Use case: 9N referendum

§  What? -‐> The 9N

unofficial Catalonian

independence referendum

§  When? -‐> from 7th Nov.

2014, to 11th Nov. 2014.

§  Where? -‐> Catalonia

20/25

Page 21: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Use case: 9N referendum §  How? -‐> Storing all tweets with filters:

§  Location: none.

§  Language: none.

§  Text: Contains “9N”.

§  Time: From Nov 7 at 00:00 to Nov 11 at 23:59.

§  Why? -‐> Analyse the reactions in the world before,

during and after the referendum.

21/25

Page 22: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

5. Conclusions

General conclusions

Future work

Page 23: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

General conclusions

§  NoSQL Technologies are crucial for the project. Couchsbase + Elasticsearch + kibana works

perfectly.

§  Elasticsearch is flexible enough for allowing fast

developing and performing real time queries

§  Kibana allows us to create fancy plots with few

effort

23/25

Page 24: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Future work

§  More data sources.

§  Better data enrichment.

§  Add user data context.

§  Percolation queries

24/25

Page 25: Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Hermes

Thank you for your attention