Top Banner
Visualizing Relationships: Journalistic Problems in a Digital Age
14

Visualizing Relationships: Journalistic Problems in a Digital Age

Jan 13, 2015

Download

Devices & Hardware

3Pillar Global

A presentation from Marcos Vanetta, Technical Lead and web developer at 3Pillar Global, and Mariano Blejman of Spanish-language newspaper Pagina 12 that was given at the 2012 Mozilla Festival in London, England.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visualizing Relationships: Journalistic Problems in a Digital Age

Visualizing Relationships: Journalistic Problems in a

Digital Age

Page 2: Visualizing Relationships: Journalistic Problems in a Digital Age

2

Summary1. Introduction2. The Problem we are solving3. Involved issues4. Problems we found5. The Challenge

2© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 3: Visualizing Relationships: Journalistic Problems in a Digital Age

3

WHO ARE WE?

• Mariano Blejman is a technology editor and youth editor in Argentine newspaper Página/12, and Hacks/Hackers Buenos Aires co-founder. @blejmanevel

• Marcos Vanetta is a biomedical engineer. Software developer at 3PillarGlobal and hacker at Hacks/Hackers Buenos Aires. @malev

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 4: Visualizing Relationships: Journalistic Problems in a Digital Age

4© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

HACKS/HACKERS BUENO AIRES

Page 5: Visualizing Relationships: Journalistic Problems in a Digital Age

5

THE PROBLEM• 1976 A dictatorship started in Argentina.

• 30,000 persons were kidnapped and disappeared.

• 1985 First trials happened in Argentina. They judged the bad guys but we have to stop.

• 2003 Justice start judging the bad guys again.

• 2012 Large amount of judicial documents.

No one can read all of them

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 6: Visualizing Relationships: Journalistic Problems in a Digital Age

6

INVOLVED ISSUES• Semantic Analytics

• Ontology

• Data Mining

• Social Network Analysis

• Visualizations

Who were dealing with documents?

DocumentCloud, Overview, Open Calais, NLTK, Gate

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 7: Visualizing Relationships: Journalistic Problems in a Digital Age

7

FIRST APPROACH

Read all the documents

Software solution based on regular expressions Ruby, Padrino and MySQL database.

def self.extract_plain_text(path)basename = File.basename(path).split('.')[0..-

2].join('.')tmp_dir = Dir.tmpdirDocsplit.extract_text(path, :output =>

tmp_dir, :ocr => false)text = File.open(File.join(tmp_dir,

"#{basename}.txt")).readself.clean_text(text)

end

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 8: Visualizing Relationships: Journalistic Problems in a Digital Age

8

THE PROBLEMS WE FOUND• Convert text from pdf files

• Extract entities from documents

• Parse dates and addresses

• Co-reference names resolution

• How to store relations

• Documents contextual information

• Confidence on data on a crowdsourcing platform.

Visualizing Relationships over the Time

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 9: Visualizing Relationships: Journalistic Problems in a Digital Age

9

WHAT DO WE HAVE NOW?Prototype for a single (and local) use case: mapa76

Platform for different use cases: analice.me

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 10: Visualizing Relationships: Journalistic Problems in a Digital Age

10

THE VISUALIZATIONS THAT WE IMAGINED

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 11: Visualizing Relationships: Journalistic Problems in a Digital Age

11© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 12: Visualizing Relationships: Journalistic Problems in a Digital Age

12

THE VISUALIZATIONS THAT WE FOUND

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 13: Visualizing Relationships: Journalistic Problems in a Digital Age

13© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential

Page 14: Visualizing Relationships: Journalistic Problems in a Digital Age

14

THE #MOZFEST CHALLENGE

Find a big journalistic issue that involves:

• Lot of documents with unstructured data

• Lot of data to find inside

• What relationships do you wants to find

© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential