Top Banner
VISUAL ANALYSIS AND HISTORICAL DISCOVERY Summer School on Big Data Information Visulisation Chandan Kumar (University of Oldenburg) Julia Juergens (University of Hildesheim) Percy Perez (University of St. Andrews) Victoria Hore (University of Oxford) BRIGHTSOLID: NEWSPAPER DATASET
16
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visual Analysis and Historical Discovery

VISUAL ANALYSIS AND HISTORICAL DISCOVERY

Summer School on Big Data Information Visulisation

Chandan Kumar (University of Oldenburg) Julia Juergens (University of Hildesheim) Percy Perez (University of St. Andrews) Victoria Hore (University of Oxford)

BRIGHTSOLID: NEWSPAPER DATASET

Page 2: Visual Analysis and Historical Discovery

Data description • Newspapers

• Fife Herald 1833-1878 • The Dundee Courier & Argus 1890-1899

• Data set

• 154 GB of XML files • 16 048 issues (1 METs file for 1 issue) • 77 954 pages (1 ALTO file for 1 page) • no images

Page 3: Visual Analysis and Historical Discovery

Data files

Title MET

- OCR errors - No meaning

ALTO

Page 4: Visual Analysis and Historical Discovery

Methodology

Page 5: Visual Analysis and Historical Discovery

Architectural overview

Page 6: Visual Analysis and Historical Discovery

Data processing • 20 years data analyzed

• 12 years have complete titles • 8 years do not have complete titles • 6189 files analysed • 314 meta files per year ( Avg)

• 12 years => 3754 issues • Word counting, formating files to/from XML, D3 and Jigsaw

• Hadoop processing was impressive

Page 7: Visual Analysis and Historical Discovery

Idea generation • What happened in the 19th century?

• Find interesting stories

• Where were events happening? • Overview of mentioned locations

• What were the most common topics? • Overview of frequent words • Categorization of words

• Who was mentioned? • Entity recognition of names

Page 8: Visual Analysis and Historical Discovery

Visualization (overview)

Page 9: Visual Analysis and Historical Discovery

Visualization (overview)

Page 10: Visual Analysis and Historical Discovery

Visual Exploration with Jigsaw • Jigsaw already has good functions and visualizations!

Page 11: Visual Analysis and Historical Discovery

Visualisations (Beyond Jigsaw) • More numerical analysis

• User selected dimensions and exploration

• Dynamic visualization

• topics, locations, entities

• Pattern analysis

Page 12: Visual Analysis and Historical Discovery

Interactive visualisation

Page 13: Visual Analysis and Historical Discovery

Dynamic exploration

Page 14: Visual Analysis and Historical Discovery

Insights • Industrial revolution in Dundee

• Frequency analysis, cluster overview, positive sentiments

• LATEST MOVEMENTS OF DUNDEE JUTE FLEET • Entity relations, bigram analysis

• Calcutta, Indian subcontinent? • Location-commercial significance

• Baxter Brothers was the world's largest linen manufacturer (1840-1890) • Family names-organization

Page 15: Visual Analysis and Historical Discovery

Conclusions • A really steep learning curve • Big data is BIG • Distributed computing is important • Data wants to tell interesting stories (we just need to interact) • Visualisation is powerful • Jigsaw is awesome • Lot of useful visualisation tools are ready to be used

• Generalizations and Interactions (future work)

Page 16: Visual Analysis and Historical Discovery

THANK YOU FOR THE COOL (SCHOOL) EXPERIENCE

Big thanks to BRIGHTSOLID for providing the interesting dataset

Chandan Kumar Julia Juergens Percy Perez Victoria Hore