Top Banner
Professor Andrew Prescott, Theme Leader Fellow AHRC Digital Transformations Strategic Theme Big Data: Some Initial Reflections
26

Big Data: Some Initial Reflectons

Nov 30, 2014

Download

Internet

Andrew Prescott

Slides from presentation to AHRC internal staff seminar, April 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data: Some Initial Reflectons

Professor Andrew Prescott, Theme Leader Fellow

AHRC Digital Transformations Strategic Theme

Big Data: Some Initial Reflections

Page 2: Big Data: Some Initial Reflectons

• The Met Office currently generates about 20TB of data each day

• ‘The problems which confront the meteorologist today will be faced by the humanities scholar within ten years’

Page 3: Big Data: Some Initial Reflectons

• Large Hadron Collider: 600 million ‘collision events’ per second

• One million jobs run by servers each day, with over 10 GB of data per second transferred at peak times

• Approx. 20 petabytes of data produced annually• Over 70 universities involved in processing the data

Page 6: Big Data: Some Initial Reflectons

• Some working definitions of big data• Big data exceeds the capacity of existing

desktop machines and networks: you need help to deal with it

• Data that is so large that existing methods of analysis simply don’t work: you have to change your methodology (probably to something quantitative)

• Gartner definition: “Big data” is high-volume, -velocity and –variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Page 7: Big Data: Some Initial Reflectons

Examples of everyday big data of research value

• Retail data generated by supermarkets• Online retail data: Amazon• Transport information: Oyster card• Hospital data• Data from utility companies• Social media

Page 8: Big Data: Some Initial Reflectons

Visualisation of languages used in tweets in London in Summer 2012: Centre for Advanced Spatial Analysis, UCL:

http://mappinglondon.co.uk/2012/londons-twitter-tongues/

Page 9: Big Data: Some Initial Reflectons

Wolphram Alpha analytics of my Facebook friends

Page 10: Big Data: Some Initial Reflectons

Analytic of my friend network

Page 11: Big Data: Some Initial Reflectons

Does Big Data Yet Exist for the Humanities?

Page 12: Big Data: Some Initial Reflectons

Letter of Gladstone to Disraeli, 1878: British Library, Add. MS. 44457, f. 166

The political and literary papers of Gladstone preserved in the British Library comprise 762 volumes containing approx. 160,000 documents.

Page 13: Big Data: Some Initial Reflectons

George W. Bush Presidential Library:200 million e-mails

4 million photographs

Page 15: Big Data: Some Initial Reflectons

‘Big data’ has already been an issue for linguists for many years

Page 16: Big Data: Some Initial Reflectons

Another familiar example of big data in the humanities: censuses

Page 17: Big Data: Some Initial Reflectons

Moving images and sound present some of the most challenging big data issue for arts and humanities

Page 18: Big Data: Some Initial Reflectons

Archives and library catalogues as big data: Visible Archive browser: visiblearchive.blogspot.com

Page 19: Big Data: Some Initial Reflectons

Visualisation by Jon Orwant of Google of Library of Congress subject categorisations of books published

between 1600 and 2010: winedarksea.org

Page 20: Big Data: Some Initial Reflectons

Commons Explorer: experimental interface to allow exploration of large quantities of images in Flickr

Commons: http://mtchl.net/cex/

Page 21: Big Data: Some Initial Reflectons

The Anglo-American Legal Tradition: web site holding seven million images of medieval

legal records in the National Archives: www.aalt.law.uh.edu

Page 22: Big Data: Some Initial Reflectons

Fabio Lattanzi Antinori,The Obelisk (2012): Open Data Institute: http://www.theodi.org/culture/obelisk-2012

Page 23: Big Data: Some Initial Reflectons
Page 24: Big Data: Some Initial Reflectons

Asia Trend Map: predicting popularity of games, manga and anime: www.asiatrendmap.jp

Page 25: Big Data: Some Initial Reflectons

Some Big Data Issues

• Research has historically been hypothesis-driven; is a more data-driven research required?

• How valid are predictive and probabilistic techniques in arts and humanities research?

• Data quality issues: do we lose a sense of the context and stratigraphy of the data?

• Danger of thinking that data=truth

Page 26: Big Data: Some Initial Reflectons

Digital Transformation theme and Big Data

• Theme seeks to promote new research methods: using digital tools and materials to develop completely new type of scholarship

• Additional funding of £4m has been allocated to work on big data

• Following this workshop, call for big data projects will be issued

• Smaller projects (up to £100k)• Larger projects (up to £600k)