Supporting Big Data, Open Data, Data Analytics and Data Science Dr Simon Price Research IT Manager
Supporting Big Data, Open Data, Data Analytics and Data Science
Dr Simon PriceResearch IT Manager
2
• Bristol is a research-intensive university
• 6 Faculties: Social Science & Law, Science, Engineering, Arts and two Medical Faculties
• Employs 2000+ researchers (excluding PhDs)
• Each year (approximately):• 1500 research funding applications• £100M research income• 4500 research outputs
3
Outline
1. Big Data2. Open Data3. Data Analytics4. Data Science
5. Implications for IT support
4
Big Data
5
Big Data
• Lots and lots of technology buzzwords!• Some important ones:
• MapReduce• The Hadoop stack
• Distributed file systems• Query languages & programming languages
• NoSQL databases (columns, document, graph, ...)
6
MapReduce in a nutshell
Image source: https://developers.google.com/appengine/docs/python/dataprocessing/
7
Big Data
• Trends in Hadoop stack• Near realtime analytics• Streaming analytics• In-memory
• Trends in NoSQL• Relational and NoSQL moving closer together
8
Open Data
9
Open Data - data.bris• Each PI allocated 5TB "forever"• Research Data Management• Open Data Publication
10
Open Data - public data
11
140+ datasets live on opendata.bristol.gov.uk Some real time data Transport API repository now available Examples
Government: Elections since 2007 Community: Quality of Life survey Education: School Results Energy: Installed PV, Energy Use in Council Buildings Environment: Real time & Historic Air Quality, Flood Alerts (EA) Land use: 2013 Planning applications Health: Life expectancy/ Mortality, Obesity, NHS Spend
Bristol is Open - datasets
12
Data Analytics
• Operational focus• variables are "known knowns and known unknowns"
• Descriptive• summarisation known variables and alerting
• Predictive• correlations between known variables
13
Data Science
• Multidisciplinary data-intensive research• Focus on research insights, causation and prediction• Usually involves Machine Learning and Statistics
• Different perspectives:• Computer Scientists view DS as a research domain• Statisticians view DS as a research domain• Other academics view DS as a service
14
3 May 2023
15
3 May 2023
16
Implications for IT support
• Governance• Shift from IT-owned to academic-owned (Shadow IT)
• Skills• IT experts need to train and trust academics• Nurture internal skills pipeline (interns, postgrads)
• Systems• Mixed economy of internal and external