Big Challenges for Big Data Security
Prof. Oliver Popov PhD Student: Irvin Homem
PhD Student: Spyridon Dossis
• Two major research areas
– Natural Sciences
– Humanities and Social Sciences
• >60000 students
• 2000 postgraduate students
• 5000 faculty and employees
• 4 Nobel Prize winners
• Participation in international networks
– EUA, IMHE, NCI, UNICA etc.
2013-10-10 Department of Computer and System Sciences/SU
• Established in 1965 – First and the largest CS
department in Sweden
• Part of the Faculty of Social Sciences
• Located in Kista Science City, third largest ICT
cluster in the world
• 5400 students annually
• 300 employees
• Major Profile Areas
– e-Government, TEL, ICT4D, RATS
Department of Computer and System Sciences/SU 2013-10-10
Agenda
• Everything BIG
– Data
– Opportunities
– Challenges
– Security Problems
2013-10-10 Department of Computer and System Sciences/SU
What makes Big Data…BIG?
• NIST: “The set of technical capabilities and management
processes for converting vast, fast and varied data into
useful knowledge”. (one of the myriad definitions)
• The point at which traditional data management, tools for
analysis and practices no longer apply.
2013-10-10 Department of Computer and System Sciences/SU
“We are drowning in
information, while
starving for wisdom”
E.O. Wilson (Harvard
University)
Characteristics of Big Data
• Large scale analytics
• Distributed redundant data storage
• Parallel task processing
• Fast data insertion
• Central management and orchestration
• Hardware agnostic
• Accessible
• Extensible
• Cost effective
2013-10-10 Department of Computer and System Sciences/SU
The Three V’s of Big Data
• Volume
– Humankind has produced 5 exabytes until 2003
– Today, it is 5 exabytes every 10min
– 90% of total data produced in the last 2 years
• Variety
– Structured (data warehouses)
– Semi-structured (XML, graph, pdf)
– Unstructured (natural text, video)
• Velocity
– Batch processing
– Stream processing
2013-10-10 Department of Computer and System Sciences/SU
Current Limitations in Large Scale
Analytics
2013-10-10 Department of Computer and System Sciences/SU
• Computing and storage: Insufficient
CPU power and disk I/O random access
speeds.
• Data traffic jam: Difficulties in mass
network transfers.
• Scarcity and potential: Available
resources are not enough for current
power and space needs
“Smarter, not faster is the future of
computing research” E. Lazowska
(Washington University)
• Scientific experimental measurements
– SETI, LHC, SKTA, Genome projects
• Computer simulations
• E-commerce and search engines
• Social media
– Facebook, Twitter, Google
• Internet of Services / Things
Sources of Big Data
2013-10-10 Department of Computer and System Sciences/SU
Big Data Impact Areas
• Natural Sciences
• Biological and Medical Research
• Telecommunications and Networking
• Social Network Analysis
• National (Cyber-)Intelligence, and counting…
2013-10-10 Department of Computer and System Sciences/SU
Big Data Opportunities
• Completeness
• Personalization
• Real-time
• Data Relationships
• Exploration
Department of Computer and System Sciences/SU 2013-10-10
Big Data Building Blocks
• Hardware parallelism
• Scalable and elastic computing and storage
infrastructures (such as cloud computing,
cluster-based systems)
• Parallel programming frameworks (such as
MapReduce, workflows)
• Service-oriented architectures
• Distributed database systems
• Federated security mechanisms
• Models for information representation
(ontologies) and data mining
Department of Computer and System Sciences/SU 2013-10-10
Harnessing Big Data
• (Semi)automated data-driven decision making
• Better planning and forecasting
• Risk quantification (thus avoiding elusiveness)
• Consolidation of government data
• Adaptive e-services
Department of Computer and System Sciences/SU 2013-10-10
Big Challenges for BIG DATA
• Highly distributed sources
• Authenticity & provenance
• Velocity and heterogeneity
• Systems diversity
• Parallel, distributed scalable algorithms
• Security and integrity
• Sharing and integration
• Massive visualization
• Governance & curation
• Privacy, retention & compliance
Department of Computer and System Sciences/SU 2013-10-10
Big Data Concerns
• Legality
– Collection, disclosure, consolidation/correlation
– Data ownership, control and rights
• Data quality
– Accuracy, relevance, timeliness
• Disparate data meanings
– Semantic coherence
• Overconfidence in data and models
– Consistent justification, analytical integrity
• Privacy
– Generalization in science vs. Particularization in
business
Department of Computer and System Sciences/SU 2013-10-10
Mind the Big-
Data –
InfoSec Gap
• Lack of clear definition of Big Data and related
products maturity / awareness
• Knowledge gap among security practitioners
on the value Big Data can provide
• Slow current adoption by organizations while
regarded as strategic priority for IT
• Need for speed and prioritization of high risk,
low-frequency security events
Department of Computer and System Sciences/SU 2013-10-10
Adding Value
to InfoSec
Department of Computer and System Sciences/SU 2013-10-10
Source: Gartner (March 2012)
Big Data Security Myths
• Big data security - no different from traditional
security
– Storage, query and processing models are different
• Big data is not used in production systems
– Similar notion with the Internet in the late 90’s
• Existing security tools work with big data
– Security products may affect deployment, scalability and
communication protocols, limiting big data capabilities
• No sensitive data is stored in big data clusters
– Correlation may result is sensitive data
• Security is only needed on the back-end
– Big data has a lot of links to all ends
Department of Computer and System Sciences/SU 2013-10-10
• Security must be built in rather than an after-
thought.
• Plugging in commonplace security mechanisms
into big data applications is usually non-trivial,
intractable and hence not sufficient.
• Separation of concerns / duties for
administration and management
• Need for strong federated identification /
authentication solutions
Department of Computer and System Sciences/SU 2013-10-10
• Distributed nodes create a complicated
environment, avoiding the traditional security
“choke-point” that would impede scalability
• Data “sharding” cancels the traditional data
security model
• Granular Authentication, Authorization &
Accountability on inter-node communication
• Current tools have not been thoroughly security
reviewed (e.g. OWASP)
• Security on metadata and transaction logs
• Data mining & Analytics ignore privacy
Department of Computer and System Sciences/SU 2013-10-10
• File layer encryption and key management
• Automated configuration and patch management
• Monitoring & filtering (Distributed & Real time)
– Avoid introducing a single point of failure
• Audit and logging (meta-Big-Data)
• Harden the infrastructure
– Node authentication (e.g. Kerberos)
– Traffic encryption (e.g. TLS)
– Protect the management plane
Department of Computer and System Sciences/SU 2013-10-10
• Performance gains over traditional SIEM tools for
log/network event aggregation, correlation and
search
• Network flow and packet analysis for anomaly
detection (e.g. botnets, e-crime syndicates)
• Behavior profiling for detecting “low-noise”
Advanced Persistent Threats
• Community-based reputation scoring and
malware detection
• Identity and access intelligence
• Threat-intelligence networks
Department of Computer and System Sciences/SU 2013-10-10
• Need for transparency in data collection, tools
and techniques
• Abuse the “wealth of the data” to influence,
manipulate and restrict the “Quantified Self”
• Power balance between data producers and
inference/decision makers
• Development of “Big Data Ethics”
– Data-driven but not data-ruled
Department of Computer and System Sciences/SU 2013-10-10
Systems Analysis & Security (SAS) Unit
Key activities
• decision and risk analysis
• big data, innovation and eGov services
• data mining
• simulation of complex systems.
• security, privacy and trust
• digital and cyber forensics
Projects:
• ICT NG (Formas), EnRiMa (EU), STORK 2(EU), IRIS (Vinnova), DEDAL (VR),
iMENTORS (EU), Multimodal communication , SSL (SU), e-SENS (EU),
SENS4US (EU) and DFET (EU)
• eGov lab including the Cyber Systems Security (CS2) lab with platform for
simulation of security, privacy, events and forensics analysis
Cooperation
• SE government (local and national), EU, Sida, UAS, UCL, IASSA, leading
universities in Europe, North America and Asia
Department of Computer and System Sciences/SU 2013-10-10
Systems Analysis & Security (SAS) Unit
• STORK 2.0
– Federated, cross-border authentication and authorization
• eSENS
– Secure infrastructure for interoperable public services in Europe
• DFET
– Cloud-based cybercrime training environment to include real life
simulation and scenario analysis
• Scalable and Automated Aggregation of Forensic
Evidence from the Internet of Things
• Semantic Integration and Analysis of Digital Security &
Forensic Data
2013-10-10 Department of Computer and System Sciences/SU
Open for collaboration and cooperation
• Analyzing the Digital Past for Improving the
Digital future (such as e-Discovery for Business
Intelligence)
• Digital services – for preserving security, privacy
and integrity of access and usage, while being
aware of accountability and responsibility, the
impact of surveillance and the issue of data
retention – simply - building the digital trust and
trustworthiness
Department of Computer and System Sciences/SU 2013-10-10
Thank you
International networks
2013-10-10 Department of Computer and System Sciences/SU
Contact: Oliver Popov ([email protected])