Top Banner
August 2013 Institute for Big Data Analytics – Dalhousie University Big Data Analytics and Advanced Computer Networking Scenarios: Research Challenges and Opportunities Stenio Fernandes CIn/UFPE, Recife, Brazil
74

Big Data Analytics and Advanced Computer Networking Scenarios

Jan 15, 2015

Download

Technology

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. August 2013 Institute for Big Data Analytics Dalhousie University Big Data Analytics and Advanced Computer Networking Scenarios: Research Challenges and Opportunities Stenio Fernandes CIn/UFPE, Recife, Brazil

2. Agenda A bit of technical background Measurements and Analysis in Computer Networks Advanced Networking Architectures Software-Defined Networking (SDN) Information-Centric Networking (CCN) Network Visualization (NV) Tools and Techniques for High-Performance Network Traffic Analysis Visual Analytics, GPU, Map Reduce Applied Research on Computer Networking Opportunities and Directions Research agenda CIn/UFPE and DalhousieU 3. TECHNICAL BACKGROUND Measurements and Analysis in Computer Networks 4. Essential (Core) motivation Profiling Internet traffic is an essential task for precise network management At both access and backbone networks It provides useful information for Proper (re) configuration of networks Deployment of accurate policies (security, routing, throttling, capping, etc) Optimization of network resources Support for network design and planning Counterattack abnormal behavior 5. Why Operators need Internet profiling? Network-wide Reporting Performance/reliability troubleshooting Security Traffic engineering Capacity planning Generating basic information about usage and reliability Detecting and diagnosing anomalous events Detecting, diagnosing, and blocking security problems Adjusting network configuration to the prevailing traffic Deciding where and when to install new equipment 5 6. Reporting Examples Total volume of traffic sent to/from each private peer Mixture of traffic by application (e.g., Web, Streamin g, P2P, SPAM) Mixture of traffic to/from individual customers Usage, loss, and reliability trends for each link Requirements Network-wide view of basic traffic statistics Ability to have different views: by application, by customer, by peer, by link type Real-time and offline monitoring of high- speed links 6 7. Core Network Troubleshooting Detecting and diagnosing problems Recognizing and explaining anomalous events Why a backbone link is suddenly overloaded? Why DNS queries are failing with high probability? Why a router processor has high CPU utilization? Why a customer cannot reach certain networks? 7 8. Core Security Detecting and diagnosing problems Recognizing suspicious traffic or disruptions Examples Denial-of-service attack on a customer or service Spread of a worm or virus through the network Router hijack Requirements Detailed measurements from multiple places Include payload inspection, in some cases Online analysis of the data Installing filters to block the offending traffic 8 9. Core Traffic Engineering Active queue management and link scheduling Green Networking Resource allocation policies Divert traffic from congested links Balance load on peering links Link-scheduling weights to reduce delay for premium traffic Examples Network-wide view of the traffic carried in the backbone Timely view of the network topology Analytical models to assess and predict performance of control operations Requirements 9 10. Core Capacity Planning Deploying new equipment What? Where? When? Examples Where to put the next backbone router When to upgrade a link to higher capacity Whether to add/remove a particular peer Whether the network can accommodate a new customer Whether to install a caching proxy Requirements Projections of future traffic patterns from measurements Cost estimates for buying/deploying the new equipment Model of the potential impact of the change (e.g., latency reduction and bandwidth savings) 10 11. TECHNICAL BACKGROUND Measurements, Analysis, and Modeling 12. Technical Background: Measurements Packet More detailed: from link to application layer (with timestamps) Huge storage and processing requirements Header or payload (full or partial) Flow Flow summaries connection info, number of packets, duration, volume IPFIX/CISCOs NetFlow v5/v9 records Aggregate SNMP counts 13. Measurements: Packets 14. Measurements: Flows Sampling Technique Flow Monitoring Tool F4 F3 F2 F1 F4 F3 Representative flow sample Collected, classifiedflows Network Packets Flow Collector Router: flow building Collector: flow storage 31 2 4 GUI: flow analysis and reporting 5 On-line sampling Off-line sampling Traffic Management and Analysis Live Network 15. Technical Background: Analysis of Packet Traces IP header Traffic volume by IP addresses or ASes Burstiness of the stream of packets Packet properties (e.g., sizes, out-of-order) Transport header Traffic breakdown by protocol TCP congestion and flow control Number of bytes and packets per session Application header URLs, HTTP headers, file type DNS queries and responses, mobile devices 15 16. Core Modelling maximize insight into the data set extract important variables detect outliers and anomalies develop parsimonious models Exploratory Data Analysis Does the data follow a particular PDF? Maximum Likelihood Estimation Hypothesis testing Statistics Inference 17. FUNDAMENTAL RESEARCH CHALLENGES 18. Research Challenges: Measurements Network-wide view Crucial for evaluating control actions Multiple kinds of data from multiple locations Large scale Large number of high-speed links and routers Large volume of measurement data The do no harm principle (passive measurements) Dont degrade router performance Dont require disabling key router features Dont overload the network with measurement data 22 19. Research Challenges: Packet Measurements Building efficient DPI engines 1 packet every 5ns!!! Based on DFA/NFA from regular expressions that express application signatures For hardware-based or commodity platforms Update of app signatures database Encrypted traffic is not possible Analysis of packet payload forbidden in a number of countries 20. High-Performance Traffic Monitoring Systems Large number of application signatures Complexity of the signature patterns Unpredictability of signature location in the network flow, as well as within the packet payload Performance bottlenecks at OS and hardware levels Visual Analytics 21. Research Challenges: Flow level Analysis Tries to identify application or classes of applications without looking at the payload May extract high-level models for unsupervised classification and learning Less data volume to analyse Still tough to do it in real-time in high-speed links from 40Gbps and beyond Address privacy issues for lawful interception 22. EVOLUTION OF COMPUTING SERVICES 23. Server, OS, Programming Platforms Several abstraction layers in programming, db, etc, but networking 24. Networking Services 25. NEW NETWORKING ARCHITECTURES Software Defined Networking (SDN) 26. SDN Motivation Current networks cannot support this growth! -Not service-oriented -Static configuration -Status not available to apps/users -Cannot provide dynamic negotiation to users 27. Motivation: economics 28. The Need for a New Network Architecture (The ONF view) key computing trends: Changing traffic patterns contrast to client-server applications todays apps access different services access to content and applications from any type of device, anywhere, at any time The rise of cloud services agility to access applications, infrastructure, and other IT resources on demand and la carte Big data means more bandwidth Mega datasets is fueling a constant demand for additional network capacity in the data center 29. Limitations of Current Networking Technologies (The ONF View) Meeting current market requirements using device-level management tools and manual processes Complexity that leads to stasis The static nature of networks is in stark contrast to the dynamic nature of todays environment Inconsistent policies To implement a network-wide policy, thousands of devices and mechanisms must be configured Inability to scale traffic patterns are dynamic and unpredictable users with different apps and performance needs 30. SDN (the ONF view) Emerging network architecture where network control is decoupled from forwarding and is directly programmable Migration of control into accessible computing devices enables the underlying infrastructure to be abstracted for applications and network services can treat the network as a logical or virtual entity Network intelligence is (logically) centralized SDN controllers maintains a global view of the network Network appears to the applications and policy engines as a single, logical switch infrastructure gains vendor-independent control over the entire network from a single logical point 31. SDN Architecture 32. Motivation: what drives SDN research and development? Reduced network costs (CAPEX / OPEX) Support to Innovative New Products (applications, services) Synergy with Cloud Computing Services and Infrastructure And most importantly: Real time network programmability This is the quest for networks with improved performance while keeping them simple, scalable, and smart 33. Innovation Roadblocks vs. Enablers for Big Data Analytics Roadblocks from the Network Layer Proprietary software in network devices Developers have to rely on the network as is Support for data-intensive science and applications One-size-fits-all approach to network data flows Enablers from the Network Layer Let developers communicate with and program the network itself Allow developers to optimize the network for specific applications Support for data-intensive science and applications Allow special solutions to high- performance data flows Include support to network programmability 34. Internet2 SDN use case 35. Internet2 SDN infrastructure 36. A Simplified View of SDN 1. A network in which the control plane is physically separate from the forwarding (data) plane A single control plane controls several forwarding devices 37. Consequences of SDN adoption 1. Hardware and Software from different vendors 2. Simplified Programmability 3. Enable application-level control/programming of network 4. Enables centralized control, which implies simplification of network operations 5. Prospective integration with Network Virtualization technologies (cf. next section) 38. Supporting SDN with OpenFlow First standard communications interface for SDN between the control and forwarding layers It allows direct access to and manipulation of the forwarding plane of network devices both physical and virtual (hypervisor-based) OpenFlow IS NOT SDN! 39. SDN - Challenges North (apps) to South (devices) Traffic Pattern Needs precise classification systems Needs model building At high-speed Real-time Adapt to abrupt and long-term changes Cope with millions to billions of flows in short-term (e.g., mice flows in 5min time window) Core challenge: decide which service policy to be applied to a flow (Classification and optimization problem) 40. OF-based SDN Benefits (1/2) Centralized control of multi-vendor environments use SDN-based orchestration and management tools to quickly deploy, configure, and update devices across the entire network Reduced complexity through automation develop tools that automate many management tasks Higher rate of innovation Allowing operators to program and reprogram the network in real time to meet specific business needs and user requirements 41. OF-based SDN Benefits (2/2) Increased network reliability and security define high-level configuration and policy statements More granular network control apply policies at a very granular level session, user, device, and application levels Better user experience Centralized network control and state information available to higher-level applications Infrastructure can better adapt to dynamic user needs E.g.: Adaptive Video Streaming 42. SDN: Virtual Cloud 43. SDN: Research Challenges (1/2) SDN Architecture Design accommodating consistency, dependability, and scalability requirements control plane: centralized or distributed processing? controller placement problem How many? Where to place them? How to distribute tasks? Maximizing fault tolerance and dependable infrastructure to support high-performance intra-DC data exchange for Big Data Analytics Optimized Policy Framework automatic policy transformation 44. SDN Challenges (2/2) Resiliency to security and DoS attacks Vulnerability in the Control Plane Multi-Dimensional Aggregation of Rules Use multi-dimensional tags Ensure policy consistency Example: Mobile Infrastructure 45. NEW NETWORKING ARCHITECTURES Network Virtualization 46. NV: concepts What is NV? Decoupling of the services provided by a (virtualized) network from the physical network Virtual network is a container of network services (L2 - L7) provisioned by software Faithful reproduction of services provided by physical network Analogy to a VM complete reproduction of physical machine (CPU, memory, I/O, etc.) 47. NV: concepts 48. Business Model for NV Players: 1. InP: Infrastructure Provider 2. Virtual Network Provider/Operator 3. SP: Service Provider 4. End-user 49. NV: Mapping problem 50. NEW NETWORKING ARCHITECTURES Information-Centric Networking (ICN) 51. ICN: Motivation Traditional Internet communication model is based on end-to-end communication There is a growing need of highly scalable and efficient distribution of content CDN is a success although might be seen as a patch Information driven communication breaks the traditional packet-based model allowing an content-centric communication ICN architectures takes advantage of in-network storage multiparty communication interaction models (e.g., publish-subscribe) 52. ICN: Technical Background New location-independent approach to communicate more suitable for content distribution ICN architectures are replacing where with what Ruled by the consumers of data Interest and Data packets i) a content consumer asks for some content by broadcasting its interest to all nodes it can reach ii) any node that receives the Interest packet and has the content responds with a Data packet 53. ICN: Technical Background The basic operation of an ICN node is similar to an IP host A packet arrives on an interface A longest-match lookup is performed on its name Building blocks for ICN architectures Information Objects Content Naming Security Content Forwarding In-Network Caching Routing and Transport 54. ICN: Technical Background Information Objects (IO) IO represents content information without taking in consideration its storage location and physical representation IO can have multiple copies of itself Content Naming treat content as a network primitive Unique, Persistence, Scalability Hierarchical or Flat Naming 55. ICN: Technical Background Security Content Validation Name Persistence Owner Authentication and Identification Content Forwarding 56. ICN: Technical Background In-Network Caching store temporarily content in the network core elements small but popular content generates most Internet traffic Heavy-tailed nature of Internet traffic Routing and Transport IO identifiers are not bind to a specific location common topology-based routing and forwarding algorithms are not effective for routing Ios Current Architectures: CCN Publish-Subscribe Internet Routing Paradigm (PSIRP) 4WARD-Netinf Dona CCNx 57. ICN: challenges Scalability To be effective, routers should be able to keep TBs of information in cache Security naming scheme that allows both self-certification and human-friendly identification while avoiding the use of a PKI is an open issue Privacy makes information visible and identifiable at the network level Economic model Adoption of ICN depends not only on technical aspects 58. TOOLS AND TECHNIQUES FOR HIGH- PERFORMANCE NETWORK TRAFFIC ANALYSIS Visual Analytics 59. VA: Motivation Effectively use the immense wealth of data and information acquired, computed, and stored analysts can get lost in irrelevant or inappropriately processed or presented information For computer networks, acquisition of raw data is no longer a problem Visualization techniques might be very effective but for some analyses, pure visualization do not completely expose insights hidden in the data 60. VA: definition Science of analytical reasoning supported by highly interactive visual interfaces, transcending simple and direct data visualization, and requiring active user participation 61. VA: supporting technologies 62. VA example 63. VA: Challenges Challenges for Visualization Systems for computer networks data Limited scalability Knowledge discovery Appropriateness to perform data transformation Data presentation Interaction with the visualization system Hardware bottlenecks Multi-attribute visualization 64. TOOLS AND TECHNIQUES FOR HIGH- PERFORMANCE NETWORK TRAFFIC ANALYSIS Graphical Processing Units (GPU) 65. TOOLS AND TECHNIQUES FOR HIGH- PERFORMANCE NETWORK TRAFFIC ANALYSIS Map-Reduce 66. Research Challenges and Opportunities Cloud Computing Services are driving huge changes in the computer networking field Distributed and hybrid clouds will be a reality soon Moving massive amount of data to be moved SDN seems to be a smart solution to address scalability and other issues for Big Data NV is available as the supporting technology CCN is a paradigm shift and might face barriers to full deployment Opportunities for advanced research is everywhere in those new scenarios Content is becoming king in networking 67. Center For Informatics (CIn) Federal University Of Pernambuco (UFPE) Recife, Brazil About 68. CIn/UFPE ~42K students, ~1K PhD professorsUFPE Top 5 CS Graduate Program in Brazil Evaluation: CAPES level 6 (scale 1 to 7) Top 10 most important CS Research Center in Latin America Recognition 80+ PhD professors ~25% CNPq Research ChairsFaculty Computer Science, Computer Engineering, Information SystemsPrograms 69. 2000+ students International collaboration: Europe, Asia, and North America Research Projects (Private and Public funded) CNPq, CAPES, FACEPE Samsung, Ericsson, Motorola, Nokia, LG, HP, etc Recipient of a number of awards: 2011 Most Innovative Brazilian Research Center Microsoft Imagine Cup (since 2005) ACM Intl. Programming Marathon Recruitment: Google, Microsoft, Facebook CIn/UFPE 70. Leucotron Mecaf Itautec Motorola 2003 Waytec Ericsson Leucotron Mecaf Itautec Motorola 2004 Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2005 Epson Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2006 Positivo Epson Engetron Samsung Ericsson Leucotron Mecaf Itautec Motorola 2007 Siemens Positivo Epson Engetron Samsung Ericsson HP Mecaf Itautec Motorola 2008 Sankwang Positivo Epson Engetron Samsung Ericsson HP Celestica Itautec Motorola 2009 Motorola 2002 Megaware Elcoma Foxconn Sankwang Positivo Epson Engetron Samsung Ericsson HP Celestica Itautec Motorola 2010 1 4 6 7 8 9 10 10 13 71. Research Agenda with Dalhousie International Science & Technology Partnership (ISTP) and Pernambuco State Research Funding Agency (FACEPE) UFPE, Dalhousie University GSTS, Neurotech ~ CAD 2Mi over 2 years New R&D program Open to new ideas and interests Further Collaboration 72. Recife, Pernambuco, Brazil