On the Internet Delay Space Dimensionality

Post on 11-May-2015

1568 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

We investigate the dimensionality properties of the Internet delay space, i.e., the matrix of measured round-trip latencies between Internet hosts. Previous work on network coordinates has indicated that this matrix can be embedded, with reasonably low distortion, into a 4- to 9-dimensional Euclidean space. The application of Principal Component Analysis (PCA) reveals the same dimensionality values. Our work addresses the question: to what extent is the dimensionality an intrinsic property of the delay space, defined without reference to a host metric such as Euclidean space? Is the intrinsic dimensionality of the Internet delay space approximately equal to the dimension determined using embedding techniques or PCA? If not, what explains the discrepancy? What properties of the network contribute to its overall dimensionality? Using datasets obtained via the King [14] method, we study different measures of dimensionality to establish the following conclusions. First, based on its power-law behavior, the structure of the delay space can be better characterized by fractal measures. Second, the intrinsic dimension is significantly smaller than the value predicted by the previous studies; in fact by our measures it is less than 2. Third, we demonstrate a particular way in which the AS topology is reflected in the delay space; subnetworks composed of hosts which share an upstream Tier-1 autonomous system in common possess lower dimensionality than the combined delay space. Finally, we observe that fractal measures, due to their sensitivity to non-linear structures, display higher precision for measuring the influence of subtle features of the delay space geometry.

Transcript

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

ACM/SIGCOMM Internet Measurement Conference, 2008

InetDim: Characterizing the Internet Delay Space Dimensionality

Bruno Abrahao Robert Kleinberg

Cornell University

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Develop a geometric framework for exploring properties of networks

Understanding allows us to make predictions on the consequences of the growth and evolution of the network and guide the design of distributed systems.

InetDim Project

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Concerned with properties of the Internet that contribute to the overall dimensionality of its latency space.

B. Abrahao and R. Kleinberg, On the Internet Delay Space Dimensionality, In Proc. of ACM/SIGCOMM Internet Measurement Conference (IMC 2008)

Current Investigation

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Internet Delay Space―matrix of all-pairs round trip times between Internet

hosts

Dimensionality

―We’ll consider several definitions in this talk

―Value which abstracts notion of network complexity

Definitions

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Why to study the Internet delay space dimensionality?

Question

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Network embedding (GNP, Vivaldi)― Models the network as contained in a

vector space

― Estimate real distances with geometric distances

Elegant, compact, relative success

BUT they suffer from ― inherent embedding distortion

― disappointing accuracy [Lua et al., IMC’05][Ledlie-Gardner-Seltzer, NSDI’07]

Coordinate-based positioning systems

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Meridian [Wong et al. SIGCOMM’04]

Iplane [Madhyastha et al. OSDI’06]

Relatively more accurate

BUT

― Measurement intensive

― Strong scaling assumptions

Measurement based positioning systems

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Dimensionality of target space is a tunable parameter

― Critical parameter influencing accuracy, convergence, stability [Dabek et al., SIGCOMM’04] [Ledlie-Gardner-Seltzer, NSDI’07]

Question: What is the optimal value?

― too low: high distortion

― too high: inefficiency

Motivation 1: Coordinate-based positioning systems

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Prior work: delay space can be embedded into an 5- to 9-dimensional Euclidean space with “reasonably” low distortion [Ng-Zhang’01; Tang-Crovella’03]

Is this value optimal?

Invariant with scaling?

― For worst-case metric space, dimensionality increases logarithmic with the cardinality of the metric [Bourgain 75]

Motivation 1: Coordinate-based positioning systems

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Scaling assumptions underlying Meridian

― Latency space is a metric of bounded doubling dimension D

Motivation 2: Measurement-based positining systems

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Question 1: What properties of the Internet contribute to its dimensionality? And to what extent?

Question 2: Is the Internet delay space dimensionality homogeneous or it is made up of many lower-dimensional pieces?

― Hierarchical embedding [Zhang ‘06]

Motivation 3: Internet properties

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

How to generate synthetic realistic delay spaces?

Previous work [Zhang, IMC’06]

― Statistical properties preserved

Generative models?

Motivation 4: Synthetic delay space generation

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Part I: Studies different methods for characterizing the Internet delay space geometry

Part II: Demonstrates how to use these tools for predicting dimensionality shifts cause by structural properties

Outline

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Uses raw measurements collected via the King method by Meridian (5200 hosts) and P2PSim (1953 hosts)― Filter out pairs with < 10 measurements total in both

directions

― Latency = median of measurements in both directions

― Eliminate ambiguity which arises from missing values

― Approximate the largest clique with the remaining pairs

― After filtering

―Meridian: 2385 hosts

―P2PSim: 298 hosts

Datasets 1/2

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Major limitation is availability of datasets

Other publicly available datasets: Dimes, Planetlab, DS2, King (Harvard) , …― Small cliques, collected over long periods of time

Combined with geolocation obtained by querying hostip.info

Datasets 2/2

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Meridian geolocation visualization

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

QuestionHow to estimate the Internet Delay Space dimensionality?

What methods were applied in prior work? What are their assumptions?

― data can be approximately embedded into a low-dimensional Euclidean space

― the distance matrix can be accurately approximated by a low-rank matrix

Part I: Dimensionality measures

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Embedding Dimension

Benefit: recovers coordinates of pointsSuggests a dimensionality value between 4 and 7

The dimensionality can be estimated using an embedding algorithm, such as Vivaldi

Meridian

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Embedding Dimension Shortcomings (1/2)

Embeddings using 8D and beyond are worse than 7D!

Curse of dimensionality: embedding algorithm is overwhelmed with so many degrees of freedom

Meridian

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Existing network embedding algorithms are suboptimal and slow to converge!

Finding an embedding that minimizes distortion is intractable [Matoušek-Sidiropoulos ‘08]

Fails if the measured distances reflect a metric other than Euclidean distance

Algorithm often produces lots of empty space, unnecessarily inflating the estimate of dimensionality

Embedding Dimension Shortcomings (2/2)

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Approximates the distance matrix by a low-rank matrix

Rotates the axes so that data variance is better captured by the components

Principal Component Analysis

Meridian

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Assumes that the dimensions are independent― e.g. surface of a sphere is reported as a 3D object

The dimensionality cut-off is non-obvious to determine

Principal Component Analysis

?

??

Meridian

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Question: How to avoid assuming a specific host metric space or assuming linearity?

If a dataset exhibits power-law behavior in its statistical or structural properties, one can model and measure its dimensionality using Fractal Measures

Intrinsic notion of dimensionality

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Power-law behavior over two decimal orders of magnitude

Intrinsic dimensionality metrics

Correlation Fractal Dimension (pair-count plot)Fractal measures indicate dimensionality values less than 2

Meridian P2PSim

Includes almost all intra-continental distances

(usec)

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Correlation Dimension

x

r

# samples within distance r is proportional to r

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Correlation Dimension

x

r

# samples within distance r is proportional to2r

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Sampling from a unit square

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

There is an infinite family of fractal dimensions, parameterized by q

Some practical dimensions are― Hausdorff Dimension― Shannon’s Entropy― Correlation Dimension

For manifolds, i.e., spaces locally modeled on , the fractal dimension coincides with d

Fractal Dimension Family

0D

1D

2D

d

Easy to measure

Sensitive to non-linear behavior

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

How much does the embedded structure deviate from the original space?

Correlation Dimension of Embedded Matrix

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Question: What doe the fractal behavior implies about the Internet? Is it self-similar? Recursive?

― [Li, Alderson, Willinger, Doyle, 2004]

― [Mitzenmacher, 2004]

Fractal behavior may also arise from non-recursive structures

― e.g., snowflakes, coastlines, surface of the human brain, etc.

Fractals and Internet Models

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Questions

What features contribute to the delay space behavior? To what extent?

Why is it useful to study the Internet through the lens of the fractal measures?

Part II: Dimensionality Components

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Power-law extending over two orders of magnitude.

Geolocation does not account for the whole delay space dimensionality, although it is a strong component

Geographic Location

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Transit links between Tier-1 AS’es contained on a significant fraction of Internet routesDecomposition into overlapping subsets, each rooted at a Tier-1 network

Dimensionality Reducing Decomposition

Notice: not a partition, due to multihomed networks

Superposition of these pieces may inflate the dimensionality

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Approximation to ground truth decomposition using a snapshot of the inferred topology [Oliveira et al., SIGMETRICS’08]

Dimensionality Reducing Decomposition

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

We measured each piece separately using― Embedding dimension

― PCA

― Correlation dimension

― Hausdorff dimension (see paper)

Dimensionality Reducing Decomposition

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Dimensionality Reducing Decomposition Results

Embedding dimension and PCA were insensitive to decomposition

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Dimensionality Reducing Decomposition Results

Fractal measures capture the structural change and report a dimensionality reduction

Power-law is preserved over the same range for subsets, however, with different exponents

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Is the reduction in dimensionality due to other side effects of the decomposition?― Pieces of smaller diameter?

―Decompose the network according to geographical and latency-based clustering

Sanity Check 1

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Diverts the set from the power-law behavior

Distance-based Clustering

• Not comparable to topology-based decomposition

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Is the reduction in dimensionality due to other side effects of the decomposition?― Pieces of smaller cardinality [Bourgain’ 75] ?

―Generated 139 random subnetworks with smaller cardinality (i.e., the median of Tier-1 subnetwork the sizes)

Sanity Check 2

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

Random subsets with lower cardinality

• Random subsets of lower cardinality do not explain the dimensionality reduction

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Fractal tool reports a dimensionality shift which was not captured by Embedding Dimension or PCA

• What explain the discrepancy?

Dimensionality Paradox

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Dimensionality reducing decomposition is real and not an artifact of the fractal methodology

Isomap

Space is rich in non-linearity

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Dimensionality is critical aspect influencing performance of algorithms

• Fractal nature and non-linear behavior: Internet delay space dimensionality better characterized by fractal measures― Lightweight

― Sensitive to Internet features and structural changes

• Properties influencing dimensionality • Internet delay space dimensionality is not homogeneous

Conclusion

Bruno Abrahao Robert Kleinberg On the Internet Delay Space Dimensionality

• Inetdim Project Website― King datasets annotated with IP addresses, code, more info

― http://www.cs.cornell.edu/~abrahao/inetdim

Further Info

top related