Johns Hopkins & Purdue 1 28 Jan 05 Scalability, Accountability and Instant Information Access for Network Centric Warfare Department of Computer Science.

Johns Hopkins & Purdue 128 Jan 05

Scalability, Accountability and Instant Information Access for

Network Centric Warfare

Department of Computer ScienceJohns Hopkins University

Yair Amir, Claudiu Danilov, Jon Kirsch, John Lane, Jonathan Shapiro, Ciprian Tutu

Chi-Bun Chan, Cristina Nita-Rotaru, David Zage

Department of Computer SciencePurdue University

http://www.cnds.jhu.edu


Network Centric Warfare Applications

• Operate on wide area network settings, communication is often conducted over unreliable channels.

• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.

• Weaker update semantics may be sufficient for several applications.

• Critical information is often not large.

• Every piece of information is usually generated by a unique source.


Network Centric Warfare Applications

• Operate on wide area network settings, communication is often conducted over unreliable channels.

• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.

• Weaker update semantics may be sufficient for several applications.

• Critical information is often not large.

• Every piece of information is usually generated by a unique source.

Fits many non-military applications as well


Dealing with Insider Threats

Project goals:

• Scaling survivable replication to wide area networks.

– Performance, performance, performance.

• Dealing with malicious clients.

– Compromised clients can inject authenticated but incorrect data.

– Maybe hard to detect on the fly.

– Malicious or just an honest error? Can be useful for both.

• Exploiting application update semantics for replication speedup in malicious environments.

– Will not be discussed today.


A Distributed Systems Service Model

• Message-passing system.

• Clients issue requests to servers, then wait for answers.

• Replicated servers process the request, then provide answers to clients.

Server

Replicas 1 o o o2 3 N

Clients

A site


Outline• Introduction.• Client Accountability.

– Concept.– Performance viability.– Applications.

• Scaling wide area intrusion tolerance replication.– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.

• Threshold cryptography based approach.

• Integration.• Summary.


Compromised Clients

• Hard Problem: Compromised clients can inject authenticated but incorrect data into the system, misleading honest clients.– Authentication and access control not sufficient.– An almost ignored problem.

• A new goal: Generic tools for accountability enforcement.– Causality tracking of updates and dependencies

to facilitate instant analysis and regeneration of a clean state once corrupt data is flagged.

– While not inventing the wheel: detecting corrupt updates via external intrusion detection, application-specific knowledge, or human in the loop.


Real-time Accountability Graph

• Solution:

Accountability Graph

Accountability enforcement and causality tracking of updates and dependencies in a Direct Acyclic Graph with periodic snapshots.

Also called A-DAG

C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8

C o rru p t u p d a teC lean u p d a te

Tim

e

A -DAGA -DAG


Real-time Accountability Graph in Action

• Marking: Upon detection of incorrect data, trace it to the corrupt update and from that, mark all causally-dependent updates as corrupted or suspected.

• Regeneration: Real-time State regeneration based on last good snapshot and non-corrupted or non-suspected updates.

• Also useful for online what-if scenarios and for offline damage assessment.

C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8

C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te

Tim

e


Real-time Accountability Graph Optimization

• Premise: Non-compromised clients can be trusted with update dependency reporting.

• Result: Dramatically reduce false-positive rate of suspicious updates by eliminating FIFO links of non-compromised clients.

• Performance: Ability to track and traverse millions of updates within a couple of seconds.

C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8

C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te

Tim

e


A-DAG Performance• Quick enough for many applications

Traversal time as a function of number of updatesAccountability Graph data structure

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Updates (millions)

Tim

e (s

ec)


A-DAG Performance (cont.)• Quick enough for many applications

Traversal Time as a function of the number of clientskeeping 4,000,000 updates as a constant.

Accountability Graph data structure

0

0.5

1

1.5

2

2.5

3

3.5

0 20000 40000 60000 80000 100000 120000

Number of clients

Tim

e (s

ec)


A-DAG Performance (Cont.)• Quick enough for many applications

Traversal time as a function of num. of dependencies for each update,Keeping 4,000,000 updates as a constant and with 100,000 clients.

Accountability Graph data structure

0

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30 35

Number of dependencies

Tim

e (s

ec)


Application to Common Operation Picture (COP)


Real-time Marking of Corrupt and Suspected Data


Application to Open SourceSoftware Development

• Hard Problem: Vulnerability to life cycle attacks.

• A New Goal: – Quick analysis of impact of discovered life-cycle

vulnerability or vulnerabilities.– Insight into where to invest limited resources to

monitor against future life cycle attacks.

• Actually applicable to any software (not just open source).


Capability Dependencies in Red Hat Linux (1997-2004)

Cummulative Distribution Function of Dependent Capabilities

0

20

40

60

80

100

0 10000 20000 30000 40000 50000

Number of Dependent Capabilities

Cap

abili

ties

(%

)

15 distributions: Red Hat 4.1 to Fedora 2.


Capability Dependencies in Red Hat Linux - Zooming In

Cummulative Distribution Function of Dependent Capabilities

0

20

40

60

80

100

0 20 40 60 80 100

Number of Dependent Capabilities

Cap

abili

ties

(%

)

15 distributions: Red Hat 4.1 to Fedora 2.




• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.




State Machine Replication

• Main Challenge: Ensuring coordination between servers.– Requires agreement on the request to be

processed and consistent order of requests.

• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.

• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.


A Replicated Server System

• Maintaining consistent servers [Sch90] :– To tolerate f benign faults, 2f+1

servers are needed.– To tolerate f malicious faults: 3f+1

servers are needed.

• Responding to read-only clients’ request [Sch90] :– If the servers support only benign

faults: 1 answer is enough.– If the servers can be malicious: the

client must wait for f +1 identical answers, f being the number of malicious servers.


Peer Byzantine Replication Limitations

• Limited scalability due to 3 round all-peer exchange.

• Strong connectivity is required.– 2f+1 (out of 3f+1) to allow progress and f+1 to

get an answer.• Partitions are a real issue.• Clients depend on remote information.

– Bad news: Provably optimal.• We need to pay something to get something else.

• Construct consistent total order.• Focus is solely on replica protection.


Peer Byzantine ReplicationBFT [CL99]

State of the Art in Byzantine Replication


Symmetric Wide Area Network

• Synthetic network used for analysis and understanding.

• 5 sites, each of which connected to all other sites with equal latency links.

• Each site has 4 replicas (except one site with 3 replicas due to current BFT setup).

• Total – 19 replicas in the system.

• Each wide area link has a 10Mbits/sec capacity.

• Varied wide area latencies between 10ms - 400ms.


Practical Wide-Area Network

• A real experimental network (CAIRN). • Was modeled in the Emulab facility.• Capacity of wide area links was modified to be 10Mbits/sec to better

reflect current realities.

ISIPC

ISIPC4

TISWPC

ISEPC3

ISEPC

UDELPC

MITPC

38.8 ms1.86Mbits/sec

1.4 ms1.47Mbits/sec

4.9 ms9.81Mbits/sec

3.6 ms1.42Mbits/sec

100 Mb/s< 1ms

100 Mb/s<1ms

Virginia

Delaware

Boston

San Jose

Los Angeles


BFT Wide Area Performance

• Almost out of the box BFT, which is a very good prototype software.

• 19 Replicas. • Does not write to disk.

Update Latency as a Function of Network Diameter (Symmetric Topology)

0

500

1000

1500

2000

2500

3000

0 100 200 300 400

Network Diameter (ms)

Update

Late

ncy (

ms)

5 Clients1 Client


BFT Wide Area Performance (Cont).

• Almost out of the box BFT, which is a very good prototype software.

• 19 Replicas.• Does not write to disk.

Update Latency as a Function of Network Diameter (CAIRN Topology)

0

200

400

600

800

1000

1200

1400

0 1 2 3 4 5

CAIRN Network Multiple( Original CAIRN network has a diameter of45ms)

Update

Late

ncy (

ms)

5 Clients4 Clients3 Clients2 Clients1 Client


BFT Wide Area Performance (cont.)

• Note: A 50ms Symmetric network vs. a Native CAIRN network.

Throughput as a Function of the Number of Clients

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6

Number of Clients

Thro

ughput

(Update

s p

er

second)

CAIRN (45ms diam)

Symmetric (50ms diam)








A New Approach:Hierarchical Architecture

• Each site acts as a trusted logical unit that can crash or partition.

• Between sites:– Fault-tolerant protocols between sites.– Alternatively – Byzantine protocols also between sites.

• There is no free lunch – we pay with more hardware…

Server

Replicas 1 o o o2 3 3f+1

ClientsA site


Constructing a Trusted Entity in the Local Site

• No trust between participants in a site– A site acts as one unit that can only crash if the

assumptions are met.

• Initial idea: – Use BFT-like [CL99, YMVAD03] protocols to

mask local Byzantine replicas.

• How to make sure that local Byzantine replicas cannot misrepresent the site on the wide area network?– Threshold cryptography seems a good

direction.– Also appealing in terms of management.


Hierarchical Architecture Details

Local AreaByzantine

Replication

Mon

itorWide Area

Fault TolerantReplication

Server Replica 1

Wide area representative

Local AreaByzantine

Replication

Mon

itorWide Area


Server Replica 2

Wide area standby

Local AreaByzantine

Replication

Mon

itorWide Area


Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients


BFT As a Potential Building Block

• Fault model: – Less than third can have faults of any kind, including benign

faults - not practical in our opinion.– Will need to write to disk to protect against partial amnesia in

benign faults.

• Consequence:– Current numbers underestimate baseline cost.– Real latency will be higher due to disk writes in each round.

• In Addition:– Very good implementation to demonstrate the concept.– But not as a building block for us going forward (some

stability and robustness issues).

• Can be solved with a new implementation.


BFT Local Area Performance

• BFT in a local area network, 19 replicas, no disk writes.

Throughput as a Function of the Number of Clients (100 Mbits LAN)

0

20

40

60

80

100

120

140

160

180

0 2 4 6 8 10 12

Number of Clients

Th

rou

gh

pu

t (U

pdate

s p

er s

econ

d)


BFT Local Area Performance

• BFT in a local area network, 19 replicas, no disk writes.

Update Latency as a Function of the Number of Clients (100 Mbits LAN)

0

10

20

30

40

50

60

70

0 2 4 6 8 10 12

Number of Clients

Update

Late

ncy (

ms)








Threshold Digital Signatures

Problem: N entities authenticate a message by generating one signature s.t. any k entities can create a valid signature, but k-1 cannot.

Solution: Threshold digital signatures (no party knows the secret key, only its individual share).

Issues with threshold digital signatures: Trusted dealer / decentralized. An insider can submit a ‘bad’ share, requires verifiable secret sharing, but when to do it? Highly interactive in key generation and share verification. Size of the signature increases linearly with the number of players.


RSA Threshold Digital Signatures

Our choice is the RSA threshold signature proposed by Shoup in 1999 (Practical Threshold Signatures, Eurocrypt 2000).

Provides verifiable secret sharing. Size of signature bounded by a constant multiple of n,

where n is the RSA modulus. Security proof in the random oracle model. Signature share generation and verification completely

non-interactive. Accepts schemes where the number of required shares

can be greater than f+1 (fits the agreement problem). Used by other projects: COCA (Cornell), SINTRA (IBM

Zurich).


A trusted dealer generates the public (e,n) and private (d) RSA keys and then splits the private key d to N shares, s.t any k out of N are enough to reconstruct the secret.

Select randomly a polynomial with a k-1 degree (as in Shamir’s secret sharing).

The dealer computes individual shares si. Dealer creates verification proof (involves modular

exponentiation). More expensive than regular RSA, requires also safe

primes.

Threshold RSA: Share Generation


Threshold RSA: Generating a Threshold Signature

Each entity owns a share si - Computes its individual signature and a proof of

correctness (based on individual shares and verification proof).

- Sends the individual signatures and the proof of correctness of the signature to the combiner.

The combiner - Collects all individual signatures.- Verifies that they were generated using the shares from the

initial secret that was split (using the proof of correctness)- Generates the threshold signature.

Much more expensive than one regular RSA signature


Threshold RSA: Verifying the Threshold Signature

Anybody can verify the signature based on the public key (remember that only the private key was split).

Computation cost similar with to a single regular RSA digital signature verification.

Consequence for us: Remote sites only need one

public key per site.


Threshold Cryptography Library

We implemented a library providing support for generating Threshold RSA signatures.

Implementation uses OpenSSL. Can be used by any application requiring

threshold digital signatures. Used to get performance results. Our plans are for the library to be released as

open source when we are happy with it.


Testing Environment

Platform: i686 Intel(R) Pentium(R) 4 CPU 2.6 MHz GenuineIntel GNU/Linux, 256 Mb Ram

Library relies on Openssl :- Used OpenSSL 0.9.7d 17 Mar 2004.

Baseline operations:- RSA 1024-bits sign: 4.8 ms, verify 0.2 ms- Perform modular exponentiation 1024 bits,

3 ms.- Generate a 1024 bits RSA key 160ms.


Threshold RSA: Key Generation

Generate Threshold RSA Key (1024 bits)

0

1

2

3

4

5

6

7

1 2 3 4 5

f (N=3f+1, k=f+1)

Tim

e (

secon

ds)

Generate (N,k)Threshold RSAKey

Generate N RSAkeys


Threshold RSA: Signing

Generate Threshold RSA Signature (1024 bits)

0

50

100

150

200

250

1 2 3 4 5

f (N=3f+1, k=f+1)

Tim

e (

millisecon

ds)

GenerateThreshold RSAPartial Signature

Combine k=f+1Partial Signatureswith proof-verification

Generate aThreshold RSASignature

Generate f+1RSA Signatures


When Do We Need Verifiable Secret Sharing?

Optimistic case: The combiner can check that the signature is correct by using the public key. Proof for correctness and share verification are not needed in such a case, while maintaining all cryptography guarantees.

Malicious case: Signature does not verify: Detect which share(s) are incorrect: Use verifiable

secret sharing. Requires proof for correctness and share verification.

Potentially create a correct threshold signature by using other shares than the ones that were incorrect.

Overall scheme may add a signature on original share (instead of the proof).


Threshold RSA: Signing (cont.)


0

50

100

150

200

250

1 2 3 4 5

f (N=3f+1, k=f+1)

Tim

e (

mil

liseco

nd

s)

Combine k=f+1 PartialSignatures with proof-verification

Generate a Threshold RSASignature with handlingproofs

Generate Threshold RSAPartial Signature withoutProof

Combine k=f+1 PartialSignatures without proof-verification

Generate a Threshold RSASignature without handlingproofs

Generate f+1 RSASignatures


Threshold RSA: Signing (Cont.)


0

5

10

15

20

25

30

35

1 2 3 4 5

f (N=3f+1, k=f+1)

Tim

e (

millisecon

ds)

GenerateThreshold RSAPartial Signaturewithout Proof

Combine k=f+1Partial Signatureswithout proof-verification

Generate aThreshold RSASignaturewithout handlingproofsGenerate f+1RSA Signatures


Threshold RSA: VerifyingThreshold RSA Verification (1024 bits)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 3 4 5

f (N=3f+1, k=f+1)

Tim

e (

millisecond

s)

Verify (N, k)Threshold RSASignatureVerify f+1 RSASignatures


Threshold Cryptography as a Building Block

• Compared with f+1 regular RSA signatures– Better than vector RSA if used inside a more sophisticated

protocol.

• Issues to consider:– Rate of malicious behavior, easier management, message

size overhead, computation overhead.

• Current thinking:– May be used beyond initial goal.

• Some of its properties can help construct an overall better, single protocol, compared with the BFT and Threshold crypto combination.








Overall Architecture

Local AreaByzantine

Replication

Mon

itorWide Area


A-DAG

Server Replica 1

Wide area representative

Local AreaByzantine

Replication

Mon

itorWide Area


A-DAG

Server Replica 2

Wide area standby

Local AreaByzantine

Replication

Mon

itorWide Area


A-DAG

Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients


Impact

New ideas

Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare

ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems such as the Army Future Combat System.

http://www.cnds.jhu.edu/funding/srs/

June 04

Dec 04

June05

Dec 05

C3I model, baseline and demo

Componentanalysis & design

ComponentImplement.

System integration & evaluation

Final C3I demoand baseline eval

First scalable wide-area intrusion-tolerant replication architecture.

Providing accountability for authorized but malicious client updates.

Exploiting update semantics to provide instant and consistent information access.

Comp.eval.

Johns Hopkins & Purdue 1 28 Jan 05 Scalability, Accountability and Instant Information Access for Network Centric Warfare Department of Computer Science.

Documents