Johns Hopkins & Purdue 28 Jan 05 Scalability, Accountability and Instant Information Access for Network Centric Warfare Department of Computer Science Johns Hopkins University Yair Amir, Claudiu Danilov, Jon Kirsch, John Lane, Jonathan Shapiro, Ciprian Tutu Chi-Bun Chan, Cristina Nita-Rotaru, David Zage Department of Computer Science Purdue University http://www.cnds.jhu.edu
53
Embed
Johns Hopkins & Purdue 1 28 Jan 05 Scalability, Accountability and Instant Information Access for Network Centric Warfare Department of Computer Science.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Johns Hopkins & Purdue 128 Jan 05
Scalability, Accountability and Instant Information Access for
Network Centric Warfare
Department of Computer ScienceJohns Hopkins University
Yair Amir, Claudiu Danilov, Jon Kirsch, John Lane, Jonathan Shapiro, Ciprian Tutu
Chi-Bun Chan, Cristina Nita-Rotaru, David Zage
Department of Computer SciencePurdue University
http://www.cnds.jhu.edu
Johns Hopkins & Purdue 228 Jan 05
Network Centric Warfare Applications
• Operate on wide area network settings, communication is often conducted over unreliable channels.
• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.
• Weaker update semantics may be sufficient for several applications.
• Critical information is often not large.
• Every piece of information is usually generated by a unique source.
Johns Hopkins & Purdue 328 Jan 05
Network Centric Warfare Applications
• Operate on wide area network settings, communication is often conducted over unreliable channels.
• Require timely decisions based on available information, although information may not be ‘the most recent and consistent’ because of intermittent network connectivity.
• Weaker update semantics may be sufficient for several applications.
• Critical information is often not large.
• Every piece of information is usually generated by a unique source.
Fits many non-military applications as well
Johns Hopkins & Purdue 428 Jan 05
Dealing with Insider Threats
Project goals:
• Scaling survivable replication to wide area networks.
– Performance, performance, performance.
• Dealing with malicious clients.
– Compromised clients can inject authenticated but incorrect data.
– Maybe hard to detect on the fly.
– Malicious or just an honest error? Can be useful for both.
• Exploiting application update semantics for replication speedup in malicious environments.
– Will not be discussed today.
Johns Hopkins & Purdue 528 Jan 05
A Distributed Systems Service Model
• Message-passing system.
• Clients issue requests to servers, then wait for answers.
• Replicated servers process the request, then provide answers to clients.
Server
Replicas 1 o o o2 3 N
Clients
A site
Johns Hopkins & Purdue 628 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication.– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 728 Jan 05
Compromised Clients
• Hard Problem: Compromised clients can inject authenticated but incorrect data into the system, misleading honest clients.– Authentication and access control not sufficient.– An almost ignored problem.
• A new goal: Generic tools for accountability enforcement.– Causality tracking of updates and dependencies
to facilitate instant analysis and regeneration of a clean state once corrupt data is flagged.
– While not inventing the wheel: detecting corrupt updates via external intrusion detection, application-specific knowledge, or human in the loop.
Johns Hopkins & Purdue 828 Jan 05
Real-time Accountability Graph
• Solution:
Accountability Graph
Accountability enforcement and causality tracking of updates and dependencies in a Direct Acyclic Graph with periodic snapshots.
Also called A-DAG
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a teC lean u p d a te
Tim
e
A -DAGA -DAG
Johns Hopkins & Purdue 928 Jan 05
Real-time Accountability Graph in Action
• Marking: Upon detection of incorrect data, trace it to the corrupt update and from that, mark all causally-dependent updates as corrupted or suspected.
• Regeneration: Real-time State regeneration based on last good snapshot and non-corrupted or non-suspected updates.
• Also useful for online what-if scenarios and for offline damage assessment.
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te
Tim
e
Johns Hopkins & Purdue 1028 Jan 05
Real-time Accountability Graph Optimization
• Premise: Non-compromised clients can be trusted with update dependency reporting.
• Result: Dramatically reduce false-positive rate of suspicious updates by eliminating FIFO links of non-compromised clients.
• Performance: Ability to track and traverse millions of updates within a couple of seconds.
C 1 C 2 C 3 C 4 C 5 C 6 C 7 C 8
C o rru p t u p d a te S u sp ic io u s u p d a teC lean u p d a te
Tim
e
Johns Hopkins & Purdue 1128 Jan 05
A-DAG Performance• Quick enough for many applications
Traversal time as a function of number of updatesAccountability Graph data structure
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Updates (millions)
Tim
e (s
ec)
Johns Hopkins & Purdue 1228 Jan 05
A-DAG Performance (cont.)• Quick enough for many applications
Traversal Time as a function of the number of clientskeeping 4,000,000 updates as a constant.
Accountability Graph data structure
0
0.5
1
1.5
2
2.5
3
3.5
0 20000 40000 60000 80000 100000 120000
Number of clients
Tim
e (s
ec)
Johns Hopkins & Purdue 1328 Jan 05
A-DAG Performance (Cont.)• Quick enough for many applications
Traversal time as a function of num. of dependencies for each update,Keeping 4,000,000 updates as a constant and with 100,000 clients.
Accountability Graph data structure
0
1
2
3
4
5
6
7
8
9
0 5 10 15 20 25 30 35
Number of dependencies
Tim
e (s
ec)
Johns Hopkins & Purdue 1428 Jan 05
Application to Common Operation Picture (COP)
Johns Hopkins & Purdue 1528 Jan 05
Real-time Marking of Corrupt and Suspected Data
Johns Hopkins & Purdue 1628 Jan 05
Application to Open SourceSoftware Development
• Hard Problem: Vulnerability to life cycle attacks.
• A New Goal: – Quick analysis of impact of discovered life-cycle
vulnerability or vulnerabilities.– Insight into where to invest limited resources to
monitor against future life cycle attacks.
• Actually applicable to any software (not just open source).
Johns Hopkins & Purdue 1728 Jan 05
Capability Dependencies in Red Hat Linux (1997-2004)
Cummulative Distribution Function of Dependent Capabilities
0
20
40
60
80
100
0 10000 20000 30000 40000 50000
Number of Dependent Capabilities
Cap
abili
ties
(%
)
15 distributions: Red Hat 4.1 to Fedora 2.
Johns Hopkins & Purdue 1828 Jan 05
Capability Dependencies in Red Hat Linux - Zooming In
Cummulative Distribution Function of Dependent Capabilities
0
20
40
60
80
100
0 20 40 60 80 100
Number of Dependent Capabilities
Cap
abili
ties
(%
)
15 distributions: Red Hat 4.1 to Fedora 2.
Johns Hopkins & Purdue 1928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 2028 Jan 05
State Machine Replication
• Main Challenge: Ensuring coordination between servers.– Requires agreement on the request to be
processed and consistent order of requests.
• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.
• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.
Johns Hopkins & Purdue 2128 Jan 05
A Replicated Server System
• Maintaining consistent servers [Sch90] :– To tolerate f benign faults, 2f+1
servers are needed.– To tolerate f malicious faults: 3f+1
servers are needed.
• Responding to read-only clients’ request [Sch90] :– If the servers support only benign
faults: 1 answer is enough.– If the servers can be malicious: the
client must wait for f +1 identical answers, f being the number of malicious servers.
Johns Hopkins & Purdue 2228 Jan 05
Peer Byzantine Replication Limitations
• Limited scalability due to 3 round all-peer exchange.
• Strong connectivity is required.– 2f+1 (out of 3f+1) to allow progress and f+1 to
get an answer.• Partitions are a real issue.• Clients depend on remote information.
– Bad news: Provably optimal.• We need to pay something to get something else.
• Construct consistent total order.• Focus is solely on replica protection.
Johns Hopkins & Purdue 2328 Jan 05
Peer Byzantine ReplicationBFT [CL99]
State of the Art in Byzantine Replication
Johns Hopkins & Purdue 2428 Jan 05
Symmetric Wide Area Network
• Synthetic network used for analysis and understanding.
• 5 sites, each of which connected to all other sites with equal latency links.
• Each site has 4 replicas (except one site with 3 replicas due to current BFT setup).
• Total – 19 replicas in the system.
• Each wide area link has a 10Mbits/sec capacity.
• Varied wide area latencies between 10ms - 400ms.
Johns Hopkins & Purdue 2528 Jan 05
Practical Wide-Area Network
• A real experimental network (CAIRN). • Was modeled in the Emulab facility.• Capacity of wide area links was modified to be 10Mbits/sec to better
reflect current realities.
ISIPC
ISIPC4
TISWPC
ISEPC3
ISEPC
UDELPC
MITPC
38.8 ms1.86Mbits/sec
1.4 ms1.47Mbits/sec
4.9 ms9.81Mbits/sec
3.6 ms1.42Mbits/sec
100 Mb/s< 1ms
100 Mb/s<1ms
Virginia
Delaware
Boston
San Jose
Los Angeles
Johns Hopkins & Purdue 2628 Jan 05
BFT Wide Area Performance
• Almost out of the box BFT, which is a very good prototype software.
• 19 Replicas. • Does not write to disk.
Update Latency as a Function of Network Diameter (Symmetric Topology)
0
500
1000
1500
2000
2500
3000
0 100 200 300 400
Network Diameter (ms)
Update
Late
ncy (
ms)
5 Clients1 Client
Johns Hopkins & Purdue 2728 Jan 05
BFT Wide Area Performance (Cont).
• Almost out of the box BFT, which is a very good prototype software.
• 19 Replicas.• Does not write to disk.
Update Latency as a Function of Network Diameter (CAIRN Topology)
0
200
400
600
800
1000
1200
1400
0 1 2 3 4 5
CAIRN Network Multiple( Original CAIRN network has a diameter of45ms)
Update
Late
ncy (
ms)
5 Clients4 Clients3 Clients2 Clients1 Client
Johns Hopkins & Purdue 2828 Jan 05
BFT Wide Area Performance (cont.)
• Note: A 50ms Symmetric network vs. a Native CAIRN network.
Throughput as a Function of the Number of Clients
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6
Number of Clients
Thro
ughput
(Update
s p
er
second)
CAIRN (45ms diam)
Symmetric (50ms diam)
Johns Hopkins & Purdue 2928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 3028 Jan 05
A New Approach:Hierarchical Architecture
• Each site acts as a trusted logical unit that can crash or partition.
• Between sites:– Fault-tolerant protocols between sites.– Alternatively – Byzantine protocols also between sites.
• There is no free lunch – we pay with more hardware…
Server
Replicas 1 o o o2 3 3f+1
ClientsA site
Johns Hopkins & Purdue 3128 Jan 05
Constructing a Trusted Entity in the Local Site
• No trust between participants in a site– A site acts as one unit that can only crash if the
assumptions are met.
• Initial idea: – Use BFT-like [CL99, YMVAD03] protocols to
mask local Byzantine replicas.
• How to make sure that local Byzantine replicas cannot misrepresent the site on the wide area network?– Threshold cryptography seems a good
direction.– Also appealing in terms of management.
Johns Hopkins & Purdue 3228 Jan 05
Hierarchical Architecture Details
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 1
Wide area representative
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 2
Wide area standby
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Johns Hopkins & Purdue 3328 Jan 05
BFT As a Potential Building Block
• Fault model: – Less than third can have faults of any kind, including benign
faults - not practical in our opinion.– Will need to write to disk to protect against partial amnesia in
benign faults.
• Consequence:– Current numbers underestimate baseline cost.– Real latency will be higher due to disk writes in each round.
• In Addition:– Very good implementation to demonstrate the concept.– But not as a building block for us going forward (some
stability and robustness issues).
• Can be solved with a new implementation.
Johns Hopkins & Purdue 3428 Jan 05
BFT Local Area Performance
• BFT in a local area network, 19 replicas, no disk writes.
Throughput as a Function of the Number of Clients (100 Mbits LAN)
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12
Number of Clients
Th
rou
gh
pu
t (U
pdate
s p
er s
econ
d)
Johns Hopkins & Purdue 3528 Jan 05
BFT Local Area Performance
• BFT in a local area network, 19 replicas, no disk writes.
Update Latency as a Function of the Number of Clients (100 Mbits LAN)
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12
Number of Clients
Update
Late
ncy (
ms)
Johns Hopkins & Purdue 3628 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 4028 Jan 05
Threshold Digital Signatures
Problem: N entities authenticate a message by generating one signature s.t. any k entities can create a valid signature, but k-1 cannot.
Solution: Threshold digital signatures (no party knows the secret key, only its individual share).
Issues with threshold digital signatures: Trusted dealer / decentralized. An insider can submit a ‘bad’ share, requires verifiable secret sharing, but when to do it? Highly interactive in key generation and share verification. Size of the signature increases linearly with the number of players.
Johns Hopkins & Purdue 4128 Jan 05
RSA Threshold Digital Signatures
Our choice is the RSA threshold signature proposed by Shoup in 1999 (Practical Threshold Signatures, Eurocrypt 2000).
Provides verifiable secret sharing. Size of signature bounded by a constant multiple of n,
where n is the RSA modulus. Security proof in the random oracle model. Signature share generation and verification completely
non-interactive. Accepts schemes where the number of required shares
can be greater than f+1 (fits the agreement problem). Used by other projects: COCA (Cornell), SINTRA (IBM
Zurich).
Johns Hopkins & Purdue 4428 Jan 05
A trusted dealer generates the public (e,n) and private (d) RSA keys and then splits the private key d to N shares, s.t any k out of N are enough to reconstruct the secret.
Select randomly a polynomial with a k-1 degree (as in Shamir’s secret sharing).
Optimistic case: The combiner can check that the signature is correct by using the public key. Proof for correctness and share verification are not needed in such a case, while maintaining all cryptography guarantees.
Malicious case: Signature does not verify: Detect which share(s) are incorrect: Use verifiable
secret sharing. Requires proof for correctness and share verification.
Potentially create a correct threshold signature by using other shares than the ones that were incorrect.
Overall scheme may add a signature on original share (instead of the proof).
Johns Hopkins & Purdue 5528 Jan 05
Threshold RSA: Signing (cont.)
Generate Threshold RSA Signature (1024 bits)
0
50
100
150
200
250
1 2 3 4 5
f (N=3f+1, k=f+1)
Tim
e (
mil
liseco
nd
s)
Combine k=f+1 PartialSignatures with proof-verification
Generate a Threshold RSASignature with handlingproofs
• Compared with f+1 regular RSA signatures– Better than vector RSA if used inside a more sophisticated
protocol.
• Issues to consider:– Rate of malicious behavior, easier management, message
size overhead, computation overhead.
• Current thinking:– May be used beyond initial goal.
• Some of its properties can help construct an overall better, single protocol, compared with the BFT and Threshold crypto combination.
Johns Hopkins & Purdue 5928 Jan 05
Outline• Introduction.• Client Accountability.
– Concept.– Performance viability.– Applications.
• Scaling wide area intrusion tolerance replication– BFT – current state of the art. – A new hierarchical approach.– Constructing a trusted entity in the local site.
• Threshold cryptography based approach.
• Integration.• Summary.
Johns Hopkins & Purdue 6028 Jan 05
Overall Architecture
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 1
Wide area representative
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 2
Wide area standby
Local AreaByzantine
Replication
Mon
itorWide Area
Fault TolerantReplication
A-DAG
Server Replica 3f+1
Wide area standby
o o o
Wide area network
Local area network
Local SiteClients
Johns Hopkins & Purdue 6128 Jan 05
Impact
New ideas
Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare
ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems such as the Army Future Combat System.
http://www.cnds.jhu.edu/funding/srs/
June 04
Dec 04
June05
Dec 05
C3I model, baseline and demo
Componentanalysis & design
ComponentImplement.
System integration & evaluation
Final C3I demoand baseline eval
First scalable wide-area intrusion-tolerant replication architecture.
Providing accountability for authorized but malicious client updates.
Exploiting update semantics to provide instant and consistent information access.