Design & Implementation of LH* RS : a Highly- Available Distributed Data Structure Rim Moussa Rim Moussa [email protected][email protected]http://ceria.dauphine.fr/rim/ http://ceria.dauphine.fr/rim/ rim.html rim.html Thomas J.E. Thomas J.E. Schwartz Schwartz [email protected][email protected]http://www.cse.scu.edu/~tschwarz/ http://www.cse.scu.edu/~tschwarz/ homepage/thomas_schwarz.html homepage/thomas_schwarz.html Workshop in Distributed Data & Structures *July 2004
53
Embed
Design & Implementation of LH* RS : a H ighly- Available Distributed D ata Structure
Workshop in Distributed Data & Structures * July 200 4. Design & Implementation of LH* RS : a H ighly- Available Distributed D ata Structure. Thomas J.E. Schwartz [email protected] http://www.cse.scu.edu/~tschwarz/homepage/thomas_schwarz.html. Rim Moussa [email protected] - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Design & Implementation of LH*RS : a Highly- Available Distributed Data
Ethernet Network: max bandwidth of Ethernet Network: max bandwidth of 1 Gbps1 Gbps
Operating System: Windows 2K ServerOperating System: Windows 2K Server
Tested configurationTested configuration 1 Client1 Client A group of 4 Data BucketsA group of 4 Data Buckets k k Parity Buckets, Parity Buckets, k k {0, 1, 2} {0, 1, 2}
LH*RS
File Creation
20
File CreationClient Operation Client Operation
Splitting Data BucketSplitting Data Bucket
PBs :PBs : (Records that Remain) (Records that Remain) NN Deletes -from old rank & Deletes -from old rank & NN Inserts -at new rank + Inserts -at new rank + (Records that move)(Records that move) N N DeletesDeletes
New Data BucketNew Data Bucket
PBs:PBs: NN Inserts (Moved Records) Inserts (Moved Records)
All Updates are gathered in the same buffer and transferred All Updates are gathered in the same buffer and transferred (TCP/IP) simultaneously to respective Parity Buckets of the (TCP/IP) simultaneously to respective Parity Buckets of the
Splitting DB Splitting DB & & New DBNew DB..
Propagation of each Insert/ Update/ Delete on Data Record to Propagation of each Insert/ Update/ Delete on Data Record to Parity Buckets Parity Buckets
Data Bucket SplitData Bucket Split
21
File Creation Perf.Experiments Set-up
File of 25 000 data records; 1 data record = 104 B
Finally, we improved the processing time of the RS decoding process by 4% to 8%
1DB is recovered in half a second
48
Conclusion
LH*RS
Mature Implementation
Many Optimization Iterations
Only SDDS with Scalable Availability
49
Future Work
Better Parity Update Propagation Strategy to PBsBetter Parity Update Propagation Strategy to PBs
Investigation of faster Encoding/ Decoding processesInvestigation of faster Encoding/ Decoding processes
50
References[Patterson et al., 88] D. A. Patterson, G. Gibson & R. H. Katz, A Case for Redundant Arrays of Inexpensive Disks, Proc. of ACM SIGMOD Conf, pp.109-106, June 1988.
[ISI,81] Information Sciences Institute, RFC 793: Transmission Control Protocol (TCP) – Specification, Sept. 1981, http://www.faqs.org/rfcs/rfc793.html
[McDonal & Barkley, 00] D. MacDonal, W. Barkley, MS Windows 2000 TCP/IP Implementation Details, http://secinf.net/info/nt/2000ip/tcpipimp.html
[Jacobson, 88] V. Jacobson, M. J. Karels, Congestion Avoidance and Control, Computer Communication Review, Vol. 18, No 4, pp. 314-329. [Xu et al.,99] L. Xu & Jehoshua Bruck, X-Code: MDS Array Codes with Optimal Encoding, IEEE Trans. on Information Theory, 45(1), p.272-276, 1999.
[Corbett et al., 04] P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, S. Sankar, Row-Diagonal Parity for Double Disk Failure Correction, Proc. of the 3rd USENIX –Conf. On File and Storage Technologies, Avril 2004.
[Rabin, 89] M. O. Rabin, Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance, Journal of ACM, Vol. 26, N° 2, April 1989, pp. 335-348.
[White, 91] P.E. White, RAID X tackles design problems with existing design RAID schemes, ECC Technologies, ftp://members.aol.com.mnecctek.ctr1991.pdf
[Blomer et al., 95] J. Blomer, M. Kalfane, R. Karp, M. Karpinski, M. Luby & D. Zuckerman, An XOR-Based Erasure-Resilient Coding Scheme, ICSI Tech. Rep. TR-95-048, 1995.
51
References (Ctnd.)
[Litwin & Schwarz, 00] W. Litwin & T. Schwarz, LH*RS: A High-Availability
Scalable Distributed Data Structure using Reed Solomon Codes, p.237-248, Proceedings of the ACM SIGMOD 2000.
[Karlesson et al., 96] J. Karlson, W. Litwin & T. Risch, LH*LH: A Scalable high performance data structure for switched multicomputers, EDBT 96, Springer Verlag.
[Reed & Solomon, 60] I. Reed & G. Solomon, Polynomial codes over certain Finite Fields, Journal of the society for industrial and applied mathematics, 1960.
[Plank, 97] J. S. Plank, A Tutorial on Reed-Solomon Coding for fault-Tolerance in RAID-like Systems, Software– Practise & Experience, 27(9), Sept. 1997, pp 995- 1012,
[Diéne, 01] A.W. Diène, Contribution à la Gestion de Structures de Données Distribuées et Scalables, PhD Thesis, Nov. 2001, Université Paris Dauphine.
[Bennour, 00] F. Sahli Bennour, Contribution à la Gestion de Structures de Données Distribuées et Scalables, PhD Thesis, Juin 2000, Université Paris Dauphine.
1st parity bucket executes XOR 1st parity bucket executes XOR calculus instead of RS calculus calculus instead of RS calculus gain performance in encoding of
20%
1st line of ‘1’s1st line of ‘1’sEach PB executes XOR calculus
for any update from the 1st DB of any group gain performance of