Top Banner
Initial Data Access Module Initial Data Access Module & Lustre Deployment & Lustre Deployment Tan Li Tan Li
14

Initial Data Access Module & Lustre Deployment Tan Li.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Initial Data Access Module & Lustre Deployment Tan Li.

Initial Data Access Module & Initial Data Access Module & Lustre DeploymentLustre Deployment

Tan Li Tan Li

Page 2: Initial Data Access Module & Lustre Deployment Tan Li.

2

Outline

• Disk I/O test for netqos03 and netqos04

• Initial design for file I/O module Data read with different function and buffer size Data read with fread() with different waiting time and buffer size Some conclusions

• Intro to Lustre setup

• Lustre deployment for the new servers

Page 3: Initial Data Access Module & Lustre Deployment Tan Li.

3

Initial Design for Data Access Current data access module (Block size: 100K, 1M, 10M,100M, 500M for 100G file)

Page 4: Initial Data Access Module & Lustre Deployment Tan Li.

4

Initial design for file I/O module1. Head file: ftp_io.h2. Date access functionsint ftp_open(char *path, int block_size, int mode);int ftp_read(int infile_fd, char *out_buf, int block_size);int ftp_write(int outfile_fd, char *in_buf, int block_size);int ftp_close(int close_fd, int block);Usage of ftp_open(): Block size passed to the function in order to decide the

open method (open, fopen or open with O_DIRECT), and the close method of ftp_close should accord with the ftp_open. mode=0 is open for read, and mode=1 is for write

Page 5: Initial Data Access Module & Lustre Deployment Tan Li.

5

Initial design for file I/O module

Page 6: Initial Data Access Module & Lustre Deployment Tan Li.

6

Initial design for file I/O module

Page 7: Initial Data Access Module & Lustre Deployment Tan Li.

7

Initial design for file I/O module

Block size > 400K?

open/fopen (Read only)

open with O_DIRECT(Read only)

NoYes

Mode=0 or 1

Mode=0 or 1

Return the file descriptor

open with O_DIRECT(Write only)

open/fopen (Write only)

Page 8: Initial Data Access Module & Lustre Deployment Tan Li.

8

Initial design for file I/O module Problem with O_DIRECT when write data

When write data with O_DIRECT, the block should be the multiple of 512 Byte on our platform. So, we will have problem to write the last few bytes of the file.

Possible solution: 1. using the regular write() to output the remaining data. 2. Integrate open function into the read and write function

Page 9: Initial Data Access Module & Lustre Deployment Tan Li.

9

Data reading test on fread()1. Test result by the time tool of linux2. Test result by nmon (recording data every two secs)

Page 10: Initial Data Access Module & Lustre Deployment Tan Li.

10

Data reading test on fread() Some Conclusions

The bandwidth grows with the increment of buffer size, especially when the buffer size change from 100K to 1000K(3 times).

The bandwidth is not sensitive to the wait time until it reach some threshold. And the larger the buffer size is, the bandwidth is less sensitive to the delay.

The CPU utilization is 0% when the buffer size is below 100K. And it grows with the increase of buffer size.

Page 11: Initial Data Access Module & Lustre Deployment Tan Li.

11

IWARP and Infiniband

Infiniband IWARP

Hardware Specialized I/O structure A set of mechanisms over Ethernet that

moving data management and network protocol

processing to the RNIC card

Transport method point-to-point end to end

Compatibility fully compatible with existing Ethernet

switching

specialized infrastructure

Vendors A broad range of vendors

Only two: Mellanox and QLogic

Page 12: Initial Data Access Module & Lustre Deployment Tan Li.

12

RoCEE RoCEE = Infiniband over Ethernet(IBoE)

RDMA over Converged Enhanced Ethernet (RoCEE) protocol proposal, is designed to allow the deployment of RDMA semantics on Converged Enhanced Ethernet fabric by running the IB transport protocol using Ethernet frames.In other words, to take the InfiniBand transport layer and package it into Ethernet frames, instead of using the iWARP protocol for Ethernet-based high-performance cluster networking.

Page 13: Initial Data Access Module & Lustre Deployment Tan Li.

13

RoCEE Problem 1: IWARP has already leveraged the performance

benefit of RoCEE Problem 2: hard to implement. Problem 3: the RoCEE is dependent on the deployment of

10GbE CEE infrastructure; currently only one vendor (Cisco) offers CEE switches, which are at relatively high price points.

Page 14: Initial Data Access Module & Lustre Deployment Tan Li.

14

Thanks & Questions