Top Banner
CT 320: Network and System Administra8on Fall 2014 * Dr. Indrajit Ray Email: [email protected] Department of Computer Science Colorado State University Fort Collins, CO 80528, USA Dr. Indrajit Ray, Computer Science Department CT 320 – Network and Systems Administra8on, Fall 2014 * Thanks to Dr. James Walden, NKU and Russ Wakefield, CSU for contents of these slides
44
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • CT 320: Network and System Administra8on Fall 2014*

    Dr. Indrajit Ray Email: [email protected]

    Department of Computer Science

    Colorado State University Fort Collins, CO 80528, USA

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    * Thanks to Dr. James Walden, NKU and Russ Wakeeld, CSU for contents of these slides

  • Disks

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Topics

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    1. Disk components 2. Disk interfaces 3. Lifecycle of a disk 4. Performance 5. Reliability 6. RAID 7. Adding a disk 8. Logical volumes 9. Filesystems

  • Hard Drive Components

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Physical Disk Geometry One head for each surface

    All tracks at r = dn form a cylinder

    Each sector has 512+ bytes of informa8on

    One surface dedicated for posi8oning and synchroniza8on

    Not all por8ons of the disk are addressable by the OS

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Hard Drive Components

    Actuator Moves arm across disk to read/write data. Arm has mul8ple read/write heads (oben 2/placer.)

    Placers Rigid substrate material. Thin coa8ng of magne8c material stores data. Coa8ng type determines areal density: Gbits/in2

    Spindle Motor Spins placers from 3600-15,000 rpm. Speed determines disk latency.

    Cache 2-16MB of cache memory oben more Reliability: write-back vs. write-through

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Disk Informa;on: hdparm

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    # hdparm -i /dev/hde /dev/hde: Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C4533667 Cong={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuType=DualPortCache, BuSize=8192kB, MaxMultSect=16, MultSect=o CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/o, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signies the current ac/ve mode

  • Disk Performance

    Seek Time Time to move head to desired track (3-8 ms)

    Rota8onal Delay Time un8l head over desired block (8ms for 7200)

    Latency Seek Time + Rota8onal Delay

    Throughput Data transfer rate (20-80 MB/s)

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Latency vs. Throughput

    Which is more important? Depends on the type of load.

    Sequen8al access Throughput Mul8media on a single user PC

    Random access Latency Most servers

    How to improve performance Faster disks Caching More spindles (disks). More disk controllers.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Disk Performance: hdparm

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    # hdparm -tT /dev/hde /dev/hde: Timing cached reads: 876 MB in 2.00 seconds

    = 437.41 MB/sec Timing buffered disk reads: 88 MB in 3.08 seconds = 28.60 MB/sec

  • Reliability

    MTBF Average 8me between failures (>1,000,000 hours).

    Real failure curves Early phase: high failure rate from defects. Constant failure rate phase: MTBF valid. Wearout phase: high failure rate from wear.

    Failures more likely on trauma8c events. Power on/o.

    Systems oben wear out before MTBF. Average life span of a disk is about 5 years

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Solid State Drives

    Flash memory based solid state drives No moving parts Much higher I/O performance than hard disks Random reads also result in very high performance. Less prone to failure (more reliable)

    Higher costs Uses NAND memory

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • NAND Flash Constraints (1)

    Flash module divided in Blocks Pages Sectors E.g., 1GB 8K Blocks of 64 pages of 4 sectors of 512 bytes

    Read/Write at page granularity (as disks) Writes more 8me and energy consuming than reads (factor of 3 to 10)

    Pages must be wricen sequen8ally within a block Erase at block granularity

    Erase-before-rewrite constraint 10 8mes more costly than a page write

    A block wears out aber 106 write/erase cycles

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • NAND Flash Constraints (2)

    Hardware constraints usually lead to make updates out-of place

    Flash Transla8on Layer (FTL) is required for Address transla8on Wear leveling Garbage collec8on

    FTL is a main source of unpredictability Very badly adapted to random writes Provides no guarantee against read/write failures

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Disk Interfaces

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    SCSI Standard interface for servers.

    IDE Standard interface for PCs.

    Fibre Channel High bandwidth Can run SCSI or IP

    USB Fast enough for slow devices on PCs.

  • SCSI

    Small Computer Systems Interface Fast, reliable, expensive.

    A bus, not a simple PC to device interface. Each device has a target # ranging 0-7 or 0-15. Devices can communicate directly w/o CPU.

    Many versions Original: SCSI-1 (1979) 5MB/s Current: SCSI-3 (2001) 320MB/s

    Serial Acached SCSI (SAS) Up to 128 devices Up to 2 GB/s full duplex.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • IDE

    Integrated Drive Electronics / AT acachment Slower, less reliable, cheap. Only allows 2 devices per interface. ATAPI standard added removable devices.

    Many versions Original: IDE / ATA (1984) Current: Ultra-ATA/133 133MB/s

    Serial ATA Up to 128 devices. 1.5 GB/s New standard up to 6 GB/s

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • IDE vs. SCSI

    SCSI oers becer performance/scale Faster bus Faster hard drives (up to 15,000rpm). Lower CPU usage Becer handling of mul8ple requests.

    Cheaper IDE oben best for worksta8ons. Convergence

    SATA2 and SAS converging on a single standard.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Other Host Interfaces

    PCI Express Speeds up to 2.0 GB/s

    Fibre Channel Very high speed achievable Can support variety of network communica8on protocols such as SCSI / IP

    Almost exclusively used for servers USB, Firewire

    Generally much slower and hence not used for internal disks

    USB 3.0 promises speeds > 3.0 GB/s

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • RAID

    Redundant Array of Independent Disks Can be implemented in hardware or sobware. Hardware RAID controllers:

    Caching Automate rebuilding of arrays

    Advantages Capacity Reliability Fault-tolerance Throughput

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • RAID Levels

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    RAID 0: Striped evenly for performance. MTBF = (avg MTBF)/# disks

  • RAID Levels (contd)

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    RAID 1: Mirrored for reliability Every write goes to each disk of set.

    Seek 8me halved as reads split between disks.

    RAID 0 + 1: Striped + mirrored

  • RAID Levels

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    RAID 5: Striped with distributed parity. Block striping, not disk striping. Can lose one disk of set without losing data.

  • RAID Levels

    JBOD: Concatenated for capacity. Only data on bad disk is lost, no performance penalty

    RAID 3, 4 exist but not popular. RAID 3 uses byte level striping with dedicated parity disk

    RAID 4 uses block level striping with dedicated parity disk

    RAID 6 extends RAID 5 by using two parity blocks

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Lifecycle of a HDD Blank media Low level format

    Performed at the factory Par88on High level format Opera8ng system install Systems opera8on

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Blank Magne;c Media

    For simplicity we will use a linear model of the magne8c media

    Unless we are performing electron microscopy the exact media geometry is not signicant

    The blank media has only geometric structure and raw magne8c storage

    Beginning End

    Beginning End

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Read / Write Process (simplied)

    Write process Digital signals are encoded (for 8ming recovery) and

    transformed into analog signals that drive the magne8c eld on the write head

    Read process Analog magne8c eld is sensed, 8ming is recovered and

    sampled signal is converted into digital data

    Beginning End

    Read / Write Head

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Low Level Format

    Low level formavng adds indivisible units of storage called sectors Most modern HDDs use 512+ bytes sectors

    The + accounts for sector overhead bytes (dier by manufacturer) Overhead bytes provide error correc8on and 8ming recovery

    func8ons Bad sectors are automa8cally remapped to redundant sectors by

    the HDD controller

    512 bytes"

    Sectors (512 bytes plus overhead)

    Redundant Sectors (Only visible to the HDD controller)

    Individual sector

    Sector overhead

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Par;;oning

    The Master Boot Record is created and includes the Master Boot Code (MBC) and the Master Par88on Table (MPT) always at sector 1 on any bootable media

    The MBC is executed at boot if the HDD is designated as the boot device

    The MPT contains informa8on about logical volumes including the ac8ve par88on, the par88on whose Volume Boot Code (VBC) will be executed

    Each par88on has Disk Parameter Block (DPB) that stores informa8on about par88ons, le system type, date and 8me last mounted etc.

    Inter-par88on gaps are a collec8on of unused sector Some sectors are unused due to addressing issues

    MBC MPT VBC DPB VBC DPB

    Master Boot Record (MBR)

    Inter-par88on gap

    Par88on #1 Volume Boot Record (VBR)

    Unused sectors

    Par88on #2

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • High Level Format (File System)

    MPT now contains le system type and cluster size Cluster sizes are in increments of 512 bytes (one sector) This becomes the indivisible le size for the opera8ng system

    A le system structure is created FAT creates a le alloca8on table (simple table) NTFS creates a master le table (database) Linux EXT2/EXT3/EXT4 creates a virtual le system

    MBC MPT

    Master Boot Record (MBR)

    File System Structures Free Space

    Cluster Blocks

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Opera;ng System Install

    Opera8ng system code, applica8on code, congura8on data and applica8on data are installed

    A swap le is created for NTFS and UNIX variants (Linux, Unix, FreeBSD etc)

    Boot code is wricen to the MBC (or VBC if a boot loader is used)

    MBC MPT

    File System Structures Free Space

    Opera8ng System Code / Data

    Swap Space Master Boot Record (MBR)

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Adding a Disk

    Install new hardware Verify disk recognized by BIOS.

    Boot Verify device exists in /dev

    Par88on fdisk /dev/sdb

    Create lesystem mkfs v t ext3 /dev/sdb1

    Add entry to /etc/fstab /dev/sdb1 /proj ext3 defaults 0 2

    mount -a

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • When dont you need a lesystem?

    Swap space mkswap v /dev/sdb1

    Server applica8ons Oracle VMWare Server

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Logical Volumes

    What are logical volumes? Appear to user as a physical volume. But can span mul8ple par88ons and/or disks.

    Why logical volumes? Aggregate disks for performance/reliability. Grow and shrink logical volumes on the y. Move logical volumes btw physical devices. Replace volumes w/o interrup8ng service.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • LVM

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • LVM Components

    Logical Volume Group (LVG) Set of physical volumes (par88ons or disks.) May be divided into logical volumes (LVs.)

    LVs made up of xed sized logical extents Each LE is 4MB. Physical extents are the same size.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Mapping Modes

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    Linear Mapping LVs assigned to con8nguous areas of PV space.

    Striped Mapping LEs interleaved across PVs to improve performance.

  • SeVng up a LVG and LV

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    1. Ini8alize physical volumes pvcreate /dev/hda1 pvcreate /dev/hdb1

    2. Ini8alize a volume group vgcreate nku_proj /dev/hda1 /dev/hdb1 Use vgextend to add more PVs later.

    3. Create logical volumes lvcreate -n nku1 --size 100G

    nku_proj1 4. Create lesystem

    mkfs v t ext3 /dev/nku_proj/nku1

  • Extending a LV

    Set absolute size lvextend L120G /dev/nku_proj/nku1

    Or set rela8ve size lvextend L+20G /dev/nku_proj/nku1

    Expand the lesystem without unmoun8ng ext2online v /dev/nku_proj/nku1

    Check size df k

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Swap

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    Can use swaple instead of swap par88on dd if=/dev/zero of=/swapfile bs=1024k count=512

    mkswap /swapfile Enable swap

    swapon /swapfile swapon /dev/sda2

    Disable swap swapoff /swapfile swapoff /dev/sda2

    Check swap resource usage cat /proc/swaps

  • Filesystems

    ext4 Gaining more popularity Can support volumes with sizes up to 1 exbibyte (260 bytes) and les up to 16 tebibytes (240 bytes)

    ext3 Current most common Linux lesystem. Journaling eliminates need for fsck.

    ext2 Old Linux non-fragmen8ng fast lesystem. Can be converted to ext3 by adding journal: tune2fs j /dev/sda1

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • Moun;ng

    To use a lesystem mount /dev/sda1 /mnt df /mnt

    Automa8c moun8ng Add an entry in /etc/fstab

    Unmount umount /dev/sda1 Cannot unmount a volume in use.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

  • fstab

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014

    # /etc/fstab: static file system information. # #

    proc /proc proc defaults 0 0 /dev/hdc1 / ext3 defaults 0 1 /dev/hdc5 /win vfat user,rw 0 0 /dev/hdc7 none swap sw 0 0 /dev/hdc8 /var ext3 defaults 0 2 /dev/hdc9 /home ext3 defaults 0 2 /dev/hda /media/cdrom0 iso9660 ro,user 0 0 /dev/fd0 /media/floppy0 auto rw,user 0 0

  • fsck: check + repair fs

    Filesystem corrup8on sources Power failure System crash

    Types of corrup8on Unreferenced inodes. Bad superblocks. Unused data blocks not recorded in block maps. Data blocks listed as free that are used in les.

    fsck can x these and more Asks user to make more complex decisions. Stores unxable les in lost+found.

    Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014