MyPower Switch Technical Manual - Intelek · 2020. 5. 4. · MyPower Switch Technical Manual Maipu Confidential & Proprietary Information Page 3 of 628 Maipu Feedback Form Your opinion

Maipu Confidential & Proprietary Information Page 1 of 628

MyPower Switch Technical Manual

Maipu Communication Technology Co., Ltd No. 16, Jiuxing Avenue Hi-tech Park Chengdu, Sichuan Province People’s Republic of China - 610041 Tel: (86) 28-85148850, 85148041 Fax: (86) 28-85148948, 85148139 URL: http:// www.maipu.com Email: [email protected]

http://www.maipu.com/

mailto:[email protected]



All rights reserved. Printed in the People’s Republic of China. No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual or otherwise without the prior written consent of Maipu Communication Technology Co., Ltd. Maipu makes no representations or warranties with respect to this document contents and specifically disclaims any implied warranties of merchantability or fitness for any specific purpose. Further, Maipu reserves the right to revise this document and to make changes from time to time in its content without being obligated to notify any person of such revisions or changes. Maipu values and appreciates comments you may have concerning our products or this document. Please address comments to: Maipu Communication Technology Co., Ltd No. 16, Jiuxing Avenue Hi-tech Park Chengdu, Sichuan Province People’s Republic of China - 610041 Tel: (86) 28-85148850, 85148041 Fax: (86) 28-85148948, 85148139 URL: http:// www.maipu.com Email: [email protected] All other products or services mentioned herein may be registered trademarks, trademarks, or service marks of their respective manufacturers, companies, or organizations.

http://www.maipu.com/




Maipu Feedback Form Your opinion helps us improve the quality of our product documentation

and offer better services. Please fax your comments and suggestions to

(86) 28-85148948, 85148139 or email to [email protected].

Document Title MyPower Switch Technical Manual

Product Version

Document Revision Number

3.0

Evaluate this document

Presentation:

(Introductions, procedures, illustrations, completeness, arrangement, appearance)

Good Fair Average Poor

Accessibility:

(Contents, index, headings, numbering)


Editorial:

(Language, vocabulary, readability, clarity, technical accuracy, content)


Your suggestions to improve the document

Please check suggestions to improve this document:

Improve introduction Make more concise

Improve Contents Add more step-by-step procedures/tutorials

Improve arrangement Add more technical information

Include images Make it less technical

Add more detail Improve index

If you wish to be contacted, complete the following:

Name Company

Postcode Address

Telephone E-mail




Contents

Overview ................................................................................................ 16

OSI Model ............................................................................................................ 16

Physical Layer .................................................................................................................... 17

Data Link Layer .................................................................................................................. 17

Network Layer .................................................................................................................... 18

Transmission Layer ............................................................................................................. 19

Session Layer ..................................................................................................................... 19

Representation Layer .......................................................................................................... 19

Application Layer ................................................................................................................ 20

Application of OSI Model ....................................................................................... 20

Use Ping Command .............................................................................................. 21

Simple Ping ........................................................................................................................ 21

Expanded Ping ................................................................................................................... 22

System Displayed Information ............................................................................... 22

show process ..................................................................................................................... 22

show cpu ........................................................................................................................... 24

show stack ......................................................................................................................... 26

show semaphore ................................................................................................................ 27

show memory .................................................................................................................... 28

show arp............................................................................................................................ 29

show ip socket.................................................................................................................... 29

show pool .......................................................................................................................... 30

netstat -m.......................................................................................................................... 38

show ip statistics ................................................................................................................ 39

show ip icmpstate ............................................................................................................... 40

Switch Principles ................................................................................... 41

Development of the Switching Technology .............................................................. 41

Basic Working Principle of the Switch ...................................................................... 42

Frame Forwarding .............................................................................................................. 43

Address Learning Process .................................................................................................... 44

Multiple Layer Switching Technology ...................................................................... 46

Comparison Between the Switch and Other Network Communication Products .......... 47

Switch and the Switch Hub .................................................................................................. 47



Switch and Router .............................................................................................................. 48

VLAN Technology ................................................................................... 50

Overview and Principle .......................................................................................... 50

Overview ........................................................................................................................... 50

VLAN Principle .................................................................................................................... 51

VLAN Division ....................................................................................................... 51

Port-Based VLAN ................................................................................................................ 52

MAC-based VLAN................................................................................................................ 53

IP subnet-based VLAN ........................................................................................................ 53

Protocol-based VLAN ........................................................................................................... 54

Typical Application ................................................................................................ 54

Link Aggregation ................................................................................... 55

Link Aggregation .................................................................................................. 55

Terms of the Link Aggregation ............................................................................................. 55

Functions of the Link Aggregation ........................................................................................ 56

LACP Protocol ..................................................................................................................... 56

Classification of Link Aggregation ........................................................................... 57

Manual Aggregation ............................................................................................................ 57

LACP Protocol Aggregation .................................................................................................. 58


MSTP ...................................................................................................... 62

STP Overview ....................................................................................................... 62

RSTP Overview ..................................................................................................... 63

MSTP Protocol ...................................................................................................... 64

Terms ................................................................................................................................ 64

Introduction to the Protocol ................................................................................................. 65

MSTP Protection Function ...................................................................................... 67

BPDU Protection ................................................................................................................. 67

Root Protection ................................................................................................................... 68

Loop Protection .................................................................................................................. 68

MSTP Typical Application ....................................................................................... 69

QinQ Technology .................................................................................... 71

New Requirements of Service Development ............................................................ 71

QinQ Supports Multiple Services............................................................................. 72

Realizing Modes of QinQ ........................................................................................ 73

Introduction to QinQ Application Scene ................................................................... 74

L2 Protocol Control Technology............................................................. 76

L2 protocol control Theory ..................................................................................... 76

L2 Protocol Tunnel .............................................................................................................. 76

L2 Protocol Discard ............................................................................................................. 77



L2 Protocol Peer ................................................................................................................. 77

L2 protocol Control Supports EVC Application ........................................................................ 77

Realize L2 protocol control ..................................................................................... 78

Realize L2 Protocol Tunnel ................................................................................................... 78

Realize L2 Protocol Discard .................................................................................................. 78

Realize L2 Protocol Peer ...................................................................................................... 78


L2 Multicast ............................................................................................ 80

Public Part of L2 Multicast ...................................................................................... 80

Terms ................................................................................................................................ 80

Introduction ....................................................................................................................... 81

L2 Static Multicast and Its Application ..................................................................... 82

Terms ................................................................................................................................ 82

Introduction ....................................................................................................................... 82

Typical Application .............................................................................................................. 83

IGMP Snooping and Its Application ......................................................................... 83

Terms ................................................................................................................................ 84

Introduction ....................................................................................................................... 84

IGMP Proxy and Its Application .............................................................................. 87

Terms ................................................................................................................................ 88

Introduction ....................................................................................................................... 88


MVR and Its Application......................................................................................... 89

Terms ................................................................................................................................ 90

Introduction ....................................................................................................................... 90


MVP and Its Application ......................................................................................... 92

Terms ................................................................................................................................ 92

Introduction ....................................................................................................................... 92


Security Technology .............................................................................. 95

802.1X Protocol and Application ............................................................................. 95

Related Terms .................................................................................................................... 96

Introduction ....................................................................................................................... 96

Typical Application ............................................................................................................ 106

DHCP Snooping and Its Application ...................................................................... 108

Related Terms .................................................................................................................. 109

Introduction ..................................................................................................................... 109


IP Source Guard and Its Application ..................................................................... 113

Related Terms .................................................................................................................. 114



Introduction ..................................................................................................................... 114

Key Points for Realization .................................................................................................. 114


Dynamic ARP Detection and Application ................................................................ 116

Related Terms .................................................................................................................. 117

Introduction ..................................................................................................................... 117


Port Security ...................................................................................................... 120

Introduction ..................................................................................................................... 121


Port Monitoring ................................................................................................... 122

Introduction ..................................................................................................................... 122


Port Isolation ...................................................................................................... 123

Related Terms .................................................................................................................. 124

Introduction ..................................................................................................................... 124


SPAN Technology ................................................................................. 126

SPAN Technology ............................................................................................... 126

Related Terms of SPAN Technology .................................................................................... 126


IPv4 Unicast Routing ........................................................................... 132

Introduction to the IPv4 Unicast Routing ............................................................... 132

Static Routing Protocol ........................................................................................ 133

Introduction to the Static Route ......................................................................................... 134

Typical Application of the Static Route ................................................................................ 135

Troubleshooting of the Static Route .................................................................................... 136

M-VRF ............................................................................................................... 137

Terms of M-VRF ............................................................................................................... 137

Introduction to M-VRF ....................................................................................................... 138

Load Balancing ................................................................................................... 138

Types of Load Balancing .................................................................................................... 138

Modes of Load Balancing ................................................................................................... 139

Switching Types and Load Balancing .................................................................................. 139

RIP Dynamic Routing Protocol .............................................................................. 139

Terms of RIP Protocol........................................................................................................ 140

Introduction to the RIP Protocol ......................................................................................... 140

IRMP Dynamic Routing Protocol ........................................................................... 151

Related Terms of IRMP Protocol ......................................................................................... 151

Introduction to IRMP Protocol ............................................................................................ 151

IRMP Types ...................................................................................................................... 152



Different TLV Defined in IRMP ............................................................................................ 152

IRMP Unicast and Multicast Sending (Multicast Address 224.0.0.10) ..................................... 152

IRMP Packet Format (Take One IP Packet with IRMP Data as an Example) ............................ 153

OSPF Dynamic Routing Protocol ........................................................................... 153

Terms of OSPF Protocol ..................................................................................................... 154

Introduction to OSPF ......................................................................................................... 156

OSFP Features .................................................................................................................. 176

IS-IS Dynamic Routing Protocol ........................................................................... 179

Terms of IS-IS Protocol ..................................................................................................... 179

Introduction to the IS-IS Protocol ....................................................................................... 180

Typical Application of the IS-IS Protocol .............................................................................. 189

BGP Dynamic Routing Protocol ............................................................................. 192

Terms of BGP Protocol ...................................................................................................... 192

Introduction to the BGP Protocol ........................................................................................ 193

ACL Technology ................................................................................... 209

ACL Introduction and Application .......................................................................... 209

Basic Concepts of ACL ....................................................................................................... 209

ACL Classification .............................................................................................................. 211


Introduction to Action Group ................................................................................ 214

Introduction to IP+MAC Binding ........................................................................... 214

Introduction to Traffic Meter ................................................................................ 214

Related Terms .................................................................................................................. 214

Introduction to Traffic Meter .............................................................................................. 215

Introduction to Time Domain ............................................................................... 215

Related Terms .................................................................................................................. 215

Introduction to Time Domain ............................................................................................. 216

QoS Technology ................................................................................... 217

Priority Mapping ................................................................................................. 217

Related Terms .................................................................................................................. 217

Introduction to Priority Mapping ......................................................................................... 219

Queue Scheduling Mode ...................................................................................... 219

Related Terms .................................................................................................................. 219

Introduction to Queue Scheduling Mode ............................................................................. 220


Drop Mode ......................................................................................................... 221

Related Terms .................................................................................................................. 221

Introduction to Drop Mode ................................................................................................ 222


Speed Restriction ................................................................................................ 222

Flow Shaping...................................................................................................... 223



VLAN-based Traffic Shaping ................................................................................. 223

AAA Technology ................................................................................... 225

AAA Terms ......................................................................................................... 225

Basic Theory of AAA ............................................................................................ 226

Introduction to RADIUS ....................................................................................... 227

Introduction to TACACS....................................................................................... 229

Introduction to ID Authentication Mechanism ........................................................ 231

Login Authentication ......................................................................................................... 231

Authenticate in Privileged Mode ......................................................................................... 232

EIPS Technology .................................................................................. 233

Sub Ring Mode EIPS ........................................................................................... 233

Basic Concepts of EIPS ...................................................................................................... 233

EIPS Packet Format .......................................................................................................... 237

Basic Theory of EIPS ......................................................................................................... 240

EIPS Typical Application .................................................................................................... 245

Hierarchical EIPS ................................................................................................ 246

Basic Concepts and Abbreviations ...................................................................................... 246

Basic Network Topology of EIPS ......................................................................................... 248

Port and Protocol Packets on Ring ...................................................................................... 253

EIPS Protocol Mechanism .................................................................................................. 257

Extended Functions ............................................................................................. 262

Payload Balance Function .................................................................................................. 263

Topology Auto Collection Function ...................................................................................... 264

Networking Mode of Not Sending HELLO ............................................................................ 268

Uni-directional Detection Function ...................................................................................... 268

Reliability Realization ........................................................................................................ 270

ULFD Technology ................................................................................. 273

ULFD Protocol and Application .............................................................................. 273

Related Terms of ULFD Protocol ......................................................................................... 273

Introduction to ULFD Protocol ............................................................................................ 275


OAM Technology .................................................................................. 281

CFM Protocol and Application ............................................................................... 281

Terms of Ethernet CFM ..................................................................................................... 281

Introduction to Ethernet CFM Protocol ................................................................................ 282

E-LMI Protocol and Application ............................................................................. 292

Terms of E-LMI Protocol .................................................................................................... 293

Introduction to E-LMI Protocol ........................................................................................... 293

Definition of E-LMI Protocol................................................................................................ 293

Relation between E-LMI Protocol and 802.1a ...................................................................... 296

UNI-N End of E-LMI .......................................................................................................... 296



UNI-C of E-LMI ................................................................................................................. 298

Typical Applications ........................................................................................................... 298

Ethernet OAM Protocol and Application ................................................................. 299

Related Terms of Ethernet OAM Protocol ............................................................................ 299

Introduction to Ethernet OAM Protocol ................................................................................ 299

EVC Technology ................................................................................... 311

Related Terms .................................................................................................... 311

Application Description ........................................................................................ 312

Typical Application .............................................................................................. 315

LLDP Technology ................................................................................. 316

Overview ........................................................................................................... 316

LLDP Working Mechanism ................................................................................... 316

LLDPDU Transmitting Mechanism ....................................................................................... 317

LLDPDU Receiving Mechanism ........................................................................................... 317

TLV Information Type ......................................................................................... 318

Basic Management TLV ..................................................................................................... 318

TLV Defined by Organization.............................................................................................. 319

Related TLV of LLDP-MED .................................................................................................. 319

Neighbor Storage Capability of LLDP ..................................................................... 320

Typical Application of LLDP .................................................................................. 320

MAC Address Table Management Technology..................................... 322

Management and Application of MAC Address Table............................................... 322

Related Terms .................................................................................................................. 322

Introduction ..................................................................................................................... 323

PWE3 Technology (Only for S3400/S3900) ....................................... 325

Basic Concepts ................................................................................................... 325

Background of TDM Circuit Emulation Technology ............................................................... 326

Related Technology Standards ........................................................................................... 326

Commonly-used Terms ..................................................................................................... 327

Technical Theory ................................................................................................ 327

TDM PWE3 Technical Scheme ............................................................................................ 328

Other Technical Schemes .................................................................................................. 331

Key Technologies .............................................................................................................. 331

Realizing Methods ............................................................................................... 334

PWE3 Packet Format......................................................................................................... 334

SAToP Protocol ................................................................................................................. 336

CESoPSN Protocol ............................................................................................................. 337

HDLC Mode ...................................................................................................................... 339

Technology of Recovering Clock from Circuit Emulation packet ............................................. 340

PWE3 Typical Application ..................................................................................... 342

Performance Test Result ................................................................................................... 343



Loopback Detection Technology .......................................................... 344

Introduction to Loopback Detection ...................................................................... 344

Related Terms of Loopback Detection Protocol .................................................................... 344

Introduction to Loopback Detection Protocol ....................................................................... 344


Super VLAN Technology ...................................................................... 348

Super-VLAN Theory ............................................................................................ 348

Super-VLAN Realization ....................................................................................... 349


L3 Multicast Technology ...................................................................... 352

Introduction to Multicast ...................................................................................... 352

Related Terms of IP Multicast ............................................................................................. 353

IP Multicast Address .......................................................................................................... 354

IP Multicast Features ......................................................................................................... 355

IP Multicast Routing Protocol .............................................................................................. 356

IP Multicast Application...................................................................................................... 359

Related Terms of IGMP Protocol ........................................................................... 359

Introduction to IGMP Protocol .............................................................................. 360

IGMP Protocol Theory........................................................................................................ 360

IGMP V1 .......................................................................................................................... 361

IGMP V2 .......................................................................................................................... 362

Inter-operation of V1 and V2 ............................................................................................. 364

IGMP V3 .......................................................................................................................... 365

Related Terms of PIM-SM Protocol ........................................................................ 371

Introduction to PIM-SM Protocol ........................................................................... 371

Basic Hierarchy of PIM-SM in TCP/IP Protocol Stack ............................................................. 372

PIM-SM Protocol ............................................................................................................... 372

Introduction to PIM-DM Protocol .......................................................................... 376

PIM-DM Protocol ............................................................................................................... 377

Introduction to MSDP Protocol ............................................................................. 380

Overview ......................................................................................................................... 380

Setup of MSDP peer .......................................................................................................... 381

Sending of Source Active Message ..................................................................................... 381

MSDP Application.............................................................................................................. 381

MPLS Technology ................................................................................. 385

Terms of MPLS Protocol ....................................................................................... 385

Introduction to MPLS ........................................................................................... 386

MPLS Architecture ............................................................................................... 386

Separation of Control and Forwarding................................................................................. 386

Forwarding Equivalence Class ............................................................................................ 387

Label Encapsulation and Label Operation ............................................................................ 388



MPLS Network Structure and Forwarding Process ................................................................ 389

Penultimate Hop Popping Mechanism ................................................................................. 390

Introduction to the LDP Protocol ........................................................................... 391

Basic Concepts of LDP ....................................................................................................... 391

LDP Working Process ........................................................................................................ 392

LDP Message Type and Format .......................................................................................... 397

BGP/MPLS VPN ................................................................................................... 408

Concepts and Terms of BGP/MPLS VPN .............................................................................. 408

BGP/MPLS VPN Network Structure ..................................................................................... 409

BGP/MPLS VPN Cross-Domain ........................................................................................... 410

MPLS VPN User Accesses Internet ...................................................................................... 413

Introduction to CSC .......................................................................................................... 417

MPLS L2VPN ....................................................................................................... 420

Terms .............................................................................................................................. 420

Basic Concepts ................................................................................................................. 420

VPWS .............................................................................................................................. 421

Point-to-Multipoint Connection (VPLS) ................................................................................ 423

Comparison between VPLS and VPWS ................................................................................ 434

MPLS Traffic Engineering ..................................................................................... 435

Ground of MPLS Traffic Engineering .................................................................................... 436

Releasing MPLS-TE Network Topology Information .............................................................. 437

MPLS-TE Tunnel Path Calculation (CSPF) ............................................................................ 439

Creating MPLS-TE Tunnel Path ........................................................................................... 439

Forwarding Traffic on MPLS-TE Tunnel ................................................................................ 441

Tunnel Protection .............................................................................................................. 442

Graceful Restart ............................................................................................................... 445

MPLS OAM ......................................................................................................... 446

Introduction to MPLS OAM ................................................................................................. 446

MPLS OAM Technology ...................................................................................................... 447

IPv6 Network Protocol Technology ..................................................... 450

Overview ........................................................................................................... 450

IPv6 Packet Format............................................................................................. 451

ICMPv6 Protocol ................................................................................................. 452

IPv6 Address Discovery Protocol .......................................................................... 454

Functions of Neighbor Discovery Protocol ............................................................................ 459

IPv6 Address ...................................................................................................... 462

IPv6 Addressing Model ........................................................................................ 463

IPv6 Address Type .............................................................................................. 464

Unicast ............................................................................................................................ 464

Multicast .......................................................................................................................... 469

Any-cast .......................................................................................................................... 470



IPv6 Extension Header ........................................................................................ 471

Extension Header ............................................................................................................. 471

Usage of Extension Header ................................................................................................ 472

Extension Header ID ......................................................................................................... 473

Extension Header Order .................................................................................................... 473

Options ............................................................................................................................ 475

Hop-by-hop Extension Header ........................................................................................... 476

Routing Extension Header ................................................................................................. 476

Fragment Extension Header .............................................................................................. 478

Destination Extension Header ............................................................................................ 479

GRE Technology ................................................................................... 480

Terms ................................................................................................................ 480

Introduction to the Protocol ................................................................................. 480

Location of GRE in the TCP/IP Protocol Stack ...................................................................... 481

Structure of the GRE Packet .............................................................................................. 481

Work Flow of the GRE ....................................................................................................... 483

Advantage and Disadvantage of GRE ................................................................................. 485


Transition Technology ......................................................................... 487

Introduction to the Transition Technology ............................................................. 487

Tunnel Technology .............................................................................................. 488

SLA Technology ................................................................................... 490

Introduction to SLA ............................................................................................. 490

SLA Terms ....................................................................................................................... 490

Introduction to SLA ........................................................................................................... 491

RTR Entity ........................................................................................................................ 492

RTR Group ....................................................................................................................... 504

RTR Schedule ................................................................................................................... 504

Debug Commands and Debug Information ........................................................... 505

show rtr entity .................................................................................................................. 505

show rtr group ................................................................................................................. 510

show rtr schedule ............................................................................................................. 510

show rtr history ................................................................................................................ 511

SLA Debug Commands ..................................................................................................... 513

VRRP Technology ................................................................................. 515

Related Terms of VRRP Protocol ........................................................................... 515

Introduction to VRRP Protocol .............................................................................. 515

Basic Hierarchy of VRRP in TCP/IP ...................................................................................... 516

Structure of VRRP Packet .................................................................................................. 516

VRRP Workflow ................................................................................................................ 517

VRRP Features ................................................................................................................. 520



Debug Commands and Debug Information ........................................................... 520

VBRP Technology ................................................................................. 522

VBRP Protocol Terms .......................................................................................... 522

Introduction to VBRP Protocol .............................................................................. 522

Basic Hierarchy of VBRP in TCP/IP ...................................................................................... 523

VBRP Packet Format ......................................................................................................... 523

VBRP Workflow ................................................................................................................ 525

VBRP Functions ................................................................................................................ 527

Debug Command and Debug Information ............................................................. 527

IPFIX Technology ................................................................................ 531

Overview ........................................................................................................... 531

Terms ................................................................................................................ 531

Introduction to the Principle ................................................................................. 532

IPFIX Working Flow .......................................................................................................... 532

IPFIX Restrictions ............................................................................................................. 533

IPFIX Packet Structure ...................................................................................................... 533

Port Isolation Technology ................................................................... 538

Configure Port Isolation ....................................................................................... 538

Introduction to Port Isolation ............................................................................................. 538

Port Isolation Application ................................................................................................... 539

IPv6 Unicast Routing ........................................................................... 540

IPv6 RIPng Dynamic Routing Protocol ................................................................... 540

Terms of IPv6 RIPng Protocol ............................................................................................ 540

Introduction to IPv6 RIPng Protocol .................................................................................... 541

Basic Work Principle of IPv6 RIPng Protocol ........................................................................ 544

Status Transition of IPv6 RIPng Protocol Route Entry and Related Timer ............................... 548

Avoidance of IPv6 RIPng Protocol Route Loop ..................................................................... 549

IPv6 OSPFv3 Dynamic Routing Protocol ................................................................ 551

Terms of OSPFv3 Protocol ................................................................................................. 551

Introduction to the OSPFv3 Protocol ................................................................................... 553

IPv6 IS-IS Dynamic Routing Protocol .................................................................... 577

Terms of IPv6 IS-IS Protocol ............................................................................................. 578

Introduction to IPv6 IS-IS Protocol ..................................................................................... 579

Route Learning of IPv6 IS-IS Protocol in Single-Topology ..................................................... 579

IS-IS Multi-Topology ......................................................................................................... 580

IPv6 BGP4+ Dynamic Routing Protocol ................................................................. 584

Terms of IPv6 BGP4+ Protocol ........................................................................................... 584

Introduction to IPv6 BGP4+ Protocol .................................................................................. 584

GVRP Technology................................................................................. 601

GVRP Overview and GARP Principle ...................................................................... 601

GVRP Overview ................................................................................................................ 601



GARP Principle .................................................................................................................. 602

Implementation of GVRP ..................................................................................... 605


Private VLAN Technology .................................................................... 608

Related Terms of Private VLAN Protocol .............................................................................. 608

Introduction to Private VLAN Protocol ................................................................................. 609

Typical Application of Private VLAN ..................................................................................... 610

Voice VLAN Technology ....................................................................... 612

Related Terms of Voice VLAN Protocol ................................................................................ 612

Introduction to Voice VLAN ................................................................................................ 612

Ports Cooperating with IP Telephone Sending tagged Voice Flow .......................................... 614

Ports Cooperating with IP Telephone Sending untagged Voice Flow ...................................... 615

Precautions ...................................................................................................................... 616

Typical Application of Voice VLAN ....................................................................................... 617

Neighbor Discovery Technology .......................................................... 618

NDSP and Relevant Terms ................................................................................... 618

Introduction to NDSP .......................................................................................... 618


MFF Technology ................................................................................... 620

MFF Technology .................................................................................................. 620

MFF Terms ....................................................................................................................... 621


PPPoE+ Technology ............................................................................. 626

PPPoE+ Principle ................................................................................................. 626

PPPOE+ Typical Application ................................................................................. 628



Overview

This document describes the basic principles and major functions of

protocol modules. It also analyzes the debugging information through

specific instances. The implementation is based on the OSI model.

Therefore, this chapter focuses on the Open Systems Interconnection (OSI)

to help understand the following chapters.

Main contents:

OSI model

Application of OSI model

Use ping command

System displayed information

OSI Model The OSI model is composed of seven layers: physical layer, data link layer,

network layer, transmission layer, session layer, presentation layer, and

application layer (see figure 1-1). Each layer processes specific

communication tasks. It exchanges data with the next layer of the protocol

stack through the protocol-based communication. The communication

between two network devices is implemented through the transfer of data

in the protocol stack of the devices. For example, if a workstation wants to

communicate with a server, the task starts from the application layer of

the workstation, certain information formatted by the lower layer, and

then the data reaches the physical layer. Then, the data is transmitted to

the server through the network. The server obtains information from the

physical layer of the protocol stack. Then, sends information to the upper

layer to explain the information until the information reaches the

application layer. Each layer can be called as the name and can be

identified thr9ough the location in the protocol stack. For example, the

bottom layer can be called as the physical layer of the first layer.

Application layer (layer 7)

Representation layer (layer 6)

Session layer (layer 5)

Transmission layer (layer 4)



Network layer (layer 3)

Data link layer (layer 2)

Physical layer (layer 1)

Figure 1-1 OSI model

The function implemented at the bottom layer is relevant with the physical

communication, for example, frame creation, transmission of packet-

contained signals; the middle layer coordinates the network

communication between nodes: ensure uninterrupted session, and error-

free communication The work of the highest layer affect the application

and data representation of the software, including data format, encryption,

data and file transmission management. Generally, these layers are called

protocol stack.

Physical Layer The bottom layer of the OSI model is the physical layer. It includes the

following items:

Data transmission medium (cable, fiber, radio, and microwave)

Network plug

Network topology

Signaling and coding method

Data transmission device

Signaling error detection

The physical layer device transmits and receives signals containing data, it

should generate, carry, and check the voltage. In the signal transmission,

the physical layer processes the data transmission rate, monitors the data

error frequency, and processes the voltage and electrical level.

Data Link Layer The function of data link layer in the LAN is to create frames. Each frame

is formatted in the specific mode. As a result, the data transmission can be

synchronized and identified. The layer will format data to serve as the

electrical signals sent to the transmission node through the frame code.

The receiving node decodes the data and detects the errors. The data link

layer creates the so-called data link frame, including the domain

composed of address and control information, as shown in the following:

Start point of the frame

The address of the device sending the frame (source address)



The address of the device receiving the frame (source address)

Management or communication control information.

Data

Error detection information.

Packet tail (or frame tail) tag.

If the communication is established between two nodes, the data link

layers of them are connected physically (through the physical layer) and

logically (through the protocol). The communication is established by the

transmission of the short signal set for data stream timing. Once the link is

established, the data link layer of the receiving end decodes the signals

into independent frames. The data link layer checks the received signals to

prevent receiving repeated, incorrect, or incomplete data. If any error is

detected, the error will be processed accordingly: the receiving end

discards the packets or the sending end retransmits the packets. The error

detection of the data link is performed by the Cyclic Redundancy Check

(CRC). The CRC is a kind of error detection method. It calculates a value

for the information domain (SOF, addressing method, control information,

data CRC and EOF). The value is inserted to the end of the frame in the

sending node by the data link layer. When the data link layer transmits the

frame to the upper layer, the value can ensure that the frame content is

the same as the sent content.

Network Layer In the protocol stack, the third layer from the bottom is the network layer.

All networks are composed of physical route (cable path) and logical route

(software path). The network layer reads the packet protocol address

information and forwards each packet along the best path (including

physical and logical paths) to transmit data effectively. In this layer, the

packets can be sent from one network to another through the router. The

path of the network layer control packets is similar to the traffic controller.

It routes the packets through the most effective path. To determine the

best path, the network layer needs to collect the information about

network and node addresses. The process is called discovery.

The network layer can route data on different paths by creating virtual

(logical) circuit. The virtual circuit is a logical communication path for

sending and receiving data. The virtual circuit is for the network layer only.

The network layer manages the data along multiple virtual circuits. Then,

when the data is reached, the wrong sequence may occur. The network

layer checks the data sequence before the data is transmitted to the next

layer. If necessary, correct the sequence. The network layer needs to

adjust the size of the frame to meet the requirements of the receiving

network.



Transmission Layer Similar to the data link layer and the network layer, the function of the

transmission layer is to ensure the reliable transmission of data from the

sending node to the destination node. For example, the transmission layer

ensures that the data is sent and received in the same sequence. The

receiving node will returns response after the transmission. When the

virtual link is adopted in the network, the transmission layer is also

responsible for tracing the ID specified to each circuit. The ID is called port,

connection ID, or socket, which is specified by the session layer. The

transmission layer needs to determine the level of the packet error

detection. The highest level can ensure that the packets can be

transmitted to from one node to another without any error in the tolerable

time.

Another function of the transmission layer is to divide messages into minor

units when the network uses different protocols with different packet size.

The data unit divided by the transmission layer in the transmission

network is combined by the transmission layer for the interpretation of the

network layer.

Session Layer The session layer is responsible for establishing and maintaining the

communication link between two nodes. It also determines correct

sequence for the communication between nodes. For example, it can

determine the first transmission node. The session layer can also

determine the transmission distance and how to restore from the

transmission error. If the transmission layer is interrupted in the lower

layer, the session layer will try to re-establish the communication.

Representation Layer This layer processes the data formatting problem. Different software

applications use different data formatting scheme. Therefore, the data

formatting is necessary. To some degree, the representation layer is

similar to the syntax checker. It ensures that the numbers and texts can

be sent in the format that can be recognized by the receiving node. For

example, the data sent from the IBM mainframe may use the EBCDIC

character formats. For the workstation running Window 95 or Windows98

can read the information, the data must be expressed in ASCII character

format.



The representation is also responsible for data encryption and data

compression.

Application Layer The application layer is the highest layer of the OSI model. It controls the

access to the application programs and network services. The network

services include file transmission, file management, remote access to file

and printer, email, and terminal simulation. The programmer uses the

layer to connect the workstation to the network service, for example,

connect the application link to the email, or provide database access on

the network.

Application of OSI Model We take examples to demonstrate the hierarchical communication.

Assume a workstation wants to access the shared drives of the server in

another network. In the workstation, the redirector of the application layer

locates the shared drives. The representation layer can determine the

workstation and server to use ASCII data format. The session layer

creates the link between two computers and ensures the link will not be

interrupted until the workstation ends the access to the shared drive. The

transmission layer can avoid the packet error and ensure the data can be

explained in the sending sequence. The network layer ensures the packets

can be sent through the fastest route to minimize the delay. The data link

layer creates frames and ensures that the frames can reach the proper

workstation. At last, the physical layer converts the information into

electrical signals that can be placed onto the network communication cable

to make data transmission possible. After the frame is formed, it can

adapt to the WAN communication in between LANs through encapsulation

or LAN simulation.

The OSI model is also applied to the network hardware and software

communication. To meet the standard, the network hardware and

software must contain the layers of the OSI model. The following table

lists the matching conditions of network hardware/software and specific

OSI model.

OSI Layer Matching Network Hardware or Software

Application layer Application program interface and gateway.

Representation layer Data conversion software and gateway

Session layer Network device software drivers, computer name searching software, and gateway.

Transmission layer Network device software drivers and gateway

Network layer Gateway, router, and source route bridge

Data link layer Network interface card, intelligent hub and bridge, and gateway.



Physical layer Cable circuit, cable socket, multiplex adapter, sender, receiver, transceiver, passive hub, passive cable connector, repeater, and gateway.

Table 1-1 Network hardware and software related with OSI model layers

The function of the gateway in the network is limited and proprietary.

Presently, the pure implementation is decreasing (except the email

gateway software) for other devices such as network bridge, router, and

switch provide diversified functions. In history, the gateway is a device

defined in any layer of the OSI model.

Successful LAN implements the communication standards created by the

OSI model in the core part. Two basic attributes of LAN-network type and

network transmission method are the basis for the compliance of

communication standard.

Use Ping Command Ping is a common tool used with IP for testing the connection between to

IP hosts. Use the ICMP protocol to send a series of test packets. The

packets will return to the source and display whether the destination is

available and display some timing and timeout statistics.

Simple Ping The simple ping command can be used in the common user mode and the

privileged user mode of the Maipu switch. The method is as follows:

Switch>ping 131.199.130.3

The returned response characters are as follows:

！Successful response

. timeout wait

U unavailable destination

& TTL timeout

It summarizes the results of sending 5 packets with the successful

proportion. If the ping is successful, it indicates that the network is normal

at the network layer. In addition, the two hosts can be connected to the

network layer.



Expanded Ping Sometimes, the simple ping command cannot provide expected test for

some faults. In this case, the privileged mode of the Maipu switch provides

the expanded ping command. Ping is an interactive mode. It provides the

quantity, size, timeout value, and data format to respond to different

prompts. The usage method is as follows: Switch# ping <CR>.

Then, you will be prompted to set parameters. You can also read the help

file of the command.

System Displayed Information show process

show cpu

show stack

show semaphore

show memory

show device

show arp

netstat –r

show ip socket

show pool

netstat -m

show ip statistics

show ip icmpstate

show process This command is to display the major tasks and the running status.

switch#show process Displayed Content

NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- tExcTask 2a2aa8 2ffe458 0 PEND 2b8b38 2ffe368 3d0001 0 tLogTask 2ad798 2ffbad0 0 PEND 2b8b38 2ffb9f0 0 0 tExcTrace 103050 2fe98b8 10 PEND 2bf428 2fe9450 0 0 tSysWdog 2fc2e8 2ff7178 15 DELAY 2cc8e8 2ff70f8 0 3 tShell1 1291f0 13280c0 20 PEND 2bf428 1327840 c0002 0 tSysLog 43ebdc 16173e8 40 PEND 2bf428 1617318 3d0001 0 tFwdTask 356a18 235fd78 45 PEND 2bf428 235fcd8 0 0 tMonDscc 3e9fac 1e638f0 45 DELAY 2cc8e8 1e63848 0 66 tNetTask 356984 23626a0 50 PEND 2bf428 23625e8 0 0 tSysTimer 122f88 235d410 50 PEND 2bf428 235d378 0 0 tActive 2fb32c 16087c8 55 DELAY 2cc8e8 1608738 0 8



tSysTask 449a54 2f43d30 60 PEND 2bf428 2f43c80 0 0 tTnd00 4f774c 2feead8 90 PEND+T 2bf428 2fee0e8 3d0004 179 tSh00 4fab9c 12f8098 90 READY 2ccf34 12f6a40 d0001 0 tTffsPTask 5609b0 2ff7e88 100 DELAY 2cc8e8 2ff7e00 0 3 tTelnetd 4f83fc 16066f0 120 PEND 2bf428 16065f0 0 0 tSnmpd 4cf0f8 1322c90 125 PEND 2bf428 1322258 3d0001 0 tSnmpTmr 4cee20 1323ea8 200 PEND 2b8b38 1323da8 0 0

Display the meaning of each item

NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY

Task name Entry address of the task Task ID Task priority Task status in the system Program counter, the instruction address of the current task The stack address of the task The error code of the task Task delay time

System task status

READY PEND

DELAY SUSPEND DELAY+S PEND+S PEND+T PEND+S+T State+I

The task is ready The task is congested

The task is delayed The task is suspended Delayed and suspended Pended and suspended With timeout value and is congested With timeout value, suspended, and pended The state has an inherited priority

Major functions of each task (common or configured)

tExcTask Exceptional tasks; provide VxWorks exceptional processing packets; implement the functions that cannot be performed in the interruption level The task must have the highest priority. You need not suspend, delete, or change the task priority.

tLogTask Log task, for the VxWorks to record the system information.

tExcTrace Display the system kernel information.

tSysWdog The watchdog task; when the switch encounters major faults, automatic restart can be performed.

tShell1 Shell task.

tSysLog Print the output information and write the specific information into the logging file.

tFwdTask System core forwarding task

tNetTask Task-level processing in the VxWorks network.

tSysTimer System timer

tActive Switching status detection



tSysTask Background system task; process the non-realtime system functions.

tTnd00 Forwarding task of the telnet

tSh00 Shell task of the telnet

tTffsPTask File system management task

tTelnetd The receiving task of Telnet; detect the connection request of the client.

tSnmpd Core task of the NMS

tSnmpTmr Timer tasks related with NMS

show cpu Display the CPU usage of each task.

switch#spy cpu

switch#show cpu Displayed Content

System monitor result: NAME ENTRY TID PRI total % (ticks) delta % (ticks) -------- -------- ----- --- --------------- --------------- tExcTask 3f9bb68 0 0% ( 0) 0% ( 0) tLogTask 3f98f90 0 0% ( 0) 0% ( 0) tRlimit 353bf80 5 0% ( 0) 0% ( 0) tKmemReapd 3f742a0 10 0% ( 0) 0% ( 0) tExcTrace 3555e30 10 0% ( 0) 0% ( 0) tFmmHdle 2c56238 10 0% ( 0) 0% ( 0) tCPUMonitor 3f90ac0 10 0% ( 0) 0% ( 0) tShell1 2b41248 20 0% ( 1) 0% ( 0)

tMbufTask 2e047b0 40 0% ( 0) 0% ( 0) tSysLog 2cb67c8 40 0% ( 0) 0% ( 0) tLocalStat 34ff8b8 45 0% ( 0) 0% ( 0) systimerhigh 34083a8 50 0% ( 0) 0% ( 0) tNetTask 2def410 50 0% ( 0) 0% ( 0) tFwdTask 2dec8a8 50 0% ( 0) 0% ( 0) tRtrSched 2c6a968 50 0% ( 0) 0% ( 0) tRtrIcmpRcv 2c67bf8 50 0% ( 0) 0% ( 0) tRtrJitter 2c64e88 50 0% ( 0) 0% ( 0) tRtrWdog 2c620a8 50 0% ( 0) 0% ( 0) tConMSig 2d404e0 55 0% ( 0) 0% ( 0) tActive 2b3a650 55 0% ( 0) 0% ( 0) tSysTask 3411928 60 0% ( 0) 0% ( 0) tAaaRecv 2c46f80 80 0% ( 0) 0% ( 0) systimer 3409cf8 90 0% ( 0) 0% ( 0) tGTL 2de7c00 90 0% ( 0) 0% ( 0) tLogHash 2d9d7e0 90 0% ( 0) 0% ( 0) tELD 2c4be58 90 0% ( 0) 0% ( 0) tTffsPTask 3f97478 100 0% ( 0) 0% ( 0) tStaticRt 2dc8c70 100 0% ( 0) 0% ( 0) tRtrSta 2c5ede0 100 0% ( 0) 0% ( 0) tAclTask 2d6eb60 110 0% ( 0) 0% ( 0) tPmtud 2df1dc0 120 0% ( 0) 0% ( 0) tTelnetd 2b39258 120 0% ( 0) 0% ( 0) tTelnetd6 2b35448 120 0% ( 0) 0% ( 0) tFmmDtct 2c50d98 220 0% ( 0) 0% ( 0) tDcacheUpd 34a8138 250 0% ( 0) 0% ( 0) tIdle 3f8f268 255 0% ( 1) 0% ( 0) KERNEL 0% ( 1) 0% ( 0)



INTERRUPT 0% ( 0) 0% ( 0) IDLE 99% ( 447) 100% ( 13)

TOTAL 99% ( 450) 100% ( 13)

Note

total% From starting monitoring to showcpu, the percentage of CPU usage

delta% From previous showcpu to now, the percentage of CPU usage

current% The CPU usage of each current task

KERNEL System kernel task

INTERRUPT Interrupted

IDLE Idle time of the CPU

TOTAL Total time

Show the CPU usage in some time segments:

switch# monitor cpu

switch#show cpu monitor Displayed Content

CPU utilization for five seconds: 2%; one minute: 1%; five minutes: 1% CPU utilization per second in the past 60 seconds: 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% 0% 0% 0% 9% 0% 0% 0% 0% 0% 0% CPU utilization per minute in the past 60 minutes: 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 1% 2% 1% 1% 1% 1% 1% 2% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CPU utilization per quarter in the past 96 quarters: 1% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



- - - - - - - - - - - - - - - -

- - - - - - - -

Note

CPU utilization for five seconds The CPU usage in the recent 5 seconds

one minute The CPU usage in the recent 1 minute

five minutes The CPU usage in the recent 5 minutes

CPU utilization per second in the past 60 seconds

The CPU usage per second in the recent 60 seconds

CPU utilization per minute in the past 60 minutes

The CPU usage per minute in the recent 60 minutes

CPU utilization per quarter in the past

96 quarters

The CPU usage per quarter in the recent 96 quarters

- The time is not up

show stack Display the task stacks in the system:

switch#show stack Displayed Content

NAME ENTRY TID SIZE CUR HIGH MARGIN ------------ ------------ -------- ----- ----- ----- ------ tExcTask 0x00002a2aa8 2ffe458 7984 240 472 7512

tLogTask 0x00002ad798 2ffbad0 4984 224 376 4608 tExcTrace 0x0000103050 2fe98b8 7984 1128 1360 6624 tMonitor 0x0000102198 12f1438 2032 136 200 1832 tSysWdog 0x00002fc2e8 2ff7178 3984 128 360 3624 tShell1 0x00001291f0 13280c0 16376 2176 3552 12824 tSysLog 0x000043ebdc 16173e8 5112 208 1088 4024 tFwdTask 0x0000356a18 235fd78 9984 160 1384 8600 tMonDscc 0x00003e9fac 1e638f0 7984 168 1048 6936 tNetTask 0x0000356984 23626a0 9984 184 1064 8920 tSysTimer 0x0000122f88 235d410 10224 152 328 9896 tCheckCpu 0x00004f14dc 12f0008 8176 176 4544 3632 tActive 0x00002fb32c 16087c8 3992 144 424 3568 tSysTask 0x0000449a54 2f43d30 9984 176 240 9744 tTnd00 0x00004f774c 2feead8 10232 2544 3448 6784 tSh00 0x00004fab9c 12f8098 20472 2600 5864 14608 tTffsPTask 0x00005609b0 2ff7e88 2032 136 416 1616 tTelnetd 0x00004f83fc 16066f0 10224 256 976 9248 tSnmpd 0x00004cf0f8 1322c90 28664 2616 4800 23864 tSnmpTmr 0x00004cee20 1323ea8 4080 256 536 3544 tIdle 0x0000102304 12f0a20 2040 128 408 1632 INTERRUPT 5000 0 800 4200

Note

Meaning of each item:



NAME Task name

ENTRY Entry address of the task

TID Task ID

SIZE Stack size

CUR The size of the memory used in the current stack

HIGH The size of the memory used in the biggest stack

MARGIN The size of memory that is not used in the stack

show semaphore Display the major semaphores used in the system and the status:

switch#show semaphore all Displayed Content

===== SEMLIST [Checksum : 0xd954] ===== Semaphore Id : 0x3ede4d8 Semaphore Type : BINARY Task Queuing : FIFO Pended Tasks : 0 State : EMPTY Owner : 0x23e0478 (tShell1) Options : 0x0 SEM_Q_FIFO VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A ===== SysMemMechSem [Checksum : 0x79a2] ===== Semaphore Id : 0x3ede508 Semaphore Type : MUTEX Task Queuing : PRIORITY Pended Tasks : 0 Owner : NONE Options : 0x9 SEM_Q_PRIORITY SEM_INVERSION_SAFE VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A

VxWorks Events -------------- Registered Task : NONE Event(s) to Send : N/A Options : N/A

Note

Semaphore type includes: MUTEX, BINARY, and COUNTING.



Task queuing ( Priority FIFO)

Use the show semaphore command to configure different parameters to

implement different functions:

show semaphore _STRING_: Display the information about specific

semaphore

Show semaphore list: display the list of the current semaphore

show semaphore binary | counting | mutex any | pended | unpended

detail | summary: Display the information about different types of

semaphores. Pended means the semaphore is blocked; unpended means

the semaphore is not blocked; detail means displaying the detailed

information; summary means the summary information.

show memory Display the memory usage in the system:

switch#show memory Displayed Content

Memory management mechanism, types, and usage. SUMMARY ------- Type Used bytes Free bytes Total bytes Used percent

---- ---------- ---------- ----------- ------------ HEAP 21291496 28001744 49293240 43.19% CODE 17810592 / 17810592 / SLAB 539040 349792 888832 60.65% MBUF 755936 16081824 16837760 4.49% Note The space of all such memory types exclude CODE is part of the HEAP's used memory,for example:MBUF,SLAB,and FPSS if exists. The memory of all memory management mechanisms (such as MBUF, SLAB, and FPSS-if

they exist) except the CODE segment are part of the used memory of HEAP.）

STATISTICS ---------- Used bytes Free bytes Total bytes Used percent ---------- ---------- ----------- ------------ 22670472 44433360 67103832 33.78%

Note




HEAP Stack memory, the most basic memory area in the system. Other re-allocation memory management mechanisms are separated from the area.

CODE Code segment memory, used in the area for saving code segment

SLAB

A memory re-allocation management mechanism

MBUF A memory re-allocation management mechanism

FPSS A memory re-allocation management mechanism, exists in MP3700, MP7200, and MP7500.

Use the show memory command to configure different parameters to

implement different functions:

show memory FPSS|HEAP|MBUF|SLAB: display the memory usage of

different memory management mechanisms

show memory FPSS|MBUF|SLAB _POOLNAME_: display the usage of a

memory pool in a memory management mechanism

show memory detail: display the usage details of system memory

show memory detail FPSS|HEAP|MBUF|SLAB: display the detailed memory

usage of different memory management mechanisms

show memory detail FPSS|HEAP|MBUF|SLAB _POOLNAME_: display the

detailed usage of a memory pool in a memory management mechanism

show arp Display the ARP cache of the system.

switch#show arp Displayed Content

Protocol Address Age (min) Hardware Addr Type Interface

Internet 128.255.41.40 2 0022.153b.55e4 ARPA vlan1

Internet 128.255.41.47 - 0001.7a5c.004a ARPA vlan1

Internet 128.255.43.254 0 0001.7a58.19ba ARPA vlan1

Note

When age is displayed as -, it means the static ARP entity.

show ip socket Display the information about the sockets in the active status:

switch#show ip socket Displayed Content

Active Internet connections (including servers) PCB Proto Recv-Q Send-Q Local Address Foreign Address vrf (state) -------- ----- ------ ------ ---------------------- ---------------------- ------- -------



2f50cb0 TCP 0 0 0.0.0.0.23 0.0.0.0.0 all LISTEN 2f50ba8 UDP 0 0 0.0.0.0.520 0.0.0.0.0 kernel 2f50aa0 UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel 2f50a1c UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel 2f5080c UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel

Note


PCB

The address of the socket protocol control block (PCB)

Proto

The protocol type of the socket

Recv-Q

The quantity of data received in the receiving cache of the socket

Send-Q

The quantity of data in the sending cache of the socket

Local Address The local IP address and port number bound with the socket (0.0.0.0.23 indicates that the IP address is any of the all local IP addresses; the port number is 23).

Foreign Address The foreign IP address and the port number corresponding to the socket.

Vrf VPN route forwarding

state

The status of the socket (effective to the TCP)

show pool Display the three commands in the current cache pool:

Show pool (show the summary of the pool)

Show pool detail (show the details of the pool)

Show pool information (show the actual information about the cache chain)

The description is as follows:

Switch# sh pool Displayed Content

Driver pool

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size clusters free usage

-------------------------------------------------------------------------------

1884 11008 10496 3906

-------------------------------------------------------------------------------

Size: 21247488 bytes

Data pool

__________________

CLUSTER POOL TABLE



_______________________________________________________________________________

size clusters free usage

-------------------------------------------------------------------------------

64 18000 17983 1611

128 36000 35943 175

256 3424 3422 40

512 2400 2394 20

1024 180 180 0

2048 300 300 0

-------------------------------------------------------------------------------


All MBUF pool size : 35689728 bytes

Note

*** pool The name of the cache pool, for example, data pool

is the cache pool used by the upper layer protocol and the driver pool is

the cache pool used by the drivers.

In CLUSTER POOL TABLE, the meaning of each item is as follows:

Size: the size of the cache data pool

Clusters: the number of data blocks

Free: the number of blocks not used

Usage: the times of using the pool

CLUSTER POOL TABLE size: the size of occupied memory

All MBUF pool size: the size of the memory occupied by all cache pools

Switch# sh pool information Displayed Content

Driver pool

free mBlk number : 5500, fact free number : 5500

free clBlk number : 5504

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size clusters free usage fact

-------------------------------------------------------------------------------

1884 11008 10496 2872 10496

-------------------------------------------------------------------------------


Data pool

free mBlk number : 69918, fact free number : 69918



free clBlk number : 59198

__________________

CLUSTER POOL TABLE

_______________________________________________________________________________

size clusters free usage fact

-------------------------------------------------------------------------------

64 18000 17983 1133 17983

128 36000 35943 151 35943

256 3424 3422 10 3422

512 2400 2394 18 2394

1024 180 180 0 180

2048 300 300 0 300

-------------------------------------------------------------------------------


All MBUF pool size : 35689728 byt

Note

free mBlk number: the number of mblk not used

fact free number: the actual number of mblks of traversed mblk links

free clBlk number: the number of clblk

In CLUSTER POOL TABLE, the fact indicates the number of clusters

obtained in traversing cluster chain

switch#show pool detail Displayed Content

fastethernet pool Statistics for the network stack mbuf type number --------- ------ FREE : 1022 DATA : 2 HEADER : 0 SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0 MRTABLE : 0 DRVSCC : 0



DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 0 DRVEXTSCC: 0 TOTAL : 1024 number of mbufs: 1024 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 1556 512 256 599

------------------------------------------------------------------------------- Link pool Statistics for the network stack mbuf type number --------- ------ FREE : 1640 DATA : 0 HEADER : 0 SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0 MRTABLE : 0 DRVSCC : 0 DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 0

DRVEXTSCC: 0 TOTAL : 1732 number of mbufs: 1732 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 ____ _____________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 64 1600 1600 0 128 10 10 0



256 10 10 0 512 10 10 0 1024 10 10 0 2048 100 100 0 ------------------------------------------------------------------------------- Size: 461120 bytes sys pool Statistics for the network stack mbuf type number --------- ------ FREE : 11560 DATA : 1 HEADER : 0 SOCKET : 2 PCB : 3

RTABLE : 22 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 8 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 4 IFMADDR : 0 MRTABLE : 0 DRVSCC : 0 DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 0 DRVEXTSCC: 0 TOTAL : 38400 number of mbufs:38400 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 64 8000 7973 28

128 16000 15959 59 256 3200 3199 1 512 3200 3192 26 ------------------------------------------------------------------------------- Size: 7801600 bytes Data pool Statistics for the network stack mbuf type number --------- ------ FREE : 7999 DATA : 0 HEADER : 0



SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 1 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0

MRTABLE : 0 DRVSCC : 0 DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 0 DRVEXTSCC: 0 TOTAL : 8000 number of mbufs: 8000 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 64 800 800 4 128 200 199 27520 256 200 200 0 512 100 100 0 1024 80 80 0 2048 50 50 0 ------------------------------------------------------------------------------- Size: 767000 bytes Driver pool Statistics for the network stack mbuf type number --------- ------ FREE : 1388 DATA : 112

HEADER : 0 SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 0 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0



OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0 MRTABLE : 0 DRVSCC : 56 DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 4 DRVEXTSCC: 4 TOTAL : 6000 number of mbufs: 6000 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0

__________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage ------------------------------------------------------------------------------- 1600 6000 5936 2446 -------------------------------------------------------------------------------


All MBUF pool size : 19971928 bytes

Note

*** pool The name of the cache pool, for example,

fastethernet pool is the cache pool of the 100M Ethernet interface

The following describes the Ethernet pool:

Statistics for the network stack mbuf

type number: Statistics of various mbufs

FREE : 1022 number of remaining mbuf

DATA : 2 The number of mbus for saving data

HEADER : 0 The number of mbuf for saving protocol headers

SOCKET : 0 The number of mbuf for creating sockets

PCB : 0 The number of mbuf for creating PCB

RTABLE : 0 The number of mbuf for creating routes

HTABLE : 0 The number of mbuf for creating IMP hosts

ATABLE 0 The number of mbuf for creating address resolution

table

SONAME : 0 The number of mbuf for saving socket names



ZOMBIE : 0 The number of mbuf for creating zombie option

SOOPTS : 0 The number of mbuf for saving socket option

FTABLE : 0 The number of mbuf for IP reconstruction

RIGHTS : 0 The number of mbuf for creating rights of accessing

kernel

IFADDR : 0 The number of mbuf for creating the interface address

CONTROL: 0 The number of mbuf for creating control option

OOBDATA : 0 The number of mbuf for saving TCP out-of-band data

IPMOPTS : : 0 The number of mbuf for saving multicasting option

IPMADDR: : 0 The number of mbuf for saving multicasting address

IFMADDR : 0 The number of mbuf for saving multicasting address in

Ethernet

MRTABLE : 0 The number of mbuf for saving multicasting routing

table

DRVSCC : 0 The number of mbuf for driving scc

DRV8SA : 0 The number of mbuf for driving 8sa

DRV8S : 0 The number of mbuf for driving 8s

DRV16A : 0 The number of mbuf for driving 16a

DRV4M336 : 0 The number of mbuf for driving 4m336

DRVEXTSCC 0 The number of mbuf for driving the expanded card

TOTAL : 1024 the sum of the preceding statistics

number of mbufs: 1024 The number of MBLK in the current pool

number of times failed to find space: 0 The times of failed to applying for

the mbuf

number of times waited for space: 0 The times of waiting in applying for

mbuf

number of times drained protocols for space: 0 The times of recycling

mbuf

CLUSTER POOL TABLE The statistics of cluster pool of the current mbuf

pool

size clusters free usage cluster size, total data, free, used



1556 512 256 599

netstat -m Display the statistics of the system data pool:

switch#netstat -m Displayed Content

Statistics for the network stack mbuf type number FREE : 7999 DATA : 0

HEADER : 0 SOCKET : 0 PCB : 0 RTABLE : 0 HTABLE : 0 ATABLE : 0 SONAME : 0 ZOMBIE : 1 SOOPTS : 0 FTABLE : 0 RIGHTS : 0 IFADDR : 0 CONTROL : 0 OOBDATA : 0 IPMOPTS : 0 IPMADDR : 0 IFMADDR : 0 MRTABLE : 0 DRVSCC : 0 DRV8SA : 0 DRV8S : 0 DRV16A : 0 DRV4M336: 0 DRVEXTSCC: 0 TOTAL : 8000 number of mbufs: 8000 number of times failed to find space: 0 number of times waited for space: 0 number of times drained protocols for space: 0 __________________ CLUSTER POOL TABLE _______________________________________________________________________________ size clusters free usage

------------------------------------------------------------------------------- 64 800 800 9 128 200 199 20 256 200 200 0 512 100 100 0 1024 80 80 0 2048 50 50 0 -------------------------------------------------------------------------------

Note



The command displays the statistics of the system data pool. The display

format and the content of the show pool detail are the same as that of the

data pool. In the show pool detail command, the statistics of the system

data pool is also displayed.

show ip statistics Display the statistics of the IP packets:

switch#show ip statistics Displayed Content

Statistics for the IP protocol

total 1434 badsum 0 tooshort 0 toosmall 0 badhlen 0 badlen 0 infragments 0 fragdropped 0 fragtimeout 0 forward 0 cantforward 1403 redirectsent 0 unknownprotocol 0 toupper 31 nobuffers 0 reassembled 0 outfragments 0 noroute 0 rawsockout 0 badaddress 0 fastforwardtotal 0 fastforward 0 cannotfastforward 0

Note

total The number of received and sent packets.

bandsum The number of packets with incorrect checksum.

tooshort The length of the received packets is shorter than actual length (the length filed in the IP header).

toosmal The length of the received packets is shorter than the IP header length (20 bytes)

badhlen The IP header filed is smaller than the IP length (20 bytes)

badlen The value of the IP header is smaller than the IP header length

infragments The number of received fragments

fragdropped The number of dropped fragments

fragtimeout The number of timeout dropped fragments

forward The number of forwarded packets

cantforward The number of packets that cannot be forwarded

redirectsent The number of redirected packets

unknownprotocol The number of unknown protocol packets

toupper The number of sent to the upper layer



nobuffers The times of no buffer

reassembled The number of reassembled packets

outfragments The number of sent fragments

noroute The times of route failure

rawsockout The number of original IP packets

badaddress The number of the packets with the illegal address

fastforwardtotal The total of fast forwarded packets

fastforward The number of fast forwarded packets

cannotfastforward The number of packets that cannot be fast forwarded

show ip icmpstate Display the statistics of the ICMP packets:

switch#show ip icmpstate Displayed Content

Statistics for ICMP protocol 6929 calls to icmp_error 0 error not generated because old message was icmp Output histogram: echo reply: 5 destination unreachable: 24 0 message with bad code fields 0 message < minimum length 0 bad checksum 0 message with bad length Input histogram: echo: 5 #10: 2

5 message responses generated

Note

call to icmp error The number of invoking ICMP to send ICMP error packets

error not generatd because old message was icmp

The number of errors discarded for the packets are ICMP packets

Output histogram The histogram of the sent ICMP packets

echo reply The number of ICMP packets of echo reply

destination unreachable The number of ICMP packets with unreachable destination

message with bad code fields The number of ICMP packets discarded for

invalid code

message < minimum fields The number of packets discarded for the ICMP header is too short

bad checksum The number of discarded ICMP packets for bad checksum

message with bad length The number of discarded ICMP packets for invalid ICMP body

Input histogram The histogram of the received ICMP packets

echo The number of ICMP packets of echo

#10: 2 There are two packets with the type of ICMP_UNREACH_HOST_PROHIB

message response generated The number of generated response packets



Switch Principles

This chapter describes the switch principles for users to understand the

later chapters.

Main contents:

The development of the switching technology

The basic working principle of the switch

Multiple layer switching technology

Comparison between the switch and other network communication

products

Development of the Switching Technology The following is the development process of the LAN.

The combination of the computer technology and the communication

technology boosts the rapid development of the LAN. From 1960s to 1990s,

the development experiences ALOHA to 1000Mbps switching Ethernet. In

the thirty years, the technology leaps from simplex to duplex, from

sharing to switching, from low speed to high speed, from simple to

complex, and from expensive to popular.

In the later 1980s, the rapid increase of the semaphore boosts the

development of the technology. As a result, the LAN has increasingly

excellent performance. The 1M bps rate is replaced by the 100BASE-T and

100CG－ANYLAN. But, in the traditional media access method, lot of sites

share a common transmission media, namely CSMA/CD.



In the early 1990s, with the improvement of the computer performance

and the increase of the semaphore, the traditional LAN is beyond its load.

The switching Ethernet technology emerges and the performance of the

LAN is significantly improved. Compared with the LAN topology of the

shared media based on the bridge and router, the bandwidth of the

network switch increases. With the switching technology, the dispersed

network can be constructed. As a result, the ports of the LAN switch can

transmit information parallelly, safely, and simultaneously. Therefore, the

LAN can be intensively expanded.

The development of the LAN switching technology goes back to the two-

port bridge. The bridge is a storage and forwarding device for connecting

similar LANs. According to the structure of the internet network, the bridge

is the DCE class point-to-point connection. According to the protocol layer,

the bridge stores and forwards the data frame in the logical link layer; it is

similar to the function of a repeater in the L1 and router in L3. The two-

port bridge and the Ethernet are developing at the same time.

The Ethernet switching technology is developed in 1990s based on the

multiple-port bridge. It implements the lower two layer protocols and is

related with the bridge. It is even called by the professionals as ―many

connected bridges‖. Therefore, the current switching technology is not new

standard; it is only the new application of current technology and is the

improved LAN bridge. Compared with traditional bridge, the switching

technology provides more ports, better performance, more powerful

management functions, and lower price.

Basic Working Principle of the Switch The LAN switching technology is on the L2 (data-link layer) of the OSI

model. The "switching‖ means forwarding frames. In the data

communication, all switching devices (namely the switches) implement

two basic tasks:

Frame forwarding: forward the frames received from the input media to

the corresponding output media;

Address learning process: construct and maintain the switching address

table to maintain the switch operation.



The following describes the details of the two basic operations.

Frame Forwarding The switch forwards frames according to the MAC address. When the

switch forwards frames, the following rules must be observed:

1. If the destination MAC address of the frame is broadcasting address or

multicasting address, the frame is forwarded to all ports of the switch

(except the source port of the frame);

2. If the destination address of the frame is a unicast address, but the

address is not in the address table of the switch, the frame is

forwarded to all ports (except the source port of the frame).

3. If the destination address of the frame is in the address table of the

switch, forward the frame to the corresponding port according to the

address table.

4. If the destination address and the source address of the frame are in

the same network segment, the frame is discarded and switching is

not performed.

The following figure illustrates the frame switching.

Figure 2-1 Frame forwarding



When host D sends the broadcast frames, the switch receives frames with

the destination address of ffff.ffff.ffff from port E3, the frame is forwarded

to ports E0, E1, E2, and E4.

When host D communicates with host E, the switch receives frames with

the destination address of 0260.8c01.5555 from E3 port. Search the

address table and find that 0260.8c01.5555 is not in the table. Therefore,

the switch forwards the frames to E0, E1, E2 and E4 ports.

When host D communicates with host F, the switch receives frames with

the destination address of 0260.8c01.6666 from port E3. Search the

address table and find that 0260.8c01.6666 is at port E3, namely, the

address and the source address are in the same network segment.

Therefore, the switch does not forward the frame, and it drops the frame

directly.

When host D communicates with host A, the switch receives the frames

with the destination address of 0260.8c01.1111 from port E3. Search the

address table and find that 0260.8c01.1111 is at port E0. Therefore, the

switch forwards the frames to port E0. As a result, host A can receive the

frame.

If host D communicates with host A, host B is sending data to host C, the

switch also forwards the frames from switch B to port E2 connecting host

C. In this case, between E1 and E2, E3 and E0, through the hardware

switching circuit in the switch, two links are created. The data

communication between on the two links does not affect mutually.

Therefore, no network conflicts are encountered. Therefore, the

communication between host D and host A occupies a link exclusively. The

communication between host C and host B also occupies a link exclusively.

This type of link is created only when the two parties of the

communication have the requirements. When the data is transmitted, the

corresponding link is removed. This is the major features of the switch.

According to the switching process described previously, we can find that

the forwarding of frames is based on the MAC address table in the switch.

The following describes the creation and maintenance of the address table.

Address Learning Process In the address table of the switch, one entry is composed of one MAC

address and the resident switch port number. The generation of the whole



address table is through the dynamic self-learning, namely, when the

switch receives a frame, the source address and the input port are

recorded in the switching address table. Figure 2-2 illustrates the

forwarding and learning of the received frames.

When a frame reaches from a specific port, the switch gets the conclusion

according to the two items: from port X, the workstation specified by the

frame source address domain can be reached. Therefore, the switch can

update the forwarding database for the MAC address. To allow the change

of the network topology, each item of the database is configured with a life

timer. When a new item is added to the database, the timer is started. The

default value of the timer is 30 seconds. If the scheduled time is up, the

item searches the database to check whether any item with the same

address field value and frame address exists. If such item exists in the

database, the content of the item is updated. Reset the timer. If such item

does not exist in the database, add a new item in the database. The

address in the new item is the MAC address of the received frame; the

port number is the port of the received frame; the timer value is set to the

original value.



Figure 2-2 Bridge forwarding and address learning

Multiple Layer Switching Technology The implementation of the LAN switching technology is through the

hardware mode. In the frame format of the LAN, the position of the

destination MAC address is fixed. The check of the header information is

simple to facilitate hardware switching. Therefore, the traditional LAN

switching refers to the L2 switching, namely, based on the L2 information-

destination MAC address.



In the switching mode, the switch needs to receive certain data to check

the forwarding before the switching operation. If the length of the

detection data is increased, you can expand the L2 switching technology to

the L3, or even L4 switching technology.

In the L3 switching technology, the detection data is expanded to the IP

packet header. The switching is performed by checking the IP address.

Actually, it is based on the hardware route. L4 switching technology

checks the communication protocol type and the port number in the IP

packet header. It can be regarded as the switching based on application.

The widely used multiple layer switching technology combines L2, L3, and

L4 switching technologies to implement ―one route, multiple switching‖

function.

Comparison Between the Switch and Other Network Communication Products

Switch and the Switch Hub The switch hub can provides terminals with exclusive bandwidth,

automatically create and maintain the station table, and create switching

path between the output and input ports according to the station table.

The switch is developed based on the switch hub. It provides the

preceding functions, and also provides the functions required by the

current network: information flow priority, service category, virtual

network, RMON, automatic flow control, embedded network management

proxy. These functions construct the high speed, flexible, intelligent,

reliable, and expansible network. It provides high-speed data transmission

capability and good QoS, It extends the data transmission network to a

new field which is suitable for the multimedia application and real-time

data transmission.



Switch and Router The traditional switch is developed based on the bridge. It belongs to L2 of

the OSI model. It addresses according to the MAC address. It selects

routes through the station table. The creation and maintenance of the

station table is performed automatically by the switch. The router belongs

to L3 of the OSI model. It addresses according to the IP address and

selects route according to the routing table. The routing table is generated

by the routing protocol. The advantage of the switch is the fast speed. The

switch only needs to identify the MAC address of the frame and select the

forwarding port according to the MAC address. The algorithm is simple and

the implementation of ASIC is easy. Therefore, the forwarding speed is

high, the line speed forwarding can be implemented in the 100Mbit/s and

Gbit/s communication links. The working mechanism of the switch also

brings the problems including loopback, lumped loading, and broadcast.

With the development of the technology, the problems are solved.

With the emergence of L3 switch, the function of the switch is becoming

more and more important. Compared with traditional router, L3 switch has

the following advantages:

Each interface is connected to a subnet. The transmission rate of the

subnet through the router is restricted by the bandwidth of the

interface. The L3 switch is different. It can define multiple ports to a

virtual network. The virtual port composed of multiple ports serves as

the virtual network interface. The information in the virtual network

can be transmitted to the L3 switch through the ports forming the

virtual network. The port number can be specified, the transmission

bandwidth between subnets is not restricted.

The information resources are reasonably configured: the rates for

accessing the resources in the subnet and for accessing resources in

the global network are the same, therefore, setting independent

server for the subnet is not necessary. IN the global network, you can

set the server cluster to save cost and configure information resources

reasonably.

The cost is reduced: In the normal network design, the subnet is

composed of switches and the subnets are connected through routers.

In the current network design, the L3 switch is adopted. It can divide

any virtual network and implement inter-subnet communication

through the L3 routing function of the switch. As a result, the cost for

the expensive routers is saved.

The connection between switches is flexible: Loopback is not allowed

between switches; multiple paths are used for improving the reliability

and balancing the load when it used as router. L3 switch use the

spanning tree algorithm to block the port that causes the loopback. In

the case of selecting routes, the blocked paths are still the options.

The function of a router is more powerful than that of a switch. But the

rate of a router is low and the price is high.



The L3 switch is widely used for it has the line speed forwarding capability

of a switch and has the good control function of a router.



VLAN Technology

This chapter describes the VLAN technology and its application.

Main contents:

Overview and principle

VLAN division

Typical application

Overview and Principle This chapter describes the VLAN concept and principle.

Main contents:

Overview

VLAN principle

Overview In the Ethernet communication, network problems including serious

conflict, flooded broadcast, and performance decreasing may be

encountered when the number of hosts is large. To solve the preceding

problems, the VLAN technology occurs. Each VLAN is a broadcast domain.

The hosts in a VLAN can communicate mutually. But the hosts between

VLANs cannot communicate with each other. As a result, the broadcast

packets are limited to a VLAN.

A VLAN is to divide physical network into logical networks. The division of

VLAN is not restricted by the physical location. The hosts in different

locations can belong to the same VLAN. VLAN restricts the broadcast

domain. The L2 unicast, broadcast, and multicast frames can be forwarded

and spread in the local VLAN and cannot enter other VLANs. L2 packets in



different VLANs are isolated, namely, users of different VLANs cannot

communicate mutually.

VLAN Principle To identify packets of different VLANs, add VLAN tag in the packets. The

encapsulation format of the VLAN packets comply with IEEE 802.1Q, as

shown in the following figure.

DA: destination MAC address; SA: Source MAC address; Type: protocol

type of the packets. IEEE 802.1Q defines that after the destination MAC

address and the source MAC address, four-byte VLAN tag should be

encapsulated to identify the VLAN. The VLAN tag contains four fields

including Tag Protocol Identifier (TPID), priority, Canonical Format

Indicator (CFI), and VLAN ID.

TPID: identify the frame with VLAN tag; the length is 16bit; the value is

0x8100.

Priority: Indicates the 802.1P priority of the packets; the length is 3 bit.

CFI: identifies whether the MAC address can be encapsulated in standard

format in different transmission media. The length is 1 bit. The value 0

indicates that the MAC address can be encapsulated in standard format;

the value 1 indicates that the address is encapsulated in non-standard

format. The default value is 0.

VLAN ID: identifies the VLAN o the packets. The length is 12bit. The value

range is 0-4095. 0 and 4095 are the reserved value of the protocol. The

value range of VLAN ID is 1-4094.

VLAN Division VLAN can be divided into different types. The common types are as follows:



Port-based VLAN

MAC-based VLAN

IP subnet-based VLAN

Protocol-based VLAN

In the default configuration, the priority (from high to low) of the four

types of VLANs is: MAC-based VLAN, IP subnet-based VLAN, Protocol-

based VLAN, and Port-based VLAN. In the same port, the VLAN division

takes effect according to the priority. Only one VLAN division takes effect.

Port-Based VLAN In the Port-based VLAN, a port is regarded as a member of the port and

added to the VLAN. The port can forward the packets of the VLAN.

Port Types The port modes can be classified into three types according to the mode of

processing packet tag.

Access:

The port belongs to one VLAN; the default VLAN ID of the port and the

home VLAN ID are the same; connected with user devices. The default

type of the port is Access.

Trunk:

The port allows multiple VLANs; receives and sends packets of multiple

VLANs; permits default VLAN packets without tag; used in interconnection

of network devices.

Hybrid:

The port can be added to multiple VLANs; receives and sends packets of

multiple VLANs; permits packets without tag of multiple VLANs; used in

interconnection of user devices and network devices.



Defaul t VLAN of the Port Through the default VLAN of the port, divide the packets without tag

received to the default VLAN. The default VLAN of the port is 1 The user

can configure the default VLAN of the port as required.

The default VLAN of the Access port is the home VLAN. It cannot be

configured.

The Trunk port and the Hybrid port can belong to multiple VLANs. The

default VLAN can be configured.

MAC-based VLAN The MAC-based VLAN divides VLAN ID for packets according to the source

MAC address of the received packets.

The untag packets received in the port are process as follows according to

different configuration:

1. If the source MAC and the MAC address of MAC-based VLAN are

consistent, and the In port of the packets is allocated to the VLAN of

the corresponding VLAN ID, the packet is allocated to the VLAN ID

corresponding to the MAC VLAN.

2. If the packets have no MAC set by the matched MAC VLAN, the

packets are divided to the default VLAN ID of the port.

IP subnet-based VLAN The IP subnet-based VLAN divides VLAN ID for packets according to the

source IP address of the received packets.

The untag packets received in the port are process as follows according to

different configuration:

1. If the source IP address is in the network segment of IP subnet-based

VLAN, and the In port of the packets is allocated to the VLAN of the

corresponding VLAN ID, the packet is allocated to the VLAN ID

corresponding to the network segment.

2. If the packets have no network segment set by the matched IP subnet

VLAN, the packets are divided to the default VLAN ID of the port.



Protocol-based VLAN The protocol-based VLAN divides VLAN ID for packets according to the

encapsulation format and protocol type of the received packets.

The protocol VLAN defines the protocol template. The protocol template is

composed of the frame encapsulation format and the protocol type. The

same port can be configured with multiple protocol templates. When the

protocol VLAN is enabled in the port, the port is configured with protocol

template, the protocol VLAN process the received untag packets as follows

according to different configuration.

1. If the packet matches the protocol template, and the In port of the

packet is allocated to the VLAN of the corresponding VLAN ID, the

packet is allocated to VLAN ID corresponding to the port configuration

protocol template.

2. If the packets have no matched protocol template, the packets are

divided to the default VLAN ID of the port.

Typical Application In an enterprise, communication can be performed in the same

department located in different places. Communication cannot be

performed between different departments. The networking diagram is as

follows:

VLAN 10 ，VLAN 20

VLAN 10 VLAN 20 VLAN 10 VLAN 20

Section A Section BSection ASection B

For the detailed configuration of VLAN, see chapter 4 VLAN Configuration.



Link Aggregation

This chapter describes the link aggregation technology and its application.

Main contents:

Link aggregation

Classification of link aggregation

Typical application

Link Aggregation This section describes the concept of the link aggregation.

Main contents:

Terms of the link aggregation

Functions of the link aggregation

LACP protocol

Terms of the Link Aggregation Link aggregation: multiple physical links are bound together to form a

logical link, which expands the link bandwidth. At the same time, the

member links of the aggregation are dynamic backup mutually. It provides

higher reliability.

LAC: Link Aggregation Control

LACP: Link Aggregation Control Protocol, defined in IEEE802.3ad.

LACPDU: Link Aggregation Control Protocol Data Unit.



LAG: Link Aggregation Group.

LAG ID: Link Aggregation Group Identifier.

Key: 16-bit integer variable, for describing the aggregation capability of a

port. It is composed of rate, duplex, and administrative key (aggregation

group ID).

Administrative Key: The key used by the administrator for setting.

Operational Key: The key reflecting the port aggregation capability.

Functions of the Link Aggregation The link aggregation is a aggregation group composed of multiple ports.

The upper layer entities using the link aggregation service regard the

multiple physical links in the same aggregation group as a logical link. The

function of the link aggregation is to share the in/out load in each member

port to increase the link bandwidth. At the same time, member ports of

the aggregation group are dynamic backup mutually. It provides higher

reliability.

LACP Protocol IEEE802.3ad-based LACP is a protocol for implementing the link dynamic

aggregation. The LACP protocol communicates with the opposite end

through the Link Aggregation Control Protocol Data Unit (LACPDU).

After the LACP protocol of a port is enabled, the port advertises the

system priority, system MAC address, port priority, port number, and the

operation key to the opposite end by sending LACPDU. After the opposite

end receives the information, compare the information with the

information saved in other ports to select port to aggregate. As a result,

the two parties can agree with each other on joining or exiting a dynamic

aggregation group.

The operation key is a configuration combination generated by the LACP

protocol according to the port configuration (rate, duplex, administrative

key).



Classification of Link Aggregation The link aggregation can be classified into two types according to the

aggregation mode:

1. Manual aggregation

2. LACP protocol aggregation

Manual Aggregation 1. Overview

The manual aggregation is configured by the user manually. The LACP

protocol of the manual aggregation port is disabled.

2. Port status in the manual aggregation group

In the manual aggregation group, the status of the port can be Selected

and Unselected. Only the Selected port can receive user service packets;

the Unselected port cannot receive or send user service packets.

The system sets the port status (Selected or Unselected) according to the

following principles:

The any port in the aggregation group is in the Up status, select the

port with the highest priority and in the Up status to serve as the root

port of the group.

The port in the Up status with the same operation key as the root port

becomes the candidate port of the possible Selected port. Other ports

will be in the Unselected status.

The number of the ports in the Selected status of the manual

aggregation group is limited. When the number of the candidate ports

does not reach the upper limit, all candidate ports are in the Selected

status and other ports are in the Unselected status. When the number

of the candidate ports exceeds the limit, the system selects some

candidate ports to remain the Selected status according to the port

number (from small to large), and the ports with bigger port numbers

become Unselected.

3. Configuration requirements for the manual aggregation

In the manual aggregation group, only the ports with the same

configuration as the reference port can become the Selected ports.

The configuration covers the rate, duplex, and up/down status. Users



need to keep the basic configuration of each port same through

manual configuration.

In an aggregation group, when the configuration of a port changes,

the system does not perform aggregation. But the system resets the

Selected/Unselected status of each port and re-selects the root port.

LACP Protocol Aggregation 1. Overview

The LACP aggregation is performed by users manually. When the port

joins the LACP aggregation group, the LACP protocol of the port is

automatically enabled.

2. Port status in the LACP aggregation group

In the LACP aggregation group, the status of the port can be Selected and

Unselected.

The Selected ports and the Unselected ports in the up status can

receive and send LACP packets.

Only the Selected port can receive user service packets; the

Unselected port cannot receive or send user service packets.

The system sets the port status (Selected or Unselected) according to the

following principles:

The local system and the opposite system negotiate. The status of the

ports at two ends is determined by the port ID with higher device ID

priority. The negotiation procedure is as follows:

Compare the device IDs of the two ends (device ID= system priority +

system MAC address). Compare the system priorities. If the system

priorities are the same, compare the system MAC addresses. The end

with smaller device ID is considered to be prior (when the system

priority is low and the system MAC address is small, the device ID is

small)

Compare the port IDs of the end with the prior device ID (port ID =

port priority + port number). For the ports at the end with prior device

ID, compare the port priorities. If the priorities are the same, compare

the port numbers. The port with small port ID serves as the root port

of the aggregation group (the port with lower priority has smaller port

number, and the port ID is small).



When the port is consistent with the operation key of the root port and

is in the Up status, the configurations of the opposite port and the

opposite root port are the same, the port becomes the candidate port

of the Selected ports. Otherwise, the port is in the Unselected status.

The number of the ports in the Selected status of the LACP

aggregation group is limited. When the number of the candidate ports

does not reach the upper limit, all candidate ports are in the Selected

status and other ports are in the Unselected status. When the number

of the candidate ports exceeds the limit, the system selects some

ports to remain the Selected status according to the port ID (from

small to large), and the ports with bigger port IDs become Unselected.

At the same time, the opposite device feels the change of the status.

The corresponding port status changes.

3. Configuration requirements for the LACP aggregation

In the LACP aggregation group, only the ports with the same

configuration as the root port can become the Selected ports. The

configuration covers the rate, duplex, and up/down status. Users need

to keep the basic configuration of each port same through manual

configuration.

In an aggregation group, when the configuration of a port changes,

the system does not perform aggregation. But the system resets the

Selected/Unselected status of each port and re-select the root port.

The following figure illustrates the LACP aggregation. The priority of device

S is higher than the priority of device T. The member ports of aggregation

group 1 are A, B, C, E, D, and F. Port F is in the Down status. The rate of

port E is 10M and the rate of other ports is 100M. One aggregation group

supports only three ports.



LACP aggregation

1. Port A has the highest priority and is set to the Selected status first.

Therefore, port A is the root port of aggregation group 1.

2. The opposite end of port G is connected with the port of aggregation

group 8, which is different from the aggregation group of the port

connected with root port A. Therefore, the status of port G is set to

Unselected.

3. The link is in the down status, and the aggregation status is set to

Unselected.

4. The rate of port E is different from that of root port A, and the

aggregation status is set to Unselected.

5. The rate and duplex of the port D are the same as root port A. But the

link priority is lower than B and C, therefore, the aggregation status is

set to Unselected.

As a result, in the six member ports of aggregation group 1, only ports A,

B, and C are in the Selected status. Perform real aggregation and write

into the TRUNK_BITMAP table. The spanning tree status of ports D, E, F,

and G is set to Blocking/Disabled.

Typical Application

Networking diagram of link aggregation



As shown in the preceding figure, ports 0/0/1-0/0/3 of switch A and switch

B are connected through 10/100/1000M link. The same configurations are

adopted at two ends. The three ports are added to the aggregation group

of each device.

The LACP aggregation mode is adopted. The configuration procedure is as

follows:

Switch A:

link-aggregation 1 mode lacp Create the LACP aggregation group.

Specify the aggregation group ID to 1

port 0/0/1-0/0/3 Enter the port mode

link-aggregation 1 active Add the ports in the active status to the

aggregation group 1

Switch B:

link-aggregation 1 mode lacp Create the LACP aggregation group.

Specify the aggregation group ID to 1

port 0/0/1-0/0/3 Enter the port mode

link-aggregation 1 active Add the ports in the active status to the

aggregation group 1

Through the preceding configuration, an aggregation link is created. For

detailed configuration commands, see chapter Configuring Link

Aggregation.



MSTP

In the L2 switching network, the loopback may cause loop and propagation

of packets and thus broadcast storm is generated. As a result, all valid

bandwidth is occupied and the network is unavailable. The STP protocol

occurs accordingly. The STP is a L2 management protocol. It selectively

blocks redundant links to eliminate L2 loopback of the network. At the

same time, the protocol provides the link backup function.

Like other protocols, the STP protocol is developing rapidly. At the

beginning, the IEEE 802.1D STP is widely used. On this basis, IEEE 802.1w

RSTP and IEEE802.1s MSTP are generated.

This chapter describes the protocols of STP and focuses on the MSTP.

Main contents:

STP

RSTP

MSTP protocol

MSTP protection function

MSTP typical application

STP Overview The basic idea of the STP protocol is very simple. Loopback does not occur

in the natural trees. If a network grows like a tree, no loopback will occur.

In the STP protocol, the Root Bridge, Root Port, Designated Port, and Path

Cost are defined. The purpose is to construct a tree to tailor the redundant

loopback and back up links and optimize paths. The algorithm for

constructing the tree is Spanning Tree Algorithm.



STP exchanges the BPDU information between bridges. First, the root

bridge is selected. The selection is based on the bridge ID composed of

bridge priority and MAC address. The bridge with smallest ID will become

the root bridge of the network. All ports are connected to the downstream

bridge. Therefore, all port roles become designated ports. Then, the

downstream bridge connected with the root bridge will select a most

robust branch to serve as the path of the root bridge. The role of the

corresponding port becomes the root port. Perform the operation to the

edge of the network. After the designated port and the root port are

determined, a tree is generated. After 30 seconds (default value), the

designated port and the root port enter the forwarding status. Other ports

enter the block status. The STP BPDU is transmitted from the designated

port of each bridge periodically to maintain the link status. If the network

topology changes, the spanning tree recalculates and the port status

changes accordingly. This is the basic principle of the spanning tree.

With spread of application and the development of network technology,

the disadvantages of STP are exposed in the applications. The

disadvantage of the STP mainly falls on the convergence speed. When the

topology changes, new configuration message can be transmitted to the

entire network after certain delay, which is called Forward Delay. The

default value of the delay is 15 seconds. After all bridges receive the

change information, if the forwarding ports in the old topology do not find

that t hey should stop forwarding in the new topology, a temporary

loopback may exist. In the STP, a timer policy is used to solve the

temporary loopback, namely, add a learning status between the block

status and the forwarding status. The status only learns the MAC address

and does not forward any packets. The duration of status switching is

Forward Delay. As a result, no loopback occurs when the topology changes.

But, the solution brings double Forward Delay convergence time. The time

cannot be accepted in some real-time services (such as audio and video

services).

RSTP Overview To solve the defect of STP convergence speed, in 2001, the IEEE defines

the RSTP based on IEEE 802.1w. The RSTP protocol improves the STP

protocol in the following three aspects to quicken the convergence (within

one second at maximum):

1. Set Alternate Port and Backup Port for the root port and the

designated port. When the root port fails, the alternate port becomes

the new root port and enters the forwarding status without any delay.

When the designated port fails, the backup port becomes the new

designated port and enters the forwarding status without any delay.

2. In the point-to-point link connecting two switching ports, the

designated port can enter the forwarding status without any delay



through handshaking with the downstream bridge. For the shared link

connecting more than three bridges, the downstream bridge does not

respond to the handshaking requests sent from the upstream

designated port. It waits for double Forward Delay time to enter the

forwarding status.

3. The port connected with terminals but not connected with other

bridges is defined as the Edge Port. The edge port can enter the

forwarding status without any delay.

Compared with the STP protocol, the RSTP protocol is significantly

improved and it is downward compatible with the STP protocol to form a

hybrid network. RSTP and STP belong to the Single Spanning Tree (SST).

It has the following defects:

1. There is only one spanning tree in the entire switching network. When

the network scale is large, the convergence time is long.

2. The RSTP is a single spanning tree protocol, so all VLANs share a

spanning tree. To ensure normal communication in the VLAN, each

VLAN in the network must be distributed along the direction of the

spanning tree path. Otherwise, some VLANs will be isolated for the

internal links are blocked. As a result, communication fails in the VLAN.

3. When a link is blocked, no traffic is carried and thus load cannot be

balanced, which wastes the bandwidth.

The defects cannot be removed by the single spanning tree. The MSTP

supporting VLAN occurs.

MSTP Protocol

Terms Multiple Spanning Tree Regions

It is composed of multiple devices and the network segment between

them. The devices are enabled with MSTP. The devices have the same

region names, revision levels, and same configuration of mapping from

VLAN to spanning tree.

VLAN Mapping Table

It is an attribute of the MST region. It is an instance table for describing

VLAN and spanning tree instance relation. For example, VLAN1 is mapped



to spanning tree instance 1, VLAN2 is mapped to spanning tree instance

2,and the other VLANs are all mapped to CIST.

Internal Spanning Tree

IST is a spanning tree in the MSTP domain. It is instance 0 in the MST

domain. It and CST form the spanning tree CIST of the entire switching

network.

Common Spanning Tree

CST is the single spanning tree connecting all MST domains in the switch

network. If each MST domain is regarded as a device, CST is a spanning

tree generated by the MSTP protocol.

Common and Internal Spanning Tree

CIST is composed of IST and CST. It is a single spanning tree connecting

all devices in the switching network.

Multiple Spanning Tree Instance

Multiple spanning trees can be generated in an MST domain. Each tree is

independent. Each spanning tree is called an MSTI.

Introduction to the Protocol MSTP is a new spanning tree protocol defined in IEEE 802.1s. Compared

with STP and RSTP, it has obvious advantages. The features of the MSTP

are as follows:

1. The domain concept is used in the MSTP. One switching network can

be divided into multiple domains. Multiple spanning trees are

generated in each domain and each spanning tree is independent.

Between domains, the MSTP uses the CIST to ensure that no loopback

exists in the global topology.

2. The Instance concept is used in the MSTP. Multiple VLANs are mapped

to an instance to save communication overhead and resource

utilization. The calculation of each MSTP instance is independent.

(Each instance corresponds to a spanning tree). In these instances,

the load of VLAN data can be shared.

3. MSTP can implement the port status fast transfer similar to the RSTP.



4. MSTP is compatible with STP and RSTP

The MSTP sets the VLAN mapping table to associate VLAN and the

spanning tree. At the same time, it divides a switching network into

multiple domains. Multiple spanning trees are generated in each domain.

Each spanning tree is independent. The MSTP prunes the loopback

network into a loopless tree network to avoid increasing and indefinite

cycling of packets in the loopback network. At the same time, multiple

redundant paths for data forwarding are provided. In the process of data

forwarding, the load of VLAN data is balanced.

For example, in the following network, there are four bridges A, B, C, and

D, including VLAN 10, 20, 30, 40, 50, and 60. Four bridges run the MSTP

protocol. Bridge B, C, and D, are in the same MST domain. Bridge A can

be considered to be in an isolated area. On bridge B, C, and D, map VLAN

10 and VLAN 20 to instance 1, map VLAN 30 and VLAN 40 to instance 2,

map VLAN 50 and VLAN 60 to instance 0.

The connection of CIST is shown in the blue links in the following figure.

Frames of VLAN 50 and 60 are forwarded along the active connection.

Bridge A is the general root of the entire CIST. Bridge B is the region root

of CIST. Port 1 of bridge B is the root port of CIST region root.

Figure 5-1 CIST topology

The connection of instance 1 is shown in the red links in the following

figure. Frames of VLAN 10 and 20 are forwarded along the active

connection. Bridge C is the region root of instance 1; bridge B is the

master port of port 1.



Figure 5-2 Instance 1 topology

The connection of instance 2 is shown in the red links in the following

figure. Frames of VLAN 30 and 40 are forwarded along the active

connection. Bridge D is the region root of instance 2; bridge B is the

master port of port 1.

Figure 5-3 Instance 2 topology

MSTP Protection Function

BPDU Protection For the access layer device, the access port is usually connected with user

terminal or file server. In this case, the access port is set to be the edge

port to implement fast transfer of the ports. When the ports receive BPDU

packets, the system automatically sets the ports to be non-edge ports. It

re-calculates the spanning tree and the network topology changes.



Normally, the ports do not receive any BPDU packets. If anybody attacks

devices by pretending BPDU, the network oscillation may occur.

The MSTP provides the BPDU Guard function to prevent the attack: after

the BPDU protection function is enabled, if a port whose AdminEdge is

TRUE receives the BPDU packets, the port will be shut down. At the same

time, log information is used to prompt users. The disabled ports can be

restored only by the network administrators. The ports can also be

automatically restored through the port management module.

Root Protection The root bridge and the backup root bridge of the spanning tree should be

in the same domain, especially for the CIST root bridge and backup bridge.

In the network design, the CIST root bridge and back root bridge are

usually placed in a high bandwidth core domain. But, owing to the

incorrect configuration and the malicious attack in the network, the legal

root bridge in the network may receive BPDU with higher priority. As a

result, the legal root bridge loses the position of the root bridge and the

network topology changes. The illegal changes may cause that the high-

speed link traffic is led to the low-speed link. As a result, the network is

congested.

For the ports enabled with Root Guard function, the port roles in all

instances can only be the specified port. Once the port receives the BPDU

with higher instance priority, the port will be blocked. If no configuration

information with higher priority is received, the port will be restored to the

original status.

Loop Protection By receiving the BPDU packets sent from the upstream devices, the device

can maintain the status of root ports and other congested ports. Owing to

the link congestion or unidirectional link fault, the ports cannot receive

BPDU packets sent from the upstream devices. The spanning tree

information on the port times out. In this case, the downstream device re-

selects the port role. The downstream device port that cannot receive

BPDU packets will become the designated port and the congested port will

be transferred to the forwarding status. Then, a loopback occurs in the

switching network.

The loop guard function suppresses the generation of the loopback.For the

port configured with the Loop Guard, when the BPDU packets from the

upstream devices cannot be received, the spanning tree information times



out, in the case of recalculating the port roles, set all instances to the

Blocking status, and the port does not participate in the spanning tree

calculation. If the port receives the BPDU packets, it re-participates in the

spanning tree calculation.

MSTP Typical Application Through MSTP, the packets of different VLANs in the same network can be

forwarded according to different spanning tree. As a result, load sharing

and redundant backup can be performed for packets of different VLANs. As

shown in the following figure, Switch A and Switch B are the devices of the

aggregation layer. Switch C and Switch D are the devices of the access

layer. To balance the traffic on each link, configure the devices as follows:

All devices belong to the same MST domain.

VLAN 10 packets are forwarded along instance 1; root bridge of

instance 1 is switch A.


instance 2 is switch B.


instance 3 is switch A.


instance 4 is switch B.

Figure 5-4 MSTP networking

After the MSTP calculation, the forwarding paths of different VLANs are

shown in figure 5-5. As a result, the load of each link is reduced. At the



same time, each VLAN has a redundant backup link. When the working

link fails, the redundant link takes effect immediately, which reduces the

traffic lose caused by link failure.

Figure 5-5 MSTP forwarding path



QinQ Technology

This chapter describes the QinQ technology and application.

Main contents:

New requirements of service development

QinQ supports multiple services

Realizing modes of QinQ

Application scene of QinQ

New Requirements of Service Development With the development of the technology, the user hopes to divide its

internal network VLAN to realize the security and reliability of the internal

network as desired. The network provider has the special requirements for

the VLAN quantity and VLAN ID supported by the user. The VLAN ID

ranges needed by different users may overlap with each other, so the

division of the internal network of the user is limited With the service

development, more and more VLANs are needed to support identifying and

separating services The maximum number of the VLAN IDs of the network

provider is 4K. In the actual application, when there are lots of users,

VLAN IDs are consumed up and cannot meet the requirement.

Therefore, the QinQ technology comes into being. QinQ expands the VLAN

technology and increases the VLAN quantity to 4K×4K via the double

layers of tags.



QinQ Supports Multiple Services What is QinQ?

The QinQ technology is called VLAN dot1q tunnel, 802.1Q tunnel, VLAN

Stacking technology. The standard comes from IEEE 802.1ad and it is the

expansion of the 802.1Q protocol. QinQ adds one layer of 802.1Q tag

(VLAN tag) based on the original 802.1Q packet head. With the double

layers of tags, the VLAN quantity is increased to 802.1Q. QinQ

encapsulates the private network VLAN tag of the user in the public

network VLAN Tag to make the packet with double layers of VLAN Tags

cross the backbone network (public network) of the operator. In the public

network, the packet is broadcasted according to the out layer of VLAN tag

(that is the public network VLAN Tag) and the private network VLAN Tag

of the user is shielded

The formats of the common 802.1Q packet with one layer of VLAN TAG

and the QinQ packet with two layers of VLAN TAGs are as follows:

The formats of common VLAN packet and QinQ packet

Two layers of VLAN tags can support 4K × 4K VLANs, meeting most

requirements.

QinQ features:

1. Provide one simple L2 VPN tunnel for the user;

2. Do not need the supporting of the protocol and signaling; be realized

by the static configuration;

QinQ mainly solves the following problems:

1. Shield the VLAN ID of the user, so as to save the public network VLAN

ID resource of the service provider;



2. The user can plan the private network VLAN ID, avoiding the

confliction with the public network and other user VLAN IDs;

3. Provide the simple L2 VLAN solution;

The process of realizing QinQ:

QinQ diagram

The upstream packet of the CE1 switch carries one layer of VLAN tag. The

packet reaches the QinQ port of the PE1 switch. According to the

configuration of the QinQ port, add one out layer of VLAN TAG to the

packet. The packet with two layers of VLAN tags is forwarded to PE2 via

the public network. On the QinQ port of PE2, the out layer of VLAN TAG is

deleted, and the packet recovers to have one layer of VLAN Tag and is

forwarded to CE2.

Realizing Modes of QinQ QinQ is divided to two kinds, including basic QinQ and selective QinQ.

Basic QinQ: When receiving the packet, the QinQ port adds the VLAN TAG

of the default VLAN of the port to the packet no matter whether the packet

has the VLAN TAG. Before the packet is forwarded out from the QinQ port,

delete the out layer of TAG and then forward it. The disadvantage of the

method is that the encapsulated out layer of VLAN cannot be selected

according to the VLAN TAG of the packet.



Selective QinQ: The selective QinQ solves the disadvantage of the basic

QinQ. When receiving the packet, the QinQ port adds the specified out

layer of VLAN TAG to the packet according to the VLAN TAG of the packet.

If the encapsulated out layer of VLAN TAG is not specified, add the VLAN

TAG of the default VLAN of the port to the packet.

QinQ expansion: Configure the mapping entries on the QinQ port to

replace the VLAN TAG of the packet with the specified VLAN TAG to realize

the conversion of the VLAN TAG. The function is called VLAN Mapping.

TPID (Tag Protocol Identifier): It is one field in VLAN TAG, used to indicate

the protocol type of VLAN TAG. IEEE 802.1Q protocol defines the value of

the field as 0x8100 The default value of the out layer of TPID of QinQ is

0x8100. The TPID of the out VLAN TAG of the device QinQ packet of some

manufacturer is 0x9100 or 0x9200. The user can modify the TPID of the

port at the public network to realize the intercommunication of the devices

of different manufacturers.

Introduction to QinQ Application Scene Configure the selective QinQ entries on the ports of the switch that

supports QinQ and encapsulate the out TAG according to the VLAN TAG.

Different VLAN TAGs can be encapsulated with different out VLAN TAGs.

The enterprises divide different VLANs according to services, so as to

realize the separation and security of the private network. On the

enterprise access port, encapsulate the out VLAN TAG for the enterprise

packet. The out VLAN ID is the VLAN ID provided by the service operator.

With the simple VLAN solution provided by the QinQ function, the

communication between different places of the enterprises and the

separated security between different services are realized.



QinQ service division and flow diagram



L2 Protocol Control Technology

This chapter describes the L2 protocol control technology and its

application.

Main contents:

L2 protocol control theory

Realize L2 protocol control

Typical application

L2 protocol control Theory L2 protocol control controls the L2 protocol packets received on the port

With L2 protocol control, L2 protocol tunnel, L2 protocol discard and L2

protocol peer can be realized

L2 Protocol Tunnel With L2 protocol tunnel, the L2 protocol packets (such as BPDU and

LACPDU) of the customer network can be transmitted transparently in the

operator’s network.

The upper is the operator’s network and the lower is the user network,

which includes the user network A and user network B. Configure the L2

Protocol Tunnel function on the packet input and output devices at the two

sides of the operator’s network so that the BPDU and LACPDU packets of

the user network can be transmitted transparently in the operator’s

network. Besides, the spanning tree calculation and link aggregation

functions of the whole user network can be realized.



L2 protocol tunnel network

L2 Protocol Discard With L2 protocol discard, the port directly discards the received BPDU and

LACPDU packets so that the packets do not take part in the corresponding

protocol processing.

L2 Protocol Peer With L2 protocol peer, the port does not process the received BPDU and

LACPDU packets, but directly forward the packets to the upper protocol

module for processing. The function is the default function.

L2 protocol Control Supports EVC Application L2 protocol pass-to-evc combines with the EVC application. The

configuration of EVC for the control type of the BPDU and LACPDU packets

decides the L2 protocol Control function (discard or tunnel).



Realize L2 protocol control

Realize L2 Protocol Tunnel When the L2 Protocol Tunnel function is enabled, the edge device of the

operator’s network replaces the destination MAC address of the L2

protocol packet at the input direction with one special multicast MAC

address and the packet becomes the tunnel packet. The internal device of

the operator’s network does not process the packet, but just forwards it as

the common packet. When the tunnel packet reaches the edge device at

the output direction, the edge device recovers the original destination MAC

address and the L2 protocol packet is recovered and then forwarded to the

device of the user network, so as to realize the L2 protocol tunnel function.

The default special multicast MAC address is 01-00-0c-cd-cd-d0. Other

common-used special multicast MAC addresses are 01-00-0c-cd-cd-d1,

01-00-0c-cd-cd-d2, and 01-0f-e2-00-00-03. Enable the L2 protocol packet

tunnel function on the two edge ports to realize the tunnel function of the

L2 protocol.

Currently, the bmga protocol, dot1x protocol, gmrp protocol, gvrp protocol,

lacp protocol and stp (mstp) protocol support the L2 protocol tunnel

function.

Realize L2 Protocol Discard When the L2 protocol discard function is configured on the port, the L2

protocol control module discards the separated L2 protocol packets so that

the protocol packets do not take part in the processing of the protocol

module.

Realize L2 Protocol Peer L2 protocol control module does not process the packet, but forwards the

packet to the upper protocol module for processing.

Typical Application PE1 and PE2 are the devices of the operator’ network. Customer A and

Customer B are the devices of the user network.



The networking requirement: To realize the tunnel transmission of the STP

packets between Customer A and Customer B, the L2 protocol tunnel of

the STP packets needs to be set up between PE1 and PE2.

Networking

The user enables the L2 tunnel function of the STP protocol packets on the

edge ports Port0/0/2 of PE1 and Port0/0/2 of PE2. The network between

PE1 and PE2 can pass the tunnel packets.



L2 Multicast

This chapter describes the public part of L2 multicast, L2 static multicast,

IGMP Snooping, IGMP Proxy, MVR, MVP and the applications.

Main contents:

Public part of L2 multicast

L2 static multicast and its application

IGMP Snooping and its application

IGMP Proxy and its application

MVR and its application

MVP and its application (the function is just applicable to

MyPower3400 and S4100 series switch)

Public Part of L2 Multicast This section describes the principles of the L2 multicast public part.

Main contents:

Terms

Introduction

Terms 1. L2 multicast comprehensive table: the table integrates the L2

multicast information obtained in static configuration and dynamic

learning. In each entry, the VLAN, multicast MAC address, and output

port list obtained through static configuration and dynamic learning

are contained.



2. L2 multicast forwarding table: similar to the L2 multicast

comprehensive table. The output port list in each table entry is formed

after the corresponding L2 multicast comprehensive table port is

filtered by VLAN and the aggregation group is converted into member

port. The table entry is used to determine the forwarding port list of

L2 multicast.

Introduction The public part of the L2 multicast is the middle layer connecting bottom

layer chips and the L2 multicast applications. It integrates the L2 multicast

applications (for example, configured in L2 static multicast and learned

from IGMP Snooping dynamic L2 multicast application) to form the L2

multicast forwarding table, and delivers the entries to the bottom layer

chips. Consequently, the hardware forwarding table is formed.

Entry Maintenance L2 multicast public part, integrates the L2 multicast information in static

configuration and dynamic learning. Then, the L2 multicast comprehensive

table is formed. In the process of forming multicast comprehensive table,

static configuration is preferred in the integrated processing. For example,

if the L2 multicast group (in the [VLAN, MAC] mode) is not allowed to be

forwarded in the static configuration, when the dynamic L2 multicast

learned the members of the L2 multicast group in the port, the port cannot

become an output port, and it cannot duplicate and forward the L2

multicast packets.

Based on the L2 multicast comprehensive table, L2 multicast public parts

add or delete output port list to form the forwarding port list through VLAN

filtering and converting the aggregation group output ports into

aggregation group member port list. As a result, the L2 multicast

forwarding table is created. The table is used for L2 multicast forwarding.

At last, the forwarding table is written into the hardware forwarding table.

L2 Mult icast Forwarding When the device receives a L2 multicast packet from a port, it first

searches the hardware forwarding table. If the hardware forwarding table

entry is not found, the packet is flooded in the reached destination (except

the port that the packet reached). If the hardware forwarding table entry

is not found, and the configuration is to discard the unknown multicast,

the packet will be discarded. If the corresponding hardware table entry is

found, the multicast packet is duplicated and forwarded in all output ports

(except the port that the packet reached) specified by the hardware table

entry. The basis for searching L2 multicast forwarding is the unit doublet



determined by VLAN and multicast MAC. The forwarding port list is the

collection of ports whose L2 multicast packets should be duplicated and

forwarded.

L2 Static Multicast and Its Application This section describes the principles and application of L2 static multicast.

Main contents:

Terms

Introduction

Typical Application

Terms L2 static multicast table: a table of L2 static multicast maintenance, each

table entry is the L2 static multicast information generated in static

configuration. The information covers VLAN, multicast MAC, member port

list, and forbidden port list.

Introduction The L2 static multicast can generate L2 multicast information through the

static configuration. The VLAN, multicast MAC, member port list, and

forbidden port list should be specified. The L2 static multicast table entry

generates the related entries through L2 multicast public part. At last, the

entries are delivered to the hardware forwarding table.

Member Port L is t If a port belongs to the member port list of the L2 static multicast table

entry, after the corresponding L2 multicast packets are received, they will

be duplicated and forwarded in the port.



Forbidden Port L ist If a port belongs to the forbidden port list of the L2 static multicast table

entry, after the corresponding L2 multicast packets are received, they will

not be duplicated and forwarded in the port. According to the preferred

static configuration policy of the L2 multicast public part, if the dynamic L2

multicast learns the received member in the port, the L2 multicast packets

will not be duplicated and forwarded.

Typical Application

Figure 8-1 Application of L2 static multicast

As shown in the preceding figure, the video server and the switch are

connected. The video server sends multicast video programs. The

receivers PC1, PC2, and PC3 are connected with the switch. The ports

connected with the video server and receiver PC belong to the same VLAN.

Create L2 static multicast table entry according to the VLAN and multicast

MAC. Then, set the port connected with PC1 to be member port. Set the

port connected with PC2 to be forbidden port. Do not configure the port

connected with PC3. PC1 can receive the video programs. PC2 and PC3

cannot receive the video programs.

IGMP Snooping and Its Application This section describes the principles of IGMP Snooping.



Main contents:

Terms

Introduction

Terms 1. IGMP: Internet Group Management Protocol, used to maintain the

multicast member qualification protocol advertised to the router or

switch by the host.

2. IGMP Snooping: Internet Group Management Protocol Snooping.

3. Dynamic router port: refers to the port receiving IGMP query

packets or L3 multicast protocol packets (such as PIM hello) in the

switch.

4. Dynamic member port: refers to the port receiving IGMP member

relation report in the switch.

5. IP multicast L2 forwarding: uses VLAN, multicast source IP

address (for (*, G), the address is 0.0.0.0, this is related with

switch chip), and multicast destination address to forward L2

multicast service.

Introduction The IGMP protocol creates and maintains the multicast member

qualification between the host and the router. The IGMP protocol is

running between the host and the connected multicast routers. At one side,

the host notifies the multicast router through the IGMP protocol that it

wants to join in and receive the information of specific multicast group (or

specific multicast source); at other side, the router queries through the

IGMP protocol whether any members are in the active status in the local

network segment, namely, check whether any multicast group member

exists in the network segment and then collects the member information I

the local network segment. The multicast router only cares whether any

multicast group member exists in the local network segment; it does not

care the number of the members in the network segment. If there is one

group member, the router will forward the service data of the specified

multicast group (or specified multicast source) to the network segment.

IGMP has three versions: IGMPV1, V2, and V3. The most common version

is IGMP V2. IGMP V1 is defined in RFC1112. It describes the process of

universal query and qualification report. IGMPv2 is defined in RFC 2236.



On the basis of IGMP V1, it adds the group member quick leave

mechanism and querier selection function. IGMPv3 is defined in RFC 3376.

On the basis of IGMP V2, the source filtering function is added. It can

specify to receive specific multicast group service of certain multicast

source host; it can also exclude specified multicast group service.

Internet Group Management Protocol Snooping (IGMP Snooping), is used

in the switch that does not support IGMP protocol to narrow the

transmission scope of multicast packets to prevent transmitting multicast

packets to the network segment that does not need the packets. It snoops

and analyzes the IGMP packets. It forms and maintains the mapping

relation between multicast MAC or IP address and multicast receiving port

and VLAN. Based on the mapping relation, it forwards multicast traffic.

As shown in the following figure, when the IGMP Snooping is not running

in the L2 device, the multicast data is flooded in the VLAN. The multicast

traffic is forwarded to all ports in the VLAN. When the IGMP Snooping is

running in the in the devices, the known multicast data will not be flooded

in the VLAN, but is forwarded to specified multicast member port.

Figure8-2 Before and after the IGMP Snooping is used

Snoop IGMP Packets to Create Mul t icast Information IGMP Snooping obtains the multicast information to create related entries

through snooping IGMP packets. The port receiving the query packets is

the router port. The port receiving the member relation packets is the

multicast member port. The switch records all member ports of the device



in the multicast forwarding table of IGMP Snooping. The multicast

forwarding table contains multicast group MAC or IP address, VLAN, and

port list.

Forward the Snooped IGMP Packets IGMP Snooping traps the IGMP packets to the CPU, and then forwards the

packets as required. The received query packets should be forwarded to

other ports in the VLAN. For the query of specified group without

suppression tag, adjust the group timer to the LMQT. The received

member relation packets, should be forwarded to the router port (if the

report suppression is applied, for V1 and V2, not all report packets will be

forwarded to the router port).

Dynamic Port Aging Timer The aging timer is for dynamic port only. After the aging timer times out,

the port will be deleted from the related table entry.

After the IGMP ordinary query packets are received, the switch forwards

the packets through the ports except the receiving port in the VLAN. The

switch processes the receiving port as follows:

1. If the router port list contains the dynamic router port, reset the

aging timer.

2. If the router port list does not contain the dynamic router port,

add the port to the router port list, and then start the aging timer.

After the IGMP member relation report packets are received, the switch

forwards the packets through all router ports in the VLAN. It parses the

multicast group address that the host will be added from the packets and

processes the receiving port as follows:

1. If the forwarding table entry corresponding to the multicast group

does not exist, create the forwarding table. Add the port as

dynamic member port to the output port list and then start the

aging timer.


exists, but the port list does not contain the port, add the port as

dynamic member port to the output port list and then start the

aging timer.


exists, and the port list contains the port, restart the aging timer.



IP Mul t icast L2 Forwarding The L2 multicast forwarding is performed through the VLAN and MAC

address. Refer to the section of L2 multicast forwarding in the L2 multicast

public part.

For the table entry formed by IGMP SNOOPING, the IP multicast L2

forwarding can be performed. The forwarding table entry of (VLAN, *, G)

and (VLAN, S, G) is formed. In the table entry, the member port list is

specified. The VLAN in the table entry is the VLAN of the multicast packet;

* indicates matching all multicast source IP address; S indicates the

multicast source IP address; G indicates the multicast destination address.

When the IP multicast L2 forwarding is used, and a multicast packet

reaches a port, check whether the (VLAN, S, G) entry exists. If the entry

exists, duplicate and forward the multicast packets according to the

member ports specified by the entry and then end the forwarding.

Otherwise, check whether (VLAN, *, G) entry exists. If the entry exists,

duplicate and forward the multicast packets according to the member

ports specified by the entry and then end the forwarding. If the two

entries do not exist, duplicate and forward the multicast packets in all

member ports of the VLAN. If the table entry is not found, and the

configuration is to discard the unknown multicast, the packet will be

discarded.

In the preceding forwarding process, you cannot forward the packets from

the multicast packet to the port.

IGMP Proxy and Its Application This section describes the principles of IGMP Proxy.

Main contents:

Terms

Introduction



Terms IGMP PROXY: The switch is logically divided into two parts. One part acts

as the IGMP group member, responsible for sending IGMP member report

to the router. The other part acts as the multicast router, sends IGMP

query to the downstream port list and collects member information to

form the member database. Different from IGMP SNOOPING, IGMP Proxy

integrates the port member information to form its own IGMP member

report.

Introduction

Figure 8-3 Working principle of IGMP proxy

The preceding figure is the working principle of IGMP proxy. The L2 switch

running IGMP proxy is logically divided into two parts: IGMP group

member and multicast router. The multicast router disguises the switch to

be a multicast router to send IGMP query information and collect IGMP

member information. The multicast router integrates some group member

information and forms the IGMP proxy member database. The IGMP group

member reports the IGMP member information to the real multicast router

according to the IGMP proxy member database information. Different from

IGMP SNOOPING, the IGMP member report of the downstream receive

host and the leave message are terminated in the switch running IPMP

proxy. The query messages sent by multicast router are also terminated in

the switch running IGMP proxy. IGMP proxy automatically sends IGMP



protocol query, member report, and leave message. But IGMP SNOOPING

only forwards the messages.

Typical Application

Figure 8-4 Typical application of IGMP proxy

As shown in the preceding figure, when no IGMP proxy is running in the

switch, the switch will forward the IGMP reports of terminal A, B, and C to

the router. In addition, it will forward the query packets of the router to

the downstream terminals. After the switch runs the IGMP proxy, the

upstream query will not be forwarded to the downstream terminals. The

IGMP report of terminal A, B, and C will not be forwarded to the router.

The switch sends query packets to the downstream. It integrates the

reports of terminal A, B, and C to form member database. It forms report

according to the group record of the member database and sends to the

router. The effects of receiving proxy report and receiving downstream

terminal A, B, and C are the same. But it reduces the quantity of IGMP

report packets received by the router and relieve the pressure of the

router.

MVR and Its Application This section describes the principles and applications of MVR.

Main contents:



Terms

Introduction

Terms MVR: Multicast VLAN Registration.

Introduction In the traditional multicasting VOD mode, when users of different VLANs

select programs in the multicasting VOD mode, the multicasting data is

duplicated in each VLAN. This mode wastes large amount of bandwidth

and increases the load of layer 3 equipment. To solve the problem, you

can configure the multicasting VLAN function in the switch, that is, add the

user interfaces belonging to different VLANs to the multicasting VLAN, and

enable the IGMP snooping function. Through the VLAN conversion, the

IGMP joining and leaving packets received by the multicasting VLAN

interface carry the tag of multicasting VLAN. The forwarding table of

multicasting VLAN is generated in the switch. As a result, the multicasting

data only need to send one copy in the multicasting VLAN, and users of

different VLANs can receive the multicasting data. This mode of joining the

user interfaces that should receive multicasting data to the multicasting

VLAN and become the member of multicasting VLAN is called Multicast

VLAN Registration (MVR).

Figure 8-5 Before and after enabling MVR



Forwarding Table of the Mul t icast VLAN The multicast forwarding table formed through IGMP Snooping includes

multicast group MAC or IP address, VLAN, and port list. After the MVR

function is enabled, the switch analyzes the member relation report

packets received from the member port of multicast VLAN. If the VLAN tag

of the packets is not multicast VLAN, convert the VLAN into multicast VLAN.

Then, process the forwarding table forming the multicast VLAN.

Consequently, the multicast traffic only needs to be duplicated once in the

multicast VLAN.

Typical Application The MVR improves the multicast application. It can save the bandwidth

and reduces the burden of L3 devices. In all multicast application

environments, the MVR can be used. The following figure describes the live

web broadcasting.

Figure 8-6 MVR application in live web broadcasting

Through the multicast technology, router A transmits the video traffic to

the video terminals connected with switch A and switch B. According to the

preceding figure, if the six ports connecting video terminals in the two

switches belong to different VLANs, if the MVR is not enabled, router A

needs to send three copies of video traffic to the connected switch. If the

MVR is enabled, it only needs to send one copy of video traffic to each

switch. As a result, the network traffic is reduced and the bandwidth is

saved. At the same time, the burden of the router A is relieved. In the

application environment occupying much bandwidth such as live web

broadcasting, the function of MVR is significant.



MVP and Its Application This section describes the principle and application of MVP.

Main contents:

Terms

Introduction

Terms MVP: Multicast VLAN Plus.

Introduction In the traditional multicast distribution mode, when the users belong to

different VLANs, the upstream will duplicate the multicast data for each

VLAN. It occupies large amount of bandwidth and adds extra burden to L3

device. To solve the problem, you can configure the MVP function in the

switch. The home VLAN of the receiver joins the multicast VLAN as sub-

VLAN. As a result, the receiver of main VLAN and the sub-VLAN in the

multicast VLAN can receive the multicast data flow. Compared with

traditional multicast forwarding mode, the upstream only needs to send

one copy of data to the multicast VLAN. Consequently, the bandwidth is

saved and the upstream pressure is relieved. Compared with MVR, it does

not require than all receivers should join in the multicast VLAN. The cross-

VLAN multicast duplication can be implemented. Users of different VLANs

are isolated, which ensures the security.



Figure 8-7 Before and after enabling MVP

Forwarding Table of the Mul t icast VLAN The switch forms records in the multicast VLAN and each sub-VLAN

through the IGMP Snooping. The MVP forms the multicast forwarding table

of MVP according to the group records. The MVP multicast forwarding table

contains multicast primary VLAN, source IP address, group IP address, and

the forwarding port list of primary VLAN. At the same time, it contains

sub-VLAN and forwarding port list. After the multicast data enters the

primary VLAN of the multicast, search the entry according to multicast

primary VLAN, source IP, and group IP. If it is matched, forward the data

according to the forwarding port and sub-VLAN forwarding port of the

multicast primary VLAN. If it is not matched, discard or flood in the

multicast primary VLAN according to the configured policy. L2 multicast

can only be forwarded in the local VLAN. After the MVP function is enabled,

the switch can forward the multicast traffic received from the multicast

VLAN according to the MVP forwarding table to the multicast VLAN and

multicast sub-VLAN forwarding table. As a result, the multicast traffic can

be duplicated to receivers of different VLANs.

Typical Application The MVP can save bandwidth and reduce the burden of L3 devices. The

following figure describes the live web broadcasting:



Figure 8-8 MVP application in live web broadcasting

Through the multicast technology, router A transmits the video traffic to

the video terminals connected with switch A and switch B. According to the

preceding figure, if the six ports connecting video terminals in the two

switches belong to different VLANs, if the MVP is not enabled, router A

needs to send three copies of video traffic to the connected switch. If the

MVP is enabled, it only needs to send one copy of video traffic to each

switch. As a result, the network traffic is reduced and the bandwidth is

saved. At the same time, the burden of the router A is relieved. Compared

with MVR, the terminals are different VLANs. The isolation of VLAN ensures

the security. In the application environment occupying much bandwidth

such as live web broadcasting, the function of MVP is significant.



Security Technology

This chapter describes the related technologies of L2 security and its

application.

Main contents:

802.1X technology

DHCP Snooping technology

IP Source Guard technology

Dynamic ARP detection technology

Port security

Port monitoring

Port isolation

802.1X Protocol and Application This section describes the 802.1X theory and realization, and its

application.

Main contents:

Related terms

Introduction

Typical application



Related Terms Supplicant system: It is the client, an entity located at the LAN. It is

authenticated by the device at the other side of the link. The client is

one user terminal device. The user initiates the 802.1X authentication

by enabling the client software.

Authenticator system: It is the device side, another entity located at

the LAN. It authenticated the connected client. The device side is the

network device that supports the 802.1X protocol. It provides the LAN

ports for the client.

Authentication server system: It is the authentication server, the

entity that provides the authentication service for the device side. The

authentication server is used to perform the authentication,

authorization, and accounting for the user. It is usually RADIUS

(Remote Authentication Dial-In User Service) server. The server can

save the user information, including user name, password, VLAN and

so on.

PAE (Port Access Entity): It is the entity that executes the calculation

and protocol operations in 802.1X.

Non-controlled port/controlled port: The device side provides the ports

of accessing the LAN for the client. The ports are divided to two logical

ports, including non-controlled port and controlled port. The non-

controlled port is always in the bi-directional connected state. It is

mainly used to transmit the EAPOL protocol frames, ensuring that the

client can send and receive the packets. The controlled port in the

authorized state is in the bidirectional connected state. It is used to

transmit the service packets and prohibits receiving any packet from

the client in the un-authorized state.

Introduction

802.1X Authent icat ion System Structure



Figure 9-1 802.1X architecture

The 802.1X system is the typical client/server structure, including three

entities, that is, Supplicant system, Authentication system, and

Authentication server system. The 802.1X authentication system uses the

EAP protocol to realize the exchange of the authentication information

among Supplicant system, Authenticator system, and Authentication

server system. The EAP protocol packet between Supplicant PAE and

Authenticator PAE uses the EAPOL encapsulation format. Authenticator

system uses the non-controlled port to receive and send the EAPOL frames.

Authenticator PAE and Authentication server carry the EAP protocol

packets on other high-layer protocol (usually, it is the Radius protocol) for

communication, so as to exchange the authentication information.

Authenticator PAE changes the authorization status of the controlled port

according to the authentication result returned by Authentication server,

so as to permit or deny Supplicant system to access the network resources.

EAPOL Message Encapsulat ion 1. EAPOL Message Format

Figure 9-2 EAPOL message format

PAE Ethernet Type: the protocol type, 0x888E.

Protocol Version: the protocol version supported by the sender of

the EAPOL frame.

Type: the EAPOL frame type, including EAP-Packet (0x00)

authentication frame, EAPOL-Start (0x01) authentication initiation

frame, EAPOL-Logoff (0x02) exit request frame.

Length: It is the data length, that is, the length of the Packet Body If

it is 0, it means that there is no data.

Packet Body: the data contents, varying with the type.

2. EAP Message Format



When the Type of the EAPOL message is EAP-Packet, Packet Body is the

EAP packet structure, as follows:

Figure 9-3 EAP encapsulation format

Code: the EAP type, including Request, Response, Success, and Failure.

Success and Failure do not have Data field. The value of Length is 4.

The Data field format of Request and Response is as follows. Type is the

EAP authentication type and the contents of Type data depend on Type.

Figure 9-4 The Data field format of Request and Response

Identifier: perform the Request and Response message matching;

Length: The length of the EAP packet, including Code, Identifier,

Length and Data fields.

Data: the contents of the EAP packet, depending on the Code type.

3. Encapsulation of EAP Attribute

To support the EAP authentication, RADIUS adds two attributes, that is,

EAP-Message and Message-Authenticator.

EAP-Message



Figure 9-5 EAP-Message encapsulation

As shown in figure 9-5, the attribute is used to encapsulate the EAP packet.

The type code is 79 and the String field is 253 bytes at most. If the length

of the EAP packet is larger than 253 bytes, you can fragmentize the packet

and encapsulate in multiple EAP-Message attributes.

Message-Authenticator

Figure 9-6 EAP-Authenticator attribute

As shown in figure 9-6, the attribute is used to prevent the access request

packet from being monitored when using the EAP and CHAP authentication.

The packet with the EAP-Message attribute must contain Message-

Authenticator at the same time. Otherwise, the packet is regarded as

invalid and discarded.

802.1X Authent icat ion The authentication can be initiated by Supplicant system or Authenticator

system. On one hand, Authenticator system actively sends the EAP-

Request/Identity packet to Supplicant system to initiate the authentication;

on the other hand, Supplicant system can send the EAPOL-Start packet to

Authenticator system via the software to initiate the authentication. The

following takes the Supplicant system to actively initiate the authentication

as an example. The EAP protocol supports the multiple authentication

methods. The following takes EAP-MD5 as an example to describe the

basic service flow.



Figure 9-7 Service flow of 802.1X authentication system

The authentication process is as follows:

1. When the user has the requirement of accessing the network, enable

the 802.1x client program, input the applied and registered user name

and password, and initiate the connection request (EAPOL-Start

packet). Here, the client program sends the authentication request

packet to the device side and starts one authentication.

2. After receiving the authentication request data frame, the device side

sends one request frame (EAP-Request/Identity packet) to ask the

client program of the user to send the input user name.

3. The client program answers the request of the device side and sends

the user name information to the device side via the data frame (EAP-

Response/Identity packet). The device side encapsulates the data



frames sent by the client in the RADIUS Access-Request packet and

then sends it to the authentication server for processing.

4. After receiving the user name information forwarded by the device

side, the RADIUS server compares it with the user name table in the

database, finds the corresponding password information of the user

name, and uses one random-generated encryption word to encrypt it,

and then sends the encryption word to the device side via the RADIUS

Access-Challenge packet . The device side forwards it to the client

program.

5. After receiving the encryption word (EAP-Request/MD5 Challenge

packet) sent by the device side, the client program uses the

encryption word to encrypt the password (the encryption algorithm is

irreversible; generate the EAP-Response/MD5 Challenge packet) and

sends it to the authentication server via the device side.

6. The RADIUS server compares the received encrypted password

information (RADIUS Access-Request packet) with the local password

information after the encryption algorithm. If they are the same,

regard the user as the legal user and feed back the message of

passing the authentication (RADIUS Access-Accept packet and EAP-

Success packet).

7. After receiving the message of passing the authentication, the device

changes the port to the authorized state and permits the user to

access the network via the port.

8. The client also can send the EAPOL-Logoff packet to the device side to

ask for logout actively. The device side changes the port status from

the authorized state to the un-authorized state and sends the EAP-

Failure packet to the client.

Technologies Cooperat ing with 802.1X Auto Vlan:

Auto Vlan in the port-based access control mode is valid only on the

ACCESS port. Auto Vlan in the MAC-based access control mode is valid

only on the HYBRID port. In other access control modes, Auto Vlan is

invalid.

Auto Vlan is also called Assigned Vlan. When the 802.1x user passes the

authentication on the server, the server delivers the authorized VLAN

information to the device side. If the delivered VLAN is illegal (VLAN ID is

wrong or the VLAN does not exist), the authentication fails. Otherwise, the

authentication port is added to the delivered VLAN. After the user logs out,

the port recovers to the unauthorized state and is deleted from the Auto

Vlan. The default VLAN of the port recovers to the previous configured

VLAN.



The authorized delivered Auto Vlan does not change or affect the port

configuration, but the priority of the authorized delivered Auto Vlan is

higher than that of the Vlan configured by the user (that is Config Vlan),

that is to say, the effective Vlan after passing the authentication is the

authorized delivered Auto Vlan and the Config Vlan takes effect after the

user logs out.

The three associated Radius attributes:

– [64] Tunnel-Type = Vlan

– [65] Tunnel-Medium-Type = 802

– [81] Tunnel-Private-Group-ID = Vlan name or Vlan Id

Guest Vlan:

Guest Vlan in the port-based access control mode takes effect only on the

ACCESS port. Guest Vlan in the MAC-based access control mode takes

effect only on the HYBRID port. It does not take effect in other access

control mode.

The Guest Vlan function is used to permit the un-authenticated users to

access some specified resources. The authenticated port of the user

belongs to one default VLAN (that is Guest Vlan) before passing the

802.1X authentication. To access the resources in the Guest Vlan, the user

does not need the authentication, but cannot access other network

resources. After passing the authentication, the port leaves Guest Vlan

and the user can access other network resources.

The user in Guest Vlan can get the 802.1X client software, upgrade the

client, or execute other application upgrade programs (such as anti virus

software and operation system patch program).

After enabling the 802.1X and configuring Guest Vlan, the port is added to

the Guest Vlan in untagged mode. Here, the users of the ports in the

Guest Vlan initiate authentication. If the authentication fails, the port is

still in Guest Vlan; if the authentication succeeds, there are two cases as

follows:

1. If the authentication server delivers one Vlan, the port leaves Guest

Vlan and is added to the delivered Vlan. After the user logs out, the

port returns to Guest Vlan.



2. If the authentication server does not deliver Vlan, the port leaves

Guest Vlan and is added to Config Vlan. After the user logs out, the

port returns to Guest Vlan.

802.1X Expansion User-based authentication:

The standard 802.1X protocol is based on the port to realize, that is, as

long as one user of the port passes the authentication, the other users can

use the network resources without authentication, but after the user logs

out, the other users also are denied to use the network. Maipu switch

supports the user-based authentication (based on MAC address). When

the port is configured as the user-based authentication, each user of the

port needs the separate authentication. Only the users that pass the

authentication can use the network resources. After one user logs out,

only the user cannot use the network, but the other authenticated users

still can use the network.

EAP termination mode:

The standard 802.1X protocol defines that the client and the server

interact with each other via the EAP packet. During the interaction, the

device serves as the role of ―EAP relay‖. The device encapsulates the EAP

data sent from the authentication server in the EAPOL packet and then

sends it to the client. The interaction mode is called EAP relay. The EAP

relay requires that the authentication server supports the EAP protocol.

Otherwise, the authentication server cannot interact with the client by

using EAP. Considering the actual application environment, maybe the

previous deployed authentication sever does not support the EAP protocol,

so Maipu switch expands it and supports the EAP termination mode. The

EAP data of the client is not directly sent to the authentication server, but

the device completes the EAP interaction with the client. The device gets

the authentication information of the user from the EAP data and then

sends it to the authentication server for authentication. If adopting the

EAP termination mode, only MD5-based EAP authentication is supported.

When adopting the EAP termination mode, the service interaction flow is

as follows:



Figure 9-8 The service flow of the EAP termination mode of the 802.1X

authentication system

Compare Figure 9-8 with Figure 9-7, and we can see that when EAP

termination mode is adopted, the EAP protocol packer is not sent to the

authentication server, but terminates at the device side. The device gets

the enough information from the EAP protocol packet and then sends it to

the authentication server for authentication.

EAP over UDP mode:

In the standard 802.1X function, the client and the authentication device

exchange information via the EAPOL (EAP over LAN) packets. In the actual

application environment, because of the network complexity, maybe the

user to be authenticated and the authentication device need to traverse

the intermediate switch. Once the intermediate switches do not transmit



the EAPOL packets transparently, the user authentication cannot be

performed normally. Therefore, in the environment, you can use the

EAPOU mode to make the authentication packet (EAP packet) to traverse

the intermediate switch. In fact, the EAPOU function means to encapsulate

the original EAP packet in the UDP packet to be forwarded. Compared with

the EAPOL mode, the packet header changes from the original Ethernet

header to Ethernet header + IP header + UDP header, but the EAP

contents are the same. The EAPOU packet is not limited by the

intermediate switch, so the EAPOU mode can realize the 802.1X

authentication across the switch.

Non-client user authentication:

In the actual network, besides lots of PC terminal users, there are some

network terminals (such as network printer), which do not carry or cannot

be installed with 802.1X client program. Therefore, this kind of user

authentication is called non-client user authentication, that is, the so-

called MAC address authentication. The authentication method does not

need the user to install any client software. After the device detects the

user MAC address at the first time, enable the authentication for the user

at once. The authentication process does not need the user to input the

user name and password. After passing the authentication, the user can

access the network. The authentication is suitable for the terminal without

client software to authenticate and the PC terminal user that does not

want to install the client software or does not want to input the user name

or password to authenticate.

When performing the MAC address authentication, you can select the user

name type of the MAC address authentication. Usually, there are the

following two modes:

MAC address user name: Use the MAC address information of the user as

the user name and password for authentication.

Fixed user name: No matter what is the user MAC address, all users use

the local user name and password pre-configured on the device to

authenticate.

Dynamically deliver ACL:

In the 802.1X authentication environment that uses the radius server, you

can configure the corresponding ACL name on radius. When the user

authentication is passed, the server delivers the ACL name to the

authentication device, which binds the user with the ACL so that the

subsequent actions of the user are controlled by ACL. The ACL needs to be

pre-configured on the device. Passing the user authentication is just a

process of searching and binding. If the searching or binding fails, the user

cannot be online.



Typical Application

802.1x Cl ient Authent icat ion The Supplicant is connected to the network via 802.1X authentication. The

authentication server is the Radius server. The port 0/0/1 connected to

the Supplicant is in Vlan 1; the authentication server is in Vlan2; Update

Server is the server used to download and upgrade the client software and

is in Vlan 10; the port 0/0/2 of the switch connected to Internet is in Vlan

5.

Radius ServerUpdate Server

Supplicant

Switch

Vlan 10

Port 0/4Vlan 2

Port 0/3

Vlan 1

Port 0/1

Vlan 5

Port 0/2

Internet

Figure 9-9

Enable the 802.1X authentication function on Port 0/1; set the

authentication mode as the port-based authentication; set Vlan 10 as the

Guest Vlan of the port.

Port 0/1 is added to Guest Vlan. Here, Supplicant and Update Server are in

Vlan 10; Supplicant can access Update Server and download the 802.1X

client.




Supplicant

Switch

Vlan 10

Port 0/4Vlan 2

Port 0/3

Vlan 1

Port 0/1

Vlan 5

Port 0/2

Internet

Vlan 10

Figure 9-10

When the user goes online after passing the authentication, the

authentication server delivers Vlan5. Here, Supplicant and Port 0/2 are in

Vlan 5; Supplicant can access Internet.


Supplicant

Switch

Vlan 10

Port 0/4Vlan 2

Port 0/3

Vlan 1

Port 0/1

Vlan 5

Port 0/2

Internet

Vlan 5



Figure 9-11

Non-cl ient MAC Address Authent icat ion As shown in the following figure, one user (Client) is connected to Port 0/1

of the device. The device manager hopes to perform the MAC address

authentication for the user access on the port, so as to control the access

for Internet. After the device detects the MAC address of Client

0001.7a11.2233, enable the corresponding authentication. If the

authentication is passed, Client can access Internet. Otherwise, Client

cannot access Internet.

Figure 9-12

DHCP Snooping and Its Application This section describes the DHCP Snooping theory and how to realize it, as

well as its application.

Main contents:

Related terms

Introduction

Typical application



Related Terms Trust Port: DHCP Snooping divides the ports to trust port and un-trust

port and performs some limitation processing for the DHCP packet on the

un-trust port, so as to realize the security policy.

Option 82: Option82 is one DHCP option. The option is used to record the

location information of the DHCP client. The administrator can locate the

DHCP client according to the option, so as to perform some security

control.

Dynamic binding table: Snoop the interaction of the DHCP packets to

get one binding table that contains the binding relation of the IP address

and MAC address and the related information.

Introduction DHCP Snooping is one security feature of DHCP. It can ensure that the

client gets the IP address from the legal server, preventing the proof

attack. It also can record the corresponding relation between the IP

address and the MAC address of the DHCP client for the administrator to

view and for other security modules to use.

Record Corresponding Relat ion of IP Add ress and MAC Address Considering the security, the network administrator may need to record

the IP addresses used by the users for Internet and ensure the

corresponding relation of the IP address got by the user from the DHCP

server and the MAC address of the user supplicant.

DHCP Snooping records the MAC address of the DHCP customer and the

got IP address by snooping the DHCP-REQUEST and DHCP-ACK broadcast

packets received by the trust ports. The administrator can use the show

dhcp-snooping command to view the information about the IP address

got by the DHCP client.

Ensure that Cl ient Gets IP Address from Legal Server If there is private deployed DHCP server in the network, the user may get

the wrong IP address. To make the user get IP address from the legal

DHCP server, DHCP Snooping permits the port to be set as the trust port

and un-trust port.



The trust port is the port directly or indirectly connected to the legal DHCP

server. The trust port forwards the received DHCP packets normally, so as

to ensure that the DHCP client gets the correct IP address.

The un-trust port is the port not connected to the legal DHCP server. If the

DHCP-ACK and DHCP-OFFER packets returned by the DHCP server are

received from the un-trust port, discard them, so as to prevent the DHCP

client from getting the wrong IP address.

Support Opt ion 82 Option82 is one DHCP option. The option is used to record the location

information of the DHCP client. The administrator can locate the DHCP

client according to the option, so as to perform some security control, such

as restrict the number of the IP addresses distributed to one port or VLAN.

Option 82 can contain 255 sub options at most. SM4100 series switch only

supports two sub options, that is, sub-option 1 (Circuit ID) and sub-option

2 (Remote ID).

SM4100 series switch supports two kinds of filling formats, that is, default

format and user-configured format.

The contents of the two sub options of the default format are as follows:



Figure 9-2-1 option82 default format

The contents of the two sub options of the user-configured format are as

follows:

Figure 9-2-2 Sub option 1 of option82 user-configured format

Figure 9-2-3 Sub option 2 of option82 user-configured format

The supporting of DHCP Snooping for Option 82:

1. After receiving the DHCP request packets, the device performs the

following processing on the packets according to whether the packet

contains Option 82 and the processing policy configured by the user,

as well as the filling format, and then forwards the processed packets

to the DHCP server.



Received DHCP

Request Packet

Processing Policy Filling Format The Processing of

DHCP Snooping for

Packets

The received packet

carries Option 82.

Drop Discard the packet

Keep Keep the Option 82

in the packet and

forward it

Replace Default Adopt the default

format to fill in

Option 82; replace

the original Option 82

in the packet and

forward it

User-configured Adopt the user-

configured format to

fill in Option 82;

replace the original

Option 82 in the

packet and forward it

The received packet

does not carry Option

82.

Default Adopt the default

format to fill in

Option 82 and

forward it

User-configured Adopt the user-

configured format to

fill in Option 82 and

forward it

Figure 9-2-4 DHCP Process Snooping packets

2. If the packet contains Option 82 when the device receives the

response packet of the DHCP server, delete Option 82 and forward it

to the DHCP client; if the packet does not contain Option 82, directly

forward the packet to the DHCP client.

Packet Rate Limitat ion After enabling the DHCP Snooping function on the device, send all DHCP

packets to CPU. If the user adopts the tool to fabricate lots of DHCP

packets and initiate the DHCP Flooding attack, it may result in the running

of the device with high payload or even breakdown. To avoid this, you can

set the threshold for the DHCP packets received every second on the port.

The device measures the number of the DHCP packets received by the



port each second. If the number of the packets received each second

exceeds the set threshold, the excessive packets are directly dropped by

CPU. If the number of the received DHCP packets exceeds the threshold in

successive 20 seconds, directly shut down the port and whether to recover

automatically depends the configuration managed by the port. You can

also recover manually.

Typical Application The typical application of the DHCP Flooding function in the network is as

shown in the following Switch A. The port connected to the client network

is set as the un-trust port and the port connected to the relay or server is

set as the trust port. This can ensure that the client can get the IP address

from the trust port (that is the legal server).

Figure 9-2-5 DHCP networking

IP Source Guard and Its Application This section describes the IP Source Guard theory and how to realize it.

Main contents:



Related terms

Introduction

Typical application

Related Terms IP Source Guard: Filter IP packets via IP or IP+MAC.

Introduction With the IP Source Guard binding function, you can filter the packets

forwarded by the port, so as to prevent the packets with invalid IP address

and MAC address from passing the port and improve the port security.

After receiving the packet, the port searches for the IP Source Guard

binding entries and perform the following processing on the packet

according to the filter mode specified on the port.

When the filter mode of the port is IP: If the source IP address of the

packet is the same as the IP address recorded in the binding entries,

the port forwards the packet. Otherwise, drop the packet.

When the filter mode of the port is IP+MAC: If the source MAC

address and source IP address of the packet is the same as the MAC

address and IP address recorded in the binding entries, the port

forwards the packet. Otherwise, drop the packet.

The IP Source Guard binding entries have two sources. One is the static

binding entries configured manually by IP Source Guard; the other is the

entries maintained by DHCP Snooping.

Key Points for Realization 1. When the IP Source Guard function is enabled, poll IP Source Guard

static binding table and DHCP Snooping dynamic binding base to get

the corresponding port entries and write into the hardware entries.

2. When the IP Source Guard function is disabled, poll the IP Source

Guard function is static binding table and the DHCP Snooping dynamic

binding base and delete the corresponding port entries from the

hardware entries;

3. When adding the IP Source Guard static entries, update the hardware

entries automatically. Delete the hardware entries during deletion. If



setting the hardware entries fails, the static table sets Writed-Flag as

non－write.

4. When adding the DHCP Snooping dynamic entries, update the

hardware entries automatically. Delete the hardware entries during

deletion. If setting the hardware entries fails, the static table sets

Writed-Flag as non－write.

5. Synchronize the software table (IP Source Guard static entries and

DHCP Snooping dynamic entries) and hardware table every minute.

Because of the ACL resource limitation, it is likely that all software

entries cannot be written into the hardware entries. You need to check

whether there are available resources regularly. If there are available

resources, for example, some entries are deleted and the ACL

resources are adjusted larger, write the legal entries in the software

table into the hardware entries. The default ACL resources are two

slices, that is, 256. Enabling one port needs to occupy two and the

other are used to set the filter entries.

6. When the IP Source Guard function is enabled on the port, the

configured binding table is written into the switch chip hardware, so as

to realize the filtering of the IP packets. The quantity written into the

switch ship hardware depends on the number of the resources

distributed by the switch chip hardware to IP Source Guard. If the

switch chip hardware resources distributed to IP Source Guard are

used up and you need to add the binding entries or enable the IP

Source Guard binding function on other port, you need to add the

switch chip hardware resources or delete some binding entries. You

can continue to distributed the resources after restarting the device. If

you just delete some entries after the switch chip hardware resources

are used up, you cannot enable the IP Source Guard function on other

port, because you need to pre-distribute the resources for enabling

the IP Source Guard function of the port, but when the switch chip

hardware resources are not enough, to make the resource utilization

reach the maximum, the binding entries occupy the pre-distributed

resources. Meanwhile, after disabling the IP Source Guard function of

the port, the pre-distributed resources of the port are released, but

maybe the resources cannot be written into the binding table.



Typical Application

Appl icat ion in non-DHCP Snooping environment

Figure 9-3-1 IP Source Guard configuration instance 1

The switch can be applied in LAN and be connected to Internet. Configure

IP Source Guard on the port of the switch connected to LAN; bind the IP

address and MAC address of the users in LAN according to the

configuration of the static binding table. Only the bound address can be

connected to Internet via the switch. The IP packet that is sent from the

un-bound address is regarded as illegal packet and is filtered.

Dynamic ARP Detection and Application This section describes Dynamic ARP Inspection theory and how to realize it.

Main contents:

Related terms

Introduction

Typical application



Related Terms Dynamic ARP Inspection: It is one security measure of discovering and

preventing the ARP proof attack by checking the validity of the ARP packet.

Introduction Dynamic ARP detection function can be used to discover and prevent the

ARP proof attack.

The dynamic ARP function re-directs all ARP packets (broadcast ARP and

unicast ARP) of the port on which the ARP detection function is enabled to

CPU for judging, comparing, software forwarding, log recording and so on,

so when there are lots of ARP packets, the CPU resource is consumed.

Therefore, in the normal state, it is not recommended to enable the

function. When it is double that there is the ARP proof attack in the

network, you can enable the function to confirm and locate.

The device does not check all ARP packets from the port on which the

dynamic arp inspection function is not enabled, but directly forward the

packets. Usually, the port on which the dynamic arp inspection is not

enabled is the upstream port of the device. The device checks the ARP

packets from the port on which the dynamic arp inspection function is

enabled according to the DHCP Snooping table or the IP static binding

table configured manually by IP Source Guard.

When global arp-security is enabled, control whether the device processes

the ARP packets of the IP/MAC specified by the global IP/MAC of ACL.

When the source IP of the ARP packet sent to the device matches with the

IP specified by the global IP/MAC of ACL, but the source MAC does not

match, the ARP packet is dropped so that the device does not set up the

wrong ARP entities. The device sets up the entity only when the source

IP/MAC matches with the global IP/MAC of ACL. When the source IP does

not match with the IP specified by the global IP/MAC, the ARP entity can

also be set up.

ARP Detect ion Pol icy 1. When the binding relation of the source IP address and source MAC

address in the ARP packet matches with the DHCP Snooping entries

or the manual-configured IP static binding entries, and the ingress

port of the ARP packet and its VLAN are consistent with the DHCP



Snooping entries or the IP static binding entries manually

configured by IP Source Guard, the ARP packet is valid and is

forwarded.

2. When the binding relation of the source IP address and source MAC

address in the ARP packet does not match with the DHCP Snooping

entries or the manual-configured IP static binding entries, and the

ingress port of the ARP packet and its VLAN are inconsistent with

the DHCP Snooping entries or the IP static binding entries manually

configured by IP Source Guard, the ARP packet is invalid and is

dropped. Besides, the log information is printed.

3. The matching order: First match IP Source Guard static binding

table and then match DHCP snooping dynamic binding table.

Packet Forwarding Pol icy After receiving the ARP packet, first judge whether the dynamic arp

inspection function is enabled on the port. If not, the ARP packet continues

going to the protocol stack for processing and do not perform the software

forwarding; if yes, check the validity according to the previous method. If

the packet is invalid, drop it directly and record in the log. If the packet is

valid, process it according to the destination address.

1. If the destination MAC address of the ARP packet is the local device,

forward the packet to the ARP protocol stack processing and update

the ARP cache of the local device.

2. If the destination MAC address of the ARP packet is the broadcast

address, copy the packet, forward the original packet to the ARP

protocol stack for processing, update the ARP cache of the local

device, and forward the copied packet from all ports of the same

VLAN.

3. If the destination MAC address of the ARP packet is other unicast

address, first search the hardware MAC table to get the forwarding

port. If the forwarding port is found, forward the packet from the

port; if the forwarding port is not found, forward the packet from

all ports of the same VLAN.



Figure 9-4-1 Processing flow for valid ARP packet

Packet Rate Limitat ion After enabling the dynamic ARP function on the device, TRAP all ARP

packets to CPU. If the user adopts the tool to fabricate lots of ARP packets

and initiate the ARP Flooding attack, it may result in the running of the

device with high payload or even breakdown. To avoid this, you can set

the threshold for the ARP packets received every second on the port. The

device measures the number of the ARP packets received by the port each

second. If the number of the packets received each second exceeds the

set threshold, the excessive ARP packets are directly dropped by CPU. If

the number of the received ARP packets exceeds the threshold in

successive 20 seconds, directly shut down the port and whether to recover

automatically depends the configuration managed by the port. You can

also recover manually.

Log Recording For the invalid ARP packet, record it in the log before dropping it. Each

invalid ARP log entry includes the following contents:

1. Receiving VLAN

2. Receiving port

3. The IP address of the sender and the destination IP address

4. The MAC address of the sender and the destination MAC address



5. The number of the dropped packets

The log information is not output in real time, but output periodically. The

user can perform the further processing according to the output log

information, such as locate the host that initiates the ARP attack.

Typical Application

Figure 9-4-2 Application instance of Dynamic ARP Inspection

The above figure is the application in the DHCP environment. If it is not

the DHCP environment, that is, the DHCP Snooping function is not enabled

on switch A, you need to configure the IP Source Guard static binding

table. Otherwise, the ARP packets of all ports on which the Dynamic ARP

Inspection function is enabled are filtered. The Dynamic ARP Inspection

function adopts the dynamic binding table generated by the DHCP

Snooping function to filter the ARP packets, forward the valid packets, and

drop the invalid packets and record in the log.

Port Security This section describes the basic theory of the port security and its

application.

Main contents:



Introduction

Typical application

Introduction The port security is applied at the access layer. It can limit the hosts that

access the network via the device, permitting some specified hosts to

access the network, but other hosts cannot access the network.

The port security function binds the MAC address, IP address, VLAN ID and

Port of the user flexibly to prevent the invalid user from being connected

to the network, so as to ensure the security of the network data and the

valid user can get the enough bandwidth.

The user can limit the hosts that can access the network via three kinds of

rules, including MAC rule, IP rule and MAX rule. The MAC rule is divided to

three kinds of binding modes, that is, MAC binding, MAC+IP binding, and

MAC+VID binding. The IP rule can be for one IP or a series of IP. The MAX

rule is used to limit the number of the maximum MAC addresses that the

port can learn (by order). The maximum number of the MAC addresses

does not include the valid MAC addresses generated by the MAC rule and

IP rule.

The MAC rule and IP rule can specify whether the packet that matches

with the corresponding rule permits the communication. With the MAC rule,

you can bind the MAC address with VLAN, MAC address with IP address

flexibly. The port security is realized based on the software. The rule

quantity is not limited by the hardware resources, which makes the

configuration more flexible.

The rules of the port security depend one the ARP packets of the terminal

device to trigger. When the device receives the ARP packet, the port

security gets the information about various kinds of packets to match the

configured three rules. The matching order is first to match the MAC rule,

then match IP rule and at last match the MAX rule. Control the L2

forwarding table of the port according to the matching result, so as to

control the forwarding of the port for the packet.

When the port security regards the packet as the illegal packet, it

performs the corresponding process. Currently, there are three kinds of

processing modes, that is, protect, restrict, and shutdown. The protect



mode drops packets; the restrict mode drops packets and trap alarm

(alarm within two minutes when receiving illegal packet); besides the

actions of the restrict mode, the shutdown mode shuts down the port.

Typical Application Refer to the related chapter of the configuration manual.

Port Monitoring This section describes the basic theory of the port monitoring and its

application.

Main contents:

Introduction

Typical application

Introduction The port monitoring function is to monitor the packets on the switch CPU,

filter the excessive packets at the bottom layer and protect the switch

from being attacked by the lots of invalid packets.

The monitoring includes the port monitoring and host monitoring. When

the switch is attacked, the user first enables the port monitoring. The

monitoring program measures the packets to the CPU by port. The user

discovers the attacked port from the statistics data and then enables the

host monitoring on the port and sets the upper threshold of the packets to

the CPU in sampling period. The packets that exceed the threshold in the

sampling period from the host that initiates the attack are filtered at the

bottom layer and they do not go to the IP layer for being routed and are

not written into the hardware route table, so as to save the CPU resources

and hardware table resources. When performing the packet filtering on the

host that initiates the attack, the other hosts still can communicate

normally. The monitoring program writes the host whose packets to the

CPU exceed the upper threshold in the sampling period into the blacklist.

In the next sampling period, only half of the upper threshold of the

packets of the hosts in the backlist can go to CPU and the other packets to

CPU are dropped. The port monitoring program performs the measuring

and dropping operations according to the packet classification.



The port monitoring program calculates the sampling result at the end of

each sampling period and updates the backlist information.

The port monitoring divides the packets into six types:

1. broadcast-packet: The destination MAC address is all 1;

2. multicast-packet: The lowest digit of the highest bytes of the

destination MAC address is 1;

3. admin-packet: The destination IP address is the IP address of the

switch VLAN interface;

4. forward-packet: The destination IP address is not the IP address of

the switch VLAN interface. It is the packet that requires to be

forwarded out after being routed;

5. other-packet: The other packets except for the previous four kinds

of packets;

6. All the previous packets are called total-packet;

Typical Application Refer to the related chapters of the configuration manual.

Port Isolation This section describes the basic theory of the port isolation and its

application.

Main contents:

Related terms

Introduction

Typical application



Related Terms Port isolation: It is one function of the port security. The function can

prevent the packet forwarding between one port and the other ports of the

switch.

Introduction The port isolation is port-based security feature. The user can specify the

isolated ports of one port as desired to realize the L2 and L3 data isolation

between the port and the isolated ports, which improves the network

security and provide the flexible networking scheme for the user.

By default, the packets can be forwarded between any two ports in one

VLAN of the switch. To make any specified port in one VLAN cannot

communicate, you can configure the isolated ports of the port in the

specified port mode so that the port that is configured with the port

isolation cannot communicate with the specified isolated ports.

The port isolation is not related with the VLAN of the port. Currently, the

switch supports configuring the isolated ports in the common port and

aggregation port mode. The isolated port can be common port or

aggregation port. The port isolation only realizes the uni-directional packet

dropping. Suppose that the isolated ports are set as B, C, and D on port A.

If the destination port of the packet entering from the port A is B/C/D,

drop the packet directly. However, the destination port of the packet

entering from the port B/C/D is port A, the packet can be forwarded

normally.



Typical Application

Figure 9-6-1 Typical application of port isolation

Illustration

The three ports of switch A are connected to three terminal devices

respectively. port 0/1, port 0/2 and port 0/3 are connected to PC1, PC2

and PC3 respectively. Port 0/27 is connected to the public network. port

0/1, port 0/2, port 0/3 and port 0/27 are connected to one VLAN.

PC1, PC2 and PC3 cannot communicate with each other, but can

communicate with the public network normally. In the normal state, the

ports in one VLAN can communicate with each other. To meet the previous

environment, you can use the port isolation function to realize the

application environment. Isolate port 0/2 and port0/3 on port 0/1; isolate

port 0/1 and port0/3 on port 0/2; isolate port 0/1 and port0/2 on port 0/3.

After the configuration, port 0/1, port 0/2, and port 0/3 cannot

communicate with each other, but can communicate with port 0/27.



SPAN Technology

This chapter describes the port mirroring SPAN technology and application.

Main contents:

SPAN technology

Typical application

SPAN Technology Switched Port Analyzer (SPAN) is used to monitor the data flow of the

switch port. You can use SPAN to copy the frames on one monitoring port

(source port) to another destination port on the switch connected to the

network analysis device to analyze the communication on the source port.

The user adopts the network analysis device to analyze the packets

received by the destination port for network monitoring and

troubleshooting. SPAN does not affect the normal packet switching of the

switch, but all frames that enter into the source port and are output from

the source port are copied to the destination port. However, for one

destination port with excessive traffic, for example, one 100Mbps

destination port monitors one 1000Mbps port, the frames may be dropped.

Related Terms of SPAN Technology SPAN Session

The SPAN session means the data flow between one group of monitoring

ports and one destination port. The data of multiple monitoring ports can

be mirrored to the destination port. The mirrored data flow can be the

input data flow, output data flow or output and input data flow. You can

set SPAN for the port that is in the close state, the SPAN session is

inactive. However, as long as the port is enabled, SPAN becomes active.

Each line card support the SPAN session of four rx and one tx.



Local SPAN

Local SPAN supports the port mirroring on one switch and all monitoring

ports and destination ports are on one switch. Local SPAN mirrors the data

of one or multiple monitoring ports to the destination port.

Remote SPAN

RSPAN supports that the monitoring port and the destination port are not

on the same switch, so as to realize the remote monitoring across the

network. Each RSPAN Session bears the monitoring traffic on the specified

RSPAN VLAN. RSPAN includes RSPAN Source Session, RSPAN VLAN, and

RSPAN Destination Session. You need to configure RSPAN Source Session

and RSPAN Destination Session on different switches. When configuring

RSPAN Source Session, you need to specify one or multiple monitoring

ports and one RSPAN VLAN. The monitoring data is sent to RSPAN VLAN.

Configure RSPAN Destination Session on another switch and you need to

specify the destination port and RSPAN VLAN. RSPAN Destination Session

sends the RSPAN VLAN data to the destination port.

The switches that realize the remote port mirroring function are divided to

three kinds:

1. Source switch: It is the switch of the monitored port, which

transmits to the intermediate switch or destination switch via

RSPAN VLAN.

2. Intermediate switch: It is the switch between the source switch and

destination switch in the network, which transmits the mirroring

traffic to the next intermediate switch or destination switch. If the

source switch is connected to the destination switch directly, there

is no intermediate switch.

3. Destination switch: It is the switch of the remote mirroring

destination port, which transmits the mirroring traffic received from

RSPAN VLAN to the monitoring device via the mirroring destination

port.

Traffic Types

There are three types of monitored traffic:

1. Receive (Rx): The traffic received by the monitoring port;

2. Transmit (Tx): The traffic sent by the monitoring port;

3. Both: The received and sent traffic of the monitoring port.



Monitoring port (source port)

The data of the monitoring port (source port) is monitored for network

analysis. The monitored data flow can be input, output or bi-directional

and can be in different VLANs.

The monitoring port has the following features:

It can be common port or aggregation port;

It cannot be destination port;

One source port can only belong to one SPAN session;

It can be or not in the same VLAN as the destination port.

Destination port

The destination port can only be one separate physical port or aggregation

group. One destination port can only be used in one SPAN session.

The destination port has the following features:

The destination port is common port or link aggregation;

The destination port cannot be monitoring port;

The destination port type of RSPAN Destination Session should be

hybrid;

The destination port cannot take part in the STP calculation. The local

SPAN includes the BPDU of the monitored traffic, so any BPDU seen by

the destination port is from the source port;

The destination port should not be connected to other switch, which

may result in the network loop;

The destination port had better be larger than or be equal to the

bandwidth of the monitoring port. Otherwise, the packets may be lost;

The destination port does not enable the LACP and 802.1X function,

preventing the mirroring data from being affected;

The source RSPAN destination port can only be the common port, but

cannot be the aggregation port;

The destination port can serve as the common forwarding port, but to

prevent the monitored data from being interfered by other data flow,

it is recommended to delete the destination port from all VLANs.



RSPAN VLAN

RSPAN Vlan should be one private idle VLAN for RSPAN and its VLAN

number can be 2-4096. You can select one idle VLAN flexibly during

configuration, but you need to ensure that other devices on all paths to

the analysis device are all configured with the VLAN and the corresponding

ports are added to the VLAN.

RSPAN VLAN has the following features:

To prevent the monitored data from being interfered by other data

flow, RSPAN VLAN can only bear the RSPAN traffic;

Except for the ports those are used to bear the RSPAN traffic, do not

configure any port to RSPAN VLAN;

RSPAN VLAN prohibits the MAC address learning function;

RSPAN does not support the L2 protocol monitoring unless disabling

the L2 protocol function of RSPAN destination session device.

Limitations

1. SPAN and flow mirroring use the same chip resource. When

enabling the port mirroring, avoid enabling the flow mirroring.

Otherwise, the hardware resource may become lacking.

2. In the MPLS environment, if MPLS learns the destination MAC

address of the packet, the mirrored MPLS packet carries the MPLS

header; if MPLS does not learn the destination MAC address of the

packet, the mirrored MPLS packet does not carry the MPLS header.

Typical Application

Local SPAN Appl icat ion The following is one simple local SPAN environment.



The application diagram of the local SPAN

Illustration

In the above figure, all packets of port 0/1 are mirrored to port 0/2. The

network analyzer connected to port 0/2 is not connected to port 0/1

directly, but port 0/2 can receive the packets of port 0/1 via the mirroring.



Remote SPAN Appl icat ion

The application diagram of remote SPAN

Illustration

In the above figure, the mirroring packets of the port 0/8 on the source

device switch 1 are transmitted to the destination port 0/1 of the

destination device switch 2 via RSPAN Vlan 100, realizing the monitoring

for the sent and received packets of the source switch ports on the

destination switch.



IPv4 Unicast Routing

This chapter describes the principles of the mainstream routing protocols.

Main contents:

Introduction to the IPv4 unicast routing

Static routing protocol

M-VRF

Load balance

RIP dynamic routing protocol

OSPF dynamic routing protocol

IS-IS dynamic routing protocol

BGP dynamic routing protocol

Introduction to the IPv4 Unicast Routing The packets reach another host from one host in the network. Then, you

should know the transmission path of the packets in the network. The path

is called route.

A network is composed of many forwarding devices (such as switches). To

forward packets from one host to another host, each forwarding devices

should know the path to the destination host, that is, each forwarding

device should have the route to the destination route.

The source of the route includes three types: when the forwarding device

is directly connected to the network, the directly-connected route is



generated; when the network administrator adds routes manually, static

routes are generated; when the forwarding device runs the dynamic

routing protocol, the dynamic route can be automatically learned.

There are many paths for packets sent from one host to another.

Therefore, the best path should be selected to forward the packets.

Determine the path from the following aspects:

Path length: the path length can be measured through the hops or

cost. In the distance vector routing protocol, the path length refers to

the number of the forwarding devices from the source host to the

destination host. In the link status routing protocol, the path length

refers to the sum of the cost of each link.

Reliability: measured by the error rate between the source host and

the destination host. In most routing protocols, the reliability of a link

is designated by the network engineer.

Delay: refers to the sum of the time spent in traveling through all

network devices, links, and switching devices. In addition, for the

delay time, the network congestion and the distance between the

source end and the destination end. Many variables are taken into

account for the delay time. Therefore, in the calculation for best path,

delay is an important measurement standard.

Bandwidth: Calculating the best path through the bandwidth may

cause misleading. The link with 1.544Mbps bandwidth is better than

the link with the bandwidth of 56Kbps, but the utilization rate of the

1.544Mbps link is high, or the load of the opposite receiving device is

heavy, it may not be the best path.

Load: Assign a value for the network resource according to the

resource utilization. The value is determined by the CPU utilization,

passed packet per second, and disassemble/assemble of packets. But

the process of monitoring device resources is a heavy load.

Communication cost: In some cases, the communication link of public

network is charged by utilization rate or by monthly fee, for example,

the ISDN link is charge by the utilization time and the data amount in

the period. In the examples, the communication cost is a very import

factor in determining the best path.

Static Routing Protocol Main contents:

Introduction to the static route



Typical application of the static route

Troubleshooting of the static route

Introduction to the Static Route The static route is defined by users. Through the static route, the packets

between the source and destination adopt the path specified by the

administrator.

To know the information categories in the routing table, when a frame

reaches one interface of the switch, it is useful to check the changes. You

must check the data link tag of the frame in the destination domain. If the

tag includes the tag of the switch interface and the broadcast tag, the

switch will deprive the header and tailor of the frame and transmit the

complete packets to the network layer. The network layer must check the

destination address in the packets. If the destination address is the IP

address of the switch, is the multicast address performing monitoring, is

the broadcast address of the subnet or the designated broadcast address,

is the global broadcast address (255.255.255.255), the protocol domain of

the packets will be checked and the complete data will be transmitted to

the corresponding internal process.

To find a route, use the next-hop address as the destination, and parse

the link layer address. The next-hop address may be the address of

another host directly connected with the switch. It may be the address of

another host non-directly connected with the switch in the network. The

addresses can be routed.

To route the packets, the switch searches the routing table to get the

correct route. In the database, each route in the database should contain

the following two conditions:

1. Destination address: The network address that the switch can reach.

Based on the same primary network address , the switch may have

more than one route to the same address.

2. Destination pointer: The pointer specifies whether the network and the

switch are directly connected or specifies the address of the next

switch, namely, the next-hop switch.

The switch will try to match the most special address. In the following

special sequence, the address may be one of the following:



Host address (host route)

Subnet

A group of subnets (summary route)

Main network ID

A group of network ID (ultranetwork)

Default address

If the destination address of the packets does not match any entry in the

routing table, the packets will be discarded and send an ICMP message

that the destination address is unavailable to the source address.

Typical Application of the Static Route The following is a simple environment illustrating the static route.

Figure 11-1 Typical application of the static route



Illustration

Two Maipu routers (switch-a and switch-b), as the forwarding equipment,

connect the two networks including 10.1.1.0/24 and 10.1.3.0/24. The

default gateway of PC-1 is 10.1.3.1 and the default gateway of PC-2 is

10.1.1.2.

Configure static route on the two switches to implement the

interconnection of 10.1.1.0/24 and 10.1.3.0/24. Configure a static route

on switch-a: set the destination address to 10.1.1.0/24 and set next hop

to 10.1.2.1. Configure a static route on switch-b: set the destination

address to 10.1.3.0/24 and set the next-hop to 10.1.2.2. Then, the

network can be interconnected.

The data flow sent to PC-2 from PC-1 reaches the default gateway switch-

a. Switch-a finds that the destination address 10.1.1.1 of the data flow is

not the local address. Search the routing table. Owing to the existence of

static route 10.1.1.0/24, switch-a can forward the data flow to the next

hop 10.1.2.1 (namely switch-b). Switch-b continues forwarding, the

destination address of the data flow hits the directly connected route, and

the data flow is successfully transmitted to PC-2.

Troubleshooting of the Static Route

Load Balancing of the Switching Device On the switching devices that support hardware routing (such as L3

switch), after the static route is configured, small amount of packets

should be forwarded (through software) to parse the next hop. For

example:

S 128.255.0.0/16 [1/10] via 1.1.1.2, 00:40:10, vlan1

When the static route takes effect, it is possible that the ARP table entry

corresponding to 1.1.1.2 does not exist. When real data flow should be

forwarded through the route, the ARP table entry corresponding to 1.1.1.2

will be parsed. The ARP is parsed by sending the data to the CPU for

software forwarding. When the ARP is parsed successfully, the data is

switched on the hardware and is not sent to the CPU.

When the static route is a load balancing route, it is possible that the data

is sent to the CPU continuously owing to the different route of the software

and hardware.

S 128.255.0.0/16 [1/10] via 1.1.1.2, 00:40:10, vlan1

via 1.1.1.3, 00:40:10, vlan1

The load balancing route is written into the hardware. The ARP is not

parsed for next hops 1.1.1.2 and 1.1.1.3. The data flow with the

destination address of 128.255.1.1 hits the route. For the load balancing

route, the hardware adopts flow load balancing mode to select the next



hop. For example: select 1.1.1.3. For 1.1.1.3, if the ARP is not parsed, the

packets should be transmitted to the CPU to perform software forwarding.

After the packets reach the CPU, if the software also adopts the flow load

balancing mode to select the next hop, owing to the different algorithm of

software and hardware, 1.1.1.2 may be selected. As a result, the ARP

parsing of 1.1.1.2 is implemented. 1.1.1.3 is not parsed.

Then, the hardware selects 1.1.1.3 as the next hop. The software selects

1.1.1.2 as the next hop. Consequently, the data flow is continuously

transmitted to the CPU and hardware forwarding cannot be performed.

Therefore, for the hardware route switching devices, when the static route

load balancing mode is used, we recommend setting the software load

balancing to packet load balancing mode. Then, each next hop on the

software can perform ARP parsing.

Use the ip load-sharing per-packet command to set the software load

balancing mode to per packet mode.

M-VRF Main contents:

Terms

Introduction to M-VRF

Terms of M-VRF VPN- Virtual Private Network Through VPN technology, two or multiple

network sites can be connected through the Internet. In the VPN, the

running mode is like that all sites are in a single private network.

M-VRF- Multi－VPN Routing and Forwarding In the switch, each VPN has

its own routing and forwarding table. All customers of sites of the VPN can

only access the routes of the table.



Introduction to M-VRF M-VRF supports the VPN. In a switch, multiple VRFs may exist. The

resources (interface, IP address, routing table) belong to a VRF. The

resources in different VRF cannot access mutually. Through the Multi-VRF

function, users can isolate the network. And the address space overlapping

is supported.

The M-VRF does not modify the packet format. It only enhances the

security by dividing the resource attributes. The resources in the system

belong to one VRF only. After the interface is configured with a VRF, the

packets sent or received through the interface can only access the

resources of its own VRF.

We take the packet forwarding as an example. When an interface receives

a packet, take the VRF attributes of the interface. In addition to

determining whether the local address is the destination address of the

packets, we need to determine whether the VRF attributes of the home

interface of the address and the VRF attribute of the interface receiving

packets are the same. To forward packets, locate routing table according

to the VRF attribute.

Load Balancing Main contents:

Types of load balancing

Modes of load balancing

Switching types and load balancing

Types of Load Balancing Equal-cost load balancing, assigns communication traffic on average. (1:1)

Unequal-cast load balancing, assigns communication traffic according to

the cost ratio. (1: n)



Modes of Load Balancing Load balancing of per packet, the first packet takes one link and the

second packet takes another link. The packets are distributed each links

circularly. (Ignore whether the destination address is the same)

Load balancing of per session (or destination by destination), packets to

the same host use the same link.

Both modes have their own features.

1. Switching per packet: when the concurrent link is less than 64K, it is a

good option. Missequence may occur. It is improper for specific

application, such as voice traffic (depends on the sequence of the

arrived packets))

2. Switching per session: when the load of the link used by the session

traffic is heavy (for the communication traffic is heavy), but the load

of other links is light, the load of different links may be unbalanced.

Switching Types and Load Balancing Different switching types match different load balancing modes; generally

there are the following two types:

Process switching: To balance the load based on the sequence of the

arrived packets. The per packet balancing mode is adopted.

Fast switching: To balance the load based on the source/destination

address of the packets. The per session balancing mode is adopted.

Note

The content described in this chapter is only applicable to the software

forwarding. The packets forwarded through switching chip are not

restricted by the description in this chapter.

RIP Dynamic Routing Protocol Main contents:

Terms of RIP protocol

Introduction to the RIP protocol



Terms of RIP Protocol UDP- User Datagram Protocol. It is a simple datagram-oriented unreliable

transmission IP network transmission layer protocol.

D-V algorithm-distance vector algorithm. It is a routing calculation

method for the computer network. It is also called the Bellman-Ford

algorithm.

IGP-- Interior Gateway Protocol.

Request packets-The packets for requesting the RIP routing information

about other routing devices.

Response packets-For advertising its own routing information to the RIP

of the adjacent routing devices.

Split horizon- A measure adopted by the RIP protocol to prevent the

generation of loopback.

Poisoned reverse- A measure adopted by the RIP protocol for preventing

the generation of route loopback, is more initiative than the Split Horizon.

Triggered updates- A measure of the RIP protocol for quickening the

convergence. When the route changes, the updates are triggered and the

changed routes are advertised. Regular updates, the RIP protocol sends

the updates of all routing information at an interval of 30 seconds by

default.

Introduction to the RIP Protocol Routing Information Protocol (RIP) is an interior gateway routing protocol

based on the distance vector algorithm. It is used for the dynamic IPv4

route. The RIP protocol has become one of the standards of information

transmission between routing devices and hosts.

The RIP protocol includes RIPv1 and RIPv2. RIPv1 does not support

classless routes but RIPv2 supports the classless routes. Usually RIPv2 is

used.

The RIP protocol is simple and the configuration is also simple. The routing

information to be advertised by the RIP protocol and the number of routes

in the routing table are directly proportional. A large number of routes use

lots of network resources. At the same time, the RIP protocol defines that

the maximum of the hops is 15. Therefore, the RIP protocol is only

applicable to the simple small-to-medium network.



RIP protocol is applicable to most campus network and simple regional

network. For more complex environment, the RIP protocol is not used.

RIP in the TCP/IP Protocol

Figure 11-2 RIP in the TCP/IP protocol stack

As shown in the preceding figure, the RIP protocol is based on the UDP

protocol. The protocol packets sent by the RIP protocol are encapsulated

in the UDP packets. At port 520, the RIP protocol receives the protocol

packets sent from the remote routing devices. It updates the local routing

table according to the routing information in the received protocol packets.

At the same time, add one to the metric and then notify other adjacent

routing devices. Through this mode, all routing devices in the route

domain can learn all routes.

The RIP protocol sends packets in the following three modes: broadcast,

multicast, and unicast. The usage of each mode is shown in the following

table.

Table 1-1 Modes of sending packets

Mode Address Version Port Purpose

Broadcast 255.255.255.255 RIPv1 520 RIPv1 sends protocol packets to all adjacent routing devices.

Multicast 224.0.0.9 RIPv2 520 RIPv2 sends protocol packets to all adjacent

routing devices.

Unicast Unicast IP address

RIPv1/2 520 The response packets responding to request packets; protocol packets sent to Neighbor.

RIP Packets Types and Structure RIP Packet Types

There are two types of packets: Request packets and Response packets.

The RIP packet types and the functions are as follows.



Table 1-2 RIP packet types

Packet Type Function Sending Status

Request packets Request the routing information from the adjacent routing device RIP. You can request the specified routing information or request all routing information (there is only one route entry with the address family tag 0, metric 16.)

When the RIP is running at the interface, request all routing information from the adjacent routing device RIP.

Response packets Advertise the routing information to the adjacent routing device RIP.

A) Respond to the request packets. B) When the route changes, the update of the routing information is triggered. C) Advertise all routing information (regular updates) to the adjacent device RIP periodically.

RIP Packet Structure

Figure 11-3 RIP packets structure

As shown in the preceding figure, the RIP packets are encapsulated in the

UDP packets. In the IP header of the RIP packets, TTL is set to 1 to

prevent RIP packets from being forwarded by other routing devices.

The RIP header has two fields: Command field identifies the request

packets (value is 1) or response packets (value is 2); Version field

identifies the RIPv1 (value is 1) or RIPv2 (value is 2).

RIP Entry includes three types: RIPv1 routing entry, RIPv2 routing entry,

and authentication information entry. RIP Entry types and description are

as follows.

Table 1-3 RIP protocol RIP entry types and description

RIP information entry

Version Format Description

RIPv1 routing entry RIPv1 The format is shown in the

In the RIPv1, advertise the routing information to the adjacent routing



following figure:

device RIP.

RIPv2 routing entry RIPv2 The format is shown in the following figure:

In the RIPv2, advertise the routing information to the adjacent routing device RIP.

Authentication information entry

Plain text

RIPv2 The format is shown in the following figure:

Add the authentication information about the plain text of the packet in the RIPv2 protocol. The information follows the RIP packet header.

MD5 RIPv2 The format is shown in the following figure:

Add the authentication information about the MD5 of the packet in the RIPv2 protocol. The information follows the RIP packet header. At the end of the packet, corresponding authentication content is required.

Figure 11-4 Format of the RIP routing information entry



Figure 11-5 Packet format of the RIPv2 authentication information

Working Principle of RIP

Figure 11-6 Working flow of the RIP protocol



The working flow of the RIP protocol is shown in the preceding figure. It

can be divided into two parts: one is the RIP protocol starting flow, and

the other is the processing flow of RIP receiving packets.

Starting the Protocol

When an interface starts to run the RIP protocol, request packets are sent

to the interface through the broadcast (RIPv1) or multicast (RIPv2) mode

to request all routing information from all adjacent routing devices. Then,

the fast convergence can be implemented.

After the response packets of the request packets are received, update the

routes in the route database according to the routing information

contained in the packets. Then, the changed routes are advertised to other

adjacent routing device RIP (triggered updates).

At the same time, start the Updates timer. Every 30 seconds by default,

advertise all routing information through response packets to the adjacent

routing device RIP. The purpose of the operation is to ensure the

synchronization of the database between the routing device RIPs and to

update the advertise routes. As a result, the previously advertised routes

do not time out or become invalid on other routing devices.

Route Database

The route database records all routing information about the RIP protocol.

Each routing information is composed of the following elements:

1. Destination address: the destination host or subnet of the route.

2. Metric: The metric value of reaching the destination.

3. Next hop interface: the interface for forwarding packets reached

the destination, namely, the interface of the route is learned.

4. Next hot IP address: to reach the destination, the interface IP

address of the passed adjacent routing devices. Generally, the

source IP address of the response packets of the route is learned.

5. Source IP address: the source IP address of the response packets

of the route is learned.

6. Route tag: defined by the user, for marking category 1 route. For

example, mark that a route is obtained through redistributing the

BGP routes.



Source of the Routing Entries in the Route Database

In the RIP route database, the sources of the routing entries are as follows:

1. Directly connected route of the covered interface

2. The route for the protocol to redistribute other protocols.

3. Routes generated by the protocol configuration command, for

example, the command for generating and launching default route

0.0.0.0 (default-information originate).

4. Routes learned from the adjacent routing device RIP.

Retrieval of Next-Hop Route

In RIPv1, the next-hop interface of the route is the interface of the learned

route. The next-hop IP address is the source IP address of the response

packets of the learned route.

In RIPv2, the routing information in the response packets can carry the

next-hop IP address. The next-hop interface of the route is the interface of

the learned route. The next-hop IP address can be one of the following:

the source IP address of the response packets that learned the route; the

next-hop IP address carried in the routing information. If the next-hop IP

address in the routing information and the interface that receives the

routing information are in the same subnet, the next-hop IP address of the

route is the next-hop IP address in the routing information. Otherwise, the

next-hop IP address of the route is the source IP address of the response

packets. The purpose is to implement the re-direction function.

The following example illustrates the application of the next-hop address

of the routing entry in RIPv2.



Figure 11-7 RIP route redirection

As shown in the preceding figure, switch-A runs RIP, switch-B runs RIP

and OSPF, switch-C runs OSPF. In switch-B, the RIP redistributes the

learned OSPF route 11.0.0.0/8. As a result, switch-A can learn the route

11.0.0.0/8 that reaches the subnet. When switch-A learns the route, by

default, the next-hop is switch-B, namely, 10.1.1.2. Then, the packets

forwarded from switch-A to destination subnet 11.0.0.0/8 reach switch-C

through switch-B.

To solve the problem, when switch-B advertises route 11.0.0.0/8 to

switch-A, the next-hop of the route is specified to switch-C, namely

10.1.1.3. When switch-A learns the route, it specifies the next-hop of

route 11.0.0.0/8 to switch-C, namely 10.1.1.3. Then, the packets

forwarded to destination subnet 11.0.0.0/8 by switch-A are directly

forwarded to switch-C, and the packets doest pass through switch-B.

Route Updates

When a route is learned from the adjacent routing device RIP, in the

following cases, use the route to update the route in the database:

1. The route does not exist in the route database and the metric of the

route is less than 16 hops.

2. The route exists in the database. The source IP address and the source

IP address of the learned route are the same.



3. The route exists in the database, but the metric is equal to or greater

than the metric of the learned route.

To accurate the number of metric hops, when the routes in the route

database are advertised, the metric increases 1. The maximum of the

metric is 15. When the metric is greater than 15, the route is considered

to be unreachable.

RIP Timer

Valid

Invalid +

HolddownInvalid

Flush(Delete route from

database)

Invalid Timer timeout

or metric is updating

to 16 (Unreachable)

Flush

Timer timeout

Holddown

Timer timeout

Route Update

Flush

Timer timeout

Running

invalid timer on

nexthops of routes

Running

holdown timer

and

flush timer on

routes

Running

flush timer on

routes

Figure 11-8 Status change of RIP route entry

RIP protocol contains four timers, Update timer, invalid timer, holddown

timer, and flush timer. The description of each timer is as follows.

Table 1-4 RIP protocol timers

Timer Operation Object

Default Value

Startup Condition

Function

Update Timer

Route Database

30 seconds

The timer is started repeatedly when the RIP is started.

Advertise all route information to the adjacent routing device Rip through the response packets periodically. 1. Ensure the database synchronization between routing device RIPs. 2. Refresh the previously advertised routes. As a result, the advertised routes do not time out on other routing devices.

Invalid Timer Next hop of routing entry

180 seconds

Started when one route entry is learned

A route entry will become invalid if it is not updated in certain time. The change of status is shown in the preceding figure. The timer can be updated by the response packets. When the route entry becomes invalid,



shut down the timer.

Holddown Timer

Route entry

180 seconds

Started when the route entry becomes invalid.

In a certain time after the route entry becomes invalid, the route entry cannot be updated by the response packets to prevent the loopback. The change of status is shown in the preceding figure. When the route entry gets out of the holddown status, shutdown the timer.

Flush Timer Route entry

240 seconds

Started when the route entry becomes invalid.

A route entry is deleted from the database after it becomes invalid for a certain time. The change of status is shown in the preceding figure. When the route entry is deleted, shut down the timer.

Prevent ion of RIP Route Loopback The RIP protocol is dynamic routing protocol based on the distance vector

algorithm. It does not know the status of the entire network topology.

When the network sends the changes, the routes of the entire network

take some time to perform convergence. As a result, the route databases

of each route devices are not synchronized in certain time. At the same

time, the topology of the entire topology is not known, so the route

loopback may be generated. The RIP protocol uses the following

mechanism to reduce the possibility of route loopback caused by the

inconsistency in the network:

Counting to Infinity

The RIP protocol allows the maximum hop of 15. The destination greater

than 15 hops is considered to be unreachable. The number restricts the

network size and prevents the infinite transfer of routing information. The

routing information travels from one routing device to another. The

number of hops increases 1 at each transfer. When the number of hops

exceeds 15, the route will be deleted from the routing table.

Split Horizon

Split-horizon prohibits a router from advertising a route back out the

interface from which it was learned. The route learned from one interface

is advertised from the interface. Consequently, the route loopback may

occur.

The rules of the RIP split horizon are as follows: if the routing device RIP

learns routing information A from an interface, the response packets sent

to the interface cannot contain the routing information A.

There is a special case for split-horizon, when an interface receives route

request packets from an interface, do not perform split-horizon for the

response of the packets.



Poisoned Reverse

The purpose of poisoned reverse and the purpose of the split horizon are

the same, but the operations are different.

The rules of the RIP poisoned reverse are as follows: if the routing device

RIP learns routing information A from an interface, the response packets

sent to the interface cannot contain the routing information A, but the

metric is set to 16 (namely unreachable).

Compared with split horizon, the poisoned reverse has the advantage that:

when the number of hops is set to unreachable, notify the routing

information to the source routing device, if the route loopback already

exists, the route loopback can be broken immediately. But for the split

horizon, it has to wait until the wrong route entry is deleted for timeout.

The disadvantage is that: the poisoned reverse increases the size of the

response packets. As a result, the consumption of the protocol bandwidth

is increased.

Holddown Timer

The purpose of the holddown timer is to prevent the response packet

update after the route entry becomes unreachable for certain time.

Through the hoddown timer, before the route device receives the message

that the route is unreachable, the unreachable route will not be updated

by the received response packets. The route entry information in the

received response packets may be the packets advertised by itself.

Triggered updates

When the route changes, it is advertised to the adjacent routing device

RIP through the response packets.

The poisoned reverse and split horizon break the route loopback composed

of any two routing devices. The route loopback composed of three or more

routing devices may also occur until the route metric is accurate to infinite

(16). The triggered updates can quicken the convergence of the route.

Then, the time for breaking the route loopback is shortened.

RIPv1 and RIPv2 RIPv2 is the expansion of RIPv1. RIPv2 is the trend of the technology

development. At the same time, RIPv2 overcomes some disadvantages of

RIPv1. The main mechanism of RIPv2 is the same as that of RIPv1. It

improves and expands the RIPv1. The difference between the two

protocols is as follows:

Table 1-5 Difference between RIPv1 and RIPv2



Attribute RIPv1 RIPv2

Route mask Cannot release the route mask. The mask is obtained through the route class and the classless route is not supported.

Can release the route mask; the classless route is supported.

Packet sending Send in the broadcast (255.255.255.255) mode; it consumes lots of network resources.

Send in the multicast (224.0.0.9) mode; it consumes lots of network resources.

Authentication Does not support authentication Authentication information field is expanded; support the plain text and MD5 authentication.

Route tag Does not support advertisement and learning of route tag.

Support advertisement and learning of route tag.

Next hop advertisement

Does not support the advertisement of next hop.

Support the advertisement of next hop to implement the function of route redirection.

IRMP Dynamic Routing Protocol Main contents:

Related terms

Introduction to IRMP protocol

Related Terms of IRMP Protocol downstream router: (for the subnet) it is the router nearer to the

destination subnet;

successor: the next router passed from the current router to the

destination router;

reported distance: the distance reported by the neighbor to the current

router;

feasible successor: the router that is nearer to the destination router

than the current router.

Introduction to IRMP Protocol The technology (DUAL-Diffused Update Algorithm) used by IRMP (Internal

Routing Message Protocol, compatible with EIGRP) is similar to the

distance vector protocol.

The router only uses the information provided by the direct-connected

neighbor to make the routing decision. The received information can



perform the next filtering because of the security or communication

project.

The router only provides the used route information for the direct-

connected neighbor. The information sent to the neighbor also can be

filtered at first, and then be sent.

However, there is some difference between IRMP and distance victor,

which makes IRMP more excellent than the traditional distance vector.

1. IRMP saves all routes sent by all neighbors in the topology table, but

not just save the best route received up to now;

2. When IRMP cannot access the destination, but there is no substitute

route, it can query the neighbor (the topology table is one data

structure and IRMP uses it to save all route information received from

the neighbor).

IRMP Types Opcode Type

1 Update 3 Query 4 Reply 5 Hello 6 IPX SAP (does not support for the moment)

Different TLV Defined in IRMP No. TLV Type

Common TLV types: 0x0001 0x0003 0x0004 0x0005

IRMP Parameters Sequence Software Version 12 Next Multicast Sequence

The TLV types of IP: 0x0102 0x0103

IP Internal Routes IP External Routes

Other types are not supported for the moment

IRMP Unicast and Multicast Sending (Multicast Address 224.0.0.10)

Type/Reliability Unreliable Reliable

Unicast ACK Reply

Multicast Hello Update Query



In the following cases, IRMP adopts the unicast:

When transmitting packets (X.25 and frame relay) on the transmission

medium that does not support the hardware multicast;

When re-transmitting the packet to the neighbor that does not reply

within the multicast timeout interval;

IRMP Packet Format (Take One IP Packet with IRMP Data as an Example) Version (4) header length

(5) Service Type (00)

Total Length (0045)

ID (05f7) 00 00

Life time (02) Protocol (58) (IRMP)

Header check sum (c75d)

Source IP address (0a010102)

Destination IP address (E000000a)

IRMP version (02) Operation code (01)

Check sum (e655)

Flag (00000000)

Sequence (00000003)

Response (00000000) (when the packet is ack packet, it is not 0)

AS number (00000001)

TLV type (0102) Length (00 1d)

Next step (00000000)

Delay (0001f400)

Bandwidth (00000100)

MTU (008000) Steps (00)

Reliability (ff) Load (01) Reserved (0000)

Prefix length (20) Destination

By default, the hello packets are sent with an interval of 5s; keep timer as

15s (for the NBMA interface with the bandwidth lower than T1, the two

values are 60s and 180s respectively).

OSPF Dynamic Routing Protocol Main contents:

Terms of OSPF Protocol

Introduction to the OSPF protocol

OSFP features



Terms of OSPF Protocol AS- Autonomous System: a group of routing devices exchanging information through the same routing protocol.

Area: the collection of routing devices, which has such topology database:

OSPF divides one AS into multiple areas; the topology of one are is

invisible to another area, which reduces the number of routing information

in an AS. The area is used to contain link state updates and enables the

administrator to create hierarchical network.

areaID-the 32-bit ID of the area in the AS.

IGP- Internal Gateway Protocol: the routing protocol running on the

routing devices of an AS system, each AS system has an independent IGP;

different AS system may run different IGP. OSPF is one kind of IGP.

Router ID-a 32-bit number, it is granted to the OSPF, as a result, each

routing device can identify the routing device in the AS.

Point To Point network-the network composed of a pair of routing

devices, such as a 56kb serial port connection.

Broadcast Networks-the network supports multiple (more than 2)

routing devices. The routing devices can exchange information with all

network (broadcast) routing devices. The neighbor routing device is

dynamically detected by the OSPF hello packets. If the network has the

multicast capability, OSPF also uses multicast. Each pair of routing device

on the network is supposed to directly connect with the opposite party.

The Ethernet is an example of the broadcast network.

Non-broadcast Multi-Access network-the network supports multiple

(more than 2) routing devices. But it has no broadcast capability. The

neighbor is maintained by the Hello packets of the OSPF. Owing to the lack

of broadcast capability, configuration is required in the case of detecting

neighbors.

OSPF can exchange information in two types of non-broadcast network: 1.

Non-Broadcast Multi-Access, OSPF in the network is similar to the

broadcast network; 2. Point-to-MultiPoint, OSPF processes the network

like processing multiple point-to-point collection.

Interface-the connection between the routing device and the reachable

network; each interface has the relevant status information, which can be

obtained through the bottom layer or routing protocol. Each interface has

one associated and unique IP address and mask (except the unnumbered

point-to-point connection).

Neighbor-two routing devices have an interface connecting to the same

network. The neighbor relationship is maintained through the OSPF hello

packets.



Adjacency-OSPF creates adjacency between neighbor routing devices and

then they can exchange routing information. Not every pair of neighbor

routing devices can be adjacent.

LSA- Link state advertisement: the data unit for describing local routing

device or network state. For a routing device, the interface state of the

routing device and the adjacency state are contained. The advertisement

of each link is sent to the entire area. The routing device uses the

collected link state advertisement to form the link state database.

Stub Area-the area that has only one interface connected with the

external. Category 5 LSA cannot be flooded to the area.

Backbone Area-Composed of all area boarder routing devices and the

links among them.

ASE- AS external route: the routes obtained by the non-OSPF protocols,

such as BGP, RIP, and static configuration route.

DR- Designate Router: to reduce the number of adjacencies; the

adjacencies are formed in the multiple access network, such as Ethernet,

token ring, and frame relay. The reduction of the number of formed

adjacencies lowers the scale of the topology database. The DR forms

adjacencies with all routing devices in the multiple-access network. The

routing device send the LSA to the DR, and the DR sends the LSA to the

entire network. Each routing device has a convergence point for sending

information. At the same time, each routing device exchanges information

with other routing devices in the network.

BDR- Backup Designate Router: applied in a multi-access network; the

task is to takes over the DR when it fails.

Inter-Area Route-a route generated in non-local area

Intra-Area Route- a route in an area

Flooding-a technology distributing LSA among routing devices, as a result,

the routing devices running OSPF synchronize the link state database

Hello-hello packets: to create and maintain the neighbor relationship In

the broadcast network, the hello packets can discover the neighbor routing

devices dynamically; in addition, hello packets can be used to select a DR

in the network

NSSA- Not-So-Stubby-Areas: allow the external route to advertise to the

OSPF AS; at the same time, for other parts of the AS system, the stub

area features are reserved. In NSSA ASBR, type 7 LSA is generated to

advertise external routes of the AS area; when the ABR of the NSSA

receives type 7 LSA and the P bit is set to 1, type 7 LSA is converted to

type 5 LSA to other parts of the AS area.



Introduction to OSPF Open Shortest Path First Protocol (OSPF) is a dynamic routing protocol. It

can detect the network change in the AS and form new route after

convergence for some time. The convergence time is short and the routing

information is limited. In the OSPF protocol, each route maintains one

network topology database describing the AS. Each specific routing device

has the same database. Each record of the database is the local state of

the specific routing device The routing device distributes the local states

through the flooding mode in the AS.

All routing devices run the same algorithm in parallel. Each routing device

uses the link state database to generate a shortest path tree with itself as

the root. The shortest path tree provides the route to each destination in

the AS. The external routing information serves as leaves in the tree.

OSPF allows the combination of multiple networks. The combination is

called an area. The topology information in an area is invisible to other

areas in the AS. The information shielding can reduce the route traffic. In

addition, the determination of interior route in an area requires the

topology information about the area. Then, the routing information in the

area can be protected. Generally, the route in the area is determined by

the topology of the area. An area is usually a subnet.

OSPF allows flexible configuration of the subnet. Each route distributed by

OSPF has a destination and a mask. The datagram is routed to the best

matched route. The host route is considered to be the subnet of 0xffffffff.

All OSPF interactions are authorized. It means that only the trusted

routing devices can participate in the route of AS. Multiple authorization

configurations can be used. Actually, each subnet can be configured with

independent authorization.

External routes (such as exterior gateway protocol: BGP) is advertised in

the AS. External routes and the OSPF link state data are separated. Each

external route marks an advertisement routing device. Then, the AS

boarder routing device can transmit information.

The hierarchy of the OSPF in the network protocol stack



Figure 11-9 Hierarchy of OSPF in the network protocol stack

AS Divis ion in OSPF

Figure 11-10 OSPF area, AS division

SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder

router (ABR);

SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder

router (ABR);

SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router

(ABR);

SW5 is the AS boarder router (ASBR).

SW3, SW5, SW6, and SW8 comprise the backbone area 0.



Process of OSPF The basic idea of OSPF: in the AS, each routing device running OSPF

collects the link state. Broadcast the link state in the entire system

through the flooding mode. Then, the entire system maintains the

synchronized link state database. Each routing device calculates a shortest

path tree with the device itself as the root and other network nodes as the

leaves through the database. Then, the best routes to many places in the

system are obtained.

The routing devices running the OSPF form an AS. The AS can be divided

into multiple areas. For each routing device in the area, an AS topology

(link state database is required).

When the OSPF is enabled in a routing device, it creates relationship with

other routing devices in the area. By sending hello packets, other routing

devices know its existence. It knows the existence of the opposite part by

receiving the hello packets. Then, the neighbor relation with other routing

devices is created.

If the network type is broadcast or NBMA network, the routing device A

will select the DR and BDR from the known neighbors. In addition, it

creates adjacency with them. As a result, the data traffic is reduced for all

routing devices create adjacencies only with the DR and BDR.

If the network type is point-to-point or point-to-multiple point, routing

device A attempts to create adjacency with all neighbors. In this case,

routing device A exchanges network topology with neighbors that have

created adjacencies.

Routing device A exchanges network topology through the database

description (DD) with adjacent neighbor-routing device B.

When routing device A discovers updated route in routing device B,

request the route from routing device B through the link state request.

Routing device B also requests updated route from routing device A. After

the two parties receives the requests from the opposite party, the two

parties send detailed routing information to the opposite party through the

link state update packets. And confirm the receiving of link state update

packets (link state ACK).

After the topology is obtained, routing device A runs the SPF algorithm to

generate one shortest path tree with itself as the root and records it in the



routing table. The route to the destination in the future is obtained from

the routing table.

In the area, each routing device exchanges network topology with

designated routing device continuously. Therefore, the routing devices in

the entire area have the same network topology.

The area boundary router belongs to multiple areas at the same time.

Therefore, the topology of the home area of routing device A will be

advertised to other areas, and the topology of other areas will be

advertised into the area. Through the exchange of topology in the

boundary routing devices, the home area of routing device A learns the

network topology of the entire AS area. In the OSPF, the boundary routing

devices form the backbone area.

When the AS boundary router knows the AS external route, the AS

boundary router will advertise the routes to the internal of the AS. As a

result, routing device A can obtain a topology of the entire network.

OSPF Fast Convergence The fast convergence function optimizes the procedures in the process of

route convergence to quicken the route convergence. The following items

are improved.

1. Interval of triggering SPF calculation

Generally, the interval of triggering route calculation is 5 seconds, which

causes the low speed of route convergence. The optimization for the

interval of triggering route calculation is based on the frequency of the

network changes. The interval is automatically set. When the network

changes frequently, the interval is increased to prevent repeated

calculation of routes in short time. When the network changes rarely, the

interval is reduced to trigger route calculation quickly to quicken the

convergence.

2. SPF route calculation

The main feature of the route calculation is incremental calculation. SPF

algorithm divides the network information into two parts: one part is the

network topology composed of network top point (corresponding to

routing devices, shared network segment) and the sides (the link between

routing device and the shared network segment); the other part is the

leaves mounted on the top point (network route and host route). The

routing devices performing route calculation is called Root. The first step



of the route calculation is to calculate one shortest path tree a root; the

second step is to calculate the leaves (routes) on the top point according

to the shortest path tree. The increment for the shortest path tree in the

network topology is called incremental SPF (ISPF); the incremental

calculation for the leaves (routes) are called Partial Route Calculate (PRC).

―Incremental calculation‖ can significantly improve the calculation

performance of the routing devices and decrease the CPU load.

SPF Calculation Self-Adaptive Timer

To quickly respond to the changes of network information, but do not

perform route calculation frequently, the Self-adaptive timer is adopted.

The self-adaptive timer can dynamically change the interval according to

the exponential backoff law and the preset parameters.

The self-adaptive timer has three configurable parameters: initial interval,

incremental interval, and the maximum interval. The first interval is the

initial interval, and the second interval is incremental interval. Then, the

interval is two times of the previous interval, namely, incremental interval

x 2n , until the maximum interval is reached.

Generally, the initial interval can be set to 100 milliseconds, which can

respond to burst change quickly; the incremental interval can be set to

100 milliseconds or 1 second; the maximum interval can be set to 5

seconds or 10 seconds.

The self-adaptive timer is a cyclic timer. The interval is increasing. The

initial interval is short, so it can respond to the network changes quickly.

In addition, the interval is increasing, which prevents the frequent route

calculation caused by the frequent network change. In the following three

cases, the self-adaptive timer will be reset or stopped.

1. The interval reaches the maximum for three timers consecutively.

If any route calculation request exists in this case, the next

interval will be reset to the initial interval. Otherwise, the timer

will be stopped.

2. If the interval between new route calculation request and the

previous route calculation exceeds the maximum interval, reset

the interval of the timer to the initial interval.

3. The protocol process is reset.

Incremental SPF (ISPF)



In the SPF calculation, according to the link state database, a shortest

path tree with the calculation routing device as the root is formed.

Calculate the route according to the shortest path tree. The OSPF protocol

saves its own specific link information. The information does not reflect the

topology and the relation between the route and topology. The shortest

path tree must be determined through the SPF calculation and then

calculate the route. But, the shortest path tree is not saved. If any

information changes, the shortest path tree is deleted. Then, use the SPF

algorithm to re-calculate.

The ISPF only processes the network topology information, namely, only

calculates the shortest path tree. By reorganizing the link, the ISPF forms

a graphical database reflecting the network topology. The calculated

shortest path tree is saved in the graphic. When the link state changes,

the ISPF determines the affected network topology. Then, only the

affected parts are calculated, instead of the entire network topology.

Figure 11-11 ISPF calculation

As shown in the preceding figure, RTA is the root node (the routing

devices performing calculation). When the cost of RTC-> RTD (blue link) is

changed into 50 from 100, the affected parts are RTD and RTE. Other

routing devices are not affected. ISPF will judge the range of the effect.

Then, only the routes released by RTD and RTE are calculated.

If the positions of the network topology changes are different, the affected

range is different. The spent time of the ISPF calculation is different.

Therefore, the spent time is different, even in the same network structure.

If the sides of the root node change (RTA->RTB and RTA->RTF), the

affected range covers the entire topology. In this case, ISPF is similar to

re-calculate all.



PRC Technology

For IGP, any route is a leaf in the network node. The expression leaf can

reflect the relation between the route and the network node. According to

the root node, if the shortest path of the network node is determined, the

shortest path of the route released by the node is determined. Therefore,

PRC uses the shortest path tree calculated by ISPF to calculate the leaf

route. When any routing information changes, PRC determines the

changed route (leaf). Then, the route is selected and updated. (based on

the existing calculation of the ISPF)

Owing to the restriction of the link information format in the OSPF protocol,

the routing information and the network node (released routing devices)

are not directly associated. The same routes released by different devices

are also not directly associated. Therefore, the PRC needs to re-organize

the database.

Take the route as the base point; organize all elements that release the

route. As a result, select the best route from all elements in the case of

calculating routes. At another aspect, take the releaser as the base point,

all routes released by the releaser are assembled together. As a result,

when the ISPF announces that the shortest path of a node changes, all

routes released by the node will be directly updated.

Link State Database (LSDB) of the OSPF The LSDB of the OSPF contains the information about the entire area. It

exchanges information with the adjacent neighbor to maintain the

synchronization of the LSDB in the entire area. It enables the OSPF to

dynamically process the route changes through the hello packets and the

link state update packets.

The LSDB is composed of link state advertisements (LSA). The LSA can be

divided into 6 categories:

1. Router-LSA: generated by the routing devices in the area. It describes

the link state of the routing device and is flooded only in the area.

2. Network-LSA: generated by the DR in the area. It describes the

reachable routing devices in the area on is only flooded in the area.

3. Summary-Net-LSA: generated by ABR. It describes the network

information about other areas.



4. ASBR-Summary-LSA: generated by ABR. It describes the ASBR host


5. AS-External-LSA: generated by ASBR. It describes the external route

information outside of the AS.

6. NSSA-LSA: generated by the ASBR. It describes the external route

information outside of the AS (it is flooded only in the NSSA area).

The boarder route of the area assembles the information about the local

area into a summary_LSA. It is flooded to the boarder routers of other

areas in the AS. The boarder routing devices analyze the received

summary_LSA and then generate summary_LSA and then flood to each

area. All boarder routers and the links among them form the backbone

area. Backbone areas are mutually reachable. They can be connected

physically or through the virtual link. In the case of configuring the virtual

link, the passed area must be transit area, instead of stub area.

The ASBR of the AS sends the external routing information to all nodes

except the stub area in the AS. The routing devices in the stub area are

directed to the ASBR through the default route.

NSSA allows external routes to be advertised to the OSPF AS. In addition,

the stub features of other parts in the AS are reserved. ASBR of the NSSA

generates NSSA External LSA (type 7) to advertise external routes. The

NSSA External LSAs are flooded in the NSSA are but terminated in the

ABR. When the ABR of the NSSA receives the type 7 LSA and the P bit is

set 1, the type 7 LSA will be converted into type 5 LSA to other AS areas.

If the P bit is set to 0, it will not be converted. Therefore, the NSSA

External LSA will not be advertised to external NSSA.

OSPF Packet Encapsulat ion The OSPF packet is composed of multiple encapsulations. The external

layer of the packet is IP header. In the IP header, the encapsulated packet

can be one of the following five types. The format of each type of packet

starts with the OSPF header with unified format. The packet data of the

OSPF packet varies with the packet format.



Figure 11-12 OSPF packet encapsulation

OSPF Packet Header

Figure 11-13 OSPF packet header

Version: OSPF version, the current version is 2.

Type: the packet type at the later part of the OSPF header. The OSPF has

five types of packets. Hello packets, type=1; database description packets,

type=2; link state request packets, type=3; link state update packets,

type=4; link state acknowledgement, type =5.



Area ID: the area where the packet is generated; when the packet passes

the virtual link, area ID is 0.0.0.0.

Checksum, the checksum of the entire packets.

Au type: the authorization mode

Authentication: essential authorization information about the packet

specified by the AU type.

Hello Packet Format

Figure 11-14 hello packet format



Hello packets are used to create and maintain the adjacency. It contains

the consistent parameters when the neighbor creates the adjacency.

Network Mask: the mask of the interface where the packets are generated

Hello Interval: the interval of retransmitting hello packets

Option: see the option domain in the OSPF packets.

Router priority: it is used in the case of selecting DR and BDR. When the

router priority is 0, the routing device does not have the selecting rights.

Router Dead Interval: if no hello packets are received in the router dead

interval, the neighbor is considered to be down. Delete the neighbor.

Designated Router: the IP address (not router ID) of the DR selected by

the interface generating the packets.

Backup DR: the IP address of the BDR selected by the interface generating

the packets

Neighbor: the list of the neighbors that can receive hello packets at the

interface generating the packets in the router dead interval.

Database Description Packet



Figure 11-15 format of the database description packets

Interface MTU: the maximum IP packets that can be transmitted when the

interface generating the packets is not fractionized When the packets are

transmitted in the virtual link, the interface MTU is set to 0.

Option: see the option domain in the OSPF packets.

I-bit: initial bit, when the packet is the initial packet of the DD packet

sequence, the bit is 1.

M-bit: More bit, when the packet is the last packet of the DD packet


MS-bit: Master/Slave bit, when the Master is set to 1 in the case of

generating packets, the slave is set to 0.

DD Sequence Number: sequence number of the DD packets, set by the

Master

LSA Headers: the LSA header list of the link state database

Link State Request Packet



Figure 11-16 Format of the link state request packets

Link State Type: for describing the LSA type

Link State ID: works with link state type and advertising router to identify

a LSA.

Advertising Router: the router ID of the routing device generating the LSA.

Format of the Link State Update Packet



Figure 11-17 Format of the link state update packets

Number of LSA: the number of LSAs contained in the packet

LSAs: the list of the LSAs sending updates

Format of the Link State Acknowledgment Packet



Figure 11-18 Format of the link state acknowledgement packets

LSA headers: the LSA headers acknowledged

LSA header

Figure 11-19 LSA header

Age: the duration after the LSA is generated

Options: see the option domain in the OSPF packets.




a LSA.

Advertising Router: the router ID of the routing device generating the LSA

Sequence Number: the sequence number of LSA, when new instances of

LSA are generated, it increases.

Checksum: the checksum of the LSA except Age

Length: length of LSA, with the unit of byte

Format of Router LSA Packet

Figure 11-20 Format of the router LSA packet

V: Virtual Link Endpoint bit; set the bit when the routing device generating

the packet is one end of a virtual link



E: External bit, set the bit when the routing device generating the packets

is ASBR

B: External bit, set the bit when the routing device generating the packets

is ASBR

Number of Links: number of links described in LSA

Link ID: identify a link of the routing device

Link Data: the data of the link, the meaning varies with the link type

Link type: the type of the link

Number of TOS: the cost of the TOS (type of service), set for the forward

compatibility of protocol

Metric: cost of the link

TOS: Type of the service

TOS Metric: the cost related with the service type

Format of Network LSA Packet



Figure 11-21 Format of the Network LSA packet

Link State ID: for the Network LSA, it is the IP address of the DR interface

Network Mask: the subnet mask identifying network

Attached Router: the list of the routing devices adjacent to the DR in the

network

Format of the Network and ASBR Summary LSA Packet

Figure 11-22 Format of the Network and ASBR summary LSA packet

Link State ID: for type 3 LSA, it is the IP address of the advertised

network or subnet; for type 4 LSA, it is the router ID of the advertised

ASBR.

Network Mask: for type 3 LSA, it is the mask of the advertised network or

subnet; for type 4 LSA, the domain is set to 0.

Metric: the cost of the destination route

TOS: Type of the service

TOS Metric: the metric related with the service type



Format of the Autonomous System External LSA Packet

Figure 11-23 Format of the Autonomous System External LSA packet

Link State ID: for the ASE LSA, it is the IP address of the destination

Network Mask: the network or subnet mask of the advertised destination

E: External metric bit, the type of the external cost used by the route If

the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.

Metric: the cost of the route, set by the ASBR

Forwarding Address: the destination address of the generated packets If

the forwarding address is 0, the packets of the advertised destination

should be sent to the ASBR generating the packets.



External route tag: the tag of the external route

Format of NSSA External LSA Packet

Figure 11-24 Format of the NSSA External LSA packet

The meaning of other domains excepting Forwarding Address is similar to

ASE LS.

Forwarding Address: If between the NSSA AS boarder router and the

adjacent AS advertised to the OSPF is the internal route of OSPF, the

forwarding address is the next-hop address. If it is not the OSPF internal

address, the forwarding address is the interface of the routing device.



Option Domain in the OSPF Packets

Figure 11-25 Option domain of the OSPF packets

*: not defined; it should be set to 0.

DC: set the bit in the case of configuring the demand line

EA: set the bit when the source routing device has the capability of

receiving/sending external attributes LSA

N: used only in the hello packets, set it to 1 when the NSSA external LSA

is supported; set it to 0 when the NSSA external LSA is not supported;

when N is set to 1, the E bit must be 0.

P: used only in the NSSA external LSA headers If P bit is set, the ABR of

NSSA must convert type 7 LSA to type 5 LSA.

MC: set the bit when the source routing device forwarded multicast

packets.

E: set the bit when the source routing device received the ASE LSA

packets.

OSFP Features

OSFP Features 1. OSPF is a kind of IGP, designed for using in the AS system

2. The link state advertisement packet is small in size, each

advertisement describes one part of the link state database.

3. Support NBMA; OSPF processes the network like processing LAN-select

DR, generate network LSA. Some configurations are required when the

routing devices discover the network neighbor.

4. In OSPF, the AS system can be divided into multiple areas. It has the

following advantages: 1) the routes in an area and the routes between

areas are separated; 2) dividing the AS system into areas can reduce

the calculation of SPF.

5. Input external information flexibly: each external route in the OSPF is

input in the AS system in a single LSA. It reduces the flooded data

volume. As a result, when a single route changes, it is possible to

update part of the routing table.



6. Four route levels: intra-area, inter-area, external type1, and external

type 2. Then, the route protection of multiple levels is implemented

and the route management of the AS is simplified.

7. Support virtual link: through the configuration of allowing virtual link,

the OSPF can partly remove the restriction over the AS system of the

physical topology.

8. The authorization of the routing protocol: when the OSPF routing

device receives a routing protocol packet, it checks the authorization in

the packet.

9. Flexible metric: in the OSPF, the metric is specified as the output cost

of the routing device interface. The path cost is the total of the cost of

all interfaces. The route metric can be specified by the system

administrator according to the network features (delay, bandwidth, and

cost).

10. Multiple paths with the same cost to the same destination: the OSPF

finds the paths and balances the load.

11. Support subnets of different lengths: the OSPF supports networks of

different lengths using the advertisement destination added with

network mask.

12. Support stub area: when the area is set to stub area, the external

LSAs cannot be flooded to the stub area. In the stub area, the route to

the external destination is specified by the default route.

Resource Cost of the OSPF Link bandwidth: in the OSPF, the reliable flooding mechanism ensures the

synchronization of the link state database of the routing device. When the

network topology is not changed, single LSA packet update lasts for long

(30 minutes by default). When the size of the database increases, the

bandwidth used by flooding algorithm also increases.

Memory of routing device: the link state database of the OSPF may

become very large, especially when many external link states are

advertised. In this case, the memory of the routing device must be very

large. In the process of updating and synchronizing the link state database,

large amount of memory is used.

CPU usage: in the OSPF, it is related with time of running the SPF

algorithm. Moreover, it is related with the number of routing devices in the

OSPF system. In addition, when the link state database is very large, in

the process of protocol convergence, if large amount of packets should be

exchanged, a great deal of CPU is occupied.



Specify the router role: specify the router in the multi-access network to

receive and send more packets than other routing devices. At the same

time, when the specified router fails, it is switched to a new specified

router. Because of this, the number of the routing devices connected to a

network should be restricted.

Precaut ions of OSPF Limiting the size of the OSPF system can save the memory of the routing

device.

In the area, to reduce the database size, do as follows: 1. the area can

use the default route, so reduce the external route that should be input; 2.

EGP external gateway protocol can use its own information to pass the

OSPF AS area instead of depending on the IGP (such as OSPF) to transmit

information; 3. You can specify the routing device to be the stub area; 4.

If the external network is regular address, you can summarize the

addresses. After the summary, the external information of the OSPF

decreases dramatically.

Proper Environment OSPF is suitable for the transmission AS, because: 1. OSPF can contain

lots of external routes; 2. the input of OSPF external information is flexible,

including the forwarding address in the AS External LSA, two types of

external costs (ext type 1, ext type 2); 3. when the external information

changes, the update capability of the OSPF is powerful.

The OSPF is also suitable for the small and independent AS or stub AS,

because: 1. fast convergence; 2. support multiple paths to the destination

with the same cost.

Improper Environment The capability of the OSPF expression policy is limited. It has the policy

mechanism only when four types of route levels are create: intra-area,

inter-area, type 1 and type 2 external routes. When the system needs to

use more complex policy between Ass, run the policy-based EGP between

them.



IS-IS Dynamic Routing Protocol Main contents:

Terms of IS-IS protocol

Introduction to the IS-IS protocol

Typical application of the IS-IS protocol

Terms of IS-IS Protocol PDU- Protocol Data Unit, the packet unit carrying protocol data

SPF- Shortest Path First Algorithm

IS- Intermediate System, similar to the router in the TCP/IP, the basic

unit generating routes and transmitting routing information Hereinafter,

the IS and the router have the same meaning.

ES-End System, equivalent to the host system in the TCP/IP. ES does not

participate in the processing of IS-IS routing protocol, ISO has dedicated

ES-IS protocol defining the communication between the terminal system

and the IS.

NET-Network Entity Title, identifies the ISO address of an IS, similar to

the IP address; it can be divided into area ID and system ID.

Area- the routing area divided in the IS-IS protocol, including level-1 area

and level-2 area.

LSP- Link State PDU, carries the link state information that should be

published, including adjacency information and reachable subnet

information.

LSDB- Link State Database, composed of the LSPs generated by all IS

systems in the entire area; describes the adjacent topology and relevant

routing information in the entire area. LSDB has the same backup in each

IS system. The IS system uses the SPF algorithm to calculate the route

according to its own LSDB.

IIH- Intermediate System to Intermediate System Hello PDU, for

discovering and keeping alive the IS neighbor

SNP- Sequence Number PDU, advertising the abbreviation information of

a group of LSP packets, including PSNP and CSNP; for confirming the LSP

packets, request LSP packets, and abbreviated description information of

the LSDB.



PSNP- Partial Sequence Number PDU, one type of SNP packets; for

confirming the LSP packets (point-to-point network) and the request LSP

packets (broadcast network).

CSNP- Complete Sequence Number PDU, one type of SNP packet, used

for advertising the abbreviated description information of the LSDB

Pseudo-node-a virtual IS node of DIS in the broadcast network; for

simplifying adjacent topology of the broadcast network

DIS- Designated IS, an IS system selected from all IS systems in the

broadcast network, responsible for vitalizing a Pseudo-node and

maintaining the synchronization of LSDB of all IS systems in the broadcast

network.

Introduction to the IS-IS Protocol The Intermediate System to Intermediate System (IS-IS) is an interior

gateway protocol (IGP) based on the SPF algorithm. The basic design

concept and algorithm of IS-IS protocol are similar to that of OSPF. The

IS-IS protocol is based on the link layer. It is irrelevant with the network

layer (IPv4, IPv6, and OSI). Therefore, it is not restricted by the network

layer and is easy to expand.

The IS-IS protocol supports routes of multiple protocol stacks, including

IPv4, IPv6, and OSI. The IS-IS protocol is originally applied in OSI

protocol stack (ISO10589). After expansion, it is applied in routes of IPv4

protocol stack (RFC1195) and IPv6 protocol stack (draft-ietf-isis-ipv6). In

addition, after expansion, it supports the CSPF calculation of MPLS-TE

(RFC3784).

The IS-IS protocol has the following advantages: Good compatibility

(different devices with different expansion functions are compatible), large

network capacity, supporting multiple protocol stacks, smooth upgrade,

and simple and stable protocol. Therefore, the IS-IS protocol is applicable

to large-scale core backbone network.

In this chapter, the IS-IS protocol for IPv4 and IPv6 are described. The

OSI route is not widely used, so it is not described in this document.



IS-IS Protocol Stack Structure and the Posit ion in the Network Protocol Stack

Figure 11-26 Structure of the IS-IS protocol stack

As shown in the preceding figure, IS-IS protocol can be classified into

basic part and the application part. The basic part of the IS-IS maintains

the topology of the entire network and uses the SPF algorithm to calculate

the shortest path of each IS in the destination network. After obtaining the

shortest path of each IS system, generate routes according to the

reachable subnet (IPv4, IPv6, OSI, such as 10.0.0.0/8) of the advertised

IS system. (for example, the path to the subnet 10.0.0.0/8 is the shortest

path to the IS system publishing the subnet).

Figure 11-27 Position of IS-IS protocol in the network protocol stack

As shown in the preceding figure, the IS-IS protocol is based on the link

layer, independent from the network layer of the IPv4, IPv6, and OSI

protocol stack. In the broadcast network, the packets are sent in the

multicast mode. In the Ethernet, IS-IS uses the following MAC addresses.

Table 1-6 Multicast address used by IS-IS

Address Name Multicast MAC address Description

AllL1ISs 01-80-C2-00-00-14 The multicast MAC address of layer 1 IS-IS packets

AllL2ISs 01-80-C2-00-00-15 The multicast MAC address of layer 2 IS-IS packets

AllIntermediateSystems 09-00-2B-00-00-05 The multicast MAC address of all IS



systems

AllEndSystems 09-00-2B-00-00-04 The multicast MAC address of all ES systems

IS-IS Packet Structure

Figure 11-28 IS-IS packet structure

As shown in the preceding figure, the position of the IS-IS protocol in the

network protocol stack is based on the link layer. Therefore, the IS-IS

protocol is encapsulated in the link layer packet. The routing information

carried in the IS-IS packet are organized in the TLV mode. It can be

organized and expanded flexibly. TLV: data type (1 byte)+data length (1

byte)+ data value (0-255 bytes). At the same time, according to the IS-IS

protocol, the TLV that cannot be identified should be ignored, instead of

being dropped.

IS-IS is based on the link layer and is irrelevant with the network layer,

and the routing information is organized flexibly in the TLV mode. In

addition, the TLV that cannot be identified can be ignored. This determines

the features of easy expanding and smooth upgrade.

The IS-IS protocol is shown in the following table.

Table 1-7 IS-IS protocol packets

IS-IS PDU Packet Type Category Type

Function

IIH Level 1 LAN IS to IS Hello PDU 15 Discover and keep alive layer 1 neighbor on the broadcast network

Level 2 LAN IS to IS Hello PDU 16 Discover and keep alive layer 2 neighbor on the broadcast network

Point-to-Point IS to IS Hello PDU 17 Discover and keep alive layer 1 and layer 2 neighbors on the point-to-point network

LSP Level 1 Link State PDU 18 Publish routing information in layer 1



area

Level 2 Link State PDU 20 Publish routing information in layer 2 area

CSNP Level 1 Complete Sequence Numbers PDU

24 Advertise the database abbreviated description information to the layer 1 neighbor

Level 2 Complete Sequence Numbers PDU

25 Advertise the database abbreviated description information to the layer 2 neighbor

PSNP Level 1 Partial Sequence Numbers PDU

26 Request or confirm LSP packets from layer 1 neighbors

Level 2 Partial Sequence Numbers PDU

27 Request or confirm LSP packets from layer 2 neighbors

NET of IS

Figure 11-29 IS-IS NET

When the IS-IS protocol is used to route for the TCP/IP protocol, it is still a

CLNP protocol of ISO. In the OSPF protocol, use the router ID to identify a

routing device. In the IS-IS protocol, use an ISO network address to

identify a routing device (IS). The ISO network address is the NET

(Network Entity Title). The description of NET is shown in the preceding

figure. The example in the figure is: NET 47.0000.0000.0000.0011.00.

Area ID is used to identify the layer 1 area. Level-2 Area is the backbone

of a network. Only one level-2 area is allowed. Therefore, ID is not

required.

System ID is used to identify an IS in an area. It must be unique in an IS-

IS AS.

SEL (NSAP Selector, also N-SEL), is similar to the protocol ID in the IP.

Different transmission protocol corresponds to different SEL. In IS-IS, all

SELs are 00.

Note the description of NET is for the routing purpose of the TCP/IP

protocol in the IS-IS. NET is defined in the ISO8348.



Hierarchical Topology of IS- IS

Figure 11-30 Hierarchical topology of IS-IS

Area Division of IS-IS Routing Domain

The preceding figure illustrates the two-layer network topology of the IS-

IS protocol. A typical IS-IS network is composed of a level-2 area serving

as the core backbone network and multiple level-1 areas serving as the

access network. Each level-1 area uses one or multiple Level-2 Switch to

access the level-2 area. Each level-1 area is connected through level-2

area. Then, a level-2 network topology is formed. In an IS-IS network,

there can be one level-1 area or one level-2 area. More detailed area

division is not required.

Route Learning in the IS-IS Area

The LSDBs of each area are independent. They are also independent in

SPF routing calculation. The function of dividing areas is to divide the

entire network into many small routing domains. Then, the size of the

LSDB is reduced. Consequently, the consumption of the memory and the

SPF calculation is reduced. But, a new problem occurs; the SPF calculation

can only implement the route learning in the area. How the route learning

should be performed between areas?

Route Learning Between the IS-IS Areas

According to the preceding topology, the level-1 areas are connected

through Level-2 area. If the problem of the route between level-1 area

and level-2 area, the entire network can be interconnected.

Level-1 Area and Level-2 Area are connected through Level-2 switch.

Level-2 Switch runs the level-1 protocol and level-2 protocol of IS-IS at

the same time. To solve the problem of route between level-1 area and



level-2 area, deal with level-2 switch. Level-2 switch advertises the route

learned from level-1 area to level-2 area, advertises the attach tag to

level-1 area to show that it is connected to level 2 core network.

Learning Routes of Level-2 Area Reaching Level-1 Area

On level-2 switch, redistribute the routing information of level-1 area

calculated by level-1 SPF to the level-2 routing information for publishing.

As a result, all switches in the level-2 area can learn the routes of all

subnets that reach the level-1 area.

Learning Routes of Level-1 Area Reaching Level-2 Area

Mark the attach tag in the level-1 routing information published on level-2

switch. It indicates that the route is connected to the level-2 core network.

As a result, all switches in the level-1 area generate a default route to the

level-2 switch. Then, all switches in the level-1 area have the default route

reaching level-2 area.

Creat ion of Neighbor and Generat ion of Adjacency Information in IS- IS Protocol For the IS-IS protocol, the interface network can be classified into point-

to-point network and broadcast network. The neighbor creation and the

generation of adjacency information are different in the two interface

network types.

Designated IS

The designated IS (DIS) only exists in the broadcast network. It is

selected by all the IS systems in the same broadcast network. The

selection of the DIS is based on the priority of the interface connecting to

the broadcast network in each IS system and the SPNA address (in

Ethernet, it is the MAC address; in other networks, it is the IS system ID).

First, select the DIS with higher priority. When the priorities are the same,

select the greater SNPA address.

The functions of the DIS are as follows: 1. create the Pseudo-node,

generate and publish the adjacency information about the pseudo-node; 2.

Send the CSNP packets periodically to ensure the synchronization of the

LSDB in all IS systems on the broadcast network.

Pseudo-node



The Pseudo-node network only exists in the broadcast network. The

purpose is to simplify the adjacent network topology of the route

calculation. It is generated by the DIS. Pseudo-node has all IS systems

adjacent to the broadcast network. But no neighbor exists. The adjacency

information including Pseudo-node generates its own adjacent network

topology, as shown in the preceding figure.

Neighbor ID

Figure 11-31 IS-IS neighbor ID

The network node in the adjacent network topology is identified using the

neighbor ID in the LSDB, as shown in the preceding figure. There are two

types of nodes in the adjacent network topology: 1. IS, in its neighbor ID,

the system ID is its own system ID, the Circ ID is always 0x00; 2. Pseudo-

node, created by the DIS; in its neighbor ID, the system ID is the DIS ID,

the Circ ID is the ID of the interface generating the Pseudo-node of the

DIS; it must be non-zero to distinguish the neighbor ID of the IS.

Concepts of Neighbor and Adjacency

Figure 11-32 Relationship between neighbor and adjacency in IS-IS

broadcast network

Key Words Description

Neighbor Discover and keep alive through the hello packets (IIH). It represents the physical connection between IS systems.

Adjacency The topology around the host advertised to the entire IS-IS routing domain; describes the reachable network nodes (IS or Pseudo-node), used to organize the LSP packets. All LSP packets of the IS system form the LSDB to describe the entire network topology for SPF route calculation.

Relationship between neighbor and adjacency

Adjacency is generated by the neighbor. For the point-to-point network, the adjacent topology is equivalent to the neighbor topology. For the broadcast network, as shown in the preceding figure, Pseudo-node is added for bridging in the adjacent topology. But neighbors are all-topology relation.

Different The difference between the neighbor and adjacency lies in the broadcast



between neighbor and adjacency

network. The topology composed of neighbors is physical topology. Direct neighbor relations of all IS systems in the same broadcast network form the full-connection relation. The neighbor topology does not contain the pseudo-node generated by the DIS. The topology composed of adjacencies is for the topology of the SPF route calculation. In the same broadcast network, all IS systems show that they are adjacent to the pseudo-node of the broadcast network. The adjacent topology contains the pseudo-node.

Creation of Neighbors

In the IS-IS protocol, the discovery and keep-alive of neighbors are

implemented through sending and receiving hello packets (IIH). When an

interface runs the IS-IS protocol, it sends hello packets (IIH) periodically.

The creation of neighbors covers point-to-point network and broadcast

network. After the neighbor is created, hello packets (IIH) should be sent

periodically to keep neighbors alive.

On the point-to-point network, the point-to-point neighbor relation is

created through three-way handshake (RFC3373).

On the broadcast network, the LAN neighbor relation is created through

the three-way handshake. After the neighbor is created, all IS systems on

the broadcast network select a DIS.

Generation of Adjacency Information

The adjacency information describes the IS systems that the host can

reach directly. The generated adjacency information is described in the

point-to-point mode.

For the point-to-point work, the point-to-point format is used. It generates

adjacency information according to the neighbor relationship.

For the broadcast network, to simplify the adjacent network topology, the

DIS virtualizes a Pseudo-node in the broadcast network. All IS systems in

the broadcast network generate adjacency information to the pseudo-node.

The adjacency information of the pseudo-node is the IS systems adjacent

to the broadcast network. The adjacency information of the Pseudo-node

is generated and published by the DIS.

Publ ishing IS- IS Rout ing Information Content of the Routing Information

The routing information of the IS-IS protocol is organized in the Type

Length Value (TLV) format. It is carried in the LSP packets and thus

cannot be published. The routing information published by the IS-IS

protocol includes two types: adjacency information, used to form the



entire network topology; reachable subnet information, used to describe

the subnet of the host (such as 10.0.0.0/8).

The adjacency information is obtained through the neighbor relationship.

Detail is provided previously.

The reachable subnet information comes from: 1. the directly-connected

routing information of the covered interfaces; 2. redistribute the routing

information about other protocols; 3. route leakage between layers.

Publishing the Routing Information

The IS-IS routing information is carried in the LSP packets. The

information is published to all the IS systems in the entire area through

the flooding mode. Flooding: when an IS system receives an LSP packet, it

saves a copy to the LSDB, and then sends the LSP packet to the interfaces

except the receiving interface.

Why the LSDB between IS systems should be Synchronized

If the LSDBs of each IS are not synchronous, the calculated SPF trees are

not consistent. The route loopback may occur. Therefore, in the entire

area, when the status is stable, ensure that the LSDBs of each IS system

must be synchronous.

Why the LSDBs between IS systems are not Synchronous

The LSDB is composed of LSP packets. The LSDBs are not synchronous

because the IS-IS packets are transmitted based on the link layer, it does

not depend on the transmission mechanism. Therefore, the LSP packets

may be dropped in the transmission process. Ensuring the synchronization

of the LSDBs is to ensure the reliability of the LSP packets. Therefore, for

the point-to-point network and the broadcast network, the synchronization

protection mechanisms are different.

Synchronization Protection Mechanism of the LSDB between IS

Systems in the Point-to-Point Network

In the point-to-point network, the sent LSP packets are acknowledged

through the PSNP packets to ensure the reliable transmission of the LSP

packets. The PSNP packets contain the abbreviated description information

about the LSP packets to be acknowledged.

Synchronization Protection Mechanism of the LSDB between IS

Systems in the Broadcast Network



In the broadcast network, different from the point-to-point network, the

LSDB synchronization is implemented by the DIS. The DIS sends CSNP

packets to the broadcast network periodically advertising the abbreviation

information about the LSDB, namely the LSP packets in the LSDB. In the

broadcast network, after other IS systems receive the CSNP packets, the

IS systems compare the CSNP packets with the LSDB. If it has multiple

LSP packets, the packets will be sent to the broadcast network; if it lacks

certain LSP packets, the PSNP packets will be sent to the DIS to apply for

the LSP packets. As a result, the LSDBs of all IS systems in the broadcast

network are synchronous.

IS-IS Route Calculat ion The route calculation of the IS-IS protocol includes the following two steps:

Step 1: Calculate the SPF tree through the SPF algorithm according to the

network topology composed of the adjacency information of the LSDB. As

a result, the shortest path to each network node (namely the IS) and the

next-hop are obtained.

Step2: According to the information about the reached subnet (such as

10.0.0.0/8) advertised by each network node (namely the IS) in the LSDB,

together with the SPF tree, the route is generated.

Typical Application of the IS-IS Protocol

Figure 11-33 Network topology of the IS-IS typical application

Illustration



As shown in the preceding network topology, there are four switches (A, B,

C, and D), namely four IS systems. The following describes the process of

route learning through the example of switch A learns the subnet

10.0.0.0/8 route of switch D. The metric of each link is 10. The DIS

selected from the Ethernet network is switch B.

Step 1: Publ ishing Rout ing Informat ion Generation of Adjacency Information

Figure 11-34 Adjacency topology of the IS-IS typical application

The adjacency information generated by each system forms the preceding

adjacency topology. The adjacency information generated by each IS is as

follows:

Table 1-8 Adjacency information generated by IS in the IS-IS Example

Network Node

System ID Neighbor ID Adjacency Information

IS A 0000.0000.0001 0000.0000.0001.00 Adjacency to B (0000.0000.0002.02) metric

10

IS B 0000.0000.0002 0000.0000.0002.00 Adjacency to B (0000.0000.0002.02) metric 10

Pseudo-node B

0000.0000.0002(same as DIS)

0000.0000.0002.02 Adjacency to A (0000.0000.0001.00) metric 0 Adjacency to B (0000.0000.0002.00) metric 0 Adjacency to C (0000.0000.0003.00) metric 0

IS C 0000.0000.0003 0000.0000.0003.00 Adjacency to B (0000.0000.0002.02) metric 10



Adjacency to D (0000.0000.0004.00) metric 10

IS D 0000.0000.0004 0000.0000.0004.00 Adjacency to C (0000.0000.0003.00) metric 10

Generation of Reachable Subnet Information

In the IS D, publish the directly-connected reachable subnet 10.0.0.0/8.

The Metric is 10.

Publishing the Routing Information

Through the flooding of routing information, the LSDB of each IS contains

the preceding adjacency information and the reachable subnet information.

Step 2: Perform SPF Calculat ion to Get the Shortest Path from Switch A to Each Switch

Figure 11-35 SPF tree of IS-IS route calculation example

In IS-A, according to the information about LSDB, take A as the start point;

use the SPF algorithm to calculate the SPF tree as shown in the preceding

figure. Then, the shortest path (Pseudo-node should be ignored when the

shortest path is obtained) to the IS D obtained is A->C->D. If the Ethernet

interface of A is vlan1, the IP address of Ethernet interface of C is 3.3.3.3,

the next-hop interface of IS D is vlan1, the next-hop address is 3.3.3.3,

and the metric is 20.



Step 3: Generate Route According to Reachable Subnet D advertisement can reach subnet 10.0.0.0/8; the metric is 10; the next-

hop and metric reaching D on A is obtained through the SPF calculation.

With the information, A can obtain the IPv4 route: the next-hop interface

to 10.0.0.0/8 is vlan1, the next-hop address is 3.3.3.3. The metric is 30.

BGP Dynamic Routing Protocol Main contents:

Terms of BGP protocol

Introduction to the BGP protocol

Terms of BGP Protocol AS- Autonomous System AS is a set of routing devices and hosts in the

same management control domain and policy. The AS number is allocated

by the internet registration organization.

EBGP-BGP between AS systems. An EBGP neighbor is a routing device of

the management and policy control beyond the local AS.

IBGR-the BGP in the same AS. An IBGP neighbor is the routing device in

the same management control domain.

CIDR- Classless Interdomain Routing. CIDR is an address allocation

scheme, used to solve the explosive increase of IP address entry in the IP

routing table of the routing device and to solve the problem of exhaustion.

In CIDR, an IP network is represented by a prefix. The prefix address is

represented by the IP address and the most significant bit.

NLRI- Network Layer Reachability Information NLRI is a part of the BGP

update packets, used to list the collection of the reachable destination.

Ultranet-a network advertisement whose prefix rang is one bit less than

the natural mask of the network. For example, the natural mask of class C

network 202.11.1.0 is 255.255.255.0. If we use 202.11.0.0/16 to

represent the network address, the mask is 16 bits, which is less than 24

bits. Therefore, it is an ultranet.

IP Prefix-It is a kind of IP network address. It indicates the mask bits

forming the network.



SYN-Synchronize Before the BGP advertises the routes, the route must be

in the current IP routing table. Namely, the BGP and IGP must be

synchronized before the route is advertised.

Introduction to the BGP Protocol Border Gateway Protocol (BGP) is a kind of route selection protocol for

exchanging network layer reachability (NLRI) between route selection

domains. Its main function is to exchange NLRI with other BGP peers. A

BGP peer refers to any device running BGP.

BGP uses the TCP as the transmission protocol (port 179). Then, reliable

data transmission is provided. The retransmission and acknowledgement

of data are implemented by the TCP, instead of BGP. As a result, the

process is simplified. The reliability need not be designed in the protocol.

Create a TCP connection between two routing devices running BGP. Then,

the two routing devices are called peers. Once the connection is created,

the two peer routing devices acknowledge the connection parameters

through exchanging the open packets. The parameters include BGP

version number, AS number, duration, BGP identifier and other optional

parameters. After the two peers negotiate parameters successfully, the

BGP exchanges routes by sending update packets. The update packets

contain the list of reachable destinations passing each AS system (namely

NLRI), and the path attributes of each route. When the route changes,

incremental update packets are used between peers to transmit the

information. BGP does not require refreshing routing information

periodically. If the route does not change, the BGP peers only exchange

keepalive packets. The keepalive packets are sent periodically to ensure

the valid connection.

BGP Message Header The BGP message header contains a 16-byte tag, 2-byte length field, and

1-byte type field. The following figure illustrates the format of the BGP

message header.



Figure 11-36 Format of the BGP message header

The header can be followed by data or not. It depends on the message

type, for example, the keepalive message only requires the message

header, and no data is followed.

Marker: the marker field occupies 16 bytes, used to detect the

synchronization loss between BGP peers. If the message type is open, or

the open packets do not contain the authentication information, the

marker fields must be set to 1. Otherwise, the marker field is calculated by

the authentication technology.

Length: the length field occupies 2 bytes. It indicates the length of the

message. The minimum allowed length is 19 bytes and the maximum is

4096 bytes.

Type: The type field occupies one byte. It indicates the type of the BGP

message. The four types of the BGP message are as follows:

Figure 11-8 BGP message types

Number Type

1 Open

2 Update

3 Notification

4 Keepalive

Open Messages After the TCP connection is created, the first packet is the open message.

The Open message contains BGP version number, AS number, duration,

BGP identifier, and other optional parameters.



If the open message is acceptable, it means that the peer routing devices

agree with the parameters. In this case, the keepalive message is sent to

acknowledge the open message.

Except the fixed BGP header, the open message contains the following

fields:

Figure 11-37 Format of the BGP open message

Version: the version field occupies one byte. It indicates the version

number of the BGP protocol. When the neighbors are negotiating, the peer

routing devices agree on the BGP version numbers. Usually, the latest

version supported by the two routing devices is used.

My Autonomous System: the field is two bytes. It indicates the AS number

sending the routing device.

Hold Time: the field is two bytes. It indicates the maximum waiting time

when the sending party receives the adjacent keepalive or update

messages. The BGP routing device negotiates with the peer and set the

hold time to the smaller value of the two hold times.

BGP Identifier: the field is four bytes. It indicates the identifier of the BGP

sending routing devices. The field is the ID of the routing device, namely

the maximum loopback interface address or the maximum IP address of

the physical interfaces. You can set the address of the router-id manually.

Optional parameter Length: the field is one byte. It indicates the total

length of the optional parameter fields (the unit is byte). If there are no

optional parameters, the field is set to 0.



Optional Parameters: variable length field. It provides the list of the

optional parameters of the BGP neighbor negotiation.

Update Message The update message is used to exchange routing information between BGP

peers. When you advertise routes to a BGP peer or cancel the routes, the

update message is used. The update message contains the fixed BGP

header and the following optional parts:

Unfeasible Routes Length: two-byte field. It indicates the total length of

the withdrawn route field. If the field is 0, there is no withdrawn routes.

Withdrawn Routes: variable length field. It contains the IP address prefix

list of the routes withdrawn from the services.

Total Path Attribute Length: the field is two bytes; it indicates the total

length of the path attribute field.

Path Attribute: the variable long field contains the BGP attribute list

related with the prefix in the NLRI. The path attribute provides the

attribute information of the advertised prefix, such as the priority or next

hop. The information is for route filtering and route selection. The path

attribute can be classified into the following types:

1. Well-Known Mandatory: the attributes must be contained in the BGP

update message and the attributes must be implemented and

recognized by all BGP vendors. For example, origin, AS_PATH, and

Next_HOP.

ORIGIN: one kind of the well-known mandatory attributes. It gives the

origin of the route update message. There are three possible origins: IGP,

EGP, and INCOMPLETE. The routing device uses the information in the

processing of multiple route selections. Select the route with the lowest

ORIGIN attributes. IGP is lower than the EGP and EGP is lower than the

INCOMPLETE.

AS_PATH: The AS_PATH is a kind of well-known mandatory attributes.

AS_PATH indicates the AS systems that the route in the update message

passes.

NEXT_HOP: It is a kind of well-known mandatory attributes. The attribute

describes the IP address of the next-hop routing device of the destination

listed in the reaching update message.

2. Well-Known Discretionary: the attributes that must be recognized by

all BGP implementations. But the BGP update message can contain the

attribute or not.

LOCAL_PREF: used to distinguish the priority of multiple routes to the

same destination. The higher the attribute of the local priority is, the



higher is the route priority. The local_pref is not contained in the update

message sent to the EBGP neighbor. If the attribute is contained in the

update message from the EBGP neighbor, the update message will be

ignored.

ATOMIC_AGGREGATE: used to warn that the path information is lost in the

downstream routing devices. Some routing information is lost in the route

aggregation for the aggregation comes from different sources with

different attributes. If a routing device sends the aggregation that causes

the information loss, the routing device requires adding the

atomic_aggregate attribute to the route.

3. Optional Transitive: not all BGPs support the optional transitive

attribute. If the attribute cannot be recognized by the BGP process, it

views the transitive tag. If the transitive tag is set, the BGP process

accepts the attribute and transmit it to other BGP peers.

AGGREGATOR: the attribute marks the BGP peer (IP address) performing

the route aggregation and the AS number.

COMMUNITY: the attribute indicates that one destination serves as one

member of the destination group, and these destinations share one

multiple features. The type code of the community attribute is 8. The

community is regarded as a 32-bit value. To facilitate management,

assume that: the community values from 0 (0x00000000) to 65535

(0x0000FFFF) and from 4294901760 (0xFFFF0000) to 429467295

(0xFFFFFFFF) are reserved. The left community value should use the AS

number as the first two bytes. The meaning of the last two bytes can be

defined by the AS. Beyond the reserved values, several well-known

community values are defined.

NO_EXPORT (4294967041 or 0xFFFFFF01): the received routes with the

value cannot be published to the EBGP peers. If an alliance is configured,

the route cannot be published beyond the alliance.

NO_ADVERTISE (4294967042 or 0xFFFFFF02): the received route with

value cannot be published to the EBGP or IBGP peers.

LOCAL_AS (4294967043 or 0xFFFFFF03): the received route with the

value cannot be published to the EBGP peer or the peers of other AS in the

alliance.

4. Optional Nontransitive: not all BGPs support the optional nontransitive

attributes. If the attribute is not recognized by the BGP process, it

views the transitive tag. If the transitive tag is not set, the attribute is

ignored and is not transmitted to other BGP peers.

MULTI_EXIT_DISC (MED): used by BGP peers to distinguish multiple exits

to a adjacent AS. The lower the MED is, the higher is the route priority.

MED attributes are switched between AS systems. When the MED attribute

enters an AS, it does not leave the AS (nontransitive). This is different



from the processing of local priority. The external routing device may

affect the route selection of another AS. The local priority only affects the

route selection in the AS.

ORIGINATOR_ID: the attribute is used by the route reflector. The attribute

is a 32-bit value generated by the route originator. The value is the

routing device ID in the AS. If the originator finds its own router-id in the

received originator-id of the route, it knows that route loopback is

generated. Then, the route is ignored.

CLUSTER_LIST: the attribute is a list of the cluster ID of the route reflector

that the route passes. If the route reflector finds its own local cluster-id in

the received CLUSTER_LIST of the route, it knows that route loopback is


Network Layer Reachability: the variable long field contains the list of

reachable IP address prefix advertised by the sender.

Keepal ive Message The keepalive messages are exchanged between peers periodically to

check whether the peer is reachable.

Noti f icat ion Message When any error is detected, the notification message is sent. The BGP

connection is closed after the message is sent. Except the fixed BGP

message header, the notification message contains the following fields:

Error Code: one byte, the field indicates the error type.

ERROR SUBCODE: one byte, the field provides more details about the

error.

DATA: variable length field, the field contains the data related with the

error, for example, invalid message header, illegal AS number. The

following table lists the possible error codes and the error subcodes.

Table 11-8 BGP Notification message error code and error subcode

Error Code Error Subcode

1-Message header error 1-Connection not synchronized

2- Message length is invalid

3-Message type is not supported

2-Open message errors 1-Version numbers not supported

2-AS number of invalid peers

3-Invalid BGP identifiers

4-Not supported optional parameters

5-Authentication failed

6-Unacceptable hold time

7-Not supported capability

3-Update message error 1-Format of the attribute list is incorrect



2-well-known attribute cannot be recognized

3-Well-known attribute is lost

4-Attribute tag error

5-Attribute length error

6-Source attribute is invalid

7-AS route cycling

8- next-hop attribute is invalid

9-Optional attribute error

10-Network field is invalid

11-AS path format is incorrect

4-Hold timer timeout Not used

5-FSM error (errors detected by FSM) Not used

6-Stop (critical errors except the listed errors)

Not used

BGP Fini te -State Machine Before the BGP peer can exchange the NLRI, one BGP connection must be

created. The creation and maintenance of the BGP connection can be

described in the FSM. The following provides the complete BGP FSM and

the input events causing the state change.

Figure 11-38 BGP FSM



Table 11-8-3 Input Events (IE)

IE Description

1 BGP starts

2 BGP ends

3 BGP transmission connection opens

4 BGP transmission connection is terminated

5 Fail to open the BGP transmission connection

6 BGP transmission fatal errors

7 Retrying connection timer times out

8 Duration time terminated

9 Keepalive timer terminated

10 Receive Open messages.

11 Receive Keepalive messages.

12 Receive update messages

13 Receive notification messages

Idle: initial status, the BGP is in the idle status until an operation triggers

a startup event. The startup event is usually triggered by the creation or

restart of BGP session.

Connect: BGP is waiting for the completeness of the transmission protocol

(TCP). If the connection succeeds, send the Open message, and enter the

status of sending open message. If the connection failed, move to the

active status. If the re-connecting the timer times out, it remains in the

connection status; the timer will be reset and one transmission connection

is started. If any other events occur, it returns to the idle status.

Active Status: in the status, BGP attempts to create a TCP connection with

the neighbor. If the connection succeeds, send the Open message, and

move to the status of sending open message. If re-connecting timer times

out, the BGP restarts the connection timer and goes back to the

connection status to monitor the connection from the peers.

OpenSent: in the status, the open message is sent. BGP is waiting for the

open message sent from the peers. Check the received open message. If

any error occurs, the system sends a notification message and goes back

to the idle status. If no error occurs, the BGP sends a keepalive message

to the peer and resets the keepalive timer.

OpenConfirm: in the status, BGP is waiting for a keepalive or notification

message. If a keepalive message is received, it enters the created status.

If a notification message is received, it goes back to the idle status. If the

hold timer times out before the keepalive message reaches, send a

notification message, and goes back to the idle status.

Established: the last phase of the neighbor negotiation. In the status, the

connection between BGP peers is established. Between peers, the update,

notification, and keepalive messages can be exchanged.



BGP Path Att r ibutes The path attribute is a major feature of the BGP route. The path attribute

provides the necessary information about the basic route function and

allows the BGP to set and interconnect the route policy.

The route attribute can be one of the following:

Well-Known Mandatory;

Well-Known Discretionary;

Optional Transitive

Optional Non-Transitive;

Well-known mandatory: all BGP update messages contain the attribute,

and all BGPs can parse the messages containing the attributes.

Well-known discretionary: BGP update messages can contain the attribute,


Optional Transitive: BGP does not need to support the attribute, but it

should accept the path with the attribute and the paths should be

advertised.

Optional Non-Transitive: BGP does not need to support the attribute. If it

is not recognized, the update message with the attribute is ignored; the

path is not published to the peer.

The meaning of the common path attribute is as follows:

ORIGIN: Well-known mandatory, specifies the source of the update

message;

AS_PATH: Well-known mandatory; use the AS sequence to describe the

path between AS systems or the routes to the destination specified by the

NLRI.

NEXT_HOP: Well-known mandatory; describes the next-hop IP address of

the published destination path.

MULTI_EXIT_DISC: Optional non-transitive; allows one AS to notify the

first entrance point to another AS.

LOCAL_PREF: Well-known; the attribute is used to describe the first level

of the BGP device whose route has been published;

ATOMIC_AGGREGATE: well-known discretionary; used to warn the path

information loss in the downstream devices;



AGGREGATOR: Optional transitive, indicates the AS number and IP

address of the device launching the aggregation route;

COMMUNITY: Optional transitive, simplifies the implementation of policy;

ORIGINATOR_ID: Optional non-transitive, the route originator prevents

loopback by identifying the ID in the attribute;

CLUSTER_LIST: Optional non-transitive, the reflector prevents loopback by

identifying the ID in the attribute;

BGP Route Decis ion BGP Path Decision Process

When multiple routes with the prefix of the same length and to the same

destination exist, BGP select the best route according to the following rules:

1. Next-hop unreachable route will be ignored;

2. Preferentially select the route with the maximum weight value;

3. Preferentially select the route with the maximum LOCAL_PREF value;

4. Preferentially select the route originated locally;

5. Preferentially select the route with the shortest AS_PATH;

6. Preferentially select the route with lowest ORIGIN attribute;

7. Preferentially select the route with the minimum MED value;

8. Preferentially select the route obtained through the EBGP, instead of

through IBGP;

9. Preferentially select the route whose next-hop has the minimum IGP

metric;

10. Preferentially select the first received EBGP route;

11. Preferentially select the route with the minimum BGP ROUTER-ID;

12. Preferentially select the route with shortest CLUSTER_LIST;

13. Preferentially select the route from the lowest neighbor address;

14. If the BGP load balancing is started, rules 10-13 are ignored. All routes

with the same AS_PATH length and MED values are installed in the

routing table.

Example of LOCAL_PREF and MED Preferential Selection



Figure 11-39 In the same condition, preferentially select the route with

higher LOCAL_PREF value

User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred

ISP. When the device connected to the ISP1 announces routes to the

switch-F, set the LOCAL_PREF value higher. For the same destination,

preferentially select the routes learned by ISP1 for its LOCAL_PREF value

is higher.


lower MED value

The two-host structure is used between a user and an ISP. The ISP prefers

to use LINK2 and use LINK1 as the backup. When the user publishes

routes to the ISP, the update packets with lower MED value are

transferred on LINK2. If the routes transferred on EBGP neighbor created



on LINK2 and LINK1 have no different options, the route with lower MED is

selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.

Route Fi l ter ing Route filtering means that a BGP speaker can determine the sent route

and the received route from any BGP peers. Route filtering is to define the

route policy. The procedure is as follows:

1. Identify Routes

2. Allow or deny routes

3. Operation attributes

We can complete route filtering through access list, prefix list, or AS path

access list. We can also use the route mapping to implement filtering and

attribute operation.

Route Ref lector The route reflector is the centralized routing device or focus of all internal

BGP (IBGP) sessions. The peer routing device of the route reflector is

called route reflector customer. The customers match with route reflector

and exchange routing information. Then, the route reflector exchanges or

reflects the information to all other customers to eliminate the

requirements for the full interconnection environment. As a result, large

amount of money is saved.

The route reflector is recommended only in the large scale internal BGP

closed network. The route reflector increases the overhead of the route

reflector server. If the configuration is incorrect, the route may be cyclic or

unstable. Therefore, route reflector is not recommended in every topology.

All iance The alliance is another method for processing the sharp increase of IBGP

closed network in the AS. Similar to the route reflector, the alliance is

recommended only in the large scale internal BGP closed network.

The concept of the alliance is put forward because one AS can be divided

into multiple sub-AS systems. In each sub-AS, all IBGP rules are

applicable. For example, all BGP routing devices in the sub-AS must form

a fully closed network. Each sub-AS has different AS number. Therefore,

external BGP must be run between them. Although the EBGP is used



between sub-AS systems, the route selection in the alliance is similar to

the IBGP route selection in a single AS. Namely, when the sub-AS boarder

is crossed, the next-hop, MED, and local priority information is reserved.

An alliance looks likes a single AS.

The defect of the alliance is: in the case of changing the plan from the

non-alliance to the alliance, the routing devices should be reconfigured

and the logical topology should be changed. In addition, if the BGP policy

is not manually set, you cannot select the best route through the alliance.

Route Damping Route damping (route attenuation) is a technology controlling the

unstability of routes. It significantly reduces the unstability caused by

route oscillation.

The route damping divides the route into normal performance and bad

performance. Routes with normal performance demonstrate long-term

high stability. In addition, the route with bad performance demonstrate

unstability in short term. The route with bad performance should be

punished with direct proportion to the expected route unstability. Unstable

routes should be suppressed until the route becomes stable.

The recent history of the route is the basis of evaluating the future

stability. To know the route history, first, you should know the swing times

of the route in certain period. In the route damping, when the route

swings, it is punished. When the punishment reaches a predefined limit,

the route is suppressed. After the route is suppressed, the route can

increase punishments. The more frequent the route swing is, the earlier

the route will be suppressed.

Similar rules are used to un-suppress the route and re-advertise the route.

An algorithm is used to exit (reduce) punishment according to the power

law. The basis of configuring the algorithm is the parameters defined by

users.

BGP Graceful Restart Principle of BGP Graceful Restart

After the route device becomes faulty, the neighbors in the BGP route

layer will detect that the neighborship becomes down and up, which is

called BGP neighbor oscillation. The oscillation of neighborship finally



causes the route oscillation. As a result, route blackhole occurs after the

routing device is restarted for a while or the data service of the neighbor

bypasses the restarted routing device. Consequently, the reliability of the

network is decreased.

The BGP graceful restart in the case of routing device failure prevents the

route disturbance and accelerates the route aggregation, which ensures

the network reliability.

Procedure for BGP Graceful Restart

Through BGP graceful restart, the following aspects are expanded:

1. In the BGP OPEN message, the graceful restart capability is added. The

fields are as follows:

Restart-flag: indicates whether the neighbor is restarted, 1: Yes; 0: No.

AFI/SAFI: the address family supporting graceful restart;

Fwd-flag: if an address family has the graceful restart capability, and

request for reserving the address family route, the value is 1. Otherwise,

the value is 0;

2. In the BGP update packets, add the EOR flag to indicate that the

update is complete.

3. Three timers are added

Restart-timer: Helper end is started, indicates that the reconstruction

session enters the longest waiting time of the GR flow

Stale-path-timer: Helper end is started, the longest time of reserving

routes;

Defer-timer: restarter end is started, the longest time of delaying

calculation and advertisement



Figure 11-41 Graceful restart flow

Restarter end (Switch-A):

1. At the beginning of creating neighbors, negotiate the GR capability

through the open message;

2. When any fault occurs, the forwarding layer of switch A reserves the

route and continue guiding the forwarding;

3. Re-construct the neighbor, send open messages. The restart-flag is set

to 1, which indicates that the restart is performed, notifying the

restart-time value and the reserved address family route to the

neighbors.

4. After the neighbor is restarted, start defer-timer to receive updates

from the neighbors.

5. Delay the route calculation until the EOR flag from the neighbor is

received or the deter-timer times out.

6. Calculate the route, update the core route and advertise the route.

Helper end (Switch-B):

1. At the beginning of creating neighbors, negotiate the GR capability,

and record that the neighbor has the GR capability.

2. After the restarter end becomes faulty, if any TCP error is detected,

run step 3, if no TCP error is detected, run step 4.

3. Reserve Routes; start the restart timer.



4. Re-construct neighbors and delete the restart timer. If the timer exists,

start the stale-path timer.

5. Before the creation, the restart timer times out, or the fwd-flag in the

corresponding address family of the open message is not 1, or the

corresponding address family information is not contained, run step 8.

6. Send routes to the restart routing device. Then, send EOR flag.

7. If the stale-path times out before the EOR is received, run step 8.

8. Delete the reserved route and then enter the normal BGP flow.



ACL Technology

This chapter describes the ACL technology and its application. The

configurations related with the ACL function in the switch include the

action group configuration, traffic meter configuration, and time range

configuration.

Main contents:

ACL introduction and application

Introduction to action group

Introduction to traffic meter

Introduction to time domain

ACL Introduction and Application This section describes the basic concepts and application of the ACL

technology.

Main contents:

Basic concepts of ACL

ACL classification

Typical application

Basic Concepts of ACL Access Control List (ACL) is the basic control mechanism of filtering traffic

on the switch. ACL is the traffic filter and can identify the specified types of

traffic according to the packet attributes, such as IP address and port



number. After identifying the traffic, ACL can execute the specified

operations, such as prevent them from passing one interface.

ACL comprises a series of rules. Each rule is used to match one specified

type of traffic. The serial number of the rule (Sequence) decides the

location of the rule in the ACL. ACL checks the packets according to the

rule sequence from small to large. The first rule that matches with the

packet in the ACL decides the processing result for the packet, permit or

deny. If there is no rule to match the packet, the packet is denied, that is

to say, the packets that are not permitted are denied. This shows that the

rule order is important.

The following example defines one IP standard access list.

(config)# ip access-list standard 1

(config-std-nacl)# 10 permit 36.48.3.0 0.0.0.255

(config-std-nacl)# 20 deny 36.48.0.0 0.0.255.255

(config-std-nacl)# 30 permit 36.0.0.0 0.255.255.255

(config-std-nacl)# exit

The following figure shows the access authority of the ACL segments. The

action of the shadow part is deny and the action of the white part is permit.

The partition diagram of standard ACL segments

After the last rule (that is, after the above rule 30), there is one hidden

rule deny any. The serial number of the rule is larger than those of all

rules in the ACL. The hidden rule is invisible and denies all packets that do



not match with the previous rules. To make the hidden rule not take effect,

you need to configure one rule permit any manually to permit the

packets that do not match all other rules to pass.

ACL Classification According to the usage of the ACL, ACL can be divided to six types:

IP standard ACL

IP standard ACL

MAC standard ACL

MAC extended ACL

IPV6 ACL

IP Standard ACL The IP standard ACL makes the rules only according to the source address

of the packet, so as to analyze and process the packet. For example, the

following standard IP ACL denies the packets sent from the host

171.69.198.102, but permits the packets sent from other hosts.

(config)# ip access-list standard 1

(config-std-nacl)# 10 deny host 171.69.198.102

(config-std-nacl)# 20 permit any

(config-std-nacl)# exit

IP Extended ACL The IP extended ACL filters the packets according to the IP upper-layer

protocol number, source IP address, destination IP address, source

TCP/UDP port number, destination TCP/UDP port number, TCP flag, ICMP

message type and code, and TOS priority. For example, the following IP

extended ACL denies the telnet packets sent from 171.69.198.0/24 to

171.69.198.0/24, but permits other TCP packets.

(config)# ip access-list extended 1001

(config-ext-nacl)# 10 deny tcp 171.69.198.0 0.0.0.255 172.20.52.0 0.0.0.255 eq

telnet

(config-ext-nacl)# 20 permit tcp any any

(config-ext-nacl)# exit



MAC Standard ACL MAC standard ACL makes the rules according to the source MAC address

of the Ethernet packet, so as to analyze and process the packet.

MAC Extended ACL MAC extended ACL makes rules according to the source MAC address,

destination MAC address, 802.1P priority, VLAN ID, and Ethernet type of

the Ethernet packet, so as to analyze and process the packet.

Hybrid ACL The Hybrid ACL can filter packets according to IP protocol number, source

IP address, source MAC address, DSCP, VLAN and so on.

IPV6 ACL The IPV6 extended ACL filters the packets according to the IPV6 upper-

layer protocol number, source IP address, destination IP address, source

TCP/UDP port number, destination TCP/UDP port number, and TOS priority.

For example, the following IPV6 ACL permits the IPV6 packets sent from

the host 1:2:3:4::5.

#ipv6 access-list extended 7001

switch(config-v6-list)#permit ipv6 host 1:2:3:4::5 any

switch(config-v6-list)#

Typical Application One basic function of ACL is used to limit the access for the network

resources, that is, one group of limited IP addresses access one group of

limited services. The most common used method of using ACL to control

the access authority is to create ACL to permit only the legal traffic to pass,

but prevent all illegal and un-authorized traffic. The following adopts one

example to describes the ACL function.

Application requirement:

In the intranet of one company, the port 0/0 of the switch is connected to

the news server and finance server; port 0/1 of the switch is connected to

the marketing department; port 0/2 of the switch is connected to the



accounting department; it is required that only the accounting department

(the segment range is 172.20.128.64-95) can access the finance server

and the marketing department (the segment range is 172.20.128.0-31)

cannot access the finance server, but the accounting department and

marketing department both can access the news server.

Network topology:

The example networking of using ACL to prevent the un-authorized access

1. Create the extended IP ACL 1001; permit all packets to reach the

news server via port 0/0; only permit the packets sent from the

accounting department to reach the finance server via port 0/0.

switch(config)# ip access-list extended 1001

switch (config-ext-nacl)# permit ip any host 171.23.55.33

switch (config-ext-nacl)# permit ip 172.20.128.64 0.0.0.31 host 171.23.55.34

switch (config-ext-nacl)# exit

2. Apply the ACL 1001 at the input direction of port 0/1 and port 0/2.

switch (config)# port 0/1-0/2

switch (config-port-range)# ip access-group 1001 in



switch (config-port-range)# exit

Introduction to Action Group To support the packet classification and traffic control, the switch extends

the traditional ACL so that the ACL and each permit rule in the ACL can be

bound with one action group. Take the corresponding action for the

matching packet. The action group is the set of actions. One action group

can contain packet mirroring, packet re-direction, packet modification,

packet traffic control, and packet counting. Each entry of the ACL can be

bound to one action group. Execute the corresponding action for the

matching packet. The action group cannot be bound with the permit rule.

Introduction to IP+MAC Binding To ensure that the user IP address can be used after being embezzled by

other users, you can bind user IP+ User MAC to protect the user security.

If other user adopts the bound IP address after binding user IP and MAC, it

is regarded as the illegal user and is not permitted to access any resources.

Introduction to Traffic Meter Main contents:

Related terms of traffic meter

Introduction to traffic meter

Related Terms SRTCM (Single Rate Three Color Marker): It is defined in RFC2697. Use

the three parameters (CIR, CBS, and EBS) to realize the single rate control

and packet coloring function. It includes colorblind mode and color –

sensing mode;

TRTCM (Two Rate Three Color Marker): It is defined in RFC2698. Use CIR,

CBS, PIR, and PBS to realize the two rate control and the coloring for

packets. It includes the colorblind mode and color –sensing mode;

CIR: Committed Information Rate;



CBS: Committed Burst Size;

EBS: Excess Burst Size ;

PIR: Peak Information Rate;

PBS: Peak Burst Size;

Introduction to Traffic Meter To support the packet traffic control, you can specify one meter name in

the action group.

The meter supports two modes, including SRTCM and TRTCM. The function

of the meter is to remark or drop the packet according to the traffic. The

meter has the processing action for the colored packet. When being

configured as drop the colored packet, it is used to complete the packet

traffic limitation function; when being configured as remark the colored

packet, it is used to complete the packet classification according to the

traffic so that the user takes different QoS policies in the later data path.

After the meter is configured to color the packets, the counter in the

action group can count the packets.

Introduction to Time Domain Main contents:

Related terms

Introduction to Time Domain

Related Terms Time domain: It is the set of the time periods. One time domain can

contain 0 to multiple time periods. The time range of the time domain is

the union set of the time periods.

Periodical time period: take the week as reference;

Absolute time period: Take year, month and day as reference;



Introduction to Time Domain Time domain is to support the control for the different access at different

time. The time domain can be bound with the ACL or the rules in the ACL.

The ACL or rules bound to the time domain takes effect in the range of the

time domain.



QoS Technology

This chapter describes the port-based QoS technology and the applications.

Main contents:

Priority mapping

Queue scheduling mode

Dropping mode

Rate restriction

Flow shaping

Set broadcast frame shielding

Priority Mapping This section describes the theory of the priority mapping.

Main contents:

Related terms

Introduction to Priority Mapping

Typical application

Related Terms 802.1p priority: The 8021.p priority is located in the L2 packet header. It

is used when there is no need to analyze the L3 packet header, but need

to ensure QoS in L2 environment. As shown in Figure 13-1, the 4-byte

802.1Q header contains 2-byte TPID (Tag Protocol Identifier valued as



0x8100) and 2-byte TCI (Tag Control Information). The following figure

shows the detailed contents of the 802.1Q header.

Ethernet frame with 802.1Q header

802.1Q header

As shown in Figure 13-2, the Priority field in TCI is the 802.1p priority. It

comprises three bits and the value range is 0-7.

It is called 802.1p priority, because the application of the priority is

defined in detail in 802.1p standards.

DSCP priority: RFC2474 defines the ToS domain of the IP packet header

called DS field. Here, the first six bits indicates the Differentiated Services

Code Point (DSCP) and the value range is 0-63. The later two bits are

reserved, as shown in Figure 13-3.

DS field



Local Priority: It is the priority with the local meaning distributed by the

switch to the packet. By default, it corresponds with the cos queue as the

intermediary role of DSCP or 802.1p priority to the cos queue.

Introduction to Priority Mapping Maipu series switch supports five types of priority mapping:

Map the DSCP of the packet to the local priority;

Re-tag the DSCP value of the packet according to the DSCP value of

the packet;

Map the 802.1p priority of the packet to the local priority;

Map the egress 802.1p priority of the packet according to the local

priority of the packet;

Map the egress dscp priority of the packet according to the local

priority of the packet;

After the packet enters into the switch, map to the local priority according

to the 802.1p priority or DSCP, and then to the cos queue. Meanwhile,

configure the DSCP to the local priority mapping and 802.1p priority to the

local priority. The former has higher priority (that is, the mapping from the

DSCP to the local priority takes effect).

Queue Scheduling Mode This section describes the scheduling mode based on port queue.

Main contents:

Related terms

Introduction to queue scheduling mode

Typical application

Related Terms SP (Strict Priority): It is one of queue scheduling algorithms. SP sends

the packets in the queue strictly according to the priority order from high

to low. When the queue with high priority is empty, send the packets in



the queue with lower priority. Queue 7 has the highest priority and queue

0 has the lowest priority.

RR (Round Robin): It is the packet-based fair scheduling. After one

queue schedules one packet, turn to the next queue.

WRR (Weighted Round Robin): It is the weighted scheduling based on

packet. You can configure the number of the packets scheduled by each

queue before turning to the next queue. When it is configured as 0, it

means SP.

WDRR (Weighted Deficit Round Robin): The algorithm is based on two

variables, that is, quantum and credit counter).The quantum means the

weight in the unit of byte and it is a configurable parameter. The credit

counter means the accumulation and consumption of the quantum, which

is a status parameter and cannot be configured. In the initial state, the

credit counter of each queue is equal to the quantum. Every time the

queue sends a packet, subtract the byte number of the packet from the

credit counter. When the credit counter is lower than 0, stop the

scheduling of the queue. When all queues stop scheduling, supplement

quantum for all queues. The value range of the weight N is 0-127. When

the weight is N, it means that quantum is (N*MTU_QUANTA) bytes

(MTU_QUANTA is 2K bytes). When N is 0, it means strict priority.

Introduction to Queue Scheduling Mode Each port has eight output queues and can adopt the SP, RR, WRR, and

WDRR scheduling policies.

Key Points of Queue Schedul ing Mode When configuring the weight of one queue as 0 in WRR and WDRR, it

means that the queue schedules according to the strict priority, that is,

the queue has the highest priority.



Typical Application

Scheduling mode

Illustration

The devices in the LAN are connected to the outer network via port 0/1 of

the switch. The packets sent by the devices in the LAN are mapped to the

output queue of port 0/1 according to the rules such as priority mapping.

Suppose the packets that queues 0, 6, and 7 are to send have high real-

time requirement and the other queues have the same priority. You can

configure port 0/1 to schedule by WRR and the weight of queues 0, 6, and

7 as 0. Therefore, the three queues schedule by strict priority and forward

packets first.

Drop Mode This section describes the drop mode of the port.

Main contents:

Related terms

Introduction to drop mode

Typical application

Related Terms SRED: Simple random early detection



Introduction to Drop Mode In the drop mode of SWRED, the start point of the queue dropping the

packet is labeled as StartPoint and the end point is labeled as EndPoint.

When the average length of the queue is between StartPoint and EndPoint,

SWRED drops the packets randomly according to the drop rate; when the

queue length exceeds EndPoint, drop the packets by 100%; when the

queue length is smaller than StartPoint, SWRED does not drop this kind of

packets.

Typical Application

Drop mode

Illustration

The devices in the LAN are connected to the outer network via port 0/1 of

the switch. The packets sent by the devices in the LAN are mapped to the

output queue of port 0/1 according to the rules such as priority mapping.

By default, when the network is blocked, drop the excessive packets,

which is unfair to the later packets. Therefore, configure the SWRED drop

mode on the port, that is, drop the packets according to the rate before

the network is blocked.

Speed Restriction The port-based input direction provides the speed restriction with

granularity as 64Kbit/s. The overspeed flow is dropped. The configured

parameters are bandwidth threshold (Kbit; 64K is the minimum

granularity) and burst flow (byte). The granularity of the burst flow is 4K

bytes. Use the port speed restriction to make the flow entering the

network with an even speed, preventing the network blocking from the

headstream.



Flow Shaping The flow shaping has two kinds:

Port-based flow shaping

The port-based flow shaping at the output direction makes the packets be

sent out with an even speed. The configured parameters are bandwidth

threshold (Kbit; 64K is the minimum granularity) and burst flow (byte).

The granularity of the burst flow is 4K bytes.

Flow shaping based on port queue

The output flow shaping based on the port queue makes packets be sent

out with an even speed. The configured parameters are queue number,

commitment information speed, commitment burst size, peak burst size,

and peak information rate. Here, the granularities of both the commitment

information speed and peak information speed are 64kbit/s; the

granularities of both commitment burst and peak burst size are 4k bytes.

The switches classifies the queue to three types according to the relation

between the queue flow size and cir/pir, that is, first schedule the queue

with less than cir flow, then the queue with the flow between cir and pir,

and at last the queue with more than pir flow.

VLAN-based Traffic Shaping VLAN-based traffic shaping is to map the data flow of some VLAN to 16

virtual queues, and then schedule and shape the 16 virtual queues.

The following is the principle of realizing the VLAN queue shaping.

After the packet enters the switch, enter the corresponding virtual queue

according to the VLAN number of the packet. On the virtual queue, the

queue scheduling and shaping can be realized. After VLAN queue shaping,

the traffic enters queue 9 of the port.



Set Broadcast Frame Shielding The unknown unicast frames, unknown multicast frames and broadcast

frames are broadcasted in VLAN. In some applications, the ports do not

need to send the packets. Enable the broadcast frame shielding on the

port and then the port does not send the packets.



AAA Technology

This chapter describes the AAA security service theory, RADIUS and

TACACS protocols, the ID authentication mechanism of MP series router,

and the common used debug commands and displayed debug information.

Main contents:

AAA terms

Basic theory of AAA

Introduction to RADIUS protocol

Introduction to TACACS protocol

Introduction to ID authentication mechanism

AAA Terms AAA: It is short for Authentication, Authorization and Accounting. It

provides one consistency frame used to configure the three kinds of

security functions. In fact, AAA configuration is to manage the network

security. Here, the network security mainly refers to the access control,

including:

1. Which users can access the network server?

2. Which services does the user with access authority have?

3. How to charge the user that is using the network resources?

NAS: It is short for Network Access Server. Enable the AAA security

services on the router as NAS. When the users wants to set up the

connection with NAS via one network (such as telephone network), so as

to get the authority of accessing other networks (or get the authority of

using some network resources), NAS is used to identify the user (or the

connection).



Method list: It defines one ID authentication method sequence to be

queried in turn, so as to authenticate the user ID.

RADIUS: It is short for Remote Authentication Dial In User Service,

defined by RFC 2865 and 2866.

TACACS: It is short for Terminal Access Controller Access Control System.

Basic Theory of AAA AAA enables you to dynamically configure the ID authentication and

authorization type for one single line (single user) or single server (such

as IP, IPX or VPDN). It creates the method list to define the ID

authentication and authorization type and then apply the method list to

the specified service or interface.

AAA uses the protocols (such as RADIUS and TACACS) to manage its

security function. AAA sets up the communication between NAS and

RADIUS, TACACS security server. Besides, the local user name, line

password and valid password can be used as the ID authentication method

of the access control.



As shown in the above figure, suppose that one method list is defined on

NAS. In the list, R1 is first used to get the ID authentication information,

then R2, T1, T2, and at last, the local user name database on NAS. If one

remote user tries to dial to the network, NAS first queries the ID

authentication information from R1. Suppose that the user passes the ID

authentication of R1, R1 sends out one PASS response to the network

access server. In this way, the user gets the authority of acing the

network. If R1 returns the FAIL response, the user is denied to access the

network and the session is ended. If R1 has no response, NAS regards it

as ERROR and queries the ID authentication information from R2. This

mode keeps in the following specified methods until the user passes the ID

authentication, is denied or the session is ended.

Note

NAS tries the next method only when the previous method has no

response. If the ID authentication fails at one point of the period, that is,

the security server or local user name database responds by denying the

user access, the ID authentication ends and do not try other ID

authentication method any more.

Introduction to RADIUS RADIUS is one UDP-based customer/server protocol. NAS serves as the

RADIUS client machine, but RADIUS server is the background process that

runs on the UNIX or Windows NT host.

RADIUS packet exists in the data domain of the UDP packet. The length is

variable. The domain attribute varies with the RADIUS packet type. The

following is the structure of the RADIUS packet.

The figures in the brackets mean the number of the bytes.



Code field

1-authentication request packet (Access-Request)

2-authentication pass packet (Access-Accept)

3-authentication failure packet (Access-Reject)

4-accounting request packet (Accounting-Request)

5-accounting response packet (Accounting-Response)

Identifier field

Identifier is used to match the request and response packet.

Length field

It is the total length of the packet.

Authenticator field

1-Request Authenticator

In the Access-Request packet, it is one random 16-byte number;

In the Accounting-Request packet, it is the following hash value:

RequestAuth = MD5(Code+ID+Length+16 Octets Zero+Attributes+Secret)

Here, Secret is the key shared by NAS and server.

2-Response Authenticator

In the Access-Accept, Access-Reject, and Accounting-Response packets, it

is the following hash value:

ResponseAuth = MD5(Code+ID+Length+RequestAuth+Attributes+Secret)

Attribute field

The Attribute field carries the specified authentication, authorization,

information and configuration details of RADIUS request and response.

Attribute can have multiple instances its format is as follows:



Value …

0

Type Length

1 2

The Type field indicates the Attribute type.

The Length field indicates the length of the whole Attribute, including Type,

Length and Value.

The Value field is 0 or multiple bytes, including the specified Attribute

information. The format and length of Value depend on the Type and

Length.

The following lists several common Attributes:

Attribute Type Data Type Attribute Length

User-Name 1 String Length >=3

User-Password 2 String 18<=Length<=130

NAS-IP-Address 4 Address Length=6

Service-Type 6 Integer Length=6

Reply-Message 18 String Length>=3

Acct-Status-Type 40 Integer Length = 6

Introduction to TACACS TACACS provides the authentication, authorization and accounting services.

TACACS adopts the TCP packet to transmit the data and uses the port 49

to receive the TCP packet. The format of the TACACS packet header is as

follows. The packet header always adopts the plaintext to transmit.

major version field

It is the major version number.

minor version field

It is the minor version number.



type field

It is the packet type, indicating authentication, authorization or accounting.

1-authentication

2-authorization

3-accounting

seq_no field

It is the serial number of the packet.

flags field

It is the flag. The lowest bit indicates whether the packet is encrypted.

session_id field

It is the session ID. It is one random 4-byte number. The ID does not

change in one session.

length field

It is the length of the packet body (excluding the packet head).

What is near to the packet head is the authentication, authorization or

accounting packet. All are encrypted.

The authentication has three types of packets, including START, REPLY,

and CONTINUE. START and CONTINUE are sent by the customer and

REPLY is sent by the server.

The authorization session uses one pair of packet REQUSET and

RESPONSSE to complete the authorization; the accounting session adopts

one pair of packet REQUSET and REPLY and carries the specified attributes

in the packet.



Introduction to ID Authentication Mechanism

Login Authentication 1. If AAA is not configured and Line is not configured, the login via

console port or telnet directly pass the authentication; for SSH, you

should use the local login.

2. If AAA is not configured, but Line is configured, authenticate according

to the Line configuration.

Authentication Type Configured on Line

Description

no login Pass

Login (the default login mode of telnet)

Authenticate according to the line password. If the line password is not configured, log in via the console port and pass the authentication; For telnet and ssh login, do not pass the authentication. (Note If the line password is not configured, the login fails.)

login local (the default login mode of ssh)

Authenticate according to the local password. (Note If the local user is not configured, the login fails.)

3. Configure AAA

Authenticate according to the configured method list. One method list

supports 4-6 authentication methods, but four authentication methods can

be configured at most.

When the user logs in via the interface or line, the system authenticates

the ID according to the method list referenced by the interface or line. If

the interface or line does not reference ant method list or the referenced

method is not defined, the system uses the default method list to

authenticate the ID; if the default method list is not configured, adopt the

default method to authenticate.

For the login via console port, the default method is none; for telnet and

ssh login, the default method is local.

If the user adopts the valid user name to log in, it is not required to input

the user name when authenticating ID in the privileged mode and you just

need to input the desired password.



Authenticate in Privileged Mode 1. AAA is not configured.

Use the enable password to authenticate:

If the login user has the enable password, authenticate according to the

password;

Otherwise, if there is the global enable password, authenticate according

to the global enable password;

If there is no any enable password, the user that logs in via the console

port directly passes the authentication, but the telnet user does not pass

the authentication.

2. Configure AAA

Authenticate according to the configured default method list. The method

list supports four authentication methods.

After the user logs into the router, request entering the privileged mode.

The system authenticates the ID according to the default method list; if

the method list does not exist, adopt the default method to authenticate:

For the login via the console port, the default method list is enable none;

For the telnet and ssh login, the default method is enable.



EIPS Technology

EIPS is a link layer protocol especially applied in Ethernet ring. It can

prevent the broadcast storm caused by the data loop. When a link on the

Ethernet ring is disconnected, the standby link can be enabled rapidly to

recover the communication between the nodes on the ring network.

Compared with STP protocol, EIPS has the features that the topology

aggregation speed is fast (lower than 50ms) and the aggregation time is

not related with the nodes on the ring network.

The EIPS technology supports two modes. One is sub ring mode. When

processing the intersecting rings, de-compound the two intersecting rings

to one master ring and one sub ring; there is one public link between the

master ring and the sub ring. The other mode is called hierarchical mode.

When processing the two intersecting rings, choose one ring as the master

ring. After removing the public link with the master ring, the ring

connected to the master ring becomes the low-level link connected to the

master ring.

Sub Ring Mode EIPS Main contents:

Basic concepts of EIPS

EIPS packet format

Basic theory of EIPS

Typical application of EIPS

Basic Concepts of EIPS EIPS (Ethernet Intelligent Protection Switchover): IETF defines the

auto protection switchover standard of Ethernet ring in RFC3619 (2003.10

information), indicating that the auto protection switchover mechanism is

performed in the Ethernet ring.



EIPS domain: The EIPS domain is identified by the integer ID. A group of

switches that are configured with the same domain ID and are

interconnected form one EIPS domain. The EIPS domain comprises EIPS

ring, EIPS control VLAN, master node, transmission node, edge node and

assistant edge node.

EIPS ring: The EIPS ring is identified by the integer ID. It physically

corresponds with one ring Ethernet topology. Each EIPS ring is one local

unit of the EIPS domain. The EIPS protocol takes effect on the EIPS ring.

The EIPS rings in the EIPS domain are divided to master ring and rub ring.

In one EIPS domain, there is only one master ring, but there can be one or

multiple sub rings. The sub ring intersects with the upper ring via the edge

node and the assistant edge node.

EIPS master ring: It is the EIPS ring with level as 0.

EIPS sub ring: It is the EIPS ring whose level is larger than 0.

EIPS control VLAN: It is relative to the data VLAN. In the EIPS domain,

the control VLAN can only be used to transmit the EIPS protocol packets.

Each EIPS ring has one control VLAN. The master ring protocol packets are

transmitted in the master control VLAN. The sub ring protocol packets are

transmitted in the sub control VLAN. It is not permitted to configure the IP

address on the master control VLAN and sub control VLAN interfaces. The

port connected to the Ethernet ring on the switch belongs to the control

VLAN and only the port connected to the Ethernet ring can be added to the

control VLAN. The port on the master ring belongs to the master control

VLAN and the sub control VLAN. The port on the sub ring only belongs to

the sub control VLAN. The whole master ring is regarded as one logical

node of the sub ring. The EIPS protocol packets of the sub ring are

transmitted transparently as the user packets of the master ring. The EIPS

protocol packets of the master ring do not enter the sub ring, but are only

transmitted in the master ring.

EIPS node: each switch on the EIPS ring is one node on the EIPS ring.

The nodes on one ring have the same EIPS domain ID and the EIPS ring

ID. Each EIPS node has two EIPS ports connected to the EIPS ring, which

are specified as the master port and standby port by the user during the

configuration.

Master node: The master node is the initiator of polling the status of the

ring network (the master node sends HEALTH packets periodically from

the master and standby ports. If at least one port can receive the packet

from another port, it indicates that the ring is complete. If the HEALTH

packet cannot be received for a long time, it is regarded that the ring fails).

The master node is also the decider of executing the operation after the

network topology status changes.

The master node has the following three states:

Complete State:



When all links on the ring network are in the UP state, the master node

can receive the HEALTH packet sent by itself from the standby port, which

indicates that the master node is in the complete state. The status of the

master node reflects the status of the EIPS ring. Therefore, EIPS ring is

also in the complete state. Here, the master node blocks the standby port,

so as to prevent the packets from forming the broadcast loop on the ring

topology.

Failed State:

When all links on the ring network are in Down state, it indicates that

master node is in the Failed state. Here, the master node enables the

standby port to ensure that the communication between the nodes on the

ring network is not interrupted.

PRE-UP State:

When the master node is in the failed state, it first turns to the Pre-up

state after receiving the HEALTH packet. If it still can receive the HEALTH

packets within a period, it turns to the complete state. This is to prevent

the network flap.

Transmission node: Besides the master node, there are all transmission

nodes on the EIPS ring. The transmission node is responsible for

monitoring the status of the direct-connected link and reporting the status

change to the master node via the EIPS protocol packet, and then the

master node decides how to process. The two transmission nodes

intersecting with the master ring on the sub ring are divided to edge node

and assistant edge node (there is only the transmission node on the

master ring; the edge node and assistant edge node are just on the sub

ring). If the transmission node on the master ring has the public port with

the edge node of the sub ring, it needs to send the sub ring protocol

channel status detection packet on its port. If the transmission node on

the master ring has the public port with the assistant edge node of the sub

ring, it needs to transmit the received sub ring protocol channel status

detection packet to the corresponding assistant edge node.

The transmission node has the following three states:

Link-Up State (UP state):

The master port and standby port of the transmission node are both in the

up state. The transmission node is in the Link-Up state.

Link-Down State (Down state):

When the master port or standby port of the transmission node is in the

Down state, the transmission node is in the Link-Down state. When the

transmission node in the Link-up state finds that the master port or

standby port is in the Link-Down state, it turns from the Link-Up state to



the Link-Down state and informs the master node by sending the Link-

Down packet.

Preforwarding State (temporary blocked state):

The transmission node cannot directly return to the Link-Up state from the

Link-Down state. When one port of the transmission node in the Link-

Down state is in the Link-Up state, and then the master port and standby

port recover to the Up state, the transmission node turns to the

Preforwarding state and blocks the last recovered port. At the moment

when the master port and standby port of the transmission node recover,

the master node cannot get to known the message at once, while the

standby port is still in the enabled state. If the transmission node returns

to the Link-UP state at once, the packets form the broadcast loop on the

ring network. Therefore, the transmission node first turns from the Link-

Down state to the Preforwarding state.

When the transmission node in the Preforwarding state receives the

COMPLETE-FLUSH-FDB packet sent by the master node, it turns to the

Link-Up state. If the COMPLETE-FLUSH-FDB packet is lost during

transmission, the EIPSA protocol provides one backup mechanism to

recover the temporary-blocked port and trigger the status switchover, that

is, if the transmission node does not receive the COMPLETE-FLUSH-FDB

packet in the specified time, it automatically turns to the Link-Up state and

enables the temporary-blocked port.

Edge node and assistant edge node: The edge node and assistant edge

node are used to detect the status of the sub ring protocol packet channel

in the master ring. The edge node is the initiator of the detection

mechanism, the assistant edge node judges the channel status and

reports to the edge node, and at last, the edge node makes decision

according to the channel status.

The edge node and the assistant node are both the special transmission

node, so they have the same three state as the transmission node, but the

meanings are a little different, as follows:

Link-Up State (UP state):

When the edge port is in the UP state, it indicates that the edge node

(assistant edge node) is in the Link-Up state.

Link-Down State (Down state):

When the edge port is in the Down state, it indicates that the edge node

(assistant edge node) is in the Link-Down state.

Preforwarding State (temporary-blocked state):

The transferring of the edge node (assistant edge node) status is basically

the same as the transmission node. The difference is that when the port



link statues change results in the status transferring of the edge node

(assistant edge node), it only depends on the status of the edge port

(refer to the previous introduction of the edge node status).

The edge node and the assistant edge node is the two main bodies of the

mechanism of detecting the sub ring protocol packet channel status in the

master ring. The edge node is the initiator of the mechanism, the assistant

edge node judges the channel status and reports to the edge node, and at

last, the edge node makes decision according to the channel status. The

mechanism is described in details later.

EIPS port: EIPS port is one abstract concept, corresponding to one of the

links that form the EIPS ring. The link can be one single physical link or

the aggregation link formed by multiple physical links. On each EIPS node,

there are always two ports connected to the EIPS ring. The EIPS rings may

intersect, so one EIPS port may belong to multiple EIPS nodes.

EIPS master port and EIPS standby port: The ports on the master

node and the common transmission node (non-edge node and assistant

edge node) are divided to master port and standby port. For the master

node, when the loop is complete, the user data VLAN of the standby port

needs to be blocked; for the transmission node, the master port and

standby port do not have special meaning.

EIPS public port and EIPS edge port: The ports on the edge

transmission node and assistant edge transmission node are divided to

public port and edge port. The public port is the port connected to the

public link of two intersecting rings and belongs to multiple EIPS rings. The

edge port only belongs to one sub ring. When the public port fails, do not

need to report to the master node of the sub ring, but only need to report

to the master node of the master ring.

EIPS Packet Format The protocol frame format of the Ethernet ring protection protocol is as

follows:

Table 15-1 EIPS packet format

0 15 16 31 32 47

Destination MAC address (6 bytes)

Source MAC address (6 bytes)

Type (Ether Type) (TPID) PRI + CFI + VLAN ID Frame Length



DSAP/SSAP CONTROL OUI = 0x00E02B

0x00BB 0x99 0x0B ERP_LENGTH

ERP_VER ERP_TYPE Domain_ID Ring_ID

0x0000 SYSTEM_MAC_ADDR (high 4 bytes)

Low 2 bytes HEALTH_TIMER FAIL_TIMER

STATE 0x00 HEALTH_SEQ 0x0000

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

The description of the frame format:

Destination MAC address: 48bits

Table 15-2 The description of the destination MAC address

Destination MAC Description

0180.6307.0000 1. The destination MAC of the HEALTH packet, sent out by

the master and standby ports of the master node,

passing all transmission nodes or common L2 switches;

the transmission node only forwards the HEALTH packet,

but does not send it to CPU. The ports of the master

node receive the HEALTH packet;

2. The destination MAC address of the LINK-DOWN packet,

initiated by the transmission node, edge node or

assistant edge node; inform the master node when the

links of the nodes change;

3. The destination MAC address of the ASK-RING-STATE

packet.

0180.6307.0002 The destination MAC of the COMM-FLUSH-FDB/COMP-FLUSH-

FDB packet. The packet is initiated by the master node. The transmission node forwards the packet and sends it to CPU; the master node does not forward it, but just sends it to CPU.

0001.7A4F.4826 1. The destination MAC address of the EDGE-HEALTH

packet. The packet detects the master ring link between

the edge node and assistant edge node;

2. The destination MAC of the MAJOR-FAULT packet. It is

initiated by the assistant edge node. When the master

ring link between the edge node and assistant edge node



is disconnected, inform the edge node that the master

ring link fails;

3. The destination MAC address of the MAJOR-RESUME

packet. It is initiated by the assistant edge node. When

receiving the EDGE-HEALTH packet of the edge node

again, inform the edge node that the link is recovered.

0001.7A4F.4AB6 Topology request packet

0001.7A4F.4AB4 Uni-directional detection packet

0001.7A4F.4AB5 The HELLO1 packet sent after the standby node does not receive the Hello packets of the master node within some time.

Source MAC address: 48bits, the MAC address of the sending node;

TPID: 8 bits, fixed as 0x8100;

PRI+CFI: 4bits, not defined, the priority can be defined (7 is

recommended by default), the standard format frame with CFI as 0;

VLAN ID: 16bits, not defined;

Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;

DSAP/SSAP: 16bits, fixed as 0xAAAA;

CONTROL: 8bits, fixed as 0x03;

OUI: 24bits, fixed as 0x00E02B;

ERP_LENGTH: 16bits, fixed as 0x40;

ERP_VERS: 16bits, fixed as 0x0001;

ERP_TYPE: 16bits, the frame type;

Domain_ID: 16bits, the domain ID;

Ring_ID: 16bits, the ring ID;

SYSTEM_MAC_ADDR: 48bits, the MAC address of the sending node;

HEALTH_TIMER: 16bits, the period of sending the HEALTH frames set

by the master node and edge control node (the unit is 16ms);

FAIL_TIMER: 16bits, the timeout of not receiving the HEALTH frames

set by the master node and edge control node (the unit is 16ms);

STATE: 8bits, the node status;

HEALTH_SEQ: 16bits, the serial number of the HEALTH frame,

generated by the maser node;



ERP_TYPE: the packet type, defined as follows:

Table 15-3 The definition of packet type

Packet Type Value Description

HEALTH packet 5 The packet is initiated by the master node, detecting

the loop integrality for the network.

COMP-FLUSH-FDB packet

6 The packet is initiated by the master node. When the EIPS ring turns to the HEALTH state, inform the transmission node to update the MAC entries and inform the transmission node to un-block the temporary-blocked port.

COMM-FLUSH-FDB packet

7 The packet is initiated by the master node. When the EIPS ring turns to the DWON state, inform the transmission node to update the MAC entries or when the transmission node has one port in the Link-Down state, initiate the packet, too.

LINK-DOWN packet

8 The packet initiated by the transmission node, edge node or assistant edge node. When the links of the nodes are down, inform the master node that the loop disappears.

ASK-RING-STATE 9 When the port is up and the ring status is not

confirmed, the non-master node queries the current ring status from the master node.

EDGE-HEALTH packet

10 The packet is initiated by the edge node, detecting the master ring link between the edge node and the assistant edge node.

MAJOR-FAULT packet

11 The packet is initiated by the assistant edge node. When the master ring link between the edge node and the assistant edge node is disconnected, inform the

edge node that the master ring link fails.

MAJOR-RESUME 12 After the assistant edge node finds that the master ring

fault recovers, inform the edge node that the master ring fault recovers.

LINK-HELLO 14 The uni-directional detection packet

TOPOLOGY 15 The topology collection packet, including the topology

request and topology response packets.

Basic Theory of EIPS

Basis of EIPS Protocol All nodes on each domain are configured with the same EIPS domain

ID;

The master ring protocol packets are broadcasted in the main control

VLAN; the sub ring protocol packets are broadcasted in the sub control

VLAN;

The EIPS ports on the master ring node are added to the main control

VLAN and sub control VLAN; the EIPS ports on the sub ring are only

added to the sub control VLAN;



The protocol packets of the sub ring are processed as the packets in

the master ring, being blocked/enabled synchronously with the

packets;

Pol l ing Mechanism The Polling mechanism is the mechanism that the master node of the EIPS

ring actively detects the health status of the ring network. The master

node periodically sends HEALTH packets from two ports at the same time,

which are transmitted on the ring via the transmission nodes in turn. If the

master node can receive the HEALTH packet sent by itself from any port, it

indicates that the ring network link is complete and the assistant port is

blocked, as shown in Figure 15-1. If the two ports cannot receive the

HEALTH packets within the specified time, it is regarded that the ring

network link fails; enable the assistant port; send COMM-FLUSH-FDB

packets from two ports, as shown in Figure 15-2. When the master node

in the Failed state receives the HEALTHA packets sent by itself from the

assistant port, it first turns to the PRE-UP state. After some time, it turns

to the Complete state, blocks the assistant port, refreshes FDB, and sends

COMP-FLUSH-FDB packets from the master port to inform all transmission

nodes to enable the temporary-blocked ports and refresh FDB.

There are two aspects of reasons why the master node sends HEALTH

packets from two ports at the same time:

When the ring is uni-directional, if do not send HEALTH packets from

two ports at the same time, maybe the master node cannot receive

the HEALTH packets, so it enables the assistant port and as a result,

the uni-directional link become loop;

When enabling the standby master node function, if the one link in the

loop is DOWN, the standby master node is at the port that does not

send the HEALTH packets on the master node, so the standby master

node function cannot take effect.



Figure 15-1 The running when the uni-ring is in the non-fault state

Figure 15-2 The master node cannot receive the HEALTH packets

Mechanism of Not i fying Link Status Change The mechanism of notifying link statue change provides the mechanism of

processing the ring network topology change that is faster than the Polling

mechanism. The initiator of the mechanism is the transmission node,

which always monitors its port link status. Once the status changes, the

transmission node sends the packet to inform the master node and then

the master node decides how to deal. If it is found that the port is Down,

send the LINK-DOWN packet, as shown in Figure 15-3. After the master

node receives the LINK-DOWN packet, it turns to the Failed state and

sends the COMM-FLUSH-FDB packets to the transmission node on the ring

via two ports.



Figure 15-3 Transmission node detects that the physical line is down

Sub Ring Protocol Packet Detect ion Mechanism The edge node sends the EDGE-HEALTH packets to the assistant edge

node from two directions via the two ports of the associated transmission

node, so as to detect the faults of the master ring link, as shown in Figure

15-4. When the assistant edge node does not receive the EDGE-HEALTH

packet, it indicates that at least two points on the master ring are broken.

The assistant edge node switches to the MAJOR-FAULT state and sends the

MAJOR-FAULT packets to the edge node via its edge port. After the edge

node receives the MAJOR-FAULT packet, the status machine switches to

the MAJOR-FAULT state and blocks the edge port, so as to avoid the loop

during the dual-homing, as shown in Figure 15-5.

When the edge node and assistant edge node receive the COMP-FLUSH-

FDB packets of the sub ring, turn to the LINK-UP state unconditionally.

When the assistant edge node receives the EDGE-HEALTH packet, the

status machine turns to the LINK-UP state.

To avoid that the edge node becomes disordered when receiving the

MAJOR-FAULT and COMP-FLUSH-FDB packets and the status of the edge

node becomes wrong, when the assistant edge node turns to the LINK-UP

state, send the MAJOR-RESUME packet to the edge node. After receiving

the packet, the edge node needs to turn to the LINK-UP state.



Figure 15-4 The sub ring detecting the master ring link

Figure 15-5 The sub ring detects the master ring link fault



EIPS Typical Application

Uni-r ing Networking Appl icat ion

Figure 15-6 EIPS uni-ring networking

As shown in Figure 15-6, there is only one ring in the network topology.

Here, you just need to define one EIPS domain and one EIPS ring. The

feature of the networking is that when the topology changes, the response

speed is high and the convergence time is short, which can meet the

application when there is only one ring in the network.

Sub Ring Appl icat ion

Figure 15-7 Typical network of the EIPS sub ring

As shown in Figure 15-7, there are two or more rings in the network

topology, but there are two public nodes between the rings. Here, you just

need to define one EIPS domain and select one ring as the master ring

and the other as the sub ring. The typical application of the networking is

that the master node of the sub ring can go upstream via two edge nodes

and provide the upstream link backup.



Hierarchical EIPS Main contents:

Basic concepts and abbreviations of EIPS

Basic network topology of EIPS

Ports and protocol packets on the ring

EIPS protocol mechanism

Basic Concepts and Abbreviations

Basic Concepts Ethernet Ring: It is a set of a group of Ethernet switch nodes those are

interconnected as a ring.

Master Node (master, M for short): It is the main decision maker and

control node on the ring of one domain. There is only one master node on

one single ring. The two ports of the master node on the ring are the

master port and the assistant port. When the link pf the domain controlled

by the master node is complete, the assistant port blocks all data to avoid

the loop. When the link on the ring fails and if the port of the faulty link is

not the assistant port of the master node, enable the forwarding function

of the assistant port.

Transmission node (transit, T for short): It is the node that transmits

data and cooperates with the master node to protect the ring in one

domain. It has two ports in the ring. When finding the link of the port fails,

the transmission node informs the master of the domain, updates the port

address forwarding table according to the received control packet, and

enables the port. Besides the master node, the others are transmission

nodes in one single ring.

Topology level (level): It is the division of the loops protected by one

EIPS domain. The loop protected in one domain comprises one ring or

several intersecting rings. When there is only one ring in the domain, set

the ring level as major-level ring and the level is 0; when there are

multiple intersecting rings in the domain, choose one ring as the major-

level ring and topology level is 0. The ring connected to the major-level

ring becomes the low-level link connected to the major-level ring after

removing the public link part with the major-level ring. The ring connected

to the low-level link becomes the lower-layer link after removing the public

link with the low-level link. For the low-level links in the topology, the

lower the level is, the higher the level number is. The level number of the



major-level ring is highest (it is 0). Here, the major ring is one complete

ring. The low-level links are the un-complete ring link set after removing

the public links with the access upper layer.

Topology segment (segment): It is the division of different low-level

links on the same level in the domain. It is used to distinguish the

different low-level links of the same level. There can be multiple low-level

links on the same level in the domain. The segment is used to distinguish

the different low-level links of the same level. The multiple low-level links

of the same level use the different segment numbers. Here, the segment

number of the major-level ring is 0. After dividing the levels and segments

in the domain, the ring or low-level link of each level and segment has one

unique level number and segment number in the whole domain, called

level segment. The low-level link whose level number and segment

number are defined is called low-level segment link. The low-level

segment link is the segment link connected on the major-level ring or

between the two edge ports of the upper-level segment link.

Edge-control node (edge-control, E-C for short): It is the main

decision maker and control node of the low-level segment links in the

domain. The edge-control node has one port in the level segment. The

ports of the low-level segment link connected to the upper-level segment

belong to the low-level segment link. If the accessed nodes of the upper-

level segment controls the edge ports to protect the low-level segment link,

the accessed nodes of the upper-level segment are called edge control

nodes, which belong to the low-level segment link, but not the accessed

upper-level segment. The function of the edge control node is similar to

the low-level master node. When the links of the controlled level segment

are complete, the edge ports block the forwarding function of protecting

the service VLAN, avoiding the closed ring in the domain. When the link of

the controlled level segment fails and if the ports of the faulty link are not

the edge ports, the edge ports enable the forwarding function of protecting

the service VLAN so that the VLAN data of protecting the services can pass

the edge ports. Select one of the two nodes those connect the low-level

segment link to the upper-level segment as the edge control node, which

is responsible for controlling the level segment link.

Edge assistant node (edge-assistant, E-A for short): It is the node of

the low-level segment link that transmits the data and cooperates with the

decision node to protect the ring in the domain. The edge-assistant node

has one port in the level segment. When the access node of the low-level

segment link connected to the upper-level segment is not the edge control

node, the node is the edge assistant node, which only belongs to the

accessed low-level segment link, but not belong to the accessed upper-

level segment. The edge assistant node is responsible for transmitting the

loop status detection packets sent by the edge control node on the level

segment link. When it is found that the level segment link fails, the edge

assistant node serves as the decision node to send the link fault

notification packet.



Edge node: It is the intersecting point of the two rings. It is associated

with multiple different levels and has at least three ports in one domain. It

is the compound role. The edge node can have different roles in different

levels. In the accessed low-level segment link, it can be edge control node

or edge assistant node; in the accessed upper-level segment, it can be the

master node or transmission node.

Control VLAN: To control the EIPS protocol packets to be transmitted

only in the EIPS domain, use one VLAN to control the EIPS protocol

packets. The EIPS control VLAN cannot be configured with the L3 interface.

Abbreviat ions ERP: Ethernet Ring Protection

EIPS: Ethernet Intelligent Protection Switching

MAC: Media Access Control

FDB: Forwarding Database

VLAN: Virtual Local Area Network

STP: Spanning Tree Protocol

MSTP: Multiple Spanning Tree Protocol

Basic Network Topology of EIPS

Uni-r ing Topology When the domain includes one single ring, define the single ring as the

major-level ring, the level is defined as 0 and the segment is defined as 0,

as shown in Figure 15-8.



Figure 15-8 EIPS uni-ring

In Figure 15-8, the nodes T1, T2, T3, and M form the major-level ring

(level 0, seg 0); the node M is the master node; the nodes T1, T2, and T3

are the transmission nodes. When the major-level ring is not faulty, EIPS blocks the services of the second port S.

Intersect ing Ring Topology When the domain includes multiple physical rings those intersect with each

other, de-compound it to one hierarchical structure that includes one

major-level ring and several low-level segment links. The level of the

major ring is defined as 0 and the segment is defined as 0. The low-level

segment link is distributed with one level number and segment number.

The lower the level is, the higher the level number is.

Figure15-9 EIPS intersecting rings

In Figure 15-9, choose one of the intersecting rings as the major-level ring

and the other rings degenerate as the low-level segment link. The nodes

T1, T2, T3, T4, and M form the major-level ring; the node M is the master

node; the nodes T1, T2, T3 and T4 are the transmission nodes. Divide the

level and segment for other links; (level 1, segment 1) includes the nodes

T1, T2, T3 and T4. Here, the node T2 is the edge control node; the node

T1 and T2 are the transmission nodes; the node T3 is the edge assistant

node. When ((level 1, segment 1) link is not faulty, the node T2 blocks the

edge port connected to (level 1, segment 1). The major-level ring is one

single ring and the low-level segment link is one link. The larger the level

number is, the lower the level is.



Node Roles Master Node:

The major-level ring of one domain has one master node, that is, the

master node of the major-level ring. The master node is the initiator of

detecting the major-level ring status actively and the decision maker of

executing the operation after the major-level ring topology changes.

The master node sends the HEALTH packets periodically from two ports,

which are transmitted via the transmission nodes on the ring. If the

master node can receive the HEALTH packets sent by itself, it indicates

that the major-level ring link is complete; if the two ports cannot receive

the HEALTH packets within the specified time, it regards that the ring

network link fails.

The master node has the following four states:

Complete State

The major-level ring is in the stable state and there is no broken link in

the ring. The master node blocks the service forwarding function of the

protect VLAN of the assistant port, to as to prevent the network storm

caused by the loop. Meanwhile, the master node periodically sends the

HEALTH packet, which is transmitted via the transmission node when the

loop is normal and returns to the port of the master node.

Failed State

When the link of the major-level ring is disconnected, the master node

enters into the Failed State after receiving the event that the link is

disconnected. If the corresponding port of the faulty link is not assistant

port, the assistant port enables the data forwarding function of the protect

VLAN. Because the topology of the major-level loop changes, the master

node needs to send the COMM-FLUSH-FDB control messages from the

main port and assistant port to inform all other nodes of the level segment

to clear up the address entries of the master node and the protected VLAN.

Init State

When the master node begins to initiate, the link status of the current loop

is not known, so set the current status as Init State until the actual status

of the loop is detected.

PRE-UP State



To avoid that the fault point flaps repeatedly and the loop status

frequently switches, which causes the interruption of the service data, the

master node waits for some time and then enters the Complete State from

the Failed State. During the waiting time, the status of the master node is

PRE-UP State.

Transmission Node:

The transmission node is responsible for monitoring the status of the link

on the direct-connected loop. When the link fails, send the LINK-DOWN

packet to inform the control node of the level segment and then the

control node decides how to deal. When the COMP-FLUSH-FDB and COMM-

FLUSH-FDB packets of the control node are received, update the FDB table

related with the protection service VLAN.

The transmission node has the following four states:

Complete State:

When receiving the COMP-FLUSH-FDB packet of the level segment, enter

the Complete State.

Failed State:

When receiving the COMM-FLUSH-FDB packet of the level segment, enter

the Failed State.

Init State:

When the transmission node begins to initiate, the link status of the

current loop is not known, so set the current status as Init State and send

the ASK packet to query the control node of the level segment.

Pre-forwarding:

The status appears at the moment when the link recovers. When in the

state, the original Down port becomes up. The EIPS control VLAN is

enabled and can forward the EIPS protocol packets, but the service VLAN

is still blocked. After the loop enters the Complete state and the

transmission node receives the COMP-FLUSH-FDB packet of the control

node, enable the forwarding function of the service VLAN and turn to the

Complete state. If the transmission node does not receive the COMP-

FLUSH-FDB packet within the specified time, automatically turn to the

Complete state.

Edge Control Node:

The edge control node is the control node that has only one port on the

low-level segment link. There is no master node in the level segment link.



The edge control node periodically sends the HEALTH packet to the level

segment link from the access port. When the link is complete, the returned

HEALTH packet can be received. The edge control node is similar to the

master node and has the following four status:

Complete State

The level segment link is in the stable state and there is no broken link.

The edge control node blocks the service forwarding function of the protect

VLAN of the access port, to as to prevent the network storm caused by the

loop. Meanwhile, the access port periodically sends the HEALTH packets,

which are transmitted via the nodes of the low-level segment link when

the loop is normal and return to the access port of the edge control node.

Failed State

When the access port of the edge control node does not receive the

returned HEALTH packets within the specified time or receives the event

that the link is disconnected on the level segment link, the node enters the

Failed State. If the corresponding port of the faulty link is not the access

port of the edge control node, enable the data forwarding function of the

protection service VLAN of the access port. Because the topology of the

level segment link changes and the edge control node needs to send

COMM-FLUSH-FDB control message to inform the other nodes on the level

segment link and the related nodes of the upper level to clear up the FDB

table of the node and the protected VLAN.

Init State

When the edge control node begins to initiate, the link status of the

current level segment link is not known, so set the current status as Init

State until the actual status of the loop is detected.

PRE-UP State

To avoid that the fault point flaps repeatedly and the loop status

frequently switches, which causes the interruption of the service data, the

edge control node waits for some time and then enters the Complete State

from the Failed State. During the waiting time, the status of the edge

control node is Preforwarding State.

Edge Assistant Node:

The edge assistant node is the non-control node that has only one port on

the low-level segment link. When receiving the HEALTH packet sent by the

control node of the level segment link, return it to the control node from

the receiving port and cooperate with the control node to detect the level

segment link status. If the edge assistant node does not receive the

HEALTH packet within the specified time, it is regarded that the link

between the edge assistant node and the control node fails. When the

edge assistant node receives the LINK-DOWN packet of the level segment

link, it is also regarded that the link between the edge assistant node and

the control node fails. The edge assistant node is responsible for



monitoring the status of the link on the direct-connected loop. When the

link fails, send LINK-DOWN packet to inform the control node of the level

segment. When the edge assistant node finds that the link between itself

and the control node of the level segment link fails, it serves as the

temporary control node and send the COMM-FLUSH-FDB packet to inform

the other nodes on the level and the upper-level nodes to update the FDB

table related with the protection service VLAN.

Port and Protocol Packets on Ring

Main Port and Assistant Port The master node and transmission node are connected to two ports of the

ring link. One is the main port and the other is the assistant port. The port

role depends on the user configuration. The main port and assistant port

of the master node are different on the function, while the main port and

assistant port of the transmission node has no difference on function.

The master node of the main ring send the HEALTH packets from two

ports. If at least one port can receive the packet from the other port, it

indicates that the main ring is complete, so you need to block the data

forwarding function of the protection service VLAN of the assistant port.

Contrarily, if the HEALTH packet is not received within the specified time

or the LINK-DOWN packet of the main ring is received, it indicates that the

major-level ring fails. If the corresponding port of the faulty link is not the

assistant port, you need to enable the protection service VLAN forwarding

function of the assistant port, so as to ensure the normal communication

of all nodes on the ring. Besides, the master node of the main ring

receives the address update packet from other low-level segment link, but

does not forward it.

The main port and assistant port of the transmission node has no

difference on the function. The port role also depends on the user

configuration.

Edge Port The edge node has only one port connected to one level segment link and

the port is the edge port. When the address refresh message COMP-

FLUSH-FDB and COMM-FLUSH-FDB is received from the edge port and if

the upper level does not get the status change notification of the level

segment link that sends the control message, send the packet to the

upper level and update the FDB table of the port related with the

protection service VLAN.



Data Forwarding Funct ion of Port The data forwarding function of the node port (including main port,

assistant port and access port) has the following two status:

Block: block port, prohibit the data from being forwarded via the port;

Forward: enable port, permit the data to be forwarded via the port;

For example, when the link on the main ring is normal, the master node of

the main ring blocks the assistant port so that the data in the protection

service VLAN cannot pass the assistant port of the master node, avoiding

the loop. When the link on the main ring fails and the corresponding port

of the faulty link is not the assistant port of the master node, the master

node enables the assistant port and permits the data in the protection

service VLAN to pass the assistant port and recover the communication of

service data.

Format of EIPS Protocol Packet The format of the Ethernet ring protection protocol packet is as follows:

Table 15-4

0 15 16 31 32 47

Destination MAC address (6 bytes)

Source MAC address (6 bytes)

Type (Ether Type) (TPID) PRI + CFI + VLAN ID Packet length (Frame

Length)

DSAP/SSAP CONTROL OUI = 0x00E02B

0x00BB 0x99 0x0B ERP_LENGTH

ERP_VER ERP_TYPE CTRL_VLAN_ID LEVEL_ID SEG_ID

0x0000 SYSTEM_MAC_ADDR (high 4 bytes)

Low 2 bytes HEALTH_TIMER FAIL_TIMER

STATE 0x00 HEALTH_SEQ 0x0000

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

RESERVED (0x000000000000)

The packet format is described as follows:



Destination MAC address: 48bits, described as follows:

Destination MAC Description

0180.c200.0035 The destination MAC of the HEALTH, LINK-DOWN or ASK-RING-STATE packet; the transmission node sends the packet from another port to other nodes and does not send it to the CPU of the transmission node for processing.

00E0.2B00.0004 The destination MAC of COMM-FLUSH-FDB/COMP-FLUSH-FDB packet; the transmission node sends the protocol packet to CPU for processing and sends it from another port to other nodes.

0001.7A4F.4AB6 The topology request packet

0001.7A4F.4AB4 Uni-directional detection packet

0001.7A4F.4AB5 The HELLO1 packet sent when the standby master node does not receive the HELLO packet within the specified time

Source MAC address: 48bits, the MAC address of the sending node;

TPID: 8 bits, fixed as 0x8100;

PRI+CFI: 4bits, not defined, the priority can be defined (7 is

recommended by default), the standard format frame with CFI as 0;

VLAN ID: 16bits, not defined;

Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;

DSAP/SSAP: 16bits, fixed as 0xAAAA;

CONTROL: 8bits, fixed as 0x03;

OUI: 24bits, fixed as 0x00E02B;

ERP_LENGTH: 16bits, fixed as 0x40;

ERP_VERS: 16bits, fixed as 0x0001;

ERP_TYPE: 16bits, the packet type;

CTRL_VLAN_ID: 16bits, the ID of the control VLAN;

LEVEL_ID: 8bits, the level number of the segment link, the major-

level ring is 0; the low-level link is larger than 0;

SEG_ID: 8bits, the ID of the segment link; the major-level ring is 0;

SYSTEM_MAC_ADDR: 48bits, the MAC address of the sending node;

HEALTH_TIMER: 16bits, the period of sending the HEALTH frames set

by the master node and edge control node (the unit is ms);

FAIL_TIMER: 16bits, the timeout of not receiving the HEALTH frames

set by the master node and edge control node (the unit is ms);

STATE: 8bits, the node status;



HEALTH_SEQ: 16bits, the serial number of the HEALTH frame,

generated by the maser node;

ERP_TYPE: the packet type, defined as follows:

HEALTH=5, the link health detection HEALTH packet; the

destination MAC address of the packet is 0x0180c2000035; the

protocol packet does not need to be transmitted to the CPU of the

transmission node;

COMP-FLUSH-FDB=6, the COMP-FLUSH-FDB packet of informing

that the link is complete; the destination MAC address of the

packet is 0x00E02B000004; the protocol packet needs to be

transmitted to the CPU of the transmission node;

COMM-FLUSH-FDB=7, the COMM-FLUSH-FDB packet of informing

that the link fails; the destination MAC address of the packet is

0x00E02B000004; the protocol packet needs to be transmitted to

the CPU of the transmission node;

LINK-DOWN=8, the link fault alarm LINK-DOWN packet; the



transmission node;

ASK-RING-STATE=9, the link status query ASK packet; the packet

when the transmission node and assistant edge node asks the

current loop status of the master node during initialization; the



transmission node;

LINK-HELLO =14 uni-directional detection packet;

TOPOLOGY=15 topology collection packet, including topology

request and topology response packet;

Other values are reserved;

The definition of the STATE value:

IDLE=0

COMPLETE=1

FAILED= 2

LINK-UP =3

LINK-DOWN =4

PRE-FORWARDING=5

The other values are reserved.



EIPS Protocol Mechanism

Uni-r ing Running Mechanism The uni-ring is one major-level ring. The nodes on the major-level ring

detect and protect the links of the major-level ring, ensuring that the data

communication of the protection service VLAN of any two nodes on the

major-level ring has one connected logical path at most and the Ethernet

control packet of the major level can only be transmitted in the major-

level ring.

Non-faul t Status When the links and nodes on the uni-ring has no fault, the master node

periodically sends the HEALTH packets from the main port, which are

transmitted via the transmission nodes and links on the ring to reach the

assistant port of the master node. The master node blocks the protect

VLAN forwarding function of the assistant port so that the data in the

protect VLAN cannot be transmitted via the assistant port of the master

node, avoiding the loop. The control VLAN does not block and the EIPS

protocol packets can pass the blocked assistant port of the master node.

As shown in Figure 15-10, the master node M periodically sends the

HEALTH packets; because the loop is not faulty, the HEALTH packet

reaches the assistant port of the master node; the master node blocks the

data forwarding function of the protect VLAN of the assistant port,

avoiding the loop.

Figure15-10 The non-fault status of the uni-ring



Loop Fault Status When the link on the ring fails, block the data forwarding function of the

corresponding port of the faulty link after the neighbor node of the faulty

link detects the fault. To prevent the loop protocol packet from passing the

faulty link during uni-direction, the protocol packet cannot pass the

corresponding port of the faulty link. If it is detected that the faulty node

of the link is transmission node, send the LINK-DOWN packet from

another non-fault port. After the master node receives the LINK-DOWN

packet, it is regarded that the ring fails, as shown in Figure 15-11. To

prevent the LINK-DOWN packet from being lost, the master node has the

standby detection mechanism. When the master node does not receive the

HEALTH packet within the specified time, it is regarded that the loop fails.

After the master node detects that the link fails, enable the data

forwarding function of the assistant port at once.

Figure 15-11 Transmission node detects the link fault

If the master node itself fails, the processing is different. If the main port

fails, block the main port and enable the data forwarding function of the

assistant port; if the assistant port fails, the assistant port is still blocked.

Fault Recovery After the link fault on the ring disappears, the neighbor node of the faulty

link detects that the link fault of the port disappears; set the port of the

link on which the fault disappears as the status of forwarding the ring

network control packets so that the port can forward the EIPS protocol

packets. Set the port status as Pre-Forwarding, but the port still cannot

forward the packets of the protect VLAN.

When the link fails, the master node periodically sends the HEALTH packet

from the main port. After the link fault disappears, the master node

regards that the link recovers when the assistant port receives the HEALTH

packet. To prevent the link status flap, turn to the PRE-UP state, enable

the PRE-UP timer, and enable the data VLAN. After the PRE-UP timer times



out, turn to the COMPLETE state, re-block the data forwarding function of

the protect VLAN of the assistant port and send the COMP-FLUSH-FDB

packet to the main port. Meanwhile, the master node updates the FDB

address table of the port. After the transmission node on the ring receives

the COMP-FLUSH-FDB packet, update the FDB table of the port, set the

two neighboring ports of the faulty link as Forward state, and enable the

protect VLAN data forwarding function of the port.

To prevent the COMP-FLUSH-FDB packet from being lost, set the Pre-

Forwarding port as Forward and enable the protect VLAN data forwarding

function of the port when the neighboring node of the link on which the

fault disappears does not receive the COMP-FLUSH-FDB packet within the

specified time so that the data of the protect VLAN is transmitted

according to the topology. To prevent that the transmission node receives

two COMP-FLUSH-FDB packets, which results in the repeated updating of

the port FDB address, record the current loop status as Complete State

when the transmission node receives the COMP-FLUSH-FDB packet. If the

recorded current loop status of the transmission node is Complete State,

do not process after receiving the COMP_FLUSH-FDB packet, avoiding the

repeated updating of the port FDB table. To make the status of all

transmission nodes on the ring consistent, the master node periodically

sends the COMP-FLUSH_FDB packets. As shown in Figure 15-12, after the

link fault between the nodes T2 and T3 recovers, the nodes T2 and T3

detect that the link fault of the port disappears and set the port of the link

on which the fault disappears as the status of permitting forwarding the

ring network control packets so that the port can forward the Ethernet ring

network protect control packets. Set the port status as Pre-Forwarding,

but the port still cannot forward the packets of the protect VLAN. If the

HEALTH packets sent by the master node from the main port can pass the

link on which the fault recovers to reach the assistant port, it is regarded

that the loop recovers and starts to work and the status turns to PRE-UP;

enable the PRE-UP timer. After the PRE-UP timer times out, turn to the

COMPLETE. As shown in Figure 15-13, the master node blocks the protect

VLAN data forwarding function of the assistant port and sends the COMP-

FLUSH-FDB packet to inform other nodes of the loop recovery and to

update the FDB table of the port. After other nodes on the ring receive the

COMP-FLUSH-FDB packet, update the FDB table of the port, the

neighboring node of the link on which the fault recovers enables the Pre-

Forwarding port so that the data of the protect VLAN can pass and the

loop completes the fault protect switchover.



Figure 15-12 Fault recovering

Figure 15-13 Fault recovery is complete

Running Mechanism of Intersect ing Rings After dividing the intersecting rings to major-level ring and low-level

segment link, the major-level ring is one single ring and is protected

according to the uni-ring protect running mechanism. The nodes on the

low-level segment link detect the low-level segment link, ensuring that the

data communication of the protect service VLAN of any two nodes on the

low-level ring has one connected logical path at most, and the HEALTH

and LINK-DOWN packets of the low-level segment link can only be

transmitted in the low-level segment link. When the loop of the low-level

segment link switches, the edge node sends the COMP-FLUSH-FDB and

COMM-FLUSH-FDB packets to the high-level node, informing the high-level

node to update the FDB table of the port.



Non-faul t Status of Low-level Link The edge control node periodically sends the HEALTH packets from the

edge port, which are transmitted via the transmission nodes and links on

the low-level segment link to reach the edge assistant node. After the

edge assistant node receives the HEALTH packet, detect that the level and

segment of the HEALTH packet are the local level segment and return the

packet from the receiving port. The edge port of the edge control node can

receive the HEALTH packet returned by the edge assistant port. The edge

control node blocks the protect VLAN forwarding function of the edge port

so that the data in the protect VLAN cannot pass the edge port of the edge

control node, preventing the loop, but the Ethernet loop protect protocol

packets can pass the blocked edge port of the edge control node.

Low-level Link Fault If the edge control node does not receive the HEALTH packets within the

specified time, it is regarded that the link fails. The nodes all detect the

link status on the ring. When the node detects that the port link status of

itself is faulty, send the LINK-DOWN packet to the edge control node and

edged assistant node of the level segment link so that the edge control

node and edge assistant node knows that the link fails. To avoid the loop

when the link recovers, the two neighboring nodes of the faulty node

blocks the data forwarding function of the protect service VLAN of the

faulty port and prevents the EIPS protocol packets from being forwarded

via the faulty port.

After the edge control node and edge assistant node detects the fault

status of the level segment link, the edge control node sends the COMM-

FLUSH-FDB packets from the edge port and the two ports of the accessed

level. If the faulty port is not the edge port of the edge control node,

enable the data forwarding function of the edge port protect service VLAN

of the edge control node; when the edge assistant node detects that the

local level segment link fails, send the COMM-FLUSH-FDB packets from the

edge port and the two ports of the accessed level.

When the transmission node receives the COMM-FLUSH-FDB packet and if

the level of the node is higher than or equal to the level of the sending

source, refresh the port FDB table. When the edge node receives the

COMM-FLUSH-FDB packet from the edge port and if the level of the edge

access port is higher than equal to the level of the sending source and the

upper-level node does not know the link status change of the sending

source level, forward the COMM-FLUSH-FDB packet to the upper level and

update the FDB table of the port related with the protect service VLAN.



Low-level Segment Link Recovery When the link is faulty, the edge control node periodically sends HEALTH

packets from the edge port. When receiving the HEALTH packet returned

by the edge assistant node, it is regarded that the low-level link between

the edge ports recovers; block the data forwarding function of the protect

service VLAN of the edge port; send the COMP-FLUSH-FDB packet from

the edge port and the two ports of the accessed level; update the FDB

table of the port related with the protect service VLAN. When the

transmission node receives the COMP-FLUSH-FDB packet and if the level

of the node is higher than or equal to the level of the sending source,

refresh the port FDB table.

When the edge node receives the COMP-FLUSH-FDB packet from the edge

access port and if the level of the edge access port is higher than or equal

to the level of the sending source and the upper-level node does not know

the link status change of the sending source level, forward the COMP-

FLUSH-FDB packet to the upper level and update the FDB table of the port

related with the protect service VLAN. After the two neighboring ports of

the faulty link detects that the link recovers, the EIPS protocol packets are

forwarded via the port on which the fault recovers; set the port status as

Pre-Forwarding. If the COMP-FLUSH-FDB packet of the local level segment

is received, enable the data forwarding function of the protect service

VLAN on the port; if the COMP-FLUSH-FDB packet is not received within

the specified time, automatically time out and enable the port.

Extended Functions Realizing the Ethernet intelligent protect switch is the basic function of

EIPS and is also the main function. The following describes several

extended functions.

Main contents:

Payload balance function

Topology auto collection function

The networking mode of not sending HELLO command

Uni-directional detection function

Reliability realization



Payload Balance Function The basic function of EIPS is to prevent the network ring by blocking the

port. In this way, all user data has only one link to choose, regardless of

the networking, so it is easy to form the traffic bottleneck. The block

granularity can be accurate to the instance on the port, which can solve

the problem validly. The EIPS payload balance function is based on the

method, so the each ring control granularity of EIPS needs to be accurate

to the instance.

The EIPS node is based on one or multiple spanning tree instances.

Perform the protection and switch on the data of the instances. One

physical ring can be configured with multiple EIPS rings and different rings

block different ports, so as to realize the payload balance, as shown in the

following figure.

Figure 15-14 EIPS payload balance

The four switches M1, M2, M3, and M4 are interconnected with each other,

forming one physical ring. Configure four EIPS rings on the physical ring;

the master node of R1 is M1 and the protect instance is inst 1; the master

node of R2 is M2 and the protect instance inst 2; the master node of R3 is

M3 and the protect instance is inst 3; the master node of R4 is M4 and the

protect instance is inst 4. When the physical ring is complete, the EIPS

ring R1, R2, R3, and R4 are all complete. The master node of R1 M1 blocks

the data of inst 1 at the assistant port S; the master node of R2 M2 blocks



the data of inst 4 at the assistant port S. The data traffic of each instance

can pass different link, so as to realize the payload balance.



Topology Auto Collection Function To manage and maintain the network nodes on the ring, EIPS provides the

L2 topology auto collection function. Any one node that enables the EIPS

can see the other nodes in the ring and can describe the topology

structure.

Basic Theory Each node on the ring collects the topology separately. When EIPS is

enabled on the node, the ports of the node actively send the multicast

topology request packet. After the other nodes on the same logical ring

receive the packet, add one to the TTL value. The receiving port returns

the unicast topology response packet to the requester. The response

packet contains the basic information of the node, including the node type,

node status, the information about the contained ports and so on.

Meanwhile, for the master node and transmission node, continue to

forward the topology request from another port. Each node need to reply

after receiving the topology request sent by other node. After the node

receives the topology response packet, save the information and confirm

the location in the node according to the TTL value in the packet. After all

nodes respond, the whole topology structure can be described completely.

The topology collection can reflect the topology status of the current ring,

that is, whether it is one complete ring topology structure. The main ring

and sub ring cannot see the topology structure of each other, but only can

see whether there is other edge node on the transmission node.

For the edge node and assistant edge node, there is only one port, so the

seen topology is the topology collected by the port; but for the master

node and the assistant edge node, when the topology is complete, the

topologies collected by the two ports are complete and consistent; when

the topology is in-complete, for example, one link is disconnected, they

can only collect the part of the topology and you need to combine the

collected parts to form one complete topology. The seen by the user on

the node is the complete topology after combining the topologies collected

by the two ports.

The realtime requirement is not high. Each node sends one topology

request every 10 seconds, so when the topology changes, it cannot get

the response at once and needs to be re-discovered by the re-collection of

the topology after 10 seconds. Each collection updates the previous

topology according to the new response packet. If one node is not updated

within 10 seconds, it is regarded that the node is in the topology range.



Topology Request Packet The topology request packet is as follows:

Figure 15-15 The structure of the topology request packet

The request packet is formed by standard EIPS packet +topology

information head. In the standard EIPS packet field, the destination MAC

address of the Ethernet head field is 0001.7A4F.4AB6. The packet whose

destination address is the address received by any node needs to be sent

to CPU. ERP_TYPE in the standard EIPS packet is TOPOLOGY(15).

The meanings of the fields in the topology information head are as follows:

type: one byte; 1 indicates the topology request; 2 indicates the

topology response;

ttl: one byte; indicating the location of the node relative to the request

node; fill 0 in the topology request packet; add one after passing one

node;

baseMac: 6 bytes, indicating the MAC address of the device; for the

topology request packet, it is the device MAC address of the request

node; it should be null in the topology response packet;

DMAC: 6 bytes, indicating the MAC address of the destination port; in

the topology request packet, it is all 1; in the topology response

packet, it is the MAC address of the request port;

SMAC: 6 bytes, indicating the MAC address of the source port; in the

topology request packet, it is the MAC address of the request packet;

in the topology response packet, it is the MAC address of the response

port;

Topology Response Packet The topology response packet is as follows:



Figure 15-16 The structure of the topology response packet

The topology response packet is formed by standard EIPS packet +

topology information head + node information.

In the standard EIPS packet field, the destination MAC address is the MAC

address of the initiating port of the topology request initiator. The MAC

address is got from the SMAC field in the information head of the received

topology request packet. ERP_TYPE in the standard EIPS packet is

TOPOLOGY(15). In the topology information head, type is 2; ttl is the hops

from the initiator to responder; DMAC is the destination MAC address, the

MAC address of the initiating port of the initiator, that is the value of the

SMAC field of the head information field in the topology request packet;

SMAC is the MAC address of the sending port.

The meanings of the fields in the node information are as follows:

hop: one byte, indicating the hops from the responder to initiator,

equal to the TTL value in the packet;

nt: four bits, short for node type, indicating the type of the response

node;

ns: three bits, short for node status, indicating the current status of

the response node;

b: one bit, short for border, indicating whether there is the edge node

connection; 0 means no; 1 means yes;

bm: four bits, short for backup master, indicating whether it is the

backup master node; 0 means no; 1 means yes;

ar: four bits, short for actor role, only valid for the backup master

node; o means that the backup master node role is not the master

node; 1 means that the backup master node serves as the master

node;

host name: 32 bytes, the host name of the response node;



base mac: 6 bytes, the device MAC address of the response node;

sys oid: 16 bytes, the system OID of the response node;

r_role: one byte, indicating the port role of the port that receives the

request packet;

r_b: four bits, short for r_blockstatus, indicating the BLOCK status of

the port that receives the request packet on the ring of the node; 0

means non-BLOCK; 1 means BLOCK;

r_l: four bits, short for r_linkstatus, indicating the LINK status of the

port that receives the request packet; 1 means UP; 2 means DOWN;

r_i: two bytes, short for r_index, indicating the number of the port

that receives the request packet;

r_n: 16 bytes, short for r_name, indicating the name of the port that

receives the request packet. To save the memory space, intercept a

part of the port name. If it is the common port, omit ―port‖. For

example, save as ―0/0/1‖ or ―0/1‖; if it is the aggregation port, omit

―linkaggregation‖. For example, save the aggregation port 1 as ―1‖

and aggregation port 2 as ―2‖;

r_mac: 6 bytes, indicating the MAC address of the port that receives

the request packet;

s_role: one byte, indicating the role of the port that forwards the

request packet;

s_b: four bits, short for s_blockstatus, indicating the BLOCK status of

the port that forwards the request packet on the ring of the node; 0

means non BLOCK; 1 means BLOCK;

r_l: four bits, short for r_linkstatus, indicating the LINK status of the

port that forwards the request packet; 1 means UP; 2 means DOWN;

r_i : two bytes, short for r_index, indicating the number of the port

that forwards the request packet;

r_n: 16 bytes, short for r_name, indicating the name of the port that

receives the request packet. To save the memory space, intercept a

part of the port name. If it is the common port, omit ―port‖. For

example, save as ―0/0/1‖ or ―0/1‖; if it is the aggregation port, omit

―linkaggregation‖. For example, save the aggregation port 1 as ―1‖

and aggregation port 2 as ―2‖;

r_mac: 6 bytes, indicating the MAC address of the port that forwards

the request packet;



Networking Mode of Not Sending HELLO The master node supports the mode of not sending HELLO command.

When the master node sets the hello timer as 0, do not need to send the

HELLO packet; when not receiving the HELLO packet, the receiving timer

times out and does not modify the EIPS node status. In the mode of not

sending the HELLO packet, as long as the master node detects that the

two ports both become up, it turns to the PRE-UP state.

Figure 15-17 EIPS supports connecting to higher-level network

As shown in Figure 15-17, EIPS is configured as the master node, being

connected to one network N via two lines main and backup. If both the

main line and backup line do not have fault, EIPS blocks the port S of the

backup line and the data is aggregated to network N via the main line. If

the main line fails, EIPS enables the port S of the backup line and the data

is aggregated to the network N via the backup line.

Uni-directional Detection Function In the present EIPS technology, the EIPS node detects the line fault

according to the signal status of the physical line. If there are other

transmission devices between two nodes on the EIPS ring as shown in

Figure 15-18, the uni-directional fault appears between the transmission

devices M1 and M2. EIPS cannot detect the fault according to the line

signals. The EIPS master node also cannot detect the uni-directional fault

according to HELLO. In the actual network, the possibility of the uni-

directional fault is small.



Figure 15-18 Uni-directional fault on transmission device between two

EIPS nodes

To solve the problem, the EIPS nodes send the detection packet LINK-

HELLO to each other. The LINK-HELLO adopts the standard EIPS packet

and uses the SYSTEM_MAC_ADDR field and the front two fields in the

packet to detect. The destination MAC address in the standard EIPS packet

is 0001.7A4F.4AB4, but can automatically learn according to the peer

destination MAC address. ERP_TYPE is LINK-HELLO(14).

SYSTEM_MAC_ADDR records the MAC address of the peer port and the

front two fields record the port number of the peer port. Meanwhile, adopt

the front fields of the reserved field in the packet record the port number

of the sending port. When the eight bytes about the peer information are

all 0.

As shown in Figure 15-16, if one node can receive the LINK-HELLO packet

of the neighbor and SYS_MAC_ADDR in the packet is the MAC address of

the local port and the port number is the number of the local port, it is

regarded that the line is bi-directional.

Figure 15-19 EIPS node sends LINK-HELLO to detect the uni-direction

By default, the period of sending LINK-HELLO is 1s. LINK-HELLO timeout

period is three multiples of the sending period. The sending period can be

configured. When sending LINK-HELLO, the source MAC address in the

packet is the MAC address of the sending port of the sending node;

SYSTEM_MAC_ADDR is the MAC address of the receiving port of the peer

node. As shown in the figure, the node S1 the LINK-HELLO packet whose

source MAC address is the MAC address of the node S1;

SYSTEM_MAC_ADDR is the MAC address of S2. S1 gets the MAC address

of S2 from the LINK-HELLO packet of S2. Only when the MAC address of

the LINK-HELLO received by the node is the peer MAC address,

SYSTEM_MAC_ADDR in the packet is the MAC address of the local port,



and the port number is the local port number, it is regarded that the LINK-

HELLO packet that takes part in the timeout judging is received. When the

nodes does not know the peer MAC address, SYSTEM_MAC_ADDR and the

port number of the LINK-HELLO packet are set as all 0.

If SYSTEM_MAC_ADDR in the LINK-HELLO packet received by one port of

one EIPS node is not the MAC address of the receiving port of the node or

the port number field is not the number of the receiving port, it is

regarded that the uni-directional fault appears. Perform the shutdown

operation on the uni-directional physical port and send the TRAP

information to the gateway. After the physical port is shutdown, EIPS gets

the notification at once.

After the receiving time out, it indicates that one direction or two

directions may be disconnected. If one direction is disconnected, the

neighbor can detect; if two directions are disconnected, the EIPS master

node can detect. Therefore, when the receiving times out, you just need to

clear up the recorded MAC address of the neighbor and do not need more

operations.

If the port belongs to multiple EIPS nodes, choose the control VLAN of one

node as the VLAN field in the LINK-HELLO packet at random when forming

the LINK-HELLO packet. For the selection convenience, select the control

VLAN of the EIPS node with the minimum node number.

Reliability Realization In the ring topology network, if the control platform of the master node

becomes abnormal and breaks down, but the data platform is complete, it

makes the data platform become ring. To avoid the problem, back up the

master node to realize the EIPS reliability. Therefore, the concept of

backup master node is put forward. The main function of the backup

master node is to serve as the master node when the control platform of

the master node breaks down. When it is detected that the topology is

complete, block the assistant port to avoid the ring and inform other

nodes to refresh FDB.

The backup master node can only be the transmission node. The edge

node and assistant edge node, as well as the transmission node that is

connected to the edge node or assistant edge node cannot serve as the

backup master node. To avoid the influence for the link caused by blocking

the assistant port of the backup master node and the assistant port of the

master node, the assistant port of the configured backup master node

must be direct-connected to the assistant port of the master node, as

shown in the following figure.



Figure 15-20 Assistant port of backup master node is direct-connected to

assistant port of master node

Set the Hello packet and LINKDOWN packet on the backup master node to

go to CPU and be forwarded. When the backup master node cannot

receive the HELLO packet of the master node, send the HELLO1 packet

(the format of the HELLO1 packet is the same as that of the HELLO packet;

only the destination MAC address is different; the destination MAC address

of the HELLO1 packet is 0001.7A4F.4AB5) that detects the integrality of

the data platform of the master node and the complete status of the ring.

If the assistant port can receive the HELLO1 packet, it indicates that the

loop is complete and the data platform of the master node is complete,

but the control platform breaks down. Here, the assistant port should be

blocked, and send the COMP-FLUSH-FDB packet from the main port. Set

the working status of the backup master node as the master node, as

shown in Figure 15-21.



Figure 15-21 The control platform of master node breaks down

When the backup master node works as the master node, its working

theory is basically the same as the master node. When the LINKDOWN

packet on the ring is received, you need to enable the assistant port and

send the COMM-FLUSH-FDB packet to the ring via two ports. If the HELLO

packet of the master node is received and the assistant port is in the

BLOCK state, you need to enable the assistant port and switch the working

status to the transmission node status.



ULFD Technology

ULFD Protocol and Application Unidirectional Link Fault Detection protocol (ULFD) is a L2 protocol. It can

be used by the devices connected with the fibers or twisted-pairs so that

they can monitor the physical configuration of the cables and check

whether uni-directional link exists. When discovering a uni-directional link,

UDLD disables the interface.

The uni-directional link results in the a series of problems, including the

spanning tree topology ring.

This section describes the theory and realization of the ULFD protocol.

Related Terms of ULFD Protocol Uni-directional link: Sometimes, there is one special phenomenon—uni-

directional link, that is to say, the local device can receive the packets sent

by the peer device, but the peer device cannot receive the packets of the

local device. The uni-directional link causes a series of problems, such as

spanning tree topology ring.

Take fiber as an example. The uni-directional link includes two types. One

is that the fibers are cross-connected; the other is that one fiber is not

connected or one fiber is disconnected. As shown in Figure 16-1, the fibers

of the two devices are cross-connected; as shown in Figure 16-2, the

hollow wire means that one fiber is not connected or one fiber is

disconnected. The typical case of Figure 16-2 is that one device is not

connected or disconnected.



The cross connection of fibers



One fiber is not connected or disconnected

Introduction to ULFD Protocol The ULFD protocol is used for the network uni-directional detection.

The ULFD protocol has the following features. ULFD is the link layer

protocol and it cooperates with the physical layer protocol to monitor the

link status of the devices. The auto negotiation mechanism of the physical

layer is used to detect the physical signals and faults; ULFD is used to

identify the peer devices and uni-directional link and close the un-

reachable ports. After enabling the auto negotiation mechanism and ULFD,

they cooperate to work and can detect and close the physical and logical

uni-directional connection and prevent other protocols (such as STP

protocol) from become invalid. If the links of the two ends can work

separately at the physical layer, ULFD detects whether the links are

connected correctly and whether the two ends can exchange packets. The

detection cannot be realized via the auto negotiation mechanism.

Protocol Packet Def in i t ion The ULFD protocol runs at the LLC layer. It uses one special broadcast

address as the target address and adopts the standard SNAP format.

The destination MAC address is Destination MAC address 01-00-0C-CC-CC-

CC.

Source MAC address is the L2 MAC address of the device.

ULFD SNAP format:

LLC value: 0xAAAA03

Org ID : 0x00017a

HDLC protocol type: 0x0111

ULFD PDU Field Definition:

Ver field ( 3bits):

0x01:ULFD PDU Version Number, the current ULFD protocol version

number

Opcode field (5 bits):



Packet Type Value Description

Keepalive (probe) 0x01 Used to generate the packet for discovering the

neighbor and keeping the neighbor alive; used when

maintaining the neighbor table and requesting re-

synchronizing the neighbor.

Detection (echo) 0x02 The packet used for the unidirectional detection; when

the new neighbor is detected (or the old neighbor is re-

synchronized), adopt the packet for the uni-directional

detection.

Clear (flush) 0x03 Notify the neighbor when ULFD is disabled on one

device or port; it is used to synchronize the neighbor

information rapidly; after the neighbor receives the

message, clear up the corresponding buffer information

at once.

Flags(8 bits):

1 Byte

0 1 2 3 4 5 6 7

Recommended timeout flag(RT) ReSynch flag(RSY) Reserved

The RSY flag is used to indicate that the packet is normal probe keepalive

packet or the probe packet that requests re-synchronization and detection.

When the RSY flag is 1, the receiving end needs to return the echo packet.

ULFD PDU Encapsulation Format:

Byte0 Byte1 Byte2 Byte3

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Ver Opcode Flags checksum

Device-ID TLV

Port-ID TLV

Device Name TLV

Echo TLV

Message Interval TLV

Timeout Interval TLV

Sequence Number TLV

TLV format:



Byte0 Byte1 Byte2 Byte3

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

TLV_type TLV_Len

TLV_Value

…………

…………

If the TLV type is in the TLV type range defined by ULFD, the TLV is

regarded as invalid.

Protocol Act ion The work of the ULFD protocol contains the following aspects:

Neighbor discovery: The port sends its own information and the re-

synchronization request via the probe packet, while the peer port realizes

the neighbor discovery according to the content information of the probe

packet after receiving the probe packet. After the port receives one probe

packet, judge whether the sending port is in the neighbor table. If no, it

indicates that it is the new neighbor, so add it to the neighbor table and

return the echo packet for uni-directional detection; if the sending port is

in the neighbor table, but the probe packet is set with the RSY flag, it

indicates that the neighbor requests re-synchronization and send the echo

packet to the port for the uni-directional detection; if the sending port is in

the neighbor table and is not set with the RSY flag, the probe packet is

one common keepalive packet and update the information of the neighbor.

Neighbor aging: After the neighbor is added to the neighbor table, the

port sets one aging time Tlf according to the Message Interval value in the

received probe packet. If the port does not receive the probe keepalive

packer sent by the neighbor after reaching the time Tlf, the neighbor is

aged and deleted from the neighbor table.

Uni-directional detection: The port performs the uni-directional

detection only when the neighbor table changes. The detection initiator

first initiates one probe packet with the synchronization request (RSY flag),

requests that the peer returns the echo packet after receiving the packet

and adds its own neighbor table information to the echo-tlv field of the

echo packet. If the initiator receives the echo packet sent by the peer,

check whether the contents of the echo-tlv field is correct, including the

packet format and whether the local port and device ID information is

contained. If the format of the received echo packet is correct and the

echo-tlv filed contains the local port and the device ID information, it is

regarded that the port is in the bi-directional status; if the contents of the

received echo packet are not correct, it is regarded that the port is in the

uni-directional status; if the echo packet is not received, the processing

method depends on the ULFD detection mode.



Uni-directional processing: After the port status is confirmed as the

uni-directional, the neighbor table of the port is cleared up; send the

FLUSH packet to inform the neighbor that the port information to clear up

the port information, and then shut down the port. To re-enable the port,

the user needs to execute the Reset operation manually or configure other

auto recovery mechanism.

Keepalive mechanism: After the port status is stable, the port

periodically sends probe keepalive packet, informing other ports of its

status. The peer uses the keepalive packet to refresh the status of the

neighbor. If the probe keepalive packet is not received within the

keepalive period, the port is deleted from the neighbor table. The probe

keepalive packet carries all neighbor information of the port. The sending

period of the probe keepalive packet Tmsg can be set via the global

command.

Two Kinds of Detect ion Modes The ULFD mode has two kinds of working modes, that is, normal mode

and aggressive mode. In the two modes, the methods of judging the uni-

directional link are different.

In normal state, if the port does not the packet of the peer end in the

keepalive stage, the port is in the un-confirmed status; if the port does not

receive the echo packet of the peer end or the received echo packet does

not have the local port information in the uni-directional detection stage, it

is regarded that the local port and the peer link are in the uni-directional

state. The Normal mode is often used to check the uni-directional status

caused by the crossover connection.

In the aggressive mode, if the port does not receive the packet of the peer

end and as a result, all neighbor are aged in the keepalive stage, and no

any neighbor is learned after the process of Re-establishing the link, it is

regarded that the local port is un-reachable (not the uni-directional link on

strict meaning), and shut down the local port; if the port does not receive

the echo packet of the peer end or the received echo packet does not have

the local port information in the uni-directional detection stage, it is

regarded that the local port and the peer link are in the uni-directional

state. The Aggressive mode is used to check the uni-directional connection

caused by the fiber crossover connection or disconnection.



Typical Application When using ULFD, ensure that the corresponding ports are configured with

the ULFD function and work in the same detection mode; the ULFD global

setting of the device is enabled.

In this section, configure one basic ULFD protocol for reference.

The network topology is as follows:

ULFD configuration instance

Illustration

Port 0/0 of the local switch A is connected to Port 0/1 of the peer switch B

via the fiber. Now, configure the ULFD function on the connection to detect

the connection status of the link.

The configuration of Switch A:

Command Description

SwitchA(config)# port 0/0 Enter the port configuration mode

SwitchA (config-port-0/0) #ulfd port aggressive

Configure the ULFD work node aggressive on port 0/0

SwitchA (config-port-0/0) #exit Exit the port configuration mode

SwitchA (config)#ulfd message time 16 Configure the interval of sending packets as

16s

SwitchA(config)#ulfd enable Enable ULFD globally

SwitchA (config)#exit Complete the ULFD configuration

The configuration of Switch B:

Command Description

SwitchA(config)# port 0/1 Enter the port configuration mode

SwitchA (config-port-0/1) #ulfd port aggressive

Configure the ULFD work node aggressive on port 0/1

SwitchA (config-port-0/1) #exit Exit the port configuration mode



SwitchA(config)#ulfd enable Configure the interval of sending packets as 16s

SwitchA (config)#exit Enable ULFD globally



OAM Technology

The chapter describes the MAN OAM technology and the applications. OAM

is short for Operation, Administration and Maintenance.

Main contents:

CFM protocol and its application

E-LMI protocol and its application

Ethernet OAM protocol and its application

CFM Protocol and Application This section describes the basic theories of the Ethernet connectivity fault

management (CFM).

Main contents:

Terms of Ethernet CFM

Introduction to Ethernet CFM protocol

Terms of Ethernet CFM CFM: Connectivity Fault Management;

OAM: Operation, Administration and Maintenance;

Maintenance Domain (MD): It is a part of the network covered by the

connectivity fault management. Its limit is defined by a series of

maintenance points (MP) configured on the ports. The maintenance



domain name is used to identify the MD. According to multi-domain OAM

network model of 802.1ag, MD has hierarchical levels. The high level can

include the low level, but they cannot intersect, that is, the range covered

by high level is larger than that covered by the low level. The integers of

0-7 are used to identify different levels. The higher the level, the bigger

the number.

Maintenance Association (MA): It is a set in MD, including some MPs. MA is

identified by MD name + short MA name. MA serves one VLAN, in which

the packets sent by the MPs in MA are forwarded and the packets sent by

other MPs in the MA are received at the same time. Therefore, MA is also

called Service Instance (SI).

MP (Maintenance point): It is one Maintenance Association End Point (MEP)

or Maintenance Association Intermediate Point (MIP). It is configured on

the port and belongs to one MA. On one port, each MA can be configured

with only one MP.

Maintenance Association End Point (MEP): It can receive and send any

CFM packet. Each MEP is identified by an integer, which is called MEP ID.

MEP is configured on the port and decides the MD range. The MA and MD

to which the MEP belongs decide the VLAN attribute and level attribute of

the packet sent by MEP. According to the location of MEP in MA, the MEP

direction includes inward and outward. If the packet in MA is received from

the port on which the MEP is configured, the MEP direction is outward.

Similarly, the outward MEP can only send packets to the network via the

port on which the MEP is configured. Contrariwise, if the packet in MA is

received from other port, the MEP direction is inward. The inward MEP

cannot send packets to the network via the port on which the MEP is

configured.

Maintenance Association Intermediate Point (MIP): It can process and

respond to some CFM packets (such as LT packet or the packet whose

destination is the LB which is at the same layer as itself), but cannot send

packets initiatively. The MA and MD to which the MIP belongs decide the

VLAN attribute and the MD level of the received packet.

Introduction to Ethernet CFM Protocol The IEEE 802.1ag protocol calls the Ethernet OAM function as connectivity

fault management (CFM), which is a supplement of the 802.1Q protocol. It

is the end-to-end Ethernet OAM function based on VLAN. It defines the

protocol and protocol entities for checking, confirming and locating the

connectivity fault in the VLAN-based network.



This section describes some basic concepts and functions of Ethernet CFM.

Maintenance Domain The maintenance domain is a part of the network covered by the

connectivity fault management. Its limit is defined by a series of

maintenance points (MP) configured on the ports, including MEP and MIP,

as shown in figure 17-1.

Maintenance domain

The carrier-class Ethernet needs to provide different management scopes

and contents for different organizations. Usually, there are three kinds of

organizations that refer to carrier-class Ethernet services, including

customers, service providers, and network carriers. Users purchase

Ethernet services from service providers; service providers can use their

own network or other carriers’ network to provide end-to-end Ethernet

services. In IEEE 802.1ag, carrier-class Ethernet is divided to one multi-

domain OAM network model, including three maintenance grades, that is,

customers, service providers, and carriers. They correspond to different

management domains. The service providers are responsible for end-to-

end service management and the carriers provide service transmission.

Figure 17-2 shows three maintenance domains, that is, customers, service

providers, and carriers, as well as the hierarchical structure of the

maintenance domains. CE is the edge device of the customer (Customer

Edge); PE is the edge device of the service provider (Provider Edge).



Hierarchical structure of Ethernet CFM maintenance domain

Maintenance Associat ion Maintenance Association (MA): It is a set in MD, including some MPs. MA is

identified by MD name + short MA name. MA serves one VLAN, in which

the packets sent by the MPs in MA are forwarded and the packets sent by

other MPs in the MA are received at the same time. Therefore, MA is also

called Service Instance (SI).

Maintenance Point One maintenance point is one function point configured on the port, which

takes part in the CFM protocol operation. According to the different

locations of the maintenance points in the maintenance domain, the

maintenance point is divided to Edge Maintenance Point (MEP) and

Maintenance Intermediate Point (MIP).

MEP defines the limit of one maintenance domain. Meanwhile, these

maintenance points can limit the CFM packets in the range of the

maintenance domain according to the level of the maintenance domain.

MEP can send and receive any CFM packet.

Each MEP is identified by an integer, which is called MEP ID. MEP is

configured on the port and decides the MD range. The MA and MD to which

the MEP belongs decide the VLAN attribute and level attribute of the

packet sent by MEP. According to the location of MEP in MA, the MEP

direction includes inward and outward. If the packet in MA is received from

the port on which the MEP is configured, the MEP direction is outward.

Similarly, the outward MEP can only send packets to the network via the



port on which the MEP is configured. Contrariwise, if the packet in MA is

received from other port, the MEP direction is inward. The inward MEP

cannot send packets to the network via the port on which the MEP is

configured.

MIP can process and respond to some CFM packets (such as LT packet or

the packet whose destination is the LB which is at the same layer as itself),

but cannot send packets initiatively.

Figure 17-3 shows the case that MEP and MIP are on the devices of the

customers, service providers, and carriers.

Hierarchical management of MD and locations of MEP and MIP

802.1ag supports hierarchical management and the management level is

identified by the level of the maintenance domain. The low levels can be

nested. The high-level maintenance domain can include the low-level

maintenance domain, but the low-level maintenance domain cannot

include the high-level maintenance domain. All CFM packets are initiated

by MEP. MIP does not send any CFM packet actively, but responds to LT or

the LB packet at the same layer as itself.

Figure 17-3 shows the hierarchical management of the maintenance

domain. The bigger the ID, the higher the level, the wider the control

scope.

When the maintenance domain is used to locate the fault, you can first use

LT or LB to determine the fault interval on Level 5. If the fault is between

two MIPs on Level 5, continue to use LT or LB to locate the fault on Level 3.

The packets sent or received by each MP belong to its MA, have the

features of the VLAN and layer, and do not interfere with each other. The

rest is deduced by analogy until the minimum fault area is found.



Similarly, MEP sends CCM, and remote MEP receives and processes it.

When the MD and MA configured by remote MEP are inconsistent with

those configured by the MEP that sends CCM, you can find out the

configuration error in the network.

Connect ivi ty Check The continuity check function is the most basic function in 802.1ag, used

to check the connection failure of the Ethernet flow between MPs. The

connection failure may be caused by the fault or configuration error. The

connectivity check is suitable for checking the unidirectional connection

failure. Figure 17-4 shows the example chart of one CC function. The

maintenance domain (Provider Domain) contains two Operator Doamians

(Operator A and Operator B).

Connectivity checking

When the network connection is normal, each MEP periodically sends

multicast CCM (Continuity Check Message). The destination address is the

multicast address, which is determined by the level of the maintenance

domain where the MEP is located, as shown in Table 1-1.

After the MEP receives the CCM sent by the equivalent MEP in the same

maintenance domain and analyzes it correctly, the information of the peer

MEP is saved in the CCM database. The information includes MEP ID, MAC

address of MEP, remote error ID (RDI) of MEP, Sender ID of MEP, and so

on.

The local MEP compares the MEP ID of the received CCM to ensure that

there is no repeated MEP ID in the local configuration. If there is repeated



MEP ID, it indicates that the network configuration is wrong or there is

loop.

The timeout of CCM is the 3.5 multiples of the sending interval, that is, the

connection between the local MEP and the remote MEP is regarded as

wrong when three successive CCMs are lost.

Multicast address of connectivity check packet (CCM)

01-80-C2-00-00-3y

MD Level of CCM Four address bits “y”

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

CCM can reach any MEP in one MA. When other MEPs receive the CCMs

from one MA, first get the packet information and save it in the CCM

database, and then check whether the CCMs of all other MEPs in one MA

are received within the specified time.

Suppose MEP sends one CCM. When the CCM reaches the MIP in the MA,

the MIP continues to forward it; when the CCM reaches the destination

MEP of the same MA, the MEP checks whether the Level is the same as

CCM. When the timer does not time out, process the packet, re-set the

timer, and wait for receiving the next CCM sent by the remote MEP.

When receiving the CCMs sent by the other MEPs in the same MA, MEP

periodically multicasts the CCMs outward. The local MEP is responsible for

checking whether the MEP in the local CCM database times out. If the MEP

times out, it indicates that the connection with the remote MEP fails;

report the error to the network administrator.

When the sending interval of the received CCM is inconsistent with the

configured value in MA, it triggers error notification (FNG alarm). When



the MA IDs of the received CCMs are inconsistent, it indicates that there is

cross-connection error, which also triggers FNG alarm.

Loopback Check Loopback (LB) check function is used to check the connection status with

the remote device. It is suitable for checking the bidirectional connectivity

failure. The LB function is shown in Figure 17-5.

Loopback check

Execute the command to send Loop Back Message (LBM) actively on MEP

via the network management system. The target can be any MP in MA. For

the other remote MEP in MA, the local MEP can get its MAC address via

CCM; for MIP, the local MEP gets its MAC address by sending Link Trace

Message (LTM).

Each LBM has a unique serial number. After sending LBM, the serial

number of the packet is reserved for at least 5 seconds, used to

distinguish whether the received Loop Back Reply (LBR) is the correct

reply packet of the sent LBM.

When CC finds the network connectivity error, the network administrator

uses the command to trigger sending LBM to perform error track. When

MP receives LBM, first check the validity of the packet (for example, the

source address must be one unicast address), and then reply one LBR to

the source MEP. Exchange the source address and destination address of



LBR with those of LBM; the packet type is changed to LBR; the contents

are the same as those of LBM.

When MEP receives LBR, it checks whether the serial number is consistent

with that of the latest LBM. If inconsistent, it indicates that there is error;

if MIP receives one LBR, it is regarded as one error packet and drop it.

Link Trace Funct ion Link Trace (LT) function is used to search the neighboring relation and

locate the fault. The LT function is as show in Figure 17-6.

Link trace function

LTM is the multicast packet. The multicast address is as shown in table 1-2.

Multicast address of link trace packet (LTM)

01-80-C2-00-00-3y

MD Level of LTM Four address bits “y”

7 F

6 E

5 D

4 C



3 B

2 A

1 9

0 8

TLV of LTM contains one original address (Original MAC) and one target

address (Target MAC). The original address is the address of the port

where the MEP that sends LTM is located; the target address is the MAC

address of the target MEP to which the LTM is sent. Their difference is the

destination address and source address of the Ethernet data frame. There

is a unique serial number in the LTM packet, which is added with one

every time sending.

Each MP with the same level to the target address sends one LTR packet

to the original address. The packet is one unicast packet, whose source

address is equal to the target address of LTM and the destination address

is equal to the original address in TLV of LTM.

When the FNG alarm appears, send the LTM packet to track and locate the

error link. MEP sends one LTM and MIP decides whether to receive the LTM

packet according to the level of the maintenance domain. When receiving

the packet, MIP first checks whether the TTL value of LTM is 0. If yes, drop

the packet. Otherwise, subtract one from TTL and then search for the

egress port to forward the LTM packet according to the target address and

VLAN ID of LTM in the FDB table. If the egress port is not found in the FDB

table, drop the LTM packet. When the LTM packet is forwarded, the other

information except for the source MAC address and TTL value does not

change. The MIP on the port replies one LTR packet to the source MEP

after one random delay. When the network fails, LTM can only reach the

MP before the faulty point. The MPs between the faulty point and the

target MEP do not reply LTR. In this way, the faulty area can be found.

CFM Packet The CFM packet type is 0x8902. The public head of the CFM packet is as

shown in Figure 17-7.



Public head of CFM packet

The CCM packet is as shown in Figure 17-8.

CCM packet

The LBM and LBR packets are as shown in Figure 17-9.

LBM and LTM packets

The LTM packet is as shown in Figure 17-10.



LTM packet

The LTR packet is as shown in Figure 17-11.

LTM packet

E-LMI Protocol and Application Main contents:

E-LMI protocol and application

Definition of E-LMI protocol

Relation of E-LMI protocol and 802.1ag

UNI-N of E-LMI

UNI-C of E-LMI

Typical application



Terms of E-LMI Protocol EVC (Ethernet Virtual Connection): MEF defines EVC as port-class point-to-

point or multipoint-to-multipoint Ethernet L2 circuit. EVC is the association

of two or more UNIs. The EVC status information can be used by CE as the

routing basis of the access service provider’s network.

UNI (User Network Interface): It is he Ethernet physical interface between

the edge device of the service provider (PE) and the edge device of the

user (CE). It comprises UNI-N (on the PE device) and UNI-C (on the CE

device). The E-LMI protocol runs on one UNI and its limits are UNI-N and

UNI-C.

CE (Customer Edge): the edge device of the customer;

PE (Provider Edge): the edge device of the service provider;

Introduction to E-LMI Protocol Referring to the local management interface standards of frame relay (FR-

LMI), MEF defines the Ethernet Local Management Interface (E-LMI). E-

LMI is one OAM protocol applied on the user network interface (UNI) and

works between the edge device (CE) of the customer and the edge device

of the service provider (PE). E-LMI makes the service provider can

automatically configure CE according to the purchased services. With E-

LMI, CE can automatically receive the mapping information from specific

Ethernet service instance (such as VLAN 100) to EVC and the bandwidth

and QoS settings. The auto configuration function of the CE device reduces

not only the service construction work, but also the negotiation work

between the service provider and the enterprise user. Therefore, the user

does not need to know the configuration of the CE device, which is

configured and managed by the service provider, reducing the risks of

wrong manual operations. Besides, E-LMI provides the EVC status

information for the CE device. Once the EVC fault is found (such as

802.1ag), the edge device of the service provider can inform the CE device

of the fault information so that the CE device can do the corresponding

adjustment in time (for example, switch the access route).

Definition of E-LMI Protocol E-LMI protocol runs on one UNI; the protocol edges are UNI-CE and UNI-

PE, as follows.



Metro Ethernet Network

User Network

Interface

E-LMI

User Network

Interface

E-LMI

CE CEPE

PE

Typical topology of E-LMI protocol running on one UNI

E-LMI Protocol Act ion The actions of the E-LMI protocol include CE polling and PE informing.

CE Polling:

The UNI-C device transmits the E-LMI Check message (E-LIMI Check

STATUS ENQUIRY) to the UNI-N device for active polling; the polling

interval is T391s (by default, it is 10s). Every after N391 times (360 times

by default) of active polling, UNI-C transmits one complete status request

message (FULL STATUS ENQUIRY). UNI-N transmits the status and

configuration information of UNI and EVC to UNI-C as response. UNI-N

enables the T392 timer to wait for the request message of UNI-C. The

configured value of T392 must be larger than T391.

After receiving the correct response of Full Status Enquiry, CE modifies

and updates the status and configuration information of EVC and UNI in

the local database according to the information carried in the response, so

as to ensure that the EVC configuration and status information of CE is

synchronous with that of PE.

PE Informing:

If finding that the EVC status on PE changes, PE immediately sends the

Single Evc Asynchronous Status message to inform the CE. CE modifies

and updates the EVC status information in the local database according to

the information carried in the response, so as to ensure that the EVC

status information of CE is synchronous with that of PE.

E-LMI Message Type MEP 16 defines two kinds of message types to realize the E-LMI protocol

interacting, including STATUS ENQUIRY message and STATUS message.



The content type (Report Type) transmitted by the E-LMI packet is divided

to the following four types:

E-LMI Check: the checking packet during normal polling;

Full Status: full-status packet;

Full Status Continues: Full-status follow-up packet;

Single EVC Asynchronous Status: active EVC status informing packet; the

packet can only be sent by UNI-N to inform CE of the EVC status change

information.

STATUS ENQUIRY Message:

The STATUS ENQUIRY message is sent by UNI-C to ask the UNI-N for the

configuration and status information of EVC and UNI. After receiving one

valid STATUS ENQUIRY message, UNI-N should send one STATUS

message to reply the request message.

The structure of the STATUS ENQUIRY message:

Message type: STATUS ENQUIRY

Direction: UNI-C to UNI-N

Information element Type

Protocol Version Mandatory

Message type Mandatory

Report Type Mandatory

Sequence Numbers Mandatory

Data Instance Mandatory

Structure of STATUS ENQUIRY message

STATUS Message:

The STATUS message is sent by UNI-N to reply the STATUS ENQUIRY

message or actively inform UNI-C of the EVC status change information.

The Report Types of the messages are different, so the contents of the

STATUS massages are different. The content relation is as follows.

STATUS message



Report Type

Information Element

Value

Information Element

Full

Status

E-LMI Check Single EVC

Asynchronous

Status

Full

Status

Continued

Sequence Numbers X X X

Data Instance X X X

UNI Status X

EVC Status X X X

CE-VLAN ID/EVC Map X X

E-LMI Message Frame Encapsulat ion Format Destination

Address

Source

Address

E-LMI

Ethertype

E-LMI PDU

(message)

CRC

6 Octets 6 Octets 2 Octets 46 1500 Octets

(Data + Pad)

4

Octets

E-LMI message encapsulation frame format

In MEF-16, the destination address of the E-LMI message is defined as 01-

80-c2-00-00-07; E-LMI EtherType is defined as 0X88EE. The PDU contents

comprise the series of TLV. For details, refer to MEF-16 standards.

Relation between E-LMI Protocol and 802.1a The E-MLI protocol runs on the UNI connection from PE to CE and gets the

EVC and UNI configurations and status information from the UNI-N end to

complete the auto configuration function of CE at the UNI-C end. But at

the UNI-N end, the E-LMI module cannot get the EVC status information,

but depends on the CC (Cross Check) function of the 802.1ag protocol

(CFM module) to check the connectivity between UNIs of EVC, so as to

determine the current operation status of EVC.

UNI-N End of E-LMI The EVC, UNI and CFM configurations need to be configured on UNI-N end.

The defined EVC needs to be applied on UNI. On one UNI, use EVC

Reference ID to identify one EVC. Different EVCs map with CE-VLAN IDs.

The number of the bound EVCs depends on the UNI type.



CFM

Refer to the configurations and technology description of the 802.1ag

protocol.

EVC

EVC needs to be defined on UNI-N. The EVC is divided to point-to-point

and multipoint-to-multipoint types.

Point-to-point EVC comprises only two UNIs; Multipoint-to-mulitpoint EVC

comprises two or more UNIs.

One EVC needs to be bound with the CFM management domain instance.

The connectivity between the UNIs in EVC can be got via CFM

management domain instance.

UNI

UNI has the following three types:

Bundling: Multiple EVCs can be configured on one UNI and one EVC can

map with multiple CE-VLAN IDs;

Service Multiplexing with no Bundling: Multiple EVCs can be

configured on one UNI, but each EVC can map with only one CE-VLAN ID;

All to one Bundling: One UNI can be bound with only one EVC and all

CE-VLAN IDs map to the EVC;

The port of the UNI-N end needs to be configured as the MEP node of one

CFM domain and enable the CC function of CFM. In this way, UNI-N end

can get the connection status between the UNIs of EVC configured on the

PE device via the 802.1ag protocol, so as to get the current operation

status of the EVC.

Enable PE Mode of E-LMI Protocol:

After enabling the PE mode of the E-LMI protocol on the UNI-N, the UNI-N

waits for the request of UNI-C and makes the corresponding response.

When UNI-N finds that the status of the EVC bound to the UNI changes, it

actively sends the EVC status notification message to the PE.



UNI-C of E-LMI The UNI-C of E-LMI only needs to enable the E-LMI protocol and run in the

CE mode. After being configured as the CE mode, UNI-C periodically sends

the E-LMI Check request to UNI-N and initiates one Full Status request to

ask UNI-N for the EVC and UNI configuration and status information when

finding that the Data Instance values of EVC and UNI do not match with

each other via the E-LMI Check message. Besides, the local UNI-C

information is updated.

Typical Applications The following is one typical application of E-LMI.

Topology of typical E-LMI application

In the above figure, one EVC——EVC_Provider is defined to show the

network connection of the service provider. It comprises PE1, PE2, and

PE3. The blue ellipse means one CFM management domain- Service

Provider Domain, whose level is 4. The three edge devices are configured

as three MEP nodes of the domain. The CFM management domain checks

the connectivity between the three MEPs to determine the EVC_Provider

operation status.

Enable the E-LMI protocol on the UNI connection UNI1 between CE1 and

PE1. CE1 gets the UNI1 configuration information, and EVC_Provider

configuration and status information from PE1 via the E-LMI protocol, so

as to complete the auto configuration function of CE1.



Ethernet OAM Protocol and Application Main contents:

Ethernet OAM protocol and related terms

Introduction to Ethernet OAM protocol

Related Terms of Ethernet OAM Protocol OAM: Operations Administration and Maintenance

Errored symbol: the times of the error symbol on the port

Errored frame: the number of the received error packets

Introduction to Ethernet OAM Protocol As one L2 protocol, Ethernet OAM is the tool of monitoring and solving

network problems. It can report the network status at the data link layer

so that the network administrator can manage the network more

efficiently. Ethernet OAM is defined in IEEE 802.3ah.

Currently, Ethernet OAM mainly solves the OAM problems of the Ethernet

devices at the last one km, including link performance monitoring, fault

detecting and alarming, loopback test, remote MIB and variable request.

All functions of Ethernet OAM can become valid only after the Ethernet

OAM connection is set up.

The main functions of the Ethernet OAM are as follows:

1. Discovering and setup of Ethernet OAM connection

2. Link monitoring of Ethernet OAM connection

3. Remote fault diagnose of Ethernet OAM connection

4. Remote loopback of Ethernet OAM connection

5. MIB variable request of Ethernet OAM connection



Locat ion of Protocol in System

Location of Ethernet OAM in the system

As shown in the above figure, the Ethernet OAM is located between MAC

Control layer and the LLC layer.

Protocol Structure

Structure of Ethernet OAM protocol

As shown in the above figure, Ethernet OAM comprises the OAM sublayer

and OAM client.

The OAM sublayer is responsible for the flow dividing and remote loopback

policy processing of the sent and received packets on the interface; OAM

client is responsible for the connection maintenance and remote loopback

control of the protocol.



Structure of OAM sublayer

As shown in the above figure, the OAM sublayer comprises Multiplexer,

Parser, and Control.

Multiplexer is responsible for the OAM processing at the sending direction

of all packets (including service data packets) on the interface. There are

two modes, that is, Forward mode (send all packets normally) and Discard

mode (discard all packets of non-Ethernet OAM protocol).

Parser is responsible for the OAM processing at the receiving direction of

all packets (including service data packets) on the interface. There are

three modes, that is, Forward mode (receive all packets normally),

Loopback mode (loopback the non-Ethernet OAM protocol packets), and

Discard mode (discard all non-Ethernet OAM protocol packets).

Control is responsible for sending and receiving the Ethernet OAM protocol

packets.



Basic Format of OAM Protocol Packet

Basic format of Ethernet OAM packet

As shown in the above figure, the destination address of the Ethernet OAM

packet is 01-80-C2-00-00-02; the OAM packet belongs to the low-speed

protocol (the protocol number is 88-09); the subtype is 0x03;

Flags identifies the status of the Ethernet OAM;

Code identifies the type of the Ethernet OAM packet;

Data/Pad is the data content of the Ethernet OAM packet, which varies

with Code;

Information OAMPDU:

Information OAMPDU packet is used to send the status information of the

OAM entity (including local information, remote information and

customized information) to the remote OAM entity, keeping the OAM

connection. The packet format is as follows:

Information OAMPDU packet format



Event Notification OAMPDU:

Event Notification OAMPDU packet is used for the link monitoring,

alarming the link fault of the remote OAM entity. The packet format is as

follows.

Event Notification OAMPDU packet format

Variable Request OAMPDU

Variable Request OAMPDU packet is the variable request packet, which is

sent when requesting the MIB variable. The packet format is as follows:

Variable Request OAMPDU packet format

Variable Response OAMPDU

Variable Response OAMPDU packet is used to respond the variable request,

which is sent when responding the MIB variable request. The packet

format is as follows:



Variable Response OAMPDU packet format

Loopback Control OAMPDU

Loopback Control OAMPDU packet is used for remote loopback control. The

device can select whether to use the packet. To realize the loopback

control, the local DTE sends the loopback control command to the remote

DTE. If the loopback control function of the remote DTE is enabled, the

sent packet is returned to the sending party. The packet format is as

follows.

Loopback Control OAMPDU packet format

Discovery and Setup of OAM Connect ion The OAM connection is set up during OAM Discovery. When setting up the

OAM connection, the connected devices can exchange their OAM

configuration information and announce the OAM capabilities supported by

the local node. The other OAM functions can be performed only after the

OAM connection is set up.

Active Mode and Passive Mode



The device can select Active mode or Passive mode to set up the OAM

connection. The DTE (Data Terminating Entity) processing capabilities in

active mode and passive mode are as follows.

Comparison of DTE processing capabilities in active mode and passive

mode

Processing Capability DTE in Active Mode DTE in Passive Mode

Initiate OAM Discovery Yes No

Respond OAM Discovery Yes Yes

Need to send Information OAMPDUs Yes Yes

Allow sending Event Notification OAMPDUs Yes Yes

Allow sending Variable Request OAMPDUs Yes No

Allow sending Variable Response OAMPDUs

Yes, but the peer DTE also needs to be in the active mode.

Yes

Allow sending Loopback Control OAMPDUs Yes, but the peer DTE also needs to be in the active mode.

No

Respond Loopback Control OAMPDUs Yes Yes

Allow sending Organization Specific OAMPDUs

Yes Yes

Status Transferring and Triggering Event of Connection



Status transferring of connection

The above figure shows the status transferring of the Ethernet OAM

connection. Besides the status transferring described in the above figure,

there are several special status transferring:

1. When the connected timeout timer times out, all status return to

Active or Passive;

2. When the port is down or the OAM function is shut down, all status

return to Fault;

Transferred Status of OAM Connection

Transferred status of connection

Status Description

Fault Ethernet OAM does not begin running.

Active Active status, actively sending out the information OAMPDU

packet that contains Local information TLV periodically to

discover the connection.

Passive Passive status, passively waiting for the the information

OAMPDU packet that contains Local information TLV to accept



the connection

Discovered Discovered connection status, periodically sending out the

information OAMPDU packet that contains Local information

TLV and Remote information TLV to negotiate the connection

and enable the connection timeout timer

Local-stable The connection status that the local passes the attribute

matching, periodically sending out the information OAMPDU

packet that contains Local information TLV and Remote

information TLV to negotiate the connection and enable the

connection timeout timer

Up The setup status of the connection, periodically sending out

information OAMPDU packet that contains Local information

TLV and Remote information TLV to keep alive the connection

and enable the connection timeout timer

Event Triggering OAM Connection Status Transferring

Events triggering OAM connection status transferring

Event Description

Ethernet OAM port UP The Ethernet OAM port becomes up

Ethernet OAM port DOWN The Ethernet OAM port becomes down, including port down

and Ethernet OAM function shutdown

Receive the information

OAMPDU packet

The information OAMPDU packet is received.

Local attribute matching

passed

According to information OAMPDU, match the local attribute

and the matching is passed

Local attribute matching not

passed

According to information OAMPDU, match the local attribute

and the matching is not passed

Remote attribute matching

passed

According to the flags digit of the information OAMPDU

packet, judge that the remote attribute matching is passed

Remote attribute matching

not passed

According to the flags digit of the information OAMPDU

packet, judge that the remote attribute matching is not

passed

Connection times out The connection is invalid and the timer times out

Serious Link Event of OAM Connect ion When there are serious link events on the link, set the related link status

on the Flags field of the Ethernet oAM packet header and inform the

connected peer end via the Event Notification OAMPDU packet.



The serious link event types of the Ethernet OAM connection and the

definitions are as follows:

Serious link event of Ethernet OAM connection

Event Definition

Link fault Hardware PHY finds the link fault at the receiving

direction;

Dying gasp The un-recoverable fault event happens to the local. For

example, Ethernet OAM is down.

critical-event Un-predictable serious event happens (currently, there is

no definition)

Link Monitor ing of OAM Connect ion Ethernet OAM can monitor and check the error signals and error frames on

the link periodically and execute the specified operation when the error

number exceeds the specified threshold (such as shut down the port) and

inform the connected peer end via the Event Notification OAMPDU packet.

The link monitoring types and the definitions of Ethernet OAM connection

are as follows:

The link monitoring types and the definitions of Ethernet OAM connection

Link monitoring event Definition

Errored Symbol Period The number of error signals exceeds the defined threshold

during the unit signal number period;

Errored Frame The number of error frames exceeds the defined threshold

during the unit time period;

Errored Frame Period The number of error frames exceeds the defined threshold

during the unit frame number period;

Errored Frame Seconds Summary The number of error frame seconds exceeds the defined

threshold during the unit time period;

Remote Loopback of OAM Connect ion After the OAM connection is set up, the Loopback Control OAMPDU packet

can be sent to control the peer end to enter the remote loopback test

mode. During the remote loopback test, the packets sent by the local are



looped back by the peer end, so as to test the parameters of the link, such

as packet loss rate and delay.

Remote Loopback test mode only influences the non-Ethernet OAM

protocol packets. The Ethernet OAM protocol packets are still sent and

received normally.

In the remote loopback test mode, the processing of the OAM sublayer is

as follows:

Port status in remote loopback mode

Port status Multiplexer

Mode

Parser Mode Description

Master (initiating

the loopback)

Forward Discard When receiving the information

OAMPDU packet that indicates

that the peer end is in the

loopback state, enter into the

mode

Slave (looping

back)

Discard Loopback When receiving the command

of enabling loopback in the

Loopback Control OAMPDU

packet, enter into the mode

In the Remote Loopback test mode, process of the non-Ethernet OAM

protocol packets is as follows:



Process of the non-Ethernet OAM protocol packets in Remote Loopback

test mode

MIB Variable Request of OAM Connect ion The local OAM entity can send remote MIB variable request (OAMPDU

packet) to the peer OAM entity to ask for the current MIB variable. The

function can be used to monitor the link status of the remote port in real

time.



EVC Technology

This chapter describes the EVC technology and application.

Main contents:

Related terms

Application description

Typical application

Related Terms This section describes the related terms of EVC.

EVC (Ethernet Virtual Connection): EVC is put forward by MEF. It is

the virtual connection used to connect two or more UNIs and switch

Ethernet service frames between them.

EVC can be divided to three types according to the connection mode:

1. Point-to-point EVC: It is also called Eline Service, including two types:

EPL: Ethernet private line

EVPL: Ethernet virtual private line

The difference is that there can be multiple EVPLs on one UNI, while there

can be only one EPL on one UNI.

2. Multipoint-to-multipoint EVC, also called ELAN Service

3. Point-to-multipoint EVC: It is one special EVC. We call one side as root

and the other side as leaf. The EVC is formed by one or multiple roots

(usually it is one root) + one or multiple leaves. The main feature is

that the frames from the root node or the leaf node need to be copied

to all leaves, while the frames from the leaf node to the root node only

need to be transmitted to the root node and the frames do not need to

be copied between the leaf nodes. The main usage is IPTV. Currently,



Maipu switch does not support this kind of EVC directly, but can

support indirectly by configuring the port separation and L3 forwarding

features between UNIs.

UNI (User Network Interface): It is the Ethernet physical connection

between the network edge device of the service provider (PE) and the

customer edge device (CE). It is formed by UNI-N (defined on PE device)

and UNI-C (defined on CE device). The E-LMI protocol runs on one UNI

and its edge is UNI-N and UNI-C.

Currently, UNI supports three types of attributes:

Multiplexing with Bundling: One UNI can be configured with multiple

EVCs and each EVC can map with multiple CE-VLAN IDs;

Multiplexing with no Bundling: One UNI can be configured with

multiple EVCs, but each EVC can map with only one CE-VLAN ID;

All to one Bundling: One UNI can be configured with only one EVC; all

CE-VLAN IDs are mapped to the EVC;

The port of the UNI-N end needs to be configured as the MEP node of one

CFM domain and enables the CC (Cross Check) function of CFM. In this

way, UNI-N end can get the connection status between the UNI ends of

EVC configured on the PE device via 802.1ag, so as to get the current

operation status of EVC.

CE (Customer Edge): customer edge device

PE (Provider Edge): edge device of service provider

EFP (Ethernet Service Instance): Ethernet service instance

QINQ, ELMI, and CFM: Refer to the related technical manuals.

Application Description EVC provides the public attributes and configurations, cooperating with the

modules to realize the service functions. For details, refer to EVC

Configuration Manual. The main attributes of EVC are described as follows:

Realization type of EVC: There can be multiple schemes to realize EVC.

Currently, Maipu switch supports QinQ.

EVC type: There are two types, that is, point-to-point and multipoint-to-

multipoint. Point-to-point means that there are only two UNI ports in one



EVC virtual connection, while multipoint-to-multipoint means that there

are multiple UNI ports in one EVC virtual connection, as follows:

Figure 18.1 point-to-point EVC

Figure 18.2 multipoint-to-multipoint EVC

Local MEP and remote MEP of EVC: MEP is the end point used to

maintain the connection and can send/receive ant CFM packet. Each MEP

uses one integer to identify, called MEP ID.

QINQ type: There are two kinds, including double and mapping. Double

supports the mapping of multiple CEVLANs and one single EVC, while

mapping only supports the mapping of one single CEVLAN and one single

EVC.

QINQ mode: There are two kinds, that is, one and multiple. The one

mode does not need to configure SVLAN and CEVLAN of EVC, adopting the

port default value; multiple has no limitation.

The combination of EVC and the related modules is described as follows:

1. The application combination of EVC and QINQ (for QINQ, refer to

QINQ Technical Manual):



Associate EVC to the local port and run QinQ function on the port to set up

the EVC connection. Bind EVC on the port, get the QinQ information in EVC

according to EVC ID and convert the QinQ information to the port

configuration. The UNI mapping type of the port should match with the

information in the bound EVC. The detailed matching rules are as follows:

You can bind EVC only to the Hybrid and Trunk port, but cannot bind EVC

to the Access port. The UNI mapping of the port is ALL-TO-ONE. The port

can only be bound to one EVC and all CEVLANs are mapped to the EVC.

If the UNI mapping of the port is BUNDLING, the port can be bound to

multiple EVCs and each EVC can be configured with multiple CEVLANs. The

CEVLANs in the multiple bound CEVLANs cannot be the same and SVLANs

cannot conflict with each other.

If the UNI mapping of the port is MULTIPLEXING, the port can be bound to

multiple EVCs, but each EVC can be configured with only one CEVLAN. The

CEVLANs in the multiple bound EVCs cannot be the same and SVLANs

cannot conflict with each other.

2. The application combination of EVC and ELMI (for ELMI, refer to ELMI

Technical Manual)

Bind the E-LMI protocol on the connected ports of the PE and CE devices,

and run the E-LMI protocol as the PE and CE modes. With the E-LMI

switching, the CE device can get the configuration information and status

information of all EVCs bound on the ports connected to the CE device

from the PE device. Meanwhile, when the EVC status on the PE port

changes, actively inform the CE device to update at once via the E-LMI

protocol.

3. The application combination of EVC and 802.1ag (for 802.1ag, refer to

802.1ag Technical Manual)

One EVC needs to be bound with the CFM management domain instance.

With the CFM management domain instance, you can get the connectivity

between the UNIs in the EVC.

The current status of EVC depends on the status of all local ports and

remote ports in EVC. The status of the remote port needs to be got via

802.1ag. Therefore, EVC needs to concern and process the following

events in 802.1ag: add remote MEP, remote MEP status UP, delete remote



MEP, remote MEP status DOWN, and delete CFM management domain

information. Process the events to update the current status of EVC.

Typical Application The following figure shows one typical application instance of combining

EVC and E-LMI.

Figure 18.3 EVC networking instance

In the above figure, one EVC is defined. EVC_Provider indicates the

network connection of the service provider, which comprises PE1, PE2 and

PE3. The blue ellipse indicates one CFM management domain- Service

Provider Domain, whose level is 4. The three edge devices are three MEP

nodes of the domain. The CFM management domain is responsible for

checking the connectivity among the three MEPs, so as to confirm the

operation status of EVC_Provider.

Enable the E-LMI protocol on UNI1 between CE1 and PE1. CE1 gets the

UNI1 configuration information and the configuration and status

information of EVC_Provider from PE1 via the E-LMI protocol, so as to

complete the auto configuration function of CE1.



LLDP Technology

This chapter describes the LLDP technology and application.

Main contents:

Overview

LLDP working mechanism

TLV information type

Typical application of LLDP

Overview LLDP (Link Layer Discovery Protocol) is the link layer protocol defined in

802.1ab. It organizes the information of the local device as TLV

(Type/Length/Value) to be encapsulated in LLDPDU (Link Layer Discovery

Protocol Data Unit), which is sent to the direct-connected neighbor.

Meanwhile, LLDP saves LLDPDU received from the neighbor in MIB

(Management Information Base). With LLDP, the device can save and

manage the information of itself and direct-connected neighbor device for

the network management system to query and judge the communication

status of the link. LLDP does not configure or control network elements or

traffic, but it only reports the L2 configuration. Another content in 802.1ab

is to make the network management software use the information

provided by LLDP to discover some L2 contradiction.

LLDP Working Mechanism LLDP has the following four working modes:

TxRx: transmit and received LLDPDU

Tx: only transmit, but not receive LLDPDU



Rx: only receive, but not transmit LLDPDU

Disable: not transmit or receive LLDPDU

LLDPDU Transmitting Mechanism When the port works in TxRx or Tx mode, transmit LLDPDU to the

neighbor device periodically according to the specified interval. When the

local configuration changes, to inform the change of the local information

to the neighbor device as soon as possible, you need to enable the polling

function on the device and configure the polling period; when the polling

time reaches, transmit LLDPDU at once. If the polling function is not

enabled, the change of the local configuration does not transmit LLDPDU

at once until transmitting the next LLDPDU by the transmitting period. To

prevent that the frequent change of the local information causes lots of

LLDPDU to be sent, delay some time every transmitting one LLDPDU, and

then continue to transmit the next LLDPDU.

When some configuration about the LLDP of the local device (such as

holdtime, select the released TLV type) changes, or when the polling

mechanism finds that the configuration information of the local system

LLDP changes after the polling function is enabled, to make other devices

discover the change of the local device as soon as possible, enable the

rapid transmitting mechanism, that is, transmit the LLDPDU of the

specified number (it is 3 by default) continuously at once, and then

recover to the normal transmitting period.

When the device disables LLDP globally or the port on which LLDP is

enabled performs the operations of shutdown, adding into aggregation

group, disabling LLDP, and executing the system reload, to make the

neighbor device learn the disabling of the local device LLDP rapidly, you

need to transmit one CLOSE TLV LLDPDU to inform the neighbor.

LLDPDU Receiving Mechanism When the port works in the TxRx or Rx mode, check the validity of the

received LLDPDU or the carried TLV. After checking the validity, save the

neighbor information to the local device and set the aging time of the

neighbor information in the local device according to the TTL (Time To Live)

carried by LLDPDU. If the TTL value in the received LLDPDU is 0, age the

neighbor information at once.



Set the aging time of the local information on the neighbor device by

configuring holdtime. The default value is 120s. The maximum value of

holdtime is 65535s.

TLV Information Type The TLV that can be encapsulated by LLDP includes basic TLV, the TLV

defined by the organization and related TLV of MED (Media Endpoint

Discovery). The basic TLV is regarded as a group of TLV of the network

device management basis; the TLV defined by the organization and the

related TLV of MED is the TLV defined by the standard organization and

other organization, used to improve the management for the network

devices. You can configure whether the TLV is transmitted in LLDPDU as

desired.

Basic Management TLV In basic TLV, some types of TLV are mandatory for realizing the LLDP

function, that is, must be released in LLDPDU, as shown in Table 19-1.

Description of basic management TLV

TLV Type Description Whether to be released

End of LLDPDU TLV Indicating the end of LLDPDU Yes

Chassis ID TLV The MAC address of the sending device Yes

Port ID TLV Used to identify the port of the LLDPDU sending end; when the device does not send MED TLV, the content is the port name; when the device sends MED TLV, the content is the MAC address of the port.

Yes

Time To Live TLV The life time of the local device information on the neighbor device

Yes

Port Description TLV The description character string of the port No

System Name TLV The device name No

System Description TLV The system description No

System Capabilities TLV The main functions of the system and which functions are enabled

No

Management Address TLV Management address and the corresponding interface number and OID (Object Identifier). The management address is the main IP address of the VLAN permitted by the interface with minimum VLAN ID. If the VLAN with the minimum VLAN ID is not configured with the main IP address, the management address is 127.0.0.1. By default, send the TLV.

Yes



TLV Defined by Organization 1. TLV defined by IEEE 802.1

Port VLAN ID TLV: VLAN ID of the port;

Protocol VLAN ID TLV: the protocol VLAN ID of the port;

VLAN Name TLV: the VLAN name of the port;

Protocol Identity TLV: the protocol type supported by the port;

The device does not support sending Protocol Identity TLV, but can receive

this type of TLV.

2. TLV defined by IEEE 802.3

MAC/PHY Configuration/Status TLV: the rate and duplex status of the

port, whether to support the auto negotiation of the port rate, whether

to enable the auto negotiation function and the current rate and

duplex status;

Power Via MDI TLV: the power capability of the port;

Link Aggregation TLV: Whether the port supports the link aggregation

and whether to enable the link aggregation;

Maximum Frame Size TLV: The supported maximum frame length,

adopting the configured MTU of the port (Max Transmission Unit);

Related TLV of LLDP-MED LLDP-MED Capabilities TLV: The MED device of the current device and

the LLDP MED TLV type that can be encapsulated in LLDPDU;

Network Policy TLV: The VLAN ID of the port, the supported

application (such as voice and video), the applied priority and used

policy;

Hardware Revision TLV: the hardware version of the device;

Firmware Revision TLV: the firmware version of the device;

Software Revision TLV: the software version of the device;

Serial Number TLV: the serial number of the device;

Manufacturer Name TLV: the manufacturer of the device;

Model Name TLV: the Model Name of the device;



Assert ID TLV: the assert ID of the device, for the directory

management and assert tracking;

Location Identification TLV: the location ID information of the

connection device, used by other devices in the application based on

the location;

Neighbor Storage Capability of LLDP The LLDP protocol can receive LLDPD and store the neighbor in the form of

the neighbor information. The LLDP protocol has limitation for the storage

capability of the neighbor. Currently, the single port on Maipu switch

supports the information of 20 neighbors at most. The whole device

supports the storage of 2000 neighbors at most. If the number of the

neighbors reaches 2000, the notification packets of more neighbors are

dropped and are not saved.

Typical Application of LLDP

Networking of configuring LLDP

As shown in the above figure, the port 0/0/1 of SW1 is connected with

port 0/0/1 of SW2; port 0/0/2 of SW1 is connected with port 0/0/2 of SW3.



Configure LLDP function on the three devices. The three devices can

exchange information via LLDPDU and query the neighbor information of

each other. The remote NMS can be connected to the device for network

management and topology collection, so as to realize the cluster

management.



MAC Address Table Management Technology

This chapter describes the management technology of the MAC address

table and application.

Management and Application of MAC Address Table This section describes the management theory of the MAC address table.

Main contents:

Related terms

Introduction

Related Terms Dynamic MAC address: the auto learned MAC address of the packet

received by the switch. When the port receives one packet, search

whether the source/destination MAC address of the packet is in the MAC

address table. If not, associate the port, VLAN and source MAC address

and save in the MAC address table.

Static MAC address: the static forwarded MAC address configured by the

user via the shell command or snmp proxy; the static MAC address and

the dynamic MAC address have the same function, but compared with the

dynamic MAC address, the static MAC address does not age.



Filter MAC address: the static filtered MAC address configured by the

user via the shell command or the snmp proxy; when the source or

destination MAC address of the packer received by the gateway is the filter

MAC address, directly discard the packet.

MAC address entry: formed by the information, such as MAC address,

VLAN, port number and the type of the MAC address.

Aging time: the existing time of the dynamic MAC address in the MAC

address table after the switch learns the MAC address.

Introduction The MAC address entry contains the address information of the packet

forwarding between the ports. There are three types of addresses in the

MAC address entries, including static MAC address, dynamic MAC address,

and filter MAC address. The MAC address entry is formed by the

information, such as MAC address, VLAN, port number and the type of the

MAC address.

The static MAC address can only be set manually or via other software.

Compared with the dynamic MAC address, the static MAC address is not

aged and cannot be learned, but can only be added and deleted manually.

According to the function, the static MAC address is divided to three kinds,

that is, the static MAC address of forwarding packets normally (FWD), the

static MAC address of only transmitting the packet to CPU, but not

forwarding the packet (TRAP) and the static MAC address of transmitting

the packet to CPU and forwarding packet (F&T).

The filter MAC address is global and functions on the whole switch. If one

MAC address is configured as the filter address, the host of the address is

prohibited to access the network via the switch, that is, the packet with

the destination or source MAC address as the MAC address is dropped.

The dynamic MAC address is the MAC address that is learned according to

the source MAC address of the packet after the switch receives the packet.

The MAC address entry is associated and saved according to the MAC

address, VLAN ID and port value. The MAC address table updates the

entries according to this mode. When receiving one packet whose

destination MAC address is in the MAC address table, forward it directly.

Otherwise, write the source MAC address into the MAC address table, that

is, learn one MAC address and forward the packet to other member ports



of the VLAN to which the port belongs. That is to say, the packet floods.

When the number of the MAC addresses learned by the port reaches the

maximum value, do not learn any more and flood the packet. If the device

does not receive the packet with the source MAC address packet as the

address before the aging time of the dynamic MAC address arrives after

learning one MAC address, the MAC address entry is deleted when the

aging time arrives.

The port-based MAC address learning number limitation is that the user

can configure to limit the number of the dynamic MAC addresses learned

by each port. Usually, the maximum number of the MAC addresses that

can be learned by one port is 32767. When the number of the MAC

addresses learned by the port reaches 32767, do not learn MAC address

any more. The new MAC address cannot be learned until the MAC

addresses are aged and the new address packets do not flooding.

The function of the static forwarding MAC address and dynamic MAC

address is fast forwarding, that is, the MAC address table is one fast

forwarding table, which can make the packet be forwarded via the

specified port rapidly and correctly, so as to prevent the packet from being

broadcasted in the whole VLAN.

Note

The static MAC address entries configured by the user manually and filter

MAC address entries are not covered by the dynamic MAC address entries,

but the dynamic MAC address entries can be covered by the static MAC

address entries and black-hole MAC address entries.



PWE3 Technology (Only for S3400/S3900)

PWE3 provides the tunnel on the packet switching network (IP/MPLS) to

emulate the L2 VPN protocol of some services (FR, ATM, Ethernet, and

TDM SONET/SDH). The protocol can help to connect the traditional

network with the packet switching network, so as to realize the sharing of

resources and the expansion of the network. The protocol is the expansion

of the Martini protocol. It expands new signaling (optimize the signaling

expenses) and regulates multi-hop negotiation mode to make the

networking of the protocol more flexible. The manual describes the theory,

key technologies, and typical applications of the circuit emulation in the

packet network.

The circuit emulation in the packet network is a technology of bearing

traditional TDM data on the packet switching network (PSN). It adopts the

circuit emulation mode in PWE3 frame protocol to provide end-to-end

transmission for PDH and SDH data flow on the packet switching network.

The main contents of the chapter:

Basic concepts

Technology theory

Realizing method

Typical application

Basic Concepts With the evolution of the network technology and the network

convergence, the network data transmission and switching mode with the

packet as the basic unit will be the dominant in the next generation

network. Both IP network and MPLS network are the representatives of the

packet switching network. However, the next generation network (NGN)

cannot be constructed overnight. The current PDH/SDH network serving



PSTN public voice communication services will exist for a long time, and

the existing TDM devices of users on the network will still be used. To

protect the investments of users on the TDM devices, it is necessary to

provide the capabilities of accessing the TDM services and transmitting the

TDM data transparently in the next generation packet switching network.

For the data transparent transmission of the TDM circuit switching service

on the packet switching network, several standard organizations put

forward their own standards and solutions. Currently, the TDM circuit

emulation is the most mature.

Background of TDM Circuit Emulation Technology At first, the TDM circuit emulation technology is to realize the transparent

transmission of the TDM circuit switching data on the IP network. It

appears as the competitive technology for the VoIP technology and

provides the processing flow that is more simplified than the VoIP protocol.

It provides the voice transmission service via the IP network. The initial

TDM service transparent transmission device only supports the transparent

transmission of the E1 and DS1/DS0 services. With the packet switching

network becoming the dominant in the NGN solutions gradually, especially

the rising of the Metro-E technology, TDM circuit emulation technology

becomes the important technology of transmitting TDM service on the

packet switching network. Currently, many protocol drafts or technology

standards for the transparent transmission of the E1/T1/E3/T3 structured

and non-structured TDM service, the structured transparent transmission

of the SDH service, and the transmission of the PDH and SDH signaling

are complete.

Related Technology Standards The related standards of TDM circuit emulation technology are mainly from

four international standard organizations, that is, IETF, ITU-T, MEF, and

MFA. The organizations cooperate with each other. The transparent

transmission standards of the TDM service put forward by different

organizations are basically similar and have a little difference in the

specific technical details such as data encryption format. Among the

standard organizations, IETF PWE3 working group plays a leading role in

making the transparent transmission standards of the TDM service. The

organization not only defines the standards of the technology at the data

layer, but also defines the standards at the control and management

layers, while the standards of other organizations mainly focus on the data

encryption method.



The standards put forward by MEF focus on how to encrypt the original

TDM service to the Ethernet frame, while the standards of MFA focus on

how to bear the TDM service on the MPLS network. ITU-T standards also

focus on the data layer. It provides the mode of MPLS bearing the TDM

service data and the mode of IP bearing the TDM service data. Besides,

ITU-T defines the clock transmission solutions that are important for the

TDM service.

Commonly-used Terms PWE3 (Pseudo Wire Edge to Edge Emulation): IETF defines the meaning

of PW in RFC3985, that is, an emulation of using the packet switching

network to bear the local service;

IWF (Interworking Function): the device that switches the data between

two different networks;

CE (Customer Edge): the device to initiate and terminate the TDM service;

PE (Provider Edge): the device that provides the PWE, which is equivalent

to IWF;

AC (Attachment Circuit): the connection link or virtual link between CE

and PE; all data on AC is required to be sent to the peer end without any

change;

Bundle: the bit flow sent by the TDM circuit of the PE devices at the two

sides of the PW; it can comprise any several 64Kbps time slots in one E1

or T1. Bundle is the uni-directional data flow. It often matches the

opposite Bundle to form the full-duplex communication. There can be

several Bundles between two PE devices.

CESoPSN (Circuit Emulation Services over Packet Switched Network): It is

the emulation that concerns the structure of the TDM data frame;

SAToP (Structure-Agnostic TDM over Packet): It is the emulation that

does not identify the structure of the TDM data frame;

TDMoIP (Time Division Multiplexing over Internet Protocol): It is the

emulation related with the contents of the TDM data.

CAS: Channel Associated Signaling

Technical Theory IETF PWE3 working group plays a leading role in making the standards of

the TDM service transparent transmission, so the standards of the TDM



service transparent transmission made by IETF PWE3 working group are

the most complete and become the mainstream standards in the field. The

following introduces the TDM transparent transmission technology by

analyzing the TDM PWE3 technical scheme.

TDM PWE3 Technical Scheme

PW Theory PW is a mechanism that transmits the key elements of one emulation

service from one PE to another or several other PEs via the PSN. It

emulates various services (ATM, FR, HDLC, PPP, TDM, and Ethernet) via

one tunnel (IP/L2TP/MPLS) on the PSN network. The PSN network can

transmit various data payloads. The tunnel used by the scheme is defined

as Pseudo Wires. The inner data service born over the PW is invisible for

the core network, that is to say, the core network is transparent for the CE

data flow.

Figure 21-1 PW schematic

The PW scheme provides a technical frame. In the frame, various services

can use the PW to be transmitted transparently on the PSN network. TDM

Pseudo Wires emulation is a technology that uses the PW to emulate the

TDM service data on the PSN network.

Elements of TDM Emulat ion Service When using the PW mode to emulate transmitting the TDM service on the

PSN network, the following elements need to be transmitted to the other

side of the PW.

1. TDM service data

2. The frame format of the TDM service data

3. The alarm and signaling of the TDM service at the AC side

4. TDM synchronous timing information



TDM Emulat ion Protocol TDM circuit emulation service is to use the special circuit emulation packet

head to encrypt the TDM service data. In the special packet head, there is

the frame format information, alarm information, signaling information

and synchronous timing information of the TDM service data. The

encrypted packet is called PWE3 packet. And then the PWE3 packet is born

by the IP, MPLS, and L2TPv3 protocols to cross the corresponding packet

switching network. After reaching the exit of the PW tunnel, dis-encrypt

the packet, and then re-construct the TDM circuit switching service data

flow.

The following describes several TDM circuit emulation encryption protocols.

1. SAToP protocol (RFC4553)

RFC4553 provides the emulation function for the low-rate PDH circuit

services such as E1/T1/E3/T3. SAToP is to transmit the unstructured (that

is unframed) E1/T1/E3/T3 service data. It segments and encrypts the TDM

service as the serial data stream, and transmits it on the PW tunnel. In the

elements of the TDM emulation service described in the above section, the

protocol can provide the transparent transmission of the TDM service and

the transmission of the synchronous timing information, but cannot

identify the TDM frame structure. Therefore, the information about the

TDM frame structure and the signaling in the TDM frame cannot be

identified and processed, and can only be transmitted transparently. The

protocol is the simplest mode of transmitting the PDH low-rate service

transparently in the TDM circuit emulation scheme. It is also because it is

simple to realize that it is released by IETF as the RFC formal standard.

RFC4553 totally provides three optional PW outer tunnel encryption modes,

that is, UDP/IP mode, L2TPv3 mode, and MPLS mode. UDP/IP mode

adopts the UDP/IP packet head to encrypt the PWE3 packet and uses the

different UDP port numbers to distinguish different PW outer tunnels. The

encryption mode is suitable for the pure IP network. Currently, the TDM

circuit emulation service developed by Maipu supports the UDP/IP mode.

The L2TPv3 mode adopts the L2TPv3 packet head to encrypt the PWE3

packet and uses the different session IDs to distinguish different outer

tunnels. The mode can adopt the L2TPv3 protocol negotiation to set up the

outer tunnel and distributes different session IDs to the different PWs in

the tunnel via the protocol. It is more flexible than the UDP/IP mode in

using.

MPLS mode adopts the MPLS label to encrypt the PWE3 packet and adopts

the LSP as the outer tunnel of PW. The PW label is the most inner label of

the MPLS label stack. In MPLS mode, the user can perform the dynamic



distribution and management via the LDP protocol, so compared with

UDP/IP manual binding mode, MPLS mode is more convenient to use.

Meanwhile, there can be several layers of MPLS labels to realize the

nesting of the PW outer tunnel, which is convenient for applying the mode

in a larger scale network range.

2. CESoPSN protocol

Compared with SAToP, CESoPSN can provide the structured TDM service

emulation transmission function, that is, can identify, process, and

transmit the framed structure and the signaling in the TDM frame. Take E1

as an example. The structured E1 comprises 32 time slots. Except for time

slot 0, the other 31 time slots can bear one 64Kbps voice service

respectively. Time slot 0 is used to transmit the signaling and the frame

symbol. The CESoPSN protocol can identify the frame structure of the TDM

service. The idle time slot channel does not need to transmit the data.

Only the useful time slots of the CE device are used to encrypt the E

service flow to the PWE3 packet. Meanwhile, the functions of identifying

and transmitting the CAS and CCS signaling in the E1 service flow are

provided.

The CESoPSN protocol scheme also provides three optional PW outer

tunnel encryption modes, that is, UDP/IP mode, L2TPv3 mode, and MPLS

mode. Different from the SAToP protocol, the TDM service data that is

born inside the PW by using the CESoPSN protocol has the frame structure.

Meanwhile, the PW control field in the PWE3 packet has the M domain to

identify the signaling checking at the AC side. Currently, the TDM service

products developed by Maipu support the CESoPSN protocol in UDP/IP

encryption mode.

Besides the TDM service data, CESoPSN provides the scheme of identifying

and transmitting the CAS signaling.

3. TDMoIP protocol

The PW encryption modes (UDP/IP mode, L2TPv3 mode, MPLS mode, and

MEF mode) on different PSN networks are described. Both SAToP and

CESoPSN take the TDM bit flow as the payload encrypted by the PW, while

TDMoIP adds three new TDM payload types, that is, the AAL1 payload,

AAL2 payload, and HDLC payload. Currently, the TDM service products

developed by Maipu support the HDLC TDM payload.

Besides, the PWE3 working group of IETF defines the structured circuit

emulation scheme for the high-end and low-end channel of SONET/SDH to

transmit the VC11/VC12 and VC2 TDM service data transparently via the

PWE3 mode.



Other Technical Schemes Besides the PWE3 working group of IETF, MEF, MFA, and ITU-T define the

related protocol standards of the circuit emulation. For example, MEF8.0

defines the TDM circuit emulation packet encrypted by the nude Ethernet,

which distinguishes the different TDM circuit emulation data flows by the

different ECIDs.

Figure 21-2 Mapping relation between the function layer and MEF

packet encryption

In the MEF8.0 standard defined by MEF, CESoETH control words are

compatible with the PW control words defined by IETF. The RTP control

words also adopt the RFC3550 standard of IETF. It also adopts the PWE3

tunnel to transmit the TDM service transparently, but the bearing layer is

the nude Ethernet.

Key Technologies

Data J i t ter Buffer After crossing the packet switching network to reach the exit PE device,

the reaching interval may be different and the packets may be out of order.

To ensure that the TDM service data flow can be re-constructed on the exit

PE device, the jitter buffer technology is needed to smooth the interval of

the PW packets and re-arrange the packets that are out of order. The

capacity of the jitter buffer considers the performance eclectically. The

jitter buffer with large capacity can absorb the packet transmission

interval jitter with much change in the network, but brings in large delay

when re-constructing the TDM service data flow. Providing the jitter buffer

whose capacity the user can configure and adjust is a good policy. The

user can configure it flexibly according to the different network delay and

jitter. Currently, the TDM circuit emulation products developed by Maipu



support configuring the jitter buffers with different capacities via the

command.

Recover Clock Timing Informat ion The TDM networks that adopt the circuit switching (such as the SDH

network) natively have the capability of transmitting the network

synchronous timing information, but most packet switching networks,

especially the current Ethernet network, do not have the function.

Currently, there are the following solutions.

1． Adopt the auto-sensing packet recovering algorithm: Use the

time window smoothing method and auto-sensing algorithm to

extract the synchronous timing information from the PWE3

packet at the exit so that the re-constructed TDM service data

flow gets a service data flow that is approximately synchronous

with the sending end. But the algorithm has limitations.

Especially, when the packet loss and transmission delay in the

network changes greatly, the synchronous timing information

cannot be recovered correctly.

2． Adopt the synchronous Ethernet to transmit the clock: Reform

the Ethernet network of the current synchronous clock system

and bring in the idea of synchronous timing transmission in the

whole network of the SDH system to the design of the Ethernet

network design.

3． TDM circuit emulation only transmits the service data. The

synchronous timing information is transmitted by other

synchronous timing system, such as the sending clock of the

GPs system or the sending clock of the synchronous clock

network.

Check Link Faul t The link fault checking includes the fault checking at the AC side, the fault

checking of the PW tunnel link, and a series of actions taken after the fault

is found, such as notifying the peer side and switching the fault link.

Currently, the link fault checking at the AC side and notifying the peer side

have the related technical drafts. The fault checking of the PW tunnel link

also has many optional technologies, such as MPLS-OAM technology and

Ethernet OAM technology.

Analyze Packet Delay For the services that have a high requirement for realtime such as voice

transmission, the data delay and jitter affect the service quality greatly,

which needs to be considered. For the technology of using the TDM PW



emulation mode to transmit the TDM service transparently, the data delay

comprises the following aspects, that is, packet encryption delay, service

processing delay, and network transmission delay.

1. Packet encryption delay is generated when the TDM service flow is

encrypted as the PWE3 packet. The delay is only owned by the

TDM circuit emulation technology. Take the E1 as an example. The

E1 rate is 2.048Mbps; each frame contains 32 time slots (256 bits);

8000 frames are transmitted every second; the duration of each

frame is 0.125ms. If adopting the structured encryption mode and

very four frames are encrypted as one PW packet, the delay for

encrypting one PW packet is 4×0.125ms=0.5ms. The encryption

time increase with the number of the encrypted frames. The more

the encrypted frames, the larger the encryption delay.

2. Service processing delay is the time for the device to process the

packet, including the packet validity check, packet filtering, parity

check, and calculation, packet encryption and receiving and

sending. The delay depends on the service processing capability of

the device. For one device, it is fixed.

3. Network transmission delay is generated when the PWE3 packet

reaches the egress PE from the ingress PE via the packet switching

network. It varies greatly with the network topology structure and

the network service flow. It is also the main reason for generating

the service jitter. Currently, the jitter buffer technology can absorb

the jitter, but the delay cannot be absorbed.

The TDM service delay depends on the above three kinds of delays.

Channel ized and Non-channel ized Technologies The non-channelized service transmission in the TDM Pseudo Wire

Emulation is the un-structured transmission. It does not identify the data

format in the TDM service flow and only processes the TDM data as the

serial code flow. RFC4553 (SAToP) un-structured encryption protocol

requires that the un-structured circuit emulation for E1 rate must support

the service processing with 256 bytes as a basic payload unit, that is, the

E1 frame structure is not identified, but the TDM code flow must be

segmented according to the integral multiple of the E1 frame length and

are encrypted as the PWE3 packet. Meanwhile, the un-structured T1 rate

circuit emulation must support the service processing with 1024 bytes as a

basic payload unit.

Correspondingly, the channelized service is the structured TDM Pseudo

Wire Emulation. It needs to identify the frame format in the TDM service

flow and the segmenting for the TDM code flow must be at the frame



delimiter. For example, E1 frame must be segmented at the beginning of

time slot 0. Because of segmenting from the frame delimiter, the 32 time

slots in the E1 frame can be identified for structured processing. The

structured processing for T1 and E3/T3 is similar.

Comparing the two modes, the un-structured mode is simpler. It does not

need to identify the frame format in the TDM data flow and is more

commonly used. For the device in the traditional data network that takes

E1/T1 as the synchronous serial interface (that is, ignore the frame format)

and adopts the net channel transmission, the un-structured TDM Pseudo

Wire Emulation is more convenient.

Structured (channelized) mode is more complicated. It needs to identify

the frame symbol in the TDM data flow. The time slots in the frame and

the signaling information carried by some special time slots must be

identified and processed. When the TDM interface works in the frame

mode and communicates via some time slots of E1/T1, adopting the

structured TDM Pseudo Wire Emulation is more helpful for improving the

bandwidth utilization. The structured TDM Pseudo Wire Emulation can

distinguish the time slots being used in one E1 circuit from the idle time

slots. It can encrypt only the being-used time slots in the PWE3 packet

and discard the idle time slots. In this way, the network transmission

bandwidth is saved. Besides, the structured mode can realize the inserting

of the time slots between different E1/T1 interfaces, so as to further

improve the bandwidth utilization.

Realizing Methods

PWE3 Packet Format Currently, Maipu only supports the PWE3 packets encrypted by the

UDP/IPv4 mode. As shown in Figure 21-3, the TDM service data is

encrypted in the TDMoIP PAYLOAD of the packet.

Figure 21-3 PWE3 packet encrypted by UDP/IP mode



The format of the UDP/IPv4 head is as shown in Figure 21-4. The source

IP address is the local address of the Pseudo Wire. The source addresses

of the PWE3 packets sent from the local are the same. The destination IP

address is the remote address of the Pseudo Wire. The destination IP

addresses of the PWE3 packets sent to the Pseudo Wire are different. UDP

destination port number is fixed as 2142, which is the private port number

of TDM over IP distributed by IANA. It is the ID of the PWE3 packet

encrypted by the UDP/IP mode. The UDP source port number is used to

distinguish the PWE3 packets of different bundles on one Pseudo Wire and

the value range is 1-8063.

Figure 21-4 UDP/IPv4 head format

The control word provides the method of exchanging TDM circuit status

and PSN network status for the PWE3 packet. The format is as shown in

Figure 21-5. RES is the reversed field and must be set as 0. L bit means

the local asynchronous; placing 1 at L bit means that the local is detected

or informed. The fault at the TDM physical layer results in the incomplete

of the data, so the bit can be used to indicate the asynchronous at the

physical layer and trigger generating the AIS signal at the remote side.

After the TDM fault is fixed, L bit is cleared up. R bit means the remote

receiving fault. Placing 1 at the R bit means that the remote does not

receive the packet from the Ethernet port. R bit can be used to advertise

the fault block or other network faults. Receiving the remote fault

indication can trigger the rollback mechanism to avoid the block. The R bit

is placed with 1 after the pre-set successive N packets are not received;

after the packets are received, the R bit is cleared up.

FRG field means the segmenting type and it is used for the CAS multi-

frame structure in the CESoPSN protocol. When FRG is 00, it means that

the multi-frame is in one packet; 01 means that the packet carries the

first segment of the multi-frame; 10 means that the packet carries the last

segment of the multi-frame; 11 means that the packet carries the middle

segment of the multi-frame. LENGTH field means the total bytes of the

control word, payload, and RTP head (if there is), which is used when the

length is less than 64 bytes. When the length is more than 64 bytes, the

field is set as 0. SEQUENCE NUMBER field means the serial number of the

packet. The initial value is a random value and it increases according to

the sent packets. When reaching the maximum value, it rolls back to 0.

The field is used to check whether the packet is lost.



Figure 21-5 Control word format

The RTP head is used to carry the clock information and assist the

receiving end to recover the TDM clock from the PSN network. The format

is as shown in Figure 21-6. V means the version and is fixed as 2. P means

the filling bit and is fixed as 0. CC is the CSRC count and is fixed as 0. M is

the marking bit and is fixed 0. PT field means the payload type and the

value of each bundle is unique. SN is the serial number of the packet and

is the same as SEQUENCE NUMBER in the control word. TS is the time

stamp and has two generating modes, that is, absolute mode (it is from

the recovered clock on the TDM line and it increases by 1 every 125 ms)

and the relative mode (it is from the common clock and it is added with 1

every time receiving a bit). SSRC indicates the synchronous source.

Figure 21-6 RTP head format

SAToP Protocol The TDM port on PE works in the non-framed mode and does not concern

the received TDM frame structure information, which is regarded as a bit

flow with fixed rate. As shown in Figure 21-7, SAToP processes the TDM

flow with the byte (8 bits) as the unit. Every N received TDM bytes are

encrypted to the TDM payload of the PWE3 packet, and sent to the PSN

network. After the PE device at the other side of the Pseudo Wire receives

the packet, dis-encrypt the TDM payload from the PWE3 packet and send

it to the TDM port.

Figure 21-7 SAToP sketch map

After N TDM bytes are received, generate a PWE3 packet, so a fixed delay

is generated, which is called packet encryption delay (PCT).

The PCT calculation method of SAToP: PCT＝N×8×bit time＝N×8÷bit rate

Take E1 as an example. The E1 rate is 2.048Mbps; 2048000 bits are

transmitted every second; each bit time is 488ns. If every 256 bits are

encrypted as one PWE3 packet, the delay for encrypting one PWE3 packet

is 256×8×488ns=1ms. The packet encryption time increases with the



number of the encrypted bytes. The more the encrypted bytes, the larger

the packet encryption delay, the fewer the generated packets in unit time.

CESoPSN Protocol The TDM port on PE works in the framed mode, which is divided into non-

CAS and CAS modes according to the TDM service type.

Non-CAS mode

As shown in Figure 21-8, CESoPSN processes the TDM flow with the frame

as the unit. After every N frames are received, the data of the specified

time slots (time slot 4 and 25) is encrypted into the TDM payload of the

PWE3 packet and then sent to the PSN network. After the PE device at the

other side of the Pseudo Wire receives the packet, dis-encrypt the TDM

payload from the PWE3 packet, insert them to the specified time slots

(time slot 4 and 25) respectively, and then send them to the TDM port.

Figure 21-8 CESoPSN sketch map of non-CAS

In the mode, PCT＝N×frame time＝N÷frame rate. Take E1 as an example.

The E1 rate is 2.048Mbps; every frame contains 32 time slots; 8000

frames are transmitted every second; the frame rate is 0.125ms; every 32

frames are encrypted as one PWE3 packet. Therefore, the delay for

encrypting one PWE3 packet is 32×0.125ms=4ms. The packet encryption

time increases with the number of the encrypted bytes in the PWE3 packet.

The more the encrypted bytes, the larger the packet encryption delay, the

fewer the generated packets in unit time.

CAS mode

As shown in Figure 21-9, TDM has the CAS multi-frame structure, that is,

comprises 26 base frames. The 16 time slots of each base frame are used

to carry the signaling and multi-frame synchronization. CESoPSN

processes the TDM flow with the CAS multi-frame as the unit. Encrypt the

data of the specified time slots (time slot 2, 4, and 25) in each base frame

to the TDM payload in the PWE3 packet according to the order that begins

with the first base frame of the multi-frame and ends with the last base



frame of the multi-frame. At last, add the corresponding signaling

information to the end of the time slot data, and then send it to the PSN

network. After the PE device at the other side of the Pseudo Wire receives

the packet, dis-encrypt the TDM payload from the PWE3 packet and insert

it to the specified time slots (time slot 2, 4, and 25) respectively.

Meanwhile, insert the signaling to time slot 16, and then send it to the

TDM port.

Figure 21-9 CESoPSN sketch map of CAS

In the mode, PCT＝the number of the base frames in the multi-frame×

frame time＝the number of the base frames in the multi-frame÷frame

rate. Take E1 CAS multi-frame as an example. The E1 rate is 2.048Mbps;

each frame contains 32 time slots; 8000 frames are transmitted every

second; the frame rate is 0.125ms; each CAS multi-frame contains 16

base frames. Therefore, the delay for encrypting one PWE3 packet is 16×

0.125ms=2ms.

If the multi-frame contains many base frames, the packet encryption

delay is large and maybe cannot reach the delay index required by the

system. The CAS multi-frame segmenting mode can solve the problem, as

shown in Figure 21-10. The multi-frame is divided to N sub multi-frames

and each sub multi-frame contains M base frames. CESoPSN processes the

TDM flow with the sub multi-frame as the unit. Each sub multi-frame

corresponds with one PWE3 packet. The last sub multi-frame of the multi-

frame is added with the signaling information. Set the FRG in the control

word of the PWE3 packet that contains the first sub multi-frame of the

multi-frame as 01; set the FRG in the control word of the PWE3 packet

that contains the last sub multi-frame of the multi-frame as 10; set the

FRG in the control word of the PWE3 packet that contains the other middle

sub multi-frames of the multi-frame as 11. The PE device at the other side

of the Pseudo Wire can dis-encrypt the time slot data and the signaling

according to the FRG in the control of the PWE3 packet.

In the segmenting mode, PCT＝the number of the base frames in the

multi-frame×frame time＝the number of the base frames in the multi-

frame÷frame rate. Take E1 CAS multi-frame as an example. The E1 rate

is 2.048Mbps; each frame contains 32 time slots; 8000 frames are

transmitted every second; the frame rate is 0.125ms; each sub multi-



frame contains 4 base frames. Therefore, the delay for encrypting one

PWE3 packet is 4 × 0.125ms=0.5ms. The packet encryption delay

increases with the number of the base frames in the sub multi-frame. The

more the base frames in the sub multi-frame, the larger the packet

encryption delay.

Figure 21-10 CESoPSN segmenting sketch map of CAS

HDLC Mode SAToP and CESoPSN circuit emulation modes are called flow mode

(transparent transmission mode), because the encrypted in the packet is

the original bit flow. The purpose is to transmit the TDM bit flow without

any change between two TDM devices.

However, in the HDLC mode, only the existing HDLC frames in the TDM bit

flow are transmitted, as shown in Figure 21-11. No matter whether the

TDM flow is framed or not, it is processed with the HDLC frame as the unit,

that is, search for the frame head and the frame trail of the HDLC frame in

the bit flow. When a complete HDLC frame is received, the data is

encrypted to the TDM payload and then sent to the PSN network. After

The PE device at the other side of the Pseudo Wire dis-encrypts the PWE3

packet, the payload is re-encrypted as the HDLC frame and inserted into

the TDM bit flow.



Figure 21-11 Sketch map of HDLC mode

In the mode, PCT is meaningless. The number of the generated PWE3

packets is the same as that of the sent HDLC frames in the TDM flow.

Technology of Recovering Clock from Circuit Emulation packet The circuit emulation technology is originated from the ATM network,

which adopts the virtual circuit to encrypt the circuit service data in the

ATM cell to be transmitted on the ATM network. Later, the theory of the

circuit emulation is transplanted to the Metro-E. The Ethernet provides the

emulation transmission of the circuit switching services such as TDM. The

circuit emulation is the mechanism adopted by the transparent

transmission of the circuit switching service on the network. It uses the

special circuit emulation head to encrypt the TDM service and realizes the

transmission of the clock on the packet switching network via some

mechanism. The device that realizes the encryption function at the

physical layer is called framer or mapper, which can be connected to the

original TDM network directly.

The technology of recovering clock from the circuit emulation packet is to

adopt the auto-sensing algorithm to recover the clock synchronous

information from the packet. The following describes the basic theory of

the algorithm.



Figure 21-12 Sketch map of auto-sensing clock recovering

As shown in Figure 21-12, the gateway (IWF) at the clock source side

sends the time information to the peer gateway regularly. The time

information is provide with the T1/E1 emulation packet. At the other side,

the gateway extracts the time stamp from the packet and recovers the

service clock (f-service) via algorithm.

The core theory of the algorithm is that the left IWF device sends the

packet to the destination IWF device according to its own source clock.

The destination IWF device uses one queue to buffer the packet, and uses

its own local clock to send it out. If the source clock and the destination

local clock are not consistent, even if only a very small difference, it

results in the depth change of the buffer queue in the destination device.

Therefore, we can judge whether the local clock is consistent with the

source clock according to the depth of the queue. If the queue depth

continues increasing, it shows that the local clock is slower than the source

clock and the local clock needs to be adjusted quicker; if the queue depth

continues reducing, it shows that the local clock is quicker than the source

clock, and the local clock needs to be adjusted slower. This is a negative

feedback mechanism. After it becomes stable, we will find that the local

clock at the destination is the same as the source clock in the long run. In

this way, the frequency synchronization is complete between two IWF

devices on the IP network.

A vivid metaphor can help to understand the auto-sensing algorithm. The

IWF device at the clock source is equivalent to the inlet of the pool and

sends the packets to the pool with a certain clock frequency. The IWF

device at the destination is equivalent to the outlet of the pool. The water

in the pool maintains a constant level by adjusting their switches. In this

way, the synchronization between two devices is complete.

The difficulty to realize the auto-sensing algorithm is that the IP network

innately has the delay jitter (PDV). The packet jitter also causes the depth

change of the buffer queue, while the IWF device at the destination cannot

judge the change is caused by the frequency difference or the delay jitter

of the IP network, so it cannot make the right response. But the delay



jitter of the IP network is not cumulative, so you can use the statistics

methods such as getting the average to perform the filtering.

PWE3 Typical Application

Figure 21-13 The connection and aggregation of the MAN private line

As shown in Figure 21-13, the TDM circuit emulation technology can be

used to connect and aggregate the MAN private line. For example, the LAN

district is connected to the PBX switches of the branches in the district to

provide the E1 voice access function to realize the communication in the

district. This can also be realized by connecting the district to the PSTN.

The TDM circuit emulation service is the emulation for the TDM physical

transmission mode and does not perceive the actual services transmitted

in E1. The DDN service, FR service, and ATM service over E1 can be

transmitted transparently via the TDM circuit emulation mode.

TDMoIP Gateway in the figure is the PWE3 device. The PWE3 packet

formats on ①②③④ paths are as shown in Figure 21-14.

Figure 21-14 PWE3 packet



Performance Test Result

Figure 21-15 Performance test environment

To test the reliability of the PWE3 circuit emulation, set up the test

environment as shown in Figure 21-15. Enable the PWE3 function on PE;

construct various background flows via the SMARTBIS devices; simulate

the network block and bandwidth burst change. And then use Router A to

send the test packet to the Router B. The test result is that no matter

whether there is background flow impact, Router B can receive the data

from Router A completely.



Loopback Detection Technology

Introduction to Loopback Detection Ethernet is one broadcast network. When the destination of the packet

cannot be identified, the switch broadcast the packet in one VLAN. When

there is loop in the network, the packet is forwarded repeatedly in the

network and at last, the network bandwidth is consumed up and the

communication cannot be performed. Enable the loopback detection

function on the port and send Loopabck packets with an interval to check

whether there is loop in the network. When the port receives the Loopback

packet sent by the local device, analyze the source port of the packet from

the loopback packet, set the port as ERR-DISABLE, and print the log

information.

This section describes the theory of the loopback detection protocol and

how to realize it.

Related Terms of Loopback Detection Protocol LBD: Loopback detection

Introduction to Loopback Detection Protocol The loopback detection protocol is used to detect the uni-port network

loop.



Ethernet is the multipoint-to-multipoint network, as well as one broadcast

network. When the destination address of the packet cannot be identified,

the switch broadcasts the packet to all terminal stations. Therefore, when

there is loop in the network, the packet is forwarded repeatedly and at last,

the network bandwidth is consumed up and the communication cannot be

performed.

There are two cases of loop. One is that the loop is between different ports

of the switch. For example, because of connection error, two ports of one

switch are connected; the other is that the loop is on one port of the

switch. For example, the port is connected to one bridge device and the

Ethernet port of the bridge loops. In the first case, you can use STP to

detect, but in the second case, STP is useless and you should adopt other

methods to detect.

The theory of the port loopback detection is to send one special packet

timely. In the normal state, the device that receives the packet drops it. If

there is loop, the packet is returned to the source port. Compared with the

sent packet, you can get to know whether there is loopback.

Format of Loopback Detect ion Protocol Packet The format of the Ethernet loopback detection protocol packet is as follows:

The format of loopback detection packet

Fields of Loopback Detect ion Packet DMAC field (6 bytes): the destination MAC address of the packet;

SMAC field (6 bytes): the source MAC address of the packet;

QTag field (4 bytes): If VLAN is configured, tag is four bytes. Otherwise,

there is no tag field;

Ethernet type field (2 bytes): the protocol type number of the loopback

packet, 0x9000;

Skip count field (2 bytes): The field is usually set as 0x0000;

Message type field (2 bytes): The message type; if it is 0x0100, it

means Reply message; if it is 0x0200, it means Forward_data;

Port Index field (2 bytes): the number of the port that sends the loopback

packet;



Workf low of Loopback Detect ion The workflow of the loopback detection is as follows:

Send the detection packet with an interval on the port that is configured

with loopback detection. The DMAC of the packet is one MAC of the switch

(got from the base MAC); the SMAC is one MAC of the switch (got from

the base MAC); Skip counter is 0; Message type is 0x0100; Receipt

number is the port number. If the port is not configured to any VLAN, send

only one untag loopback packet. Otherwise, if the port belongs to one or

multiple VLANs, besides one untag loopback packet, send the tag loopback

packet to each VLAN that is configured with tag.

When the port receives one loopback packet that is not configured with

the loopback detection, drop it. Otherwise, check whether the DMAC and

SMAC of the packet are the MAC addresses of the device. If yes, prompt

the port loopback to the user. If the port is in the controlled state,

shutdown the port. Otherwise, do not shutdown the port and only prompt

the port loopback to the user.

Typical Application When using the loopback detection, ensure that the corresponding port is

configured with the loopback detection function and works in the same

detection mode.

In this section, configure one basic loopback detection protocol for

reference.

The network topology is as follows:

Figure 22-1 Application instance of the loopback detection

Illustration



The Ethernet port0/1 of Switch1 is connected to Ethernet port0/2 of switch

2 via the network cable. Use the network cable to interconnect the port

0/3 and port 0/4 of switch 2. Add port 0/1 of switch 1 and port 0/2, port

0/3 and port 0/4 of switch 2 to VLAN 10 in tagged mode. Check whether

there is loop on switch 2 via the loopback detection function on switch 1.

The configuration of Switch1:

Command Description

switch1(config)#loopback-detection enable Enable the port loopback detection globally

switch 1(config)# port 0/1 Enter the port configuration mode

switch1(config-port-0/1)# port hybrid tagged vlan 10

Add port0/1 to VLAN10 in tagged mode

switch1(config-port-0/1)#loopback-detection enable interval-time 10

Set the interval of sending the loopback detection packets of port 0/1 as 10s

switch1(config-port-0/1)#loopback-detection enable

Enable the port loopback detection

switch1(config-port-0/1)#exit Complete the loopback detection configuration

The configuration of Switch2:

Command Description

switch2 (config)# port 0/2-0/4 Enter the port configuration mode

switch2(config-port-range)# port hybrid tagged vlan 10

Add port0/2, port 0/3 and port 0/4 to VLAN 10 in tagged mode.



Super VLAN Technology

This chapter describes the Super-VLAN technology and application.

Main contents:

Super-VLAN theory

Super-VLAN realization

Typical application

Super-VLAN Theory Super-VLAN: also called VLAN aggregation. Super-VLAN associates

multiple sub-VLANs. Configure the IP address on the Super-VLAN interface.

Each Sub-VLAN is one broadcast domain and different Sub-VLANs are

separated from each other. To realize the intercommunication between

different Sub-VLANs, the ARP proxy function is needed. With the ARP

proxy, forward and process the ARP request and response packets, so as

to realize the L3 intercommunication between L2 separated ports. The L3

communication between the users in sub-VLAN uses the IP address of

Super-VLAN as the gateway. In this way, the IP addresses are saved.

Sub-VLAN: The VLAN that is added to Super-VLAN becomes the Sub-

VLAN. The communication in one Sub-VLAN completely belongs to

common L2 communication. Sub-VLAN cannot be bound with the L3

interface for L3 forwarding directly, but performs the L3 communication

via the Super-VLAN.



Super-VLAN Realization

Figure 23-1 Super-VLAN diagram

To realize the intercommunication between different Sub-VLANs, run the

ARP proxy on Super-VLAN to process the received ARP request and

response packets. Meanwhile, the L3 switch serves as the intermediate

forwarding device to forward the packets between Sub-VLANs.

For the PC communication in different Sub-VLANs (such as PC1<---- >PC3

in the above figure), they are in different broadcast domains, and need the

router to transfer, that is, send the packet to the broadcast domain of the

destination PC. Configure the ARP proxy on Super-VLAN and realize the

intercommunication via the Super-VLAN. The ARP proxy technology is

used as follows:

Suppose that PC1 of Sub-VLAN_1 wants to communicate with PC3 of Sub-

VLAN_2. PC1 sends the ARP request packet to request the MAC address of

PC3. All PCs in Sub-VLAN_1, including Super-VLAN_1 interface can receive

the request packet.

After Super-VLAN_1 receives the ARP request packet, check (for example,

whether the ARP proxy is configured) and judge whether forwarding is

needed. If yes, modify the MAC address of the requester (sender) of the

ARP packet to the MAC address of the interface and forward to other Sub-

VLAN.

After PC3 of Sub-VLAN_2 receives the request packet, make the

responding ARP reply. Here, the source MAC address of the ARP request

packet is the MAC address of the interface, so the destination MAC of the



PC3 response packet and the switch interface can receive the ARP

response.

After the switch receives the ARP response, make a series of processing

and judge whether to answer the ARP response to the original requester of

the ARP (PC1). When answering the ARP response, modify the source MAC

address of the ARP response packet to the MAC address of the interface.

After PC1 receives the ARP response packet, the IP packet sent from PC1

to PC3 is sent to the Super-VLAN_1 interface. The switch forwards the IP

packet to PC3 via the Super-VLAN_1 interface.

Here, the communication of PC1 PC3 is PC1 L3switch

PC3. The corresponding MAC address of PC3 IP on PC1 is the

interface MAC address of L3 switch. Similarly, the corresponding MAC

address of the PC1 IP on PC3 is the interface MAC of L3 switch. When the

ARP packet sent from PC1 to Super-VLAN_1, write the PC1 information to

ARL (Address Resolution Logic) table. When other PC needs to request the

address of PC1, the Super-VLAN_1 interface directly searches the ARL

table. If the route to PC1 is available and the Sub-VLAN of the requester is

different from the Sub-VLAN of PC1, directly send the ARP response

packet to the requester, but do not need to forward the ARP request

packet in all other VLANs.

After the Super-VLAN_1 interface receives the ARP response packet of PC3,

search the ARL table according to the destination IP address of the ARP

packet. According to the recorded binding information of IP and VLANID,

Port, the switch can get to know which VLAN PC1 is in and from which port

the packet sent to PC1 should be sent out. In this way, the packet does

not need to be forwarded in all other VLANs.

Typical Application Create Super-VLAN 10 and Sub-VLAN: VLAN 5, VLAN 6, and VLAN 8. Port

0/2 and port 0/3 belong to VLAN 5; port0/0/4 and port0/0/5 belong to

VLAN 6; port0/0/6 and port0/0/7 belong to VLAN 8. The L2 separation is

performed between different VLANs, so all Sub-VLANs use the L3 interface

of the Super-VLAN as the gateway to communicate with the outside, so as

to realize the L3 communication between different Sub-VLANs.



Figure 23-2 Networking

port0/2 and port0/0/3 are added to VLAN5; port0/0/4 and port0/0/5 are

added to VLAN6; port0/0/6 and port0/0/7 are added to VLAN 8. Create

Super-VLAN 10, enable ARP proxy function; add VLAN 5, VLAN 6, and

VLAN 8 to Super-VLAN 10. Create the VLAN interface of Super-VLAN 10

(interface vlan 10) and configured the reasonable IP address on the VLAN

interface. The configurations of the basic functions of Super-VLAN are

complete.



L3 Multicast Technology

This chapter describes the IP multicast theory and the related multicast

protocols. IGMP(Internet Group Management Protocol) is mainly used to

manage the group number relation between the host and the route/switch

device. The dynamic multicast routing protocol is used to maintain the

consistent multicast route table of the whole network. The multicast public

part maintains one multicast forwarding table calculated according to the

multicast route table. When the multicast service packets are received, the

route/switch device searches the multicast forwarding table to confirm

whether to forward the packets and how to forward the packets.

Note The term ―Route/switch device‖ used in this chapter means the

router or the L3 switch with the routing function.

Main contents:

Introduction to multicast

Related terms of IGMP Protocol

Introduction to IGMP Protocol

Related terms of PIM-SM protocol

Introduction to PIM-DM protocol

Introduction to MSDP protocol

Introduction to Multicast When the destination address of the information (including data, voice and

video) is one group of users in the network, you can adopt many kinds of

transmission modes. For example, adopt unicast mode and set up one

separate data transmission path for each user; adopt the Broadcast mode

and transmit the information to all users in the network; no matter

whether they need, they all receive the broadcasted information. The

above two modes both waste lots of bandwidth resources. Moreover, the

broadcast mode is not be propitious to the security and confidentiality of



the information. The IP multicast technology solves the problem validly.

The multicast source only sends the information once. Simply speaking, IP

multicast is one technology of saving the bandwidth and it transmits one

separate information flow to multiple receivers at the same time, reducing

the network traffic.

If there are the route/switch devices that do not support the multicast, the

multicast route/switch device can adopt the tunnel mode to encapsulate

the multicast packets in the unicast IP packet and then send it to the

neighboring route/switch device. And then the neighboring multicast

route/switch device removes the unicast IP head and continues to perform

the multicast transmission until reaching the destination.

Related Terms of IP Multicast ip multicasting: The concept of the IP multicasting is defined in RFC

1112 and RFC 2236, that is, how to send packets to one host. One host

group means multiple devices that share one IP address. The IP multicast

transmission is the same as the IP unicast, adopting the ―best-effort‖

transmission mechanism to send packets. This means that for all hosts in

the group, it cannot be sure that the packets can be received correctly in

order.

multicast address: Currently, the address space reserved for IP

multicast is Class D address, which ranges from 224.0.0.0 to

239.255.255.255. The high bits of these addresses are all defined as 1110.

multicast distribution tree: In the multicast model, the source host can

send information to any host that is added to the multicast group. The

path of the IP multicast service packets in the network becomes the

multicast distribution tree, which includes the source tree and shared tree.

source tree: The root of the tree is the multicast information source. The

branches form the distribution tree that reaches the receiving station via

the network. The source tree that runs through the network with the

shortest path is called the shortest path tree (SPT).

shared tree: The shared tree does not use the information source as the

tree root, but adopts some selectable point in the network as the public

root, which is called Rendezvous Point (RP).



reverse path forwarding: When the multicast service packet reaches

the route/switch device, it executes the RPF check on the packets. If

passing the check, forward the packet. Otherwise, discard the packet.

multicast cache: It is also called the multicast route entry. It contains

the valid input and output interface information of the multicast service

packets, which is the evidence of the RPF check. The multicast cache is

generated and updated by the multicast routing protocol.

IP Multicast Address IP multicast address is used to identify one IP multicast group. IANA

distributes Class D of addresses to multicast, which ranges from 224.0.0.0

to 239.255.255.255. The front four bits of the IP multicast addresses are

all 1110.

Distr ibut ion of IP Mul t icast Addresses The space of the IP multicast addresses are distributed: 224.0.0.0 to

224.0.0.255 are preserved by IANA; the address 224.0.0.0 is reserved.

The other addresses are used by the routing protocols and topology

searching and maintaining protocols. Regardless of TTL, none of the

addresses in the range are forwarded by the route/switch device, that is,

can only be transmitted in LAN. The addresses from 224.0.1.0 to

238.255.255.255 serve as the user multicast addresses, which are valid in

the whole network. The multicast addresses from 239.0.0.0 to

239.255.255.255 are the local management multicast addresses

(administratively scoped addresses), valid only in the specified local range.

Mapping from IP Mult icast Address to MAC Address IANA distributes the MAC addresses from 01:00:5E:00:00:00 to

01:00:5E:7F:FF:FF to multicast, which requires mapping 28-bit IP

multicast address space to 23-bit MAC address space. The mapping

method is to put the low 23 bits of the multicast address to the low 23 bits

of the MAC address, as follows:



Mapping from multicast address to MAC address

Only 23 bits in the back 28 bits of the IP multicast address are mapped to

the MAC address. In this way, 32 IP multicast addresses are mapped to

one MAC address.

IP Multicast Features In the common TCP/IP route, the transmission path of one packet is from

the source address to the destination address, adopting the Hop-by-Hop

theory to transmit in the IP network. However, in the IP multicast

environment, the destination address of the packet is not one but one

group, forming the group address. All information receivers are added to

one group. Once being added to the group, the data to the group address

is transmitted to the receivers at once. All members in the group can

receive the packet. Therefore, to receive the packet, it must become the

member of the multicast group first, while the sender of the packet does

not need to be the member in the group. In the multicast environment,

the data is sent to all members in the group and the users those are not

the members in the group do not receive the data.

The IP multicast has the following features:

1. There is no limitation for the location of the group members and the

number of the members. That is to say, the separate host can be

added to or leave the multicast group at any time. The members can

be at any place of Internet. One host can be the member of more than

one multicast groups at one moment;

2. One host can send packets to one multicast group, even the host is

not the member of the group. Transmitting the multicast packet to all

hosts in one multicast group is like the unicast and only needs to send

one packet to the group address;

3. The route/switch device does not need to save the member relation of

all hosts. It only needs to know whether there is host that belongs to

one multicast group on the segment of the physical interface; the host

needs to save the information of the multicast groups to which the

host is added.



IP Multicast Routing Protocol The multicast protocol includes two parts: One is the Internet Multicast

Management Protocol (IGMP) as the basic signaling protocol of the IP

multicast; the other is the multicast routing protocol of realizing the

selection of the IP multicast path (such as DVMRP, PIM-SM, and PIM-DM).

Internet Mult icast Management Protocol IGMP defines the mechanism of setting up and maintaining the multicast

member relation between the host and route/switch device (or between

the route/switch devices), which is the basis of the whole IP multicast.

IGMP notifies the member information of the route/switch device group.

The route/switch device uses IGMP to get to know whether there are the

members of the multicast group on the subnet connected to the

route/switch device. The specified application program can know the

information of which data source is sent to which multicast group. If there

is one user in LAN who announces that he is added one multicast group

via IGMP, the multicast route/switch device in the LAN spreads the

information via the multicast routing protocol and at last, the LAN is added

to multicast tree as a branch. After the host as the member of one group

receives the information, the route/switch device periodically queries the

group and checks whether the member of the group takes part in. As long

as one host takes part in, the route/switch device continues forwarding

data. When all users in the group exits the multicast group, the related

branches are deleted from the multicast tree.

Mult icast Rout ing Protocol The group address in the multicast is virtual, so it cannot be routed to the

specified destination address directly from the data source, like unicast.

The multicast application program sends the packets to a group of

receivers (multicast address) who hope to receive data, but not only one

receiver (unicast address).

The multicast routing protocol sets up one non-loop data transmission

path from the data source to multiple receivers. The task of the multicast

routing protocol is to construct the multicast distribution tree. The

multicast route/switch device can adopt multiple methods to set up the

path of transmitting data, that is, the distribution tree. According to the

actuality, the multicast routing protocol can be divided to two types, that

is, dense mode and sparse mode.

1. Multicast in dense mode

The multicast routing protocol in dense mode is suitable for small network.

It supposes that each subnet in the network has at least one receiver that

is interested in multicast group, so the multicast packet is spread to all



nodes in the network and the related resources (such as bandwidth and

CPU of route/switch device) are consumed. To reduce the consumption for

the precious network resources, the multicast routing protocol in dense

mode performs the pruning operation for the branches without multicast

data forwarding and only reserves the branches that contain the receivers.

To make the receiver with the multicast data forwarding requirement in

the pruned branch can receive the multicast data flow, the pruned branch

can periodically recover to the forwarding status. To reduce the delay of

waiting for the pruned branch to recover to the forwarding status, the

multicast routing protocol in dense mode uses the graft mechanism to add

into the multicast distribution tree actively. The periodical spreading and

pruning are the features of the protocol in dense mode. Generally

speaking, the forwarding path of the packets in the dense mode is ―source

tree‖ (one tree with the source as root and the member as leaf). The

typical multicast routing protocols in dense mode are PIM-DM and DVMRP.

2. Multicast in sparse mode

The multicast in sparse mode supposes all machines do not need to

receive the multicast packets, but forward only when there is specified

requirement. To receive the data flow of the specified group, the receiver

must send the adding message to the Rendezvous Point of the group and

the path of the adding message becomes the branch of the shared tree.

When sending the multicast packets, the multicast packets are sent to the

Rendezvous Point and then are forwarded along the shared tree with the

Rendezvous Point as the root and the member as leaf. To prevent the

branch of the shared tree is deleted because of not being updated, the

multicast routing protocol in sparse mode periodically sends the adding

message to the branches, so as to maintain the multicast distribution tree.

To send data to the specified address, the sender first needs to register at

the Rendezvous Point and then sends the data to the Rendezvous Point.

When the data reaches the Rendezvous Point, the multicast packets are

copied and transmitted to the receivers along the distribution tree path.

The copying only happens to the branches of the distribution tree and it

can automatically repeat until the packet reaches the destination.

The typical multicast routing protocol in sparse mode is the PIM-SM in the

sparse mode.

Forwarding of IP Mult icast Packets When forwarding the unicast packet, the route/switch device does not care

about the unicast source address, but only cares about the destination



address of the packet. The route/switch device decides to which interface

the unicast packet is forwarded according to the destination address. In

the multicast, the packet is sent to a group of receivers. The receivers are

identified by one logical address. After receiving the multicast service

packet, the route/switch device must confirm the upstream (pointing to

the multicast source) and downstream directions (forward the packet

along the direction of away from the multicast source) according to the

source and destination addresses. The process is called RPF (Reverse Path

Forwarding).

The RPF process uses the original unicast route table to confirm the

upstream and downstream adjacency nodes. Forward the packet to the

downstream only when the packet reaches from the interface (called RPF

interface) of the upstream adjacency node. The RPF can be used to

forward the packets correctly according to the configuration of the

multicast route and avoid the loop caused because of various reasons.

Avoiding the loop is an important problem in the multicast routing. The

main body of RPF is RPF check. After receiving the multicast packet, the

route/switch device first performs the RPF check. The packet can be

forwarded only after passing the check. Otherwise, drop the packet. The

process of the RPF check is as follows:

1. The route/switch device searches for the multicast source or the RPF

interface of RP in the unicast route table. When the source tree is used,

search for the RPF interface of the multicast source; when the shared

tree is used, search for the RPF interface of RP. The RPF interface of

one address means the output interface when sending the IP unicast

packet from the route/switch device to the address;

2. If the multicast packet is received from the RPF interface, the RPF

check is passed. If the multicast packet passes the RPF check, the

route/switch device forwards the packet to the downstream interface.

Otherwise, drop the packet.

The following figure shows the RPF check process when the source tree is

used.



RPF check

The route/switch device E receives one multicast packet from the S0

interface. The source address of the packet belongs to Source Segment.

The route/switch device E checks the route table and finds that the output

interface that reaches Source Segment is S1, so drop the packet. If the

multicast packet reaches from the S1 interface, the reaching interface is

consistent with the interface searched from the table and the route/switch

device forwards the packet.

From the RPF check process, we can see that the RPF check uses the

interface of the shortest path from the route/switch device to the multicast

source or RP, so it is called Reverse Path Forwarding.

IP Multicast Application

Information Distr ibut ion IP multicast makes the data in the company can be distributed to lots of

users. For example, one company with several chain stores can use the

multicast to transmit the price information to the cash registers of the

chain stores or the media provides the onsite real-time information to the

users that support multicast via Internet, such as the remote employee

management and remote education.

Data Broadcast The traditional data broadcast is based on the broadcast and occupies lots

of Internet bandwidth. With the multicast technology, the TV and radio not

only can broadcast the programs to the users that really need the data,

but also can reduce the maintenance costs of the network.

Related Terms of IGMP Protocol Internet Group Management Protocol: IGMP makes the IP host can

report to which host group the neighboring multicast route/switch device

belongs. IGMP is one part of the Internet protocol stack, so the IGMP

message is encapsulated in the IP packet.

IGMP querier: The IGMP querier can send the IGMP query packets

regularly to query whether there is the host member that is applying for

adding to the multicast group in the LAN of the route/switch device.



Besides, in the version 2, IGMP querier sends the query packet of the

specified group for the IGMP leave message of one group member; in

version 3, send the query packet of the specified source for the specified

multicast source. Usually, the host does not generate the query packet

and it returns one group member qualification report packet as desired

only when receiving the query packet.

Introduction to IGMP Protocol The IGMP protocol is to set up and maintain the group member relation

between the host and the route/switch device. The IGMP protocol runs on

the host and the multicast route/switch device directly connected to the

host. The function of the IGMP protocol is bi-directional: On one hand,

with the IGMP protocol, the host informs the local route/switch device that

it hopes to add into one multicast group and receive the information of the

multicast group; on the other hand, the route/switch device periodically

queries whether the members of one known group in the LAN are in the

active state via the IGMP protocol, that is, whether the segment has the

member that belongs to one multicast group to collect and maintain the

group member relation of the connected network. From the information

recorded in the route/switch device via IGMP, you can get to know

whether one multicast group has the group member at the local, but not

the corresponding relation between the multicast group and the host.

Currently, there are three versions of IGMP: IGMPv1(RFC1112) defines the

process of querying and reporting the basic group member; IGMPv2 is

defined by RFC2236 and is added with the fast leave mechanism of the

group member based on IGMPv1; IGMPv3 is defined by RFC3376 and the

added function is that the member can specify to receive or not receive

the packets of some multicast sources.

IGMP Protocol Theory The following takes IGMPv2 as an example to describe, as follows:

IGMPv2 work theory



When there are multiple multicast route/switch devices in one segment,

IGMPv2 chooses one unique querier via the querier selection mechanism.

The querier periodically sends the common group query message to query

the member relation; the host sends the report message to answer the

query. The time of the host sending the report message is random. When

there is other member to send the same message in the same segment,

suppress its own response packet. If there are new hosts to add into the

multicast group, do not need to wait for the query message of the querier,

but actively send the report message. To leave the multicast group, the

host sends the leave group message; after receiving the leave group

message, the querier sends the query message of the specified group to

confirm whether all group members have leaved. For the route/switch

device as the group member, its action is the same as the common host,

answering the query of the other route/switch device.

With the above mechanism, set up one table in the multicast route/switch

device, which records which subnets of the interfaces on the route/switch

device have the multicast group of the active member, as well as one

timer for each multicast group. Besides, the table records one member of

the multicast group, but does not need to record all members. After the

route/switch device receives the packets of one group G, forward the

packet only to the interfaces with the member of group G. How to forward

the packet between the route/switch devices depends on the multicast

routing protocol, which is not the function of the IGMP protocol.

IGMP V1

Packet Format IGMP is one part of IP. The IGMP packet is encapsulated in the IP packet.

The protocol number of the IP packet is 2. The IGMP packet uses TTL 1 to

transmit and includes the IP route checksum in the IP head.

version type unused checksum

group address

Version number: 1

Type: When it is 1, it indicates the query packet of the member relation;

when it is 2, it indicates the report packet of the member relation;

Unused: During sending, it is set as 0; during receiving, it is omitted;

Checksum: Perform the 16-bit complement arithmetic for the

complementing sum of the 8-byte IGMP message;

Group address: It is 0 for the query packet of the member relation; it is

the IP multicast address of the reported group for the report packet of the

member relation;



Query-Response Process The route/switch device sends the query packet to 224.0.0.1 (all hosts in

the network);

The host that receives the packet fills the address of multicast group to

which it is added in the report packet and multicasts the packet to the

multicast address;

After other hosts added to the multicast group receive the multicast

packet, suppress the sending of its own report packet;

Therefore, the IGMP querier route/switch device only records to which

multicast groups one interface of the device is added and it must record

which hosts are added to the multicast group.

Response Packet Suppression After the host receives the query packet, it does not answer at once, but

delays 0-10s. This can avoid the response storm. Besides, the host the

opportunity to receive the response packet notified by other host, so as to

suppress the sending of the local packet.

Active Adding Process When the host is added to one multicast group for the first time, it can

actively notify one report packet of the IGMP member relation when not be

queried, so as to add into the multicast group.

Process of Leaving Mul t icast Group IGMP V1 does not have the packet of leaving the multicast group. When

the route/switch device does not receive the response packet within three

times of the query interval, delete the multicast group.

IGMP V2

Improvement Compared with V1 Query selection process

Maximum response time field

The message of querying the specified group



The message of leaving group

Packet Format

type Max Resp Time checksum

group address

Type: type;

Three kinds of IGMP message is related with the interaction between the

host and the route/switch device:

1. 0x11 = Membership Query

There are two sub types of the membership query:

- General query, used to understand whether one group has members in

the neighboring network.

- Specified group query, used to understand whether the specified group

has the members in the neighboring network.

The two messages use the group address to distinguish. For the general

query, the group address is 0; for the specified group query, the group

address contains the multicast group address to be queried.

2. 0x16 = Version 2 Membership Report

3. 0x17 = Leave Group

To be compatible with IGMP v1, there is one additional message type:

0x12 = V1 member report

Max Resp Time: maximum response time

The maximum response time domain is valid only in the member relation

query. It defines the maximum waiting time before answering the

membership query (the unit is 1/10s). In all other messages, the sender

sets it as 0, while the receiver ignores the domain.

Query-Response Process After the host receives the query packet, it sets one delay timer for each

group. The value of the timer is selected from o to the maximum response

time defined in the query packet. After the timer of the group arrives, the



host multicasts one V2 member report to the group and the TTL is 1. If the

host receives the report of another host (the version is 1 or 2), but its own

timer does not arrive, it stops the timer and does not send report, which

reduces the repeated report.

When receiving the membership report, the route/switch device adds the

group to the member list of the multicast group and sets one timer with

value as Group Membership Interval (GMI) for it. Receiving the report of

the group results in the updating of the timer. If the timer times out, the

route/switch device regards that there is no local group member and does

not need to forward the multicast packet for the group on the neighboring

network.

When the host is added to the multicast group, send one V2 membership

report at once, avoiding that it is the first member of the group on the

network. The report may be lost, so the host needs to re-send the

membership report for at least one time after Unsolicited Report Interval

(URI).

IGMPv2 Leave Group Informat ion IGMP V2 adds one feature, that is, the leave group information. In IGMP

V1, the host leaves stealthily and does not send any message. In V2,

when the host leaves one multicast group, it sends one leave message to

all route/switch device multicast group address (224.0.0.2).

When the querier receives the leave group message of the group member,

send the specified group query to the group that is to leave, so as to

confirm whether there are other active group members in the subnet.

Other active group members answer the membership report. If there is no

any report message at the last member query period, the route/switch

device regards that the group does not have the local member.

Inter-operation of V1 and V2

Route/Switch Device Serving as Mul t icast Host If the route/switch device that supports V2 receives the V1 IGMP

membership query, the route/switch device turns to the status that the

current queried route/switch device is V1 and sets one timer. As long as

the timer receives the V1 membership query, it resets. If the timer times

out, the route/switch device returns to the V2 status.



Device is Mult icast Route/Switch Device If there is the V1 group member in the subnet, the V1 host cannot identify

the specified group query, so it must ignore the leave message of V2 host

and does not process the leave. If there is the V1 route/switch device in

the subnet, configure all route/switch devices in the subnet as V1.

IGMP V3

Improvement Compared with V1 and V2 Add the private member report packet of V3; one packet can report

multiple group records and each group record can indicate which

sources to be received or refused;

The member report is sent to all IGMP V3 route/switch device groups

(224.0.0.22);

Add the specified source query;

When the source quantity in the query packet is 0, the query packet

length is 4 bytes more than the V2 packet;

The Max Resp Code field; when the number is larger than 128, you

can perform the floating-point transformation to get Max Resp Time;

With the INCLUDE and EXCLUDE filtering mode, unify the formats of

the member report packet and leave packet;

Packet Format There are two packet types of the IGMPv3 protocol:

0x11: member query packet

0x22: V3 member report packet

The format is as follows:

1. IGMP V3 query packet:

Type = 0x11 Max Resp Code Checksum

Group Address

Resv S QRV QQIC Number of Sources (N)

Source Address [1]



Source Address [2]

… …

Source Address [N]

Type: type;

Max Resp Code: maximum response time;

The actual used is Max Resp Time (the unit is s1/10s). The relation of Max

Resp Time and Max Resp Code is as follows:

If Max Resp Code < 128, Max Resp Time = Max Resp Code;

If Max Resp Code >= 128, Max Resp Code indicates one floating value of

the following format :

0 1 2 3 4 5 6 7

+-+-+-+-+-+-+-+-+

|1| exp | mant |

+-+-+-+-+-+-+-+-+

So, Max Resp Time = (mant | 0x10) << (exp + 3)

Checksum: the parity sum of the IGMP packet;

Group Address: group address;

When sending the general query, the group address is 0; when sending

the specified group and specified source query, it is the group address;

Resv: the reserved domain; during sending, it is 0; during receiving, it is

omitted;

S: the flag S;

When it is set as 1, it indicates that the multicast route/switch device

suppresses the updating of the timer when receiving the query, but it does

not suppress the selection of the querier or the processing of the host (the

route/switch device also can also serve as the group member) for the

received query.

QRV: Querier's Robustness Variable

When QRV is not 0, it indicates the robustness variable used by the

route/switch device that sends the query; when QRV exceeds 7, it is

processed as 0 and the route/switch device uses the QRV in the latest

received query as its own robustness variable; if the QRV of the received

as 0, use the local default robustness variable.



QQIC: Querier's Query Interval Code

The actual used is QQI. The relation between QQI and QQIC is similar to

Max Resp Code. When smaller than 128, QQI = QQIC; when larger than

128, it is processed as the floating value.

Number of Sources (N): the number of the queried sources in the query.

For the general query and specified group query, N is 0; for the specified

source query, N is not 0. The N value is limited by the MTU of the network.

Source Address [N]: the source address;

2. IGMP V3 member report packet:

Type = 0x22 Reserved Checksum

Reserved Number of Group Records (M)

Group Record [1]

Group Record [2]

… …

Group Record [M]

Type: type

Reserved: the reserved value. During sending, it is o; during receiving, it

is omitted;

Checksum: the parity sum of the IGMP packet;

Number of Group Records (M): the number of the group records;

Group Record: the group record;

The format of the group record is as follows:

Record Type Aux Data Len Number of Sources (N)

Multicast Address

Source Address [1]

Source Address [2]



… …

Source Address [N]

Record Type: the group record type; the value range is 1-6, whose

meanings are as follows:

1: MODE_IS_INCLUDE, indicating that the filtering mode of the interface

on the host is INCLUDE mode. The source list in the record is the source

list maintained on the host. The host is interested in the source of the

source list.

2: MODE_IS_EXCLUDE, indicating that the filtering mode of the interface

on the host is EXCLUDE mode. The source list in the record is the source

list maintained on the host. The host is not interested in the source of the

source list.

3: CHANGE_TO_INCLUDE_MODE, indicating that the interface of he host

becomes the INCLUDE mode. The source list contains the new interested

source list maintained on the interface;

4: CHANGE_TO_EXCLUDE_MODE, indicating the interface of he host

becomes the EXCLUDE mode. The source list contains the new

uninterested source list maintained on the interface;

5: ALLOW_NEW_SOURCES, indicating that the source in the source list is

the new interested source on the host;

6: BLOCK_OLD_SOURCES, indicating that the source in the source list is

the old interested source on the host;

Here, the group records of type 1 and 2 are the current status group

records. the group records of type 3 and 4 are the status change group

records.

Besides, IGMPv3 supports the packet types of V1 and V2, as follows:

0x12: the member report of V1

0x16: the member report of V2

0x17: the leave packet of V2

Query-Response Process The multicast route/switch device periodically sends the general query to

get the IGMP member information of the local network. After receiving the

general query, the host collects its own group information, including the

interested or un-interested source list, to fill in the current status group

record and returns the IGMP V3 member report (sent to all IGMP v3

route/switch device group 224.0.0.22) to the route/switch device.



When the group information or source information of the host changes

(maybe the filter mode changes or the source list changes), the host fills

the change information into the group record whose status changes and

then actively sends the IGMP V3 member report to the route/switch device.

After the route/switch device receives the member report, refresh the local

group and source status. Before the filter mode of the group maintained

by the route/switch device changes from EXCLUDE to INCLUDE, send the

specified group query, which is reflected on IGMPv2 as sending the

specified group query before the local un-interested group is deleted. For

the local un-interested source, send the specified source query before

deleting the source. Generally, send the specified group or specified

source query only after receiving the group record whose status changes.

For the current status record, do not send the specified group or specified

source query.

IGMP Status Informat ion on Route/Switch Device Each interface and each group on the route/switch device has one group

status. The group status comprises the group address, the filter mode

(INCLUDE / EXCLUDE), source list and group timer.

Each source in the source list of each group has one source status,

comprising the source address and the source timer.

When all sources of one group are interested, the group status is EXCLUDE

and the source list is null.

When there is no IS_EX or TO_EX report in the network, the filter mode of

the group status is INCLUDE; when receiving the IS_EX or TO_EX report,

the filter mode of the group status changes to EXCLUDE.

When the group is EXCLUDE, there are two source lists. One is the list of

the confirmed un-interested and confirmed un-forwarded sources in the

network; the other is the list of the un-interested sources and maybe

interested sources or confirmed interested sources (when turning to

INCLUDE, these sources are needed). The packets of the source from the

list are forwarded.

When the group is INCLUDE, there is only one list, that is, the list

comprising the sources that need to be forwarded. When the timer of the

source in the list times out, the list is empty and the group is deleted.



The group timer runs only in the EXCLUDE filter mode. In the INCLUDE

mode, only the source timer runs. When the source timer times out, the

group is deleted. When the group timer times out, the filter mode of the

group switches from EXCLUDE to INCLUDE.

Only the source whose packet can be forwarded has the source timer. The

source in the source list that is not forwarded in the EXCLUDE mode does

not have the source timer. When the source timer times out and if the

group is the INCLUDE mode, delete the source; if the group is EXCLUDE

mode, remove the source from the forwarding source list to the un-

forwarding source list.

Compatibi l i ty of IGMP V3 wi th V1 and V2 The distinguishing of the query packet version:

IGMPv1 query: The length of the query packet is 8 bytes and Max Resp

Code is 0;

IGMPv2 query: The length of the query packet is 8 bytes and Max Resp

Code is not 0;

IGMPv3 query: The length of the query packet is larger than or equal to 12

bytes.

The distinguishing of the member report packet:

The member report packets of IGMPv1 and v2 are sent to the added group.

The leave packets of IGMPv2 are sent to all route/switch device groups

(224.0.0.2); the member report packets of IGMPv3 are sent to all IGMPv3

route/switch device groups (224.0.0.22).

When the route/switch device of IGMP v3 receives the query packets sent

by the route/switch device with lower version, the route/switch device can

be configured as IGMP v1 or IGMP v2 artificially. If not configured as the

lower version, the alarm information appears.

The IGMP v3 processes the member report packets of v1 and v2 as

IS_EX{} packet, and the leave packets of v2 as the TO_IN{} packets.

Meanwhile, set one timer (v1 and v2 have one separate timer) of the host

with old version for each group.

When the timer of v2 host runs, do not process the BLOCK record of the

group and all TO_EX packets are processed as TO_EX{};



When the timer of v1 host runs, do not process the BLOCK record, all

TO_EX packets are processed as TO_EX{} and do not process the TO_IN{}

record of the v2 leave packet.

When the timer of the v1 host of the group times out and if the timer of v2

host does not run, the processing of the group returns to the v3

processing mode. Otherwise, adopt the v2 processing mode; when the

timer of the v2 host, return to the v3 processing mode.

Related Terms of PIM-SM Protocol BSR: BootStrap Router, used to send RP information in PIM-SM v2;

DR: Designated Router, used to forward the multicast packets and send

adding/pruning and registering messages in the multi-path access network

(such as Ethernet);

IGMP: Internet Group Management Protocol;

PIM: Protocol Independent Multicast;

PIM-DM: Protocol Independent Multicast-Dense Mode;

PIM-SM: Protocol Independent Multicast-Sparse Mode;

RP: Rendezvous Point, the tree root of the shared tree;

RPF: Reverse Path Forwarding;

SPT: Shortest Path Tree, the shortest path to the source;

Introduction to PIM-SM Protocol PIM-SM is similar to PIM-DM, adopting any one IP routing protocol (RIP,

IRMP, STATIC and OSPF) to decide the RPF interface. The most important

difference between PIM-SM and PIM-DM is that PIM-SM adopts the pulling

mode, while PIM-DM adopts the pushing mode. The pulling mode supposes

that the multicast is not needed. The multicast information is not sent to

the receiving station unless explicitly adding.



Basic Hierarchy of PIM-SM in TCP/IP Protocol Stack

Basic hierarchy of PIM-SM in TCP/IP protocol stack

The PIM-SM protocol is at the upper layer of the IP protocol and

communicates with IP via the original socket. The protocol number of PIM-

SM in the IP packet is 103.

PIM-SM Protocol In the PIM-SM domain, the route/switch device that runs the PIM-SM

protocol periodically sends the Hello message, which is used to discover

the neighboring PIM route/switch device and is responsible for selecting

DR in the multi-path access network. Here, DR is responsible for sending

the adding/pruning message and registering message.

PIM-SM sets up the multicast distribution tree to forward the multicast

packets. The multicast distribution tree includes two kinds, that is, the

shared tree with RP of group G as the root and the source tree with the

multicast source as the root. PIM-SM sets up and maintains the multicast

distribution tree via the explicit adding/pruning mechanism.

When there is the active member of the group G in the direct-connected

network of DR, send the multicast adding message hop by hop along the

RP direction of the group G to add into the shared tree (No. 1 in the

following figure). When the adding go upstream along the shared tree, the

route/switch devices on the way set up the multicast forwarding status

(No. 2 in the following figure), that is, route option. The route option

includes the fields of the source address, group address, input interface of

the multicast packet, output interface list of the multicast packet, timer

and flag so that the route/switch device can forward the received multicast

data along the tree. When the pruning message goes upstream along the

shared tree, the route/switch device on the way updates its route options,

such as the output interface. If the branches of the distribution tree are

not updated, they are deleted after timeout. To avoid this problem, the



route/switch device on the distribution tree periodically sends the

adding/pruning message to the RP of the group, so as to maintain the

multicast distribution tree status.

When the source host sends the multicast data to the group, the source

data is encapsulated in the register message and then DR unicasts it RP

(No. 5 in the following figure). RP encapsulates the register message as

packet and forwards it to the group members along the shared tree. And

then, RP can send the adding/pruning message (No. 3 in the following

figure) for the specified source along the source direction to add into the

shortest path tree of the source. In this way, the packet is sent to RP

without being encapsulated along the shortest path tree. When the

multicast packet reaches along the shortest path, RP sends the register-

stop message to the DR of the source, so as to make DR stop the

registering and encapsulating process. Hereafter, the multicast data of the

source is not registered or encapsulated any more, but is sent to RP (A－B

－RP) along the shortest path tree of the source, and then RP forwards the

packet to the shared tree. At last, the packet is sent to the group

members along the shared tree (RP－C－E).

Work process a of PIM-SM protocol

If reaching a certain data transmission rate, DR can send the explicit

adding message to add into the shortest path tree of the source (No. 9 in

the following figure) and the multicast packet is forwarded along the

shortest path tree. And then DR updates the shared tree and deletes the

corresponding shared forwarding route (No. 8 in the following figure).



Work process b of PIM-SM protocol

PIM-SM refers to the selection mechanism of BSR and RP. One or multiple

Candidate-BSRs are configured in the PIM-SM domain and use some rule

to select the public unique BSR of the domain. Candidate-RP is also

configured in the PIM-SM domain. The Candidate-RPs unicast the packets

that contain the information about their addresses and the multicast

groups that can be served to the BSR, and then BSR regularly generates

the BootStrap messages that contain a series of Candidate-RPs can

corresponding group addresses. The BootStrap messages are sent hop by

hop in the whole domain. The route/switch device receives and saves the

BootStrap messages. Id DR receives the IGMP adding packets from the

direct-connected host and it does not have the route option of the group,

use the hash algorithm to map the group address to one candidate RP,

and multicast the adding/pruning message hop by hop along the RP

direction. If DR receives the multicast packets from the direct-connected

host, and it does not have the route option of the group, use the hash

algorithm to map the group address to one candidate RP and then

encapsulate the multicast data in the register message and unicast it to RP.

In the multi-path access network, PIM-SM brings in the following

mechanism: use the assert mechanism to select the unique forwarder,

avoiding the repeated forwarding of the multicast packet in the same

segment; use the adding/pruning suppression mechanism to reduce the

redundant adding/pruning message; use the pruning deny mechanism to

deny the un-necessary pruning.

DR Select ion The rules of selecting DR are as follows:

1. If the PIM Hello packets of all neighbor route/switch devices on one

interface carry the priority field, first compare the priority values. The

larger the value, the higher the priority. If there are multiple

route/switch devices with the same priority, select the one with largest

IP address as DR;



2. If the interface has one neighbor route/switch device whose PIM Hello

packets do not carry the priority field, select DR according to the IP

address, that is, select the one with the largest IP address as DR.

BSR Select ion At first, the route/switch device configured as the candidate-BSR enters

the Pending-BSR status; set the Bootstrap timer as the random veto value

(5s-23s) and begin to monitor the Bootstrap message.

The Bootstrap message contains the priority and IP address of the

message initiator. When the route/switch device in the Pending-BSR status

receives one Bootstrap message, it compares the priority and IP address

of the message with the its own priority and IP address. If the message

initiator is better, it enters the candidate-BSR status, and set the

Bootstrap timer as the Bootstrap timeout value (130s). If the route/switch

device is better, it does not perform the further processing. When the

Bootstrap timer of the route/switch device in the Pending-BSR status

times out, it enters the Selected-BSR status, send the Bootstrap message

and set the Bootstrap timer as the Bootstrap period value (60s). If the

priorities are the same, the one with larger IP address is better.

When the Bootstrap timer of the route/switch device in the candidate-BSR

state times out, it enters the pending-BSR status, set the Bootstrap timer

as the random veto value and enter a new BSR selection process. When

the route/switch device in the candidate-BSR status receives one better

Bootstrap message, it sets the Bootstrap timer as the Bootstrap timeout

value and still keeps the candidate-BSR status.

When the Bootstrap timer of the route/switch device in the selected-BSR

status, it sends the Bootstrap message, set the Bootstrap timer as the

Bootstrap period value and keep the selected-BSR status. When the

route/switch device in the selected-BSR status receives one poorer

Bootstrap message, it sends the Bootstrap message, set the Bootstrap

timer as the Bootstrap period value and still keep the selected-BSR status.

When the route/switch device in the selected-BSR status receives one

poorer Bootstrap message, it enters the candidate-BSR status and sets the

Bootstrap timer as the Bootstrap timeout value.

Bootstrap message adopts the ―all PIM routers‖ multicast group address

224.0.0.13 and TTL is set as 1. When one PIM route/switch device

receives the Bootstrap message, it sends the message on all interfaces

(except for the receiving interface). The process not only can ensure that

the Bootstrap message is spread to the multicast domain, but also can



ensure that each PIM route/switch device can receive the packet, so as to

know which route/switch device is BSR.

RP Select ion One route/switch device can be configured as the candidate-RP (C-RP) of

some specified multicast group or all multicast groups. After receiving the

Bootstrap message and getting to know the BSR location, C-RP transmits

Candidate-RP-Advertisement message to BSR via unicast. The message

has the RP address of the message initiator, its priority and the multicast

group address of C-RP.

BSR clears up all C-RPs, lists their priorities and their groups and forms

the RP set. BSR declares the RP set to the whole multicast domain via the

Bootstrap message. The Bootstrap message includes one 8-bit hash mask.

When one route/switch device receives the IGMP message or PIM join

message and one shared tree needs to be added, it checks the RP set got

from BSR. With the specified hash algorithm, select the RP for the

multicast group.

Int roduct ion to PIM SSM PIM SSM is short for Protocol Independent Multicast ----Source Specific

Multicast. PIM SSM is the specified source multicast of PIM, that is,

perform the special processing for the multicast services in the

232.0.0.0/8 address range of IPv4. Performing the multicast service with

group address in SSM needs to complete the related SPT operations. The

discovery of source S is realized via outband, that is, do not use the PIM

message (such as register message). SSM needs the supporting of IGMPv3,

because IGMPv3 can send the IGMP member reports of the specified

source and group at the same time. PIM SSM mode can run on one device

with PIM SM at the same time, but also can run on one device separately,

which depends on the protocol.

Introduction to PIM-DM Protocol PIM-DM is short for Protocol Independent Multicast-Dense Mode. Same as

PIM-SM, PIM-DM is at the upper layer of the IP protocol and communicates

with IP via the original socket. The protocol number of PIM-DM in the IP

packet is 103. The TTL of the sent PIM-DM protocol packet is always 1,

that is, the transmission distance is only one hop.



The basic hierarchy of PIM-DM in TCP/IP protocol stack

PIM-DM Protocol

PIM-DM application topology

Neighbor Setup After the PIM route/switch device starts, it periodically (by default, it 30s)

sends the hello packets to the route/switch device (sent to all PIM router

groups 224.0.0.13) to set up the neighbor relation. The route/switch

device that receives the hello packet adds the route/switch device that

sends the hello packet to the neighbor list and enables one timer for it.

The value of the timer is the value in the holdtime domain in the hello

packet.



Spreading and Pruning Process of Service Packets When the source appears, send one (S, G) service packet to the network.

At the beginning, the packet is spread to the every corner of the network.

When the route/switch device receives the service packet, set up the (S，

G) entry for it and record the input interface and the other are regarded as

the downstream interface. As shown in the figure, C receives the service

packets of A and B, but there can only be one input interface. Select the

route with the smallest cost as the input interface according to the cost of

the route to the source via unicast, but the other sends the pruning

information to prune it.

When the service packet is transmitted from E to I and I finds that itself

does not have the downstream neighbor or local group member and the

egress port is empty, I sends the pruning message to the upstream E

(note: the pruning is sent out from the input interface and the destination

address is the address of the group to be pruned) to ask for pruning. Here,

E finds that it has only one neighbor (such as the point-to-point

connection between E and I), E prunes I at once after receiving the

pruning of I. After pruning, E finds that its egress port is empty, it

continues to send the pruning upstream. After receiving the pruning of E,

C finds that there is local group member (refer to IGMP) in the network, so

ignore the pruning of E.

When the service packet is transmitted to the network from F, G receives

the packet and finds that itself does not have egress port, so forward

pruning upstream. Here, there is no other route/switch device in the

network, so F enables the pruning delay timer; H has the local group

member and egress port, and it needs to receive the service packet. When

H audits the pruning information of G (because the pruning sent by G is

transmitted to the group that needs pruning), it enables one deny timer.

When the timer times out, send the adding packet upstream (sent to the

desired group) and inform F that the service packet needs to be received

in the network and the egress port cannot be pruned. Therefore, after F

receives the adding packet, continue to keep the status of forwarding the

service packet.

Graft ing Process If I has local group member to add and has egress port, it sends unicast

graft packet to E; after receiving the unicast graft packet, E returns one

Graft ACK to change the downstream interface status to the forwarding

status; after I receives the Graft ACK, it changes its upstream interface to

the forwarding status and when there is packet, it can forward. Here, E

finds that itself has the egress port, so send the graft via unicast to the

upstream RPF neighbor (such as C, suppose C is the declared winner); C

returns one Graft ACK to E and then the upstream of E changes to the



forwarding status. C is forwarding the service packet, so E with the

upstream interface in the forwarding status receives the service packet ad

forwards it. In this way, the service packet is transmitted to the new

added local group.

Declar ing Process As shown in the figure, because of the spreading of the service packet, the

route/switch device E may receive the service packet forwarded by C and

D, which results in the information redundancy. Therefore, C and D need

one declaring process and then D receives the service packet forwarded by

C at its egress port, which causes the declaring process. Similarly, C sends

the declare packet and they compare to select one winner to forward

service packets.

Status Refreshing Process The PIM-DM protocol is one typical spreading and pruning protocol. After

the pruning timer times out, the packet is spread to the network. To

reduce the cost of the frequent spreading-pruning process, PIM-DM uses

the status refreshing mechanism to maintain the pruning status in the

network. The status refresh message (SRM) is generated by the route-

switch device directly connected to the source and is sent to all

downstream neighbors in the network. After the downstream neighbor

receives the status refresh packet, make the response according to the

contents of the packet (for example, if the status refresh packet shows

that A sends pruning, while C needs to forward the packets, so C sends

out adding message to A; if A is in the forwarding status, while C does not

have the egress port, C sends the pruning information to A), refresh the

pruning timer of the egress port with the downstream neighbor, modify

the status refresh packet according to its own information, and forward

the modified status refresh packet (such as E in the figure, the egress port

is in the pruning status; after receiving the status refresh packet, refresh

the pruning timer of the egress port, fill in its own information and send

the status refresh packet to I; after receiving the status refresh packet, I

finds that E is in the pruning status and its own ingress interface is also in

the pruning status, and I does not have other downstream neighbor, so do

not do anything).



Introduction to MSDP Protocol

Overview

MSDP application topology

In the PIM-SM mode, if one source begins to send multicast service flow,

the first hop DR connected to the source registers the source information

to RP. In this way, the RP in PIM-SM can always know the source

information of all multicast service flows in the domain. In actual

application, to meet the network management requirements, divide the

whole network to multiple PIM domains and each domain has its own RP,

which is used to manage the source information of all multicast service

flows in the domain. Usually, the RP in the domain cannot know the source

information of other PIM domains, so it cannot receive the multicast

service flow of other domains. However, to meet the use requirements,

the users belonging to different domains hope to receive the multicast

service flow of other domains. To provide all multicast service flows, one

domain must depend on the RPs of other domains, which is not hoped by

the carriers. MSDP appears to solve the problem.

Multicast Source Discovery Protocol (MSDP) makes each MSDP domain

have its own RP and send multicast service flow to other domains or

receive multicast service flow from other domains.

MSDP sets up the peer connection relation between domains. The defined

information exchanging makes RPs of the domains share the active source

information in the network. Meanwhile, the RP of each domain maintains

the receiver information of its own domain. Therefore, for the multicast

service flow with receiver, RP can directly initiate adding to the source and

does not depend on the RPs of other domains. After the service flow is

referenced to RP via the source tree (SPT), RP transmit the service flow to

the receivers in the domain via the sharing tree (RPT). In this way, the

multicast service flow can be transmitted in the domain without depending

on the RP of other domain.

The MSDP peer relation is set up between the RPs of the domains via the

TCP connection. When the RP of one domain learns the new active source

in the domain, it sends SA (source-Active) message to all peer ends that

set up the peer relation with it. The peer end of MSDP adopts the

improved RPF to check whether to accept the SA message sent from other

peer end. After receiving the SA message, forward it to other peer ends



until all MSDP routers in the network receive the SA message. If the RP

that receives the SA sets up the (*, G) item, RP sets up the (S,G) item

and adds it to the source via SPT, importing the service flow to the domain.

The left is processed by the PIM-SM protocol. Besides, MSDP router

periodically sends out the source information in its own domain via the SA

message, letting the MSDP peer ends of all other domains know that the

source is sending service flow.

Setup of MSDP peer After configuring the MSDP peer, confirm the connection status according

to the address used to set up connection with peer and the size of the

peer address. Set up the passive connection for the large address and set

up the active connection for small address. The passive connection side

must send the MSDP message to the active connection side. Without the

MSDP message, send the keepalive message to prevent the active side

from resetting the connection. After the connection is set up, form the

MSDP peer relation.

Sending of Source Active Message After MSDP gets the multicast source information from PIM, send the

Source Active message to the connected MSDP peer and notify the

multicast source information to the peer. After the peer MSDP receives the

multicast source information, notify the information to the PIM module, so

as to realize the cross-domain multicast-on-demand.

MSDP Application

Inter-domain MSDP PIM-SM can be regarded as one multicast IGP protocol, because it is

supposed to run in one single domain. How to cross the AS boundary to

distribute the multicast packets and maintain the autonomy of each AS at

the same time is the problem of PIM-SM. PMBR (PIM Multicast Border

Router) in the PIM-SM protocol is used to solve the problem. PMBR is

located at the edge of AS and sets up the branches for all RPs in the AS.

Each branch is expressed by (*，*，RP). The wildcard indicates all source

and group addresses mapped to the RP. When RP receives the traffic from

the source, forward the traffic to PMBR, and then PMBR forwards the

traffic to other domain. When the adjacency domain does not need the

traffic, send pruning to PMBR, and then PMBR sends the pruning to RP, as

follows:



PMBR solution

The key disadvantage of PMBR is the flooding and pruning actions.

Moreover, PMBR is designed to connect the PIM-SM domain to the DVMRP

domain. Therefore, PMBR is not the good method of solving the above

problem.

To solve the above problem, the following two problems need to be solved:

1. When the source is in one domain, but the group member is in the

another domain, RPF process must keep valid;

2. To keep autonomy, the domain cannot trust the RP in another domain;

PIM can use the BGP route to decide the RPF to other domain, but when

the unicast and multicast use different links, RPF check may fail. The static

multicast route can be used to prevent the RPF problem, but using the

static multicast route in a large range is not realistic. MBGP expanded from

BGP can solve the problem. In this way, problem can be solved via MBGP.

The reason of solving problem 2 is that AS (managed by different ISPs)

does not hope to depend on the uncontrollable RP (in other domain or

managed by other ISP). If each AS sets its own RP, there must be protocol

to make multiple RPs cross the AS boundary to share the information and

discover the known source information of other RP, as follows:



Inter-domain MSDP

MSDP shares the known source information of the RPs between different

AS via interaction. PIM-SM feels that the shared sources are in the same

domain. In this way, the receiver only depends on the RP in the local

domain, realizing the AS autonomy.

Int ra-domain MSDP To solve the PIM-SM problem between domains, we have to talk about the

problem in the same PIM-SM domain, that is, Anycast RP (Anycast means

that when one packet is sent to one single address, one of multiple

devices responds the address).

Placing RP in a large dispersed PIM-SM domain is a headache. PIM-SM

only permits the group-RP mapping, so the following problems may appear:

1. Traffic bottleneck;

2. Lack the expansible register de-encapsulation (when using the shared

tree)

3. When the activated RP fails, the recovery of the fault is slow;

4. The multicast packet may be forwarded with secondary priority;

5. Depend on the remote RP;

The hash algorithm and auto RP filtering of the PIMv2 BootStrap protocol

can relieve the above problems, but cannot provide the scheme of solving

the problems completely. Anycast RP is the method of permitting one



single group to be mapped to multiple RPs. The RPs can distributed in the

whole domain and use the same RP address. As a result, the virtual RP is

generated, while MSDP is the basis of generating the virtual RP.

As shown in Figure 3, four route/switch devices form the virtual RP;

release one single RP address 10.100.254.1; use MSDP to exchange the

information of the sources registered on each route/switch device. But all

route/switch devices run the auto RP and release the RPA address of

10.100.254.1. The source DR in the domain has the information of only

one RP address and is registered on the nearest physical RP. This causes

the separation of the PIM domain, but with MSDP mesh group, the Anycast

RPs in the domain can exchange the source information.



MPLS Technology

This chapter describes the principle and application of Multi-protocol Label

Switching (MPLS).

Main contents:

Terms

Introduction to MPLS

MPLS architecture

Introduction to the LDP Protocol

Introduction to BGP/MPLS VPN

MPLS VPN user accesses Internet

Introduction to CSC

Introduction to MPLS L2VPN

MPLS traffic engineering

MPLS OAM

Terms of MPLS Protocol MPLS -Multiprotocol Label Switching

Label -Label

FEC-Forwarding Equivalence Class

LSR-Label Switching Router

LDP-Label Distribution Protocol



Introduction to MPLS The MPLS integrates the latest development of the route/switch solution.

It combines the simplicity of L2 switching and flexibility of L3 route. It

provides the following features:

In the MPLS network, the packet forwarding is based on the fixed-

length label. It simplifies the forwarding mechanism and improves the

forwarding rate.

Frame relay, ATM, PPP, HDLC, SDH, and DWDM are supported, which

ensures the interconnection of multiple types of network.

It supports QoS, traffic engineering and large-scale VPN.

MPLS Architecture

Separation of Control and Forwarding The MPLS architecture is divided into two independent units: the control

unit and forwarding unit, as shown in the following figure:



Figure 25-1 MPLS Architecture

The control unit uses the standard routing protocol (such as OSPF and

BGP4) to exchange routing information and maintain routing tables. At the

same time, it uses the label control protocol (such as LDP, MP-BGP, and

RSVP) to exchange the label forwarding information with the

interconnected label switching devices to create and maintain the label

forwarding table.

The forwarding unit determines the forwarding of a packet, namely, search

the label forwarding table according to the information in the packet

header. Process and forward the label according to the search results.

Forwarding Equivalence Class A FEC is a collection of the packets using the same forwarding path in the

network (the destination addresses of the packets can be different). The

packets are processed in the same mode by the LSR in the process of

forwarding. From the view of forwarding processing, the packets are



equivalent. FEC is collection of a series of attributes (FEC elements),

including source address, destination address, source port, destination

port, protocol type, and CoS.

The entrance LSR of the MPLS domain determines one FEC for the IP

packet entering the MPLS domain. Then, it searches the corresponding

label value according to the FEC and encapsulates them into the IP

packets to form label packets. Then, transmit the packets in the MPLS

domain.

Label Encapsulation and Label Operation In the MPLS network domain, the forwarding of the label packets are

performed according to the label carried in the packet. The label is

inserted between the L3 packet and L2 header. It is called MPLS label

header. The format is shown as follows:

Figure 25-2 MPLS label

One MPLS packet can carry multiple label headers. The structure is called

a label stack. The labels are organized in the ―Last in, first out‖ mode. The

external label is called the stack top label and the internal label is called

the stack bottom (simple IP unicast route does not use label stack, but

other MPLS-based applications including MPLS-VPN rely on the label stack).

Each label is composed of the following fields:

Time-to-Live

The TTL of the field is 8 bits. It is used for coding of TTL. The function is

the same as that of the TTL field in the IP header. The filed is used to

prevent forwarding loopback caused by improper configuration, fault, or

slow convergence of routing algorithm, and to restrict the packet scope.

Stack bottom bit (S)

The field is 1 bit and the location is 1. It indicates that the corresponding

label is the last label (S) in the label stack. 0 indicates all other labels

except the bottom stack label.



Service class information (EXP, also named trial bit)

The field is 3 bits used to carry CoS information (the function is similar to

TOS data in the IP packet).

Label Value

The field is 20 bits, containing the actual value of the label. When a LSR

receives the label packet, it first checks the label value of the stack top.

Normally, the LSR knows the next-hop node through the label value and

uses new label to replace the current stack top label. The label values 0-15

are the reserved label values. The meaning is as follows:

Label Value

Description

0 Indicates that IPv4 shows the blank label. When the label is at the stack top, it indicates that the next step is to pop up the label and forward the packets according to the new stack top label. If the label is the only label in the label stack, namely, after the popup, the label stack is empty, the forwarding of the packets are based on the IPv4 packet header.

1 Indicates the alert label of the router. When the stack top label of the receive packet is 1, the packet is sent to the local software. The forwarding of the packet is determined by the next item in the label stack.

2 Indicates that IPv6 shows the blank label. The usage is similar to label value 0.

3 It indicates the hidden blank label. The LDP uses it to require upstream neighbor to pop up labels (penultimate relay segment popup). The label value does not occur in

the label encapsulation.

4－15 Reserved

MPLS Network Structure and Forwarding Process In the forwarding of traditional IP packets, in each hop, the router

independently analyzes the destination IP address and runs the routing

algorithm of the network. On the basis, the independent forwarding policy

is made to determine the next hop of the packets. In MPLS, packets

entering the network are divided into different FECs. Then, search the

corresponding label value according to the FEC and encapsulate the values

into the packets. The routers in the network determine the packet

forwarding according to the labels carried in the packets. In the entire

MPLS domain, the forwarding of the packets is performed according to the

label. You need not perform any operation over the IP headers. The join

and forwarding process of the label is as follows:



Figure 25-3 Label forwarding

The basic unit of the MPLS network is the label switching router (LSR). The

switches or routers that can distribute labels and forward packets

according to the label belong to LSR. According to the functions provided

by the LSR, it can be divided into LSE (LER) and core LSR.

The LSR possessing non-MPLS neighbor is considered to be the boarder

LSR. The boarder LSR performs the tag insertion or rejection operation in

the MPLS boarder. In the entrance point of the MPLS domain, insert the

tag; in the egress point of MPLS domain. Before forwarding packets to

neighbors out of the MPLS domain, reject the packet tag.

In the preceding MPLS network structure, R1-R7 form an MPLS domain, in

which, R1, R2, R3, and R7 are the boarder LSRs, R4, R5, and R6 are core

LSRs. In the MPLS domain, each LSR maintains a label forwarding table.

The core LSR searches the label forwarding table according to the labels

carried in the packets to determine the forwarding path. No operation is

required for the IP header.

Penultimate Hop Popping Mechanism For the packet that is received from the MPLS neighbor and whose

destination is a subnet outside of the MPLS domain, the MPLS egress

boarder LSR must search it twice, as shown in R7 of figure 25-3. The LSR

must check the label in the label stack top to search labels. When the

packet is forwarded to outside of MPLS domain, the label should be pop up.

Then, search and forward L3 according to the IP header. Dual-search

operation of the R7 router may decrease the performance of the node. In

addition, in the environment implementing MPLS and IP switching, dual-



search will increase the complexity of the hardware implementation. To

solve the problem, in the MPLS architecture, penultimate hop popping

mechanism is adopted.

With the penultimate hop popping mechanism, the boarder LSR can

require the upstream neighbor to pop up the label (through the signaling

protocol such as LDP to send hidden label tag value 3 to the upstream

neighbor). In figure 25-3, R6 router pops up the labels in the packets,

then, send the pure IP packets to the R7 router. At last, R7 router

performs simple L3 searching operation and sends packets to the

destination.

Introduction to the LDP Protocol LDP, as a signaling protocol in the MPLS architecture, binds the labels for

the unicast routes in the routing table and advertises the generated MPLS

label forwarding table. The relation between the LDP and the label

forwarding table is similar to the relation between routing protocol and

core routing table.

Basic Concepts of LDP

LDP Peer The two LSRs using LDP to exchange FEC/label mapping information are

called LDP peers.

Label Space The concept of label space is related with the assignation and distribution

of the label. It defines the scope of using labels and defines whether the

labels in different interfaces can be repeated. The label space includes two

types:

Label space in the scope of each interface

The interface using the interface resources as the label generally uses the

label space. If the LDP peer is connected through specific interface, and

the label is transmitted through specific interface data, the label space

based on the scope of each interface can be used. In this case, the label is

unique in each interface.



Label space in the scope of each platform

When the interfaces share label resources, the label space based on each

platform scope is used. In this case, the label is unique in a platform (a

LSR).

LDP Ident i f ier The length of LDP identifier is 6 bytes. It is used to mark the label space

scope of specific LSR. The first four bytes indicate the IP address assigned

to the specific LSR. The rest two bytes indicate the specific label space in

the LSR. For the label space in the platform scope, the last two bytes of

LDP identifier are always 0. The format of LDP identifier is as follows:

<IP address>: <Label space SN> such as 128.255.1.2:0,

129.13.17.35:2

If there are two physical links between two LSRs, the two physical links

are ATM links using the label space of each interface scope. In this case,

multiple label spaces should be advertised between LSRs, and multiple

LDP identifiers should be used.

LDP Session The LDP session is used to exchange label information between LSRs. If

multiple label spaces are advertised to another LSR from one LSR, for

different label spaces, different LDP sessions must be created between

LSRs.

LDP Transmission The LDP uses TCP to ensure reliable transmission of the LDP session. If

multiple LDP session is required for two LSRs, different LDP sessions will

correspond to different TCP connection.

LDP Working Process The LDP working process includes LDP discovery, session creation and

maintenance, label distribution and management.

LDP Discovery LDP discovers and creates Adjacency through the discovery mechanism.

LDP supports basic discovery and expanded discovery.



The basic discovery mechanism is implemented by periodically sending

link hello message (the UDP multicast packets with the port of 646, the

multicast address is all routers in the subnet: 224.0.0.2) in the startup

LDP interface.

The expanded discovery mechanism discovers the non-directly-connected

LDP neighbor by periodically sending destination hello message (UDP

unicast packets with the port of 646) to a specific IP address.

If the LSR receives LDP hello message, it indicates that the potential

reachable LDP peer exists. The label space used by the LDP peer can be

obtained.

LDP Session Creat ion and Maintenance Exchanging LDP discovery hello message between two LSRs (LSR1 and

LSR2) will start the creation of LDP session. The process of creating LDP

session includes two steps: creating transmission connection (TCP

connection) and session initialization.

Assume that the label space of LSR1 is LSR1: a, the label space of LSR2 is

LSR2: b. The following describes the process of creating LDP session of

LSR1.

Process of creating transmission connection

After discovering the hello message through exchanging LDP, the two

parties will create an adjacency. Then, determine the initiative part

according to the transmission addresses of the two parties. If the

transmission address of LSR1 is larger than the transmission address of

LSR2, LSR1 serves as the initiative party of creating connection, namely, it

initiatively launches TCP connection (port: 646) to LSR2, and LSR2 serves

as the passive party of the connection to wait for the creation of

connection.

The determination mode of transmission address:

If LSR1 uses ―TLV is optional for transmission address‖ in the hello

message sent to LSR2 to advertise its address, the transmission address

of LSR1 is the address advertised in the TLV.



If LSR1 did not use the optional TLV of transmission address, the

transmission address of LSR1 is the source address of sending hello

message to LSR2.

LDP session initialization

After LSR1 and LSR2 create the transmission layer connection, they

exchange the LDP initialization messages and negotiate the LDP session

parameters. The parameters that should be negotiated include LDP

protocol version, label distribution mode, and the session holding timer

value. When the parameters are negotiated successfully, a session based

on LSR1: a and LSR2: B are created between LSR1 and LSR2. The

following describes the initialization process of a session through the state

machine mode.

Table of initialization state conversion

Status Event New status

NON EXISTENT Create a TCP connection INITIALIZED

INITIALIZED

Send initialization message (initiative party) OPENSENT

Receive acceptable initialization message (passive party). Action: send initialization message and holding message.

OPENREC

Receive any other messages. Action: send error notification messages; close

the TCP connection.

NON EXISTENT

OPENREC

Receive the holding message OPERATIONAL

Receive any other LDP messages. Action: send error notification messages; close the TCP connection.

NON EXISTENT

OPENSENT

Receive acceptable initialization message. Action: send holding message.

OPENREC

Receive any other LDP messages. Action: send error notification messages; close the TCP connection.

NON EXISTENT

OPERATIONAL

Receive the closing message Action: send the closing message; close the TCP connection.

NON EXISTENT

Receive other messages OPERATIONAL

Time out.

Action: send the closing message; close the TCP connection.

NON EXISTENT

Initialization state conversion



Figure 25-4 Initialization state conversion of LDP protocol

To maintain the completeness of LDP session, LDP maintains a session

holding timer for each session. When the LSR receives a LDP PDU from a

specific link, the session holding timer will be restarted. If the LSR did not

receive LDP PDU from the peer until the session holding timer times out,

the LSR will think that LDP session connection is wrong. Close the TCP

connection and end the LDP session.

After the LDP session is created, LSR must send the LDP protocol message

within the session holding time. If no message will be sent, the session

holding message will be sent.

If LSR wants end the LDP session, the LSR will send the closing notification

message to LDP peers.



LDP Label Distr ibut ion and Management The label distribution and management are determined by three label

parameters (distribution mode, control mode, and holding mode).

Label distribution mode

The label distribution modes used in the MPLS include: downstream

unsolicited, and downstream on demand.

For a specific FEC, the LSR does not need to obtain label request message

from the upstream to perform label distribution and assignment. This

mode is called Downstream Unsolicited.

For a specific FEC, the LSR needs to obtain label request message to

perform label distribution and assignment. This mode is called

Downstream on Demand.

Label control mode

There are two types of label control modes in MPLS: Independent Control

and Ordered Control.

When the independent control mode is used, each LSR can advertise label

mapping to the connected LSR at any time.

When the ordered control mode is used, if the LSR receives specific FEC-

label mapping message of specific FEC next-hop or if LSR is the egress

node of LSP, LSR can send label mapping message to the upstream.

Label retention mode

There are two types of label retention modes in the MPLS: Liberal

retention mode and conservative retention mode.

For s specific FEC, LSR Ru receives label mapping from the LSR Rd. When

Rd is not the next-hop of Ru, if Ru saves the binding, the mode used by

Ru is liberal retention mode; if Ru discards the binding, the mode used by

Ru is conservative retention mode.

Generally, the default mode is: downstream unsolicited, independent

control and liberal retention.

LDP Graceful Restart For the LDP to support graceful restart, you should add the support for

Fault Tolerant Session TLV as optional parameter in the initialization

message. It sends the initialization message carrying FT session TLV

parameter to peers to indicate that it can retain the MPLS LSP information



and FEC information in the case of restarting LDP. At the same time, the

neighbor router should support the graceful restart capability and retain

the MPLS LSP information created with the restart router.

LDP advertises two times in the FT Session TLV to the peers: FT Reconnect

Timeout and Recovery Time. FT Reconnect Timeout is the time of

reconnecting after restart. Recovery time is the time of LDP recovery after

restart.

In the case of restarting, LDP starts the restart process for each neighbor.

Before the reconnect timer times out, reconstruct the connection with

neighbors. Then, wait for the neighbor to send the label mapping message

retained in the restart. Update the forwarding information retained locally

in the restart according to the information. At last, send new label

mapping information to all neighbors. When the restart is over, delete all

forwarding information that is not updated. For the neighbor routers, when

new connection is created, the MPLS LSP created with the restart router is

marked as ―to be aged‖, and send label mapping information to the restart

router. At last, update the MPLS LSP marked as ―to be aged‖ according to

the label mapping information received from restart router. When the

restart is complete, delete all to-be-aged MPLS LSP that is not updated.

LDP Message Type and Format

LDP Message Type Message Type Description

Notification Error notification message

Hello Hello message

Initialization Initialization message

KeepAlive Keepalive message

Address Address message

Address Withdraw Address withdraw message

Label Mapping Label mapping message

Label Request Label request message

Label Abort Request Label abort request message

Label Withdraw Label withdraw message

Label Release Label release message

LDP Message Format All LDP messages use type-length-value (TLV) coding architecture. LDP

PDU is composed of LDP header and one or multiple LDP messages.

LDP header format



Version: two-byte unsigned integer; indicates the version number of LDP

protocol. The current LDP protocol version is 1.

PDU Length: two-byte integer; indicates the PDU length, excluding version

number and PDU length field.

LDP identifier: 6 bytes, identifies the label space sending LSR of PDU. The

first four bytes indicate the router ID (IP address) of LSR and the last two

bytes indicate the label space of LSR.

Universal format of LDP message code

All LDP messages use the following coding format:

U bit: unknown message bit. If the LSR receives unknown message with U

bit of 0, the LSR will return a notification message to the message source;

if the LSR receives unknown message with U bit of 1, the LSR will discard

the unknown message.

Message type: indicates the type of the message



Message length: indicates the length of message in bytes. The length

includes message ID, mandatory parameters, and optional parameters.

Message ID: 32-bit value, used to indicates the message.

Mandatory parameters: a collection of message parameters with variable

length.

Optional parameters: a collection of optional parameters with variable

length.

Code format of the notification message:

State TLV: indicates the event type of the notification message. For details,

see universal TLV coding mode of LDP-state TLV.

Optional Parameters: variable length field, including 0 or multiple

parameters, the coding mode of each parameter is TLV. The notification

message contains the following optional parameters.

Optional Parameter

Type Length Content

Expansion state 0x0301 4 Indicates the expansion message of certain notification message state code

Returned PDU 0x0302 Variable length

LSR uses the parameter to return LDP PDU header to source LSR.

Returned message

0x0303 Variable length

LSR uses the parameter to return LDP message type and length to source LSR.

Code format of the hello message:



TLV code of universal hello parameter:

T: indication of target hello message, 1: target hello message; 0: link hello

message.

R: Request to send target hello message, 1: request the receiving LSR to

periodically send target hello message to the sending LSR.

Optional Parameters: variable length field, including 0 or multiple

parameters, the coding mode of each parameter is TLV. The hello message

contains the following optional parameters.

Optional Parameter

Type Length Content

Transmission address

0x0401 4 Indicates sending the address used by LSR in the case of creating TCP connection.

Configuration SN 0x0402 4 Indicates the configuration state of the sending LSR.

Code format of initialization message



TLV code of universal session parameter:

The current protocol version is 1.

Session holding time: with the unit of seconds, indicates the wanted value

of session holding timer of the sending LSR.

A: Label distribution mode

0 indicates the downstream unsolicited mode; 1: downstream on demand.

D: loopback detection. 0: loopback detection is not allowed; 1: loopback

detection is allowed.

PVLim: Path vector limit

Code format of session holding message



Code format of the address message:

Address list TLV: the sending LSR notifies the interface address through

the address list TLV For the coding format, see the universal TLV coding

mode of LDP-address list TLV.

Code format of the address withdraw message:

Address list TLV: the sending LSR withdraws the interface address through

the address list TLV For the coding format, see the universal TLV coding

mode of LDP-address list TLV.

Coding format of label mapping message



FEC TLV: indicates the FEC unit part of FEC/label mapping, for the code

format, see the universal TLV coding mode of LDP-FEC TLV.

Label TLV: indicates the unit part of FEC/label mapping, for the code

format, see the universal TLV coding mode of LDP-label TLV.

Coding format of label request message

FEC TLV: indicates the FEC unit corresponding to the label request.

Coding format of label request withdraw message



FEC TLV: indicates the FEC unit corresponding to the label request

withdraw

Label request message identifier TLV: the label request message indicated

by the label request withdraw

Coding format of label withdraw message

FEC TLV: indicates the FEC unit corresponding to the label withdraw

message, for the code format, see the universal TLV coding mode of LDP-

FEC TLV.

Label TLV: indicates the withdrawn label, for the code format, see the

universal TLV coding mode of LDP-label TLV.

Coding format of label release message



FEC TLV: indicates the FEC unit corresponding to the label release

message, for the code format; see the universal TLV coding mode of LDP-

FEC TLV.

Label TLV: indicates the released label, for the code format, see the

universal TLV coding mode of LDP-label TLV.

Note

0x3E00~0x3EFF TLV is for reserving proprietary TLV of the manufacturers.

Universal TLV Coding Mode of LDP

U bit: unknown TLV bit field.

F bit: forwarding unknown TLV bit field.

Type: Message type

Length: length of message, with the unit of byte

Value: Message content

FEC TLV code format



FEC units cover many types. The coding also depends on the unit type.

The content of FEC unit field is as follows:

The field with the length of 1 byte, indicates the unit type of FEC.

The field with variable length, indicates the FEC unit value dependent on

the type.

Code of the FEC unit is as follows:

FEC unit type Type field Field of unit value

Wildcard 0x01 Does not exist

Prefix 0x02 See the context

Host Address 0x03 See the context

Code format of the address prefix

Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.

Prefix length field: one byte, indicates the length of prefix length in bytes.

Address prefix: the coding is based on the address family.

The coding format of host address:



Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.

Host address length: one byte, indicates the length of host address in

bytes.

Host address: the coding is based on the address family.

TLV code format

Labels cover three types: universal label, ATM label, and frame relay label.

The following describes the code of universal label.

Label: label value, the length is 20 bits, placed in four-byte field.

Address list TLV code format

State TLV code format



State code: 32-bit unsigned integer, indicates the event type The code

format is as follows:

Error bit: critical error bit, 1: critical error notification; 0: advisory

notification.

F bit: forwarding bit, 1: the notification message should be forwarded to

the next-hop or previous-hop LSR of LSP related with the message; 0: not

forward

State data: 30-bit unsigned integer, indicates the state information.

Message type, 0: the state TLV is not related with specific message; 1:

message type of the state TLV.

BGP/MPLS VPN BGP/MPLS VPN is a mechanism permitting SP to use the IP backbone

network to provide L3 VPN service for users. In the mechanism, BGP is

used to publish the VPN routing information in the backbone network of

the SP. MPLS is used to forward VPN service from one VPN station to

another.

Concepts and Terms of BGP/MPLS VPN Term Description

P-Network Provider network, the backbone network of the service provider.

PE router Provider Edge Router

P router Provider Router

CE router Customer Edge Router



ASBR AutoSystem border router

Site The networks connected with CE form a site

VRF VPN Routing Forwarding Instance, supported in PE, each VRF has an independent route forwarding table.

VPN An abstract concept, it can be considered to be a group of sites sharing routing information. A VPN can include multiple VRFs (the VPN contains routes of multiple VRFs); one VRF can belong to multiple VPNs (multiple VPNs contain the routes in VRF).

RD Route Distinguisher, is a 64-bit number. Address overlapping is allowed in different VRFs. When the BGP advertises VRF routes, different RD must be added to the IP address to ensure that the address is unique.

RT Route-Target， used for the BGP to advertise VPN routes; it controls the

destination VRF of importing the received VPN routes.

BGP/MPLS VPN Network Structure The following illustrates the BGP/MPLS VPN network structure.

Figure 25-5 BGP/MPLS VPN network structure

In the preceding figure, each PE contains two VRFs, and connects two sites.

The two interfaces connecting sites belong to two different VRFs. Site1 and

site2 belong to one VPN; site3 and site4 belong to another VPN.

Process of Route Advert isement and Label Mapping Advert isement In the P-Network, each device runs a certain IGP protocol (such as

OSPF) to mutually advertise routes, including loopback interface.

In the P-Network, each device starts the MPLS, and mutually

advertises label mapping through signaling protocol (such as LDP). For

PE1, in the routing table, there is a route to PE2 LOOPBACK interface;

the corresponding output MPLS label is L1.



OSPF and RIP IGP protocols run between PE and CE. At the same time,

BGP protocol can also run. The routing information can be exchanged

through static route. For PE2, the route 10.2.1.0/24 learned from site2

will be saved in the routing table of VRF1.

Run MP-BGP between PEs to mutually advertise VPN routes (including

label mapping information). PE1 receive3s VPN route 10.2.1.0/24

advertised from PE2. The output label is L2; the next-hop is the

loopback interface of PE2. According to RT, PE1 imports the route into

VRF1 routing table, and advertises the route through IGP or BGP to

CE1.

Forwarding Process of Packets For the CE1 to access 10.2.1.5, the process is as follows:

CE1 sends IP packets to PE1.

PE1 (ingress PE device) receives the VPN packets, checks the relevant

VRF routes; searches MPLS forwarding table according to the routes to

obtain relevant output label value L2. The next-hop of the route is the

loopback interface of PE2, obtain the corresponding label value L1

from the MPLS forwarding table. The two labels are integrated into

MPLS label stack, added to the front of the received VPN packets, and

forwarded to P device.

After device P receives the MPLS packets, forward the packets

according t o the stack top label: pop up stack top label L1

(penulitimate hop popping), and forward to PE2.

PE2 device (egress port PE) receives MPLS packets (the packets have

only one layer of label L2), according to the stack top label L2,

determine the VRF, pop up the label, check the relevant VRF route,

and forward packets to CE2 according to the routes.

BGP/MPLS VPN Cross-Domain

VRF-to-VRF The following figure illustrates the VRF-to-VRF cross-domain mode.



Figure 25-6 VRF-to-VRF

Through this mode, in the domain, the BGP/MPLS VPN network is

configured. For the cross-AS VPN, the ASBR should serve as the PE device

of the VPN. In the ASBR device, configure the VRF corresponding to the

VPN, and assign an interface/sub-interface for the VRF. The

interfaces/sub-interfaces between two ASBRs are mutually connected. For

the local AS domain VPN, the ASBR serves as PE; import all routes of the

VPN. For the peer AS domain VPN, the ASBR serves as CE roles; you can

learn the routes of peer VPN by running various routing protocols between

ASBRs. Then, distribute the routes to all PE devices of the local VPN. When

the packets are forwarded, in the domain, two-layer label forwards are

used. After reaching ASBR, serve as ordinary IP packets to send to the

peer ASBR.

The advantage of the mode is MPLS is not required between ASBRs. The

disadvantage is that all VPN routes should be maintained in the ASBR. In

addition, one interface/sub-interface should be assigned for each cross-

domain VPN. Therefore, the problem of expansibility exists.

MP-EBGP Carrying VPNV4 Route For this mode, the MP-EBGP should run between ASBRs. When the ASBR

learns the VPNV4 routes advertised by the local AS PE, replace the label

and advertise the routing information and new label to the peer ASBR. In

the case of forwarding packets, the two-layer label forwarding is adopted

in the domain. The one-layer label forwarding is adopted between ASBRs.

According to the implementation details, the inner layer label should be

replaced in ASBR.

In this mode, the ASBR is not required to assign VRF for each VPN; the

VPNV4 route should not be imported, and interface/sub-interface should

not be assigned for each VPN. But the ASBR should maintain all VPNV4

routes and assign labels fro routes. Install the ILM entry locally. Therefore,

the pressure of ASBR is heavy.



The implementation method of the mode is as follows:

Figure 25-7 MP-EBGP carrying VPNV4 route

In figure 25-7, CE1 and CE2 belong to the same VPN; PE1 and ASBR1

belong to AS1; PE2 and ASBR2 belong to AS2.

1. Process of Route Advertisement and Label Mapping Advertisement

In the P-Network of the same AS, each device runs a certain IGP

protocol (such as OSPF) to mutually advertise routes, including

loopback interface.

In the P-Network of the same AS, each device starts the MPLS,

and mutually advertises label mapping through signaling protocol

(such as LDP). For PE1, in the routing table, there is a route to the

ASBR1 LOOPBACK interface, the corresponding output MPLS label

is L1. For ASBR2, in the routing table, there is a route to PE2

LOOPBACK interface, and the corresponding output MPLS label is

L2.

OSPF and RIP IGP protocols run between PE and CE. At the same

time, BGP protocol can also run. The routing information can be

exchanged through static route. For PE2, the route 10.2.1.0/24

learned from site2 will be saved in the routing table of VRF1.

Run MP-BGP between PE2 and ASBR2 to mutually advertise VPN

routes (including label mapping information). ASBR2 receives VPN

route 10.2.1.0/24 advertised by PE2. The output label is L3 and

the next-hop is the loopback interface of PE2.

Run MP-EBGP between ASBR1 and ASBR2 to mutually advertise

VPN routes. After ASBR2 receives the VPN routes advertised from

PE2, apply a label L4 for the route. Then, advertise the VPN route

10.2.1.0/24 containing L4 to ASBR1.

Run MP-IBGP between PE1 and ASBR1. ASBR1 receives VPN route

10.2.1.0/24 advertised from ASBR2; the output label is L4; ASBR1



apply a label L5 for the route, then, advertise VPN route

10.2.1.0/24 containing L5 to PE1.

PE1 receives VPN route 10.2.1.0/24 advertised from ASBR1. The

output label is L5; the next-hop is the loopback interface of ASBR1.

According to RT, PE1 imports the route into VRF1 routing table,

and advertises the route through IGP or BGP to CE1.

2. Forwarding Process of Packets

For the CE1 to access 10.2.1.5, the process is as follows:

CE1 sends IP packets to PE1.

PE1 (ingress PE device) receives the VPN packets, checks the

relevant VRF routes; searches MPLS forwarding table according to

the routes to obtain relevant output label value L5. The next-hop

of the route is the loopback interface of ASBR1, obtain the

corresponding label value L1 from the MPLS forwarding table. The

two labels are integrated into MPLS label stack, added to the front

of the received VPN packets, and forwarded to P1.

After device P1 receives the MPLS packets, forward the packets


(penulitimate hop popping), and forward to ASBR1.

ASBR1 receives MPLS packets (the packets have only one layer

label L5); change L5 into L4, and then forward MPLS packets to

ASBR2.

ASBR2 receives MPLS packets (in this case, the packets have only

one layer of label L4); change L4 into L3, and then search routing

table. The next-hop of the route is the loopback interface of PE2,

label value L2 is obtained from the MPLS forwarding table. Press

L2 into MPLS label stack top of the packets and then forward MPLS

packets to P2.

After device P2 receives the MPLS packets, forward the packets


(penulitimate hop popping), and forward to PE2.

PE2 device (egress port PE) receives MPLS packets (the packets

have only one layer of label L3), according to the stack top label

L3, determine the VRF, pop up the label, check the relevant VRF

route, and forward packets to CE2 according to the routes.

MPLS VPN User Accesses Internet In most network deployments, some or all sites of MPLS VPN should

access Internet. The access service is one of the most important services



for the enterprise customers provided by the MPLS VPN service provider.

Taking address overlapping, access control, and security into consideration,

we provide two common Internet access solutions.

Enterpr ise Int ranet Access

Figure 25-8 Access Internet through intranet firewall

Many customers do not want the VPN users to access internet directly.

They want to use the firewall to control the access to internet of VPN users.

It is the typical requirement of enterprise access to the Internet. In the

VPN environment, specific sites are provided for accessing internet. Each

VPN site sends the internet data flow to one or multiple central sites.

In the network as shown in figure 25-8, all sites in the VPN access internet

through a central site. The central site provides service between the VPN

members and internet through firewall and NAT service. VPN members

forward the data flow accessing internet to the central site by importing a

default route (the next hop is the central site CE) accessing internet. The

central site forwards the data flow accessing internet to the enterprise

firewall. In the firewall, control the access and perform NAT processing

according to the security policy of the enterprise. At last, the firewall

forwards the data flow to the internet.

For the access mode, you only need to deploy the configuration in the

enterprise VPN. Participation of the carrier is not required. The enterprise

can control the security policy of accessing internet of the intranet users.

But this mode requires that the enterprise should have strong security

management capability. For the carrier, universal management for the

access to internet cannot be performed.



Based on the implementation principle of this mode, two other access

modes are expanded.

1. Universal access mode of the service provider

The universal access control mode is to put the access control of internet

and the universal access point at the egress port of carrier. To solve the

problem of address overlapping between different VPNs, the carrier needs

to provide a VPN and firewall for each VPN to access internet. Each VPN

user accesses internet through respective firewall. In this mode, a default

route directing to internet gateway should be configured at the VPN site of

the internet egress port. Advertise the default route through BGP to other

sites of the VPN. Other sites of the VPN forward the data flow accessing

internet to the VPN sites at the PE side of the internet access point of the

carrier through the regular L3VPN mode in the default route. After

processed by the firewall, the data flow is forwarded to internet.

Certain configuration should be performed for the carrier in the access

mode. Each VPN requires a firewall. If the number of VPN users is large,

great investment is required. Through this mode, carrier can manage

internet access of VPN users effectively.

2. Multiple-level access mode

To solve the problems of enterprise firewall access and service provider

access control, the multiple-level access control mode is adopted. In each

enterprise VPN, adopt the enterprise firewall access mode. Control the

centralized access to the internet in the central site in the enterprise VPN.

Then, control the access again in the carrier network, that is, multiple-

level access control. The advantage of this mode is that it meets the

requirements of controlling the access. At the same time, the carrier can

also perform universal control for the internet access of VPN users. In

addition, not too many investments are required. The configuration of this

mode is complex and the efficiency is lower than the previous two modes.



Service Provider Network Access

Figure 25-9 Static default route access

Some customers want to access internet through VPN, but the customers

do not require receiving complete routing information. The access

requirement can be met through the static default route access.

The following describes the access mode taking VPNA users accessing

internet as an example. In VPNA, there are two users: CE1 and CE4. First,

we need to respectively configure a cross-VRF default static route

accessing internet in the VRF routing table of PE1 and PE2. The next hop

of the cross-VRF default static route in PE1 is the Internet gateway; in PE2,

the next hop of cross-VRF default static route is PE1. To ensure the path

for returning from internet, you should configure a cross-VRF route

reaching CE1 in the global routing table of PE1. The next hop is the CE1 of

VRF. In the global routing table of PE2, you should configure a cross-VRF

route reaching CE4. The next hop is the CE4 in VRF. Advertise the route

through IGP to the MPLS network.

We describe the process of forwarding packets taking CE4 accessing

internet as an example:

After PE2 receives the packets accessing internet forwarded by CE4,

press it into the label stack of the default route, namely, the label

assigned to global next-hop of the default route. Then, forward the

MPLS packets to PE1.

PE1 receives the packets accessing internet sent by PE2. In the global

routing table, search the internet route and forward the packets to the

internet gateway.



When the packets returned from the internet reach PE1, PE1 searches

the global routing table. There is a route reaching CE4 advertised by

PE2 through IGP. Then, forward the packets through the label of the

route to PE2. PE2 found a static route reaching CE4 in the global

routing table and forward the packets to CE4.

Introduction to CSC

CSC Concept With the promotion and spread of BGP/MPLS VPN, more and more end

users implement network interconnection through MPLS VPN. Many small-

to-medium carriers, in order to save the cost of independently

constructing or leasing L2 transmission link, begin to lease VPN lines from

large MPLS carrier to implement POP connection. This is called Carrier’s

Carrier (CSC).

CSC Network Structure

Figure 25-10 CSC network structure

Basic structure of CSC is not significantly different from that of MPLS VPN

network. Carrier network usually refers to the large-scale network

providing VPN access service based on the label exchange for small-to-

medium carriers and end users. See backbone carrier network in figure

25-10. Carrier’s Carrier network is based on the carrier network. It

provides internet access or VPN access for end users or end users. See

User Carrier POP1 and User Carrier POP2 in figure 25-10. Theoretically,

the number of layers is not restricted. Therefore, it is expansible.



1. Process of CSC Route Advertisement and Label Mapping

Advertisement

The LGP and LDP advertisement process in the backbone carrier

network is skipped. We think that CSC-PE1 has already learned

one FTN: 2.2.2.2/32-to-L1. At the same time, one ILM (L1-to-

NULL) is installed in P1.

USER-PE2 advertises route 1.1.1.1/32 to USER-P and advertises

one blank label for the route.

USER-P advertises route 1.1.1.1/32 to CSC-CE2 through IGP

protocol. The next hop is specified to be USER-P. Assign one label

L2 for route 1.1.1.1/32 through LDP and advertise it to CSC-CE2.

Then, install an ILM entry of L2-to-NULL locally.

CSC-CE2 advertises route 1.1.1.1/32 to CSC-PE2 through IGP

protocol. The next hop is specified to be CSC-CE. Apply one label

L3 for the route through LDP on CSC-CE2 and advertise it to CSC-

PE2. Then, install an ILM entry of L3-to-L2 locally.

MP-IBGP in the CSC-PE2 applies a new label L4 fro route

1.1.1.1/32 in the VRF. Then, encapsulate the route and label into

VPNv4 packets and advertise to CSC-PE1. Specify the next-hop to

CSC-PE2(2.2.2.2), and install the ILM entry of L4-to-L3 locally.

After CSC-PE1 learns route 1.1.1.1/32, advertise it to CSC-CE1

through IGP protocol. Next-hop is specified to be CSC-PE1. In this

case, LDP in CSC-PE1 applies a label L5 for the BGP route and

advertises it to CSC-CE1. In addition, install an ILM entry of L5-L4

locally.

CSC-CE1 advertises route 1.1.1.1/32 to USER-PE1 through IGP

protocol. The next hop is specified to be CSC-CE1. Apply one label

L6 for the route through LDP on CSC-CE1 and advertise it to

USER-PE1. Then, install an ILM entry of L6-to-L5 locally.

USER-PE1 installs FTN of 1.1.1.1/32-to-L6 in the local label

forwarding table.

As a result, USER-PE1 and USER-PE2 can access mutually (assume the

label advertisement is complete in the negative direction.)

USER-CE2 advertises route 10.2.1.0/24 to USER-PE2.

After USER-PE1 and USER-PE2 creates MP-IBGP connection,

USER-PE2 applies a label L7 for route 10.2.1.0/24. Encapsulate

the route and label into VPNv4 packets and advertise to CSC-CE3.

The next-hop is specified to be USER-PE2 (1.1.1.1). Install a label

entry L7-to-NULL locally.

After USER-PE1 receives the VPNv4 route, install a FTN of

10.2.1.0/24-to-L7 in the local label forwarding table. Then,



advertise route 10.2.1.0/24 to USER-CE1 through IGP. The next

hop is specified to be itself.

2. Forwarding Process of CSC Packets

When USER-CE1 needs to access USER-CE2 (10.2.1.1), according

to the local routing table, find that the next hop reaching

10.2.1.0/24 is USER-PE1. Therefore, send the IP packets to USER-

PE1.

After USER-PE1 receives the IP packets, press the packets with

inner label L7 according to FTN: 10.2.1.0/24-to-L7. Obtain inner

label L6 according to next-hop 1.1.1.1 and FTN: 1.1.1.1/32-to-L6.

As a result, USER-PE1 sends ―L6 L7 10.2.1.1‖ MPLS packets to

CSC-CE1.

When the packets reach CSC-CE1, replace the external label

according to external label L6 and entry L6-to-L5 in the local label

forwarding table. Therefore, USER-CE1 sends ―L5 L7 10.2.1.1…‖

MPLS packets to CSC-PE1.

In CSC-PE1, replace the external label with L4 according to

external L5 and label forwarding entry L5-to-L4. Press one layer of

L1 according to next hop 2.2.2.2 of the entry and FTN:

2.2.2.2/32-to-L1. Therefore, CSC-PE1 sends one ―L1 L4 L7

10.2.1.1…‖ MPLS packet to P1.

After P1 receives the packet, the external label L1 pops up. Then,

send packets ―L4 L7 10.2.1.1…‖ to CSC-PE2.

In CSC-PE2, replace the external label with L3 according to

external label L4 and label forwarding table entry L4-to-L3.

Therefore, CSC-PE2 sends a ―L3 L7 10.2.1.1…‖ MPLS packet to

CSC-CE2.

In CSC-CE2, replace the external label with L2 according to

external label L3 and label forwarding table entry L3-to-L2.

Therefore, CSC-CE2 sends a ―L2 L7 10.2.1.1…‖ MPLS packet to

USER-P.

After the packets reach USER-P, the external label pops up

according to label entry L2-to-NULL. Then, forward packet ―L7

10.2.1.1…‖ to USER-PE2.

When USER-PE2 receives the packets, label L7 pops up according

to label table entry L7-to-NULL. Then, send IP packets to USER-

CE2.

As a result, the IP packets of USER-CE1 reach USRE-CE2. The process of

forwarding packets is complete.



MPLS L2VPN

Terms VPLS: Virtual Private LAN Service, expands Ethernet LAN to IP/MPLS

network. It provides users with virtual cross-WAN transparent LAN service.

VPWS: Virtual Private Wire Service, a point-to-point virtual private line

technology.

H-VPLS: Hierarchical VPLS, a technology enhancing VPLS expansibility.

PW: Pseudo wire, an indication of packet leased line, or virtual circuit

between two nodes.

AC: Attachment Circuit, the connection circuit between CE device and PE

device. It is a physical circuit.

VC: Virtual Circuit, a logical link between devices.

SVC: Spoke VC

uPE: User-facing PE

nPE: Network-facing PE.

Q-in-Q: an Ethernet encapsulation technology, it allows the frame with

802.1Q VLAN tag to be added with 802.1Q VLAN; it is also called VLAN

stack.

VSI: Virtual Switch Instance. Multiple VPLS forwarders connected through

PW form a VSI.

Basic Concepts MPLS L2VPN provides L2 VPN service in the MPLS network. With the

MPLSL2VPN technology, carriers can provide users with L2 VPN services of

different media through the MPLS network, including ATM, FR, VLAN,

Ethernet, PPP, and HDLC. The MPLS network also provides common IP, L3

VPN, traffic engineering, and QOS service. As a result, carriers can save

the investment for constructing network.

MPLS L2VPN transparently transmits L2 data in the MPLS network.

Through the MPLS L2VPN network, L2 connection be created between

different sites. Take ATM as an example; configure an ATM virtual circuit in

each CE. Connect it with a remote CE of the MPLS network. This mode is

the same as the interconnection through ATM network.



With MPLS L2VPN, the carrier only needs to provide L2 connectivity for

users. The carrier does not need to participate in the route calculation of

VPN users. But the MPLS L2VPN is same as traditional L2 VPN (for example,

VPN provided by ATM PVC), there is the problem of N power. In each VPN,

the connection between any two CEs requires a link between CE and PE.

For PE device, if a VPN has N sites, N-1 physical or logical port connection

between CE and PE must be created. In MPLS L2VPN, PE device does not

participate in the route calculation of users. Therefore, the expansibility of

L2VPN is greater than L3VPN. But, L2VPN is less flexible.

PPVPN team of IETF worked out many frame drafts, in which, the two

most important types are Martini and Kompell. The Martini draft

implements MPLS L2VPN through expanding LDP; Kompell draft

implements it through expanding MP-BGP. Currently, Martini draft has

become a standard. Maipu supports this mode.

Relevant RFCs are as follows:

RFC4905, Encapsulation Methods for Transport of Layer 2 Frames over

MPLS Networks

RFC4906, Transport of Layer 2 Frames Over MPLS

MPLS L2VPN covers Virtual Private Wire Service (VPWS) and Virtual Private

LAN Services (VPLS). VPWS is a point-to-point virtual dedicated line

technology. It supports most link layer protocols. VPLS provides similar

LAN services in the MPLS network. Distributed users can access mutually

like accessing LAN directly.

VPWS The basic principle of MPLS L2VPN is similar to that of BGP/MPLS VPN. It

also uses the label stack to implement the transparent transmission of

packets in the MPLS network. External label (tunnel label) is used to

transfer packets from one PE to another. Internal label (in MPLS L2VPN, it

is called VC label) is used to distinguish different connections in different

VPNs. The receiver PE determines the destination CE according to the VC

label. In the process of forwarding, the label stack changes as follows:



Figure 25-11 Forwarding process of MPLS L2VPN label

Illustration

L2PDU: link layer packets

V: internal VC label

T, T1: external Tunnel label, in the MPLS forwarding, the tunnel label will

be replaced.

Implementat ion Mode of Mart in i Martini mode defines the method of implementing MPLS L2VPN through

creating point-to-point link. It distributes VC labels by expanding LDP

signaling protocol. Therefore, the mode is also called LDP L2VPN.

For the LDP protocol to distribute VC labels, RFC4447 expands the LDP

protocol. In the LDP protocol, FEC type of VC FEC is added. In addition,

the two PEs switching VC labels are not directly connected. Therefore, LDP

must use target peer to create a session and then transfer VC FEC and VC

labels over the session. The process of distributing VC labels of LDP is the

same as the distribution process of other labels.

The L2VPN implemented through expanding LDP can carry ATM, FR,

Ethernet/VLAN, PPP, and HDLC. It requires that the link layer protocols in

each site in the VPN are the same. Only when all sites are Ethernet or ATM,

the L2 VPN network can be created. The disadvantage of L2VPN in Martini

mode is that only the point-to-point VPN L2 connection can be created.

The automatic discovery mechanism of the VPN is not supported.



The L2VPN of Martini mode focuses on the problem ―how to create virtual

circuit between two CEs‖. It adopts VC-TYPE + VC-ID to identify a VC. VC-

TYPE indicates the VC types including ATM, ETHERNET, VLAN, and PPP.

VC-ID is used to identify a VC. It must be unique in the PE device. The PE

connecting two CEs exchanges VC labels through LDP protocol and binds

the corresponding CEs through VC-ID.

When the LSP connecting two PEs is created successfully, and the

exchange and binding of labels are complete, the VC is complete. The two

CEs transmit L2 data through the VC.

Point-to-Multipoint Connection (VPLS)

Background and Features of VPLS Technology VPLS virtual private LAN is one kind of MAN Ethernet technology. It can

connect each access points and implement point-to-point, point-to-

multiple point, and multiple point-to-multiple point Ethernet service in the

network topology.

According to the connection mode, VPLS uses WAN backbone network of

IP/MPLS to provides enterprise users with simulation LAN connection.

According to the service provision mode, the simulation LAN of VPLS

provides convenient and flexible Ethernet service. The simulation LAN

connection is transparent for the sub-LAN crossing the WAN. Each sub-LAN

is like being connected to the same switch.

VPLS uses IP/MPLS domain to classify the network and restrict the L2

service to the entrance/edge network. According to the networking

requirements, the MAN using VPLS technology includes the following two

modes.

1. The access network provides L2 service; the aggregation and core

network provide L3 service.

2. The access network and aggregation network provide L2 service; the

core network provides L3 service.

VPLS technology integrates the IP/MPLS, VPN, and Ethernet switching to

implement the multipoint-to-multipoint LAN interconnection in the WAN.

The advantage of VPLS is that: after the PE device with multipoint

connectivity is configured, when the CE devices are added, deleted, or re-

deployed in the VPN, you only need to re-configure the directly connected



PE device. If the point-to-point L2VPN is used, the peer PE device must be

re-configured.

Two Graphical Concepts and Working Pr inc iple of VPLS VPLS technology includes signaling control layer and data forwarding layer.

To implement the VPLS signaling control function, you can use BGP or

Targeted-LDP, which are respectively called Kompella VPLS and Martini

VPLS. Currently, only the VPLS control panel through Targeted-LDP is

supported.

In the signaling control panel, VPLS technology uses LDP signaling protocol

to create a pair of cross-backbone network unidirectional MPLS VC-LSP,

and create corresponding PW between PEs. Transmit the Ethernet data

unit in the backbone network through PW. VC-LSP can be configured

statically or dynamically configured by the LDP protocol. The created PSN

tunnel can carry multiple VPLS services. At the same time, It shields the

transmission data to protect the cross-backbone network security.

For data forwarding, in the MAN created according to the VPLS technology,

the PE devices in the network independently learn the MAC address and

maintain the MPLS FIB table, encapsulate/de-encapsulate the received L2

data according to RFC4447. The data is exchanged through the PSN tunnel

created by LSP of MPLS between PEs. One VPLS instance corresponds to

one enterprise customer. PE maintains one MPLS FIB entry for different

VPLS instance. In the maintained MPLS FIB table entry, the key is the

relationship between MAC and PW, namely, the relation between MAC and

LSP. Note that one PW is composed of two LSPs. MAC corresponds to the

labels of negative direction. Then, the data can be properly forwarded.

When the PE maintains the MPLS FIB table entries, the problem similar to

the MAC address aging of the switch will be encountered. VPLS

implements the function through the signaling protocol to send address

withdraw message. The function is implemented through a FEC TLV

(involved VPLS of the flag) contained in a LDP address withdraw message

and a MAC address TLV (optional).

VPLS technology emulates a transparent LAN. The sub-LAN is similar to be

connected to a switch. Loopback will be encountered inevitably. VPLS

technology solves the problem through two methods: Run STP in each PE

to transmit STP BPDU tunnel; perform full-mesh interconnection for all PEs

and support horizontal split mode. In the first method, STP is developed

from the LAN technology. Even in the LAN with many hosts, the

aggregation time is long. Although STP is improved through multiple ways,

it is not suitable to apply in large-scale network. The second method can

solve the loopback problem in certain scale. But when the number of PEs



increases, the full-mesh interconnection will increase the number of inter-

PE LSPs, decrease the flexibility of network deployment, the increase the

press of PE. You can solve the problem by applying hierarchical VPLS (H-

VPLS) in the large-scale network.

H-VPLS uses a centralized star layout to create the hierarchy: full-mesh

tunnels are maintained between backbone sites (specified to be PE); CE

devices are connected to a uPE; uPE is connected to one nPE. Through the

hierarchy, H-VPLS enables carriers to assign bandwidth dynamically in the

network to create unique section. H-VPLS can effectively use the

bandwidth, especially for the video application. Through pushing multiple-

point broadcast to the edge of the carrier network, H-VPLS decreases the

load of the core part of the MAN.

VPLS Packet Encapsulat ion The packets transmitted over AC and PW in the VPLS mode are Ethernet

frames. Two Ethernet encapsulation modes are supported: RAW and

TAGGED.

RAW: the packets can contain 802.1Q VLAN tag (or do not contain),

but the tag is meaningless for the two connected nodes. The tag is

transparently transmitted.

TAGGED: In each packet, at least one 802.1Q VLAN tag is contained.

The tag is meaningful for the two connected nodes, namely, the two

connected nodes have certain conventions for the tag (for example,

configure through signaling or manual operation).

For a PE device, AC or PW encapsulation mode selection means selecting

from the encapsulation formats of the packets output from AC or PW. If

the TAGGED mode is selected, for AC, the tag is meaningful for the CE-PE

two ends; for PW, the tag is meaningful for the two ends of pseudo line

connection between PE1 and PE2.

The packets received from AC, namely, the packets received from VLAN

interface, can contain tag of not. If the tag is contained, the tag can be the

Service-Tag (S-TAG) pressed by users for the SP network to distinguish

users. It can also be customer VLAN-Tag (C-TAG). To identify the S-TAG

or C-TAG, you should check the configuration of customers (packets first

match the TPID of OVID, then, match the TPID of IVID (namely, the per-

chip configured inner TPID). If the two TPIDs are equivalent, it is

considered to be OVID).

The packets with Tags are received from PW. If the PW is in TAGGED

mode, and the TPID in the packets is equivalent to the configured TPID,

the external TAG is considered to be S-TAG. Otherwise, it is C-TAG. If PW



is the RAW mode, the tag contained in the packets is C-TAG. The C-TAG is

transparently transmitted in the VPLS processing. It will not be deleted or

replaced.

1. Packet Encapsulation on AC

The packet encapsulation mode on the AC is determined by the VSI access

mode of the user VLAN interface. The user access modes include: Ethernet

access and VLAN access.

VLAN access: the Ethernet frame header sent to PE from CE or sent to CE

from PE contains a VLAN tag. The tag is a S-TAG pressed by the customer

for the SP network to distinguish customers.

Ethernet access: The uplink Ethernet frame header of CE and the downlink

Ethernet frame header of PD do not contain S-TAG. If the frame header

contains VLAN tag, it indicates that it is the internal VLAN tag of the user

packets and it is meaningless for PE devices. The tag of the internal VLAN

is called C-TAG.

Ethernet access mode is corresponding to the RAW encapsulation mode.

VLAN access mode is corresponding to the Tagged VPLS encapsulation

mode. The packet processing modes are as follows:

A. RAW Mode

Packets sent from AC do not process tag in the VPLS process, no matter

whether S-TAG or C-TAG exists. Whether S-TAG should be added to the

packets is determined by the port configuration and VLAN configuration.

B. TAGGED Mode

If the packets sent from AC contain S-TAG in the VPLS processing part,

judge whether the S-TAG is equivalent to the S-TAG of AC. If they are

equivalent, do not perform any operation; if they are not equivalent,

replace the tag. If the packets do not contain S-TAG, add the S-TAG of AC.

2. PW Encapsulation

The encapsulation mode in PW also contains two types: RAW mode and

Tagged mode.

A. RAW Mode

If PW uses the RAW mode, PW indicates the virtual links on two Ethernet

ports. Packets are transparently transmitted. The packets can contain Tags.



But the tag is meaningless fro ingress and egress PE. S-TAG will not be

transmitted over PW.

The packets received from AC will be output from PW. If the packets

contain S-TAG previously, delete the S-TAG first, then, press two layers of

MPLS labels before forwarding. If the packets without S-TAG are received,

press two layers of MPLS labels before forwarding.

B. TAGGED Mode

After the PW is configured to be the TAGGED mode, PW indicates the

virtual link between two VLANs. Each PW can represent different VLAN to

perform switching of different network. Each packet must contain a TAG.

The tag value is meaningful for the ingress PE and egress PE.

The packets received from AC should be output from PW. If the S-TAG is

contained previously, press two layers of MPLS labels before forwarding; if

the packets do not contain S-TAG, add an empty TAG (TAG VID=0) and

then press two layers of MPLS labels before forwarding.

Basic VPLS The full-mesh interconnection structure is adopted in the basic VPLS.

In a full-mesh network, the session connections are created between PEs

in the same VPLS instance. Corresponding PW is generated. The packets

received from the CE can be forwarded to one or multiple local interfaces

(AC) and emulated LAN interface (PW). To prevent loopback of broadcast

packets in the network, the packets received from a PW will not be

forwarded to other PWs in the same VPLS instance. This is L2 horizontal

split. In the Full-mesh network, the horizontal-split is a basic function.

The following figure illustrates the full-mesh connection.



Figure 25-12 Full-mesh connection of basic VPLS

In the preceding figure, enterprise user A and user B are connected to

three branch LANs through VPLS technology respectively. Red line

indicates the traffic flow of user A and blue dotted line indicates the traffic

flow of user B. Each branch LAN of the user is connected to the IP/MPLS

backbone of the carrier through PE to form a VPLS instance. In the

preceding figure, user A belongs to VPLS instance 1; user B belongs to

VPLS instance 2. The traffic flow can be transmitted in the LAN mode

between branch LANs in the same VPLS instance. Even if multiple

enterprises access the same backbone network through the same PE, the

traffic flows are independent from each other logically. This ensures the

privacy of user data. The VPLS instance data of user A and user B are

isolated and cannot be interconnected.

To connect different branch sites, you should create the full-mesh

interconnection between PEs of the same VPLS instance. It is a data tunnel

created through the LSP of IP/MPLS. PE provides the Ethernet-based

bridge access mode for users. PE directly receives the data frames in the

Ethernet encapsulation format from user branch LAN, and determines

forwarding data to the proper LSP to reach the branch LAN at the other

end according to the destination MAC address. With the VPLS protocol

running on PE, the interfaces connecting user network on PE, like bridge

devices, provide L2 switching and MAC address learning capability. When

the PE receives data frames, it first checks whether the destination MAC

address of the frame header and the entries in the MAC address table are

matched. If any entries are matched, the data frame is forwarded to the

corresponding LSP for transmission; if no entry is matched, the same data

frame is broadcast to other logical ports serving the same VPLS instance.

When the PE device receives data from the home host of the MAC address

and learns the address, the MAC address table is updated. The following

data frames will be forwarded normally. This is similar to the working

principle of Ethernet switch.



H-VPLS Hierarchical VPLS (H-VPLS), is a technology enhancing the VPLS

expansibility. It extends the access scope of the service provider VPLS and

decreases the network complexity to facilitate network management. At

the same time, the construction and operation cost is reduced. When

common VPLS is used, if one PE is expanded, full-connection with each PE

is required. If LDP is used, each PE device in the VIS should be configured.

N2 problems occur in the case of controlling the quantity of packets. After

the H-VPLS is used, expand a PE. You only need to modify the

configuration of the PE connected. In addition, the quantity of the packets

does not encounter the N2 problem.

New roles are introduced in H-VPLS: uPE, namely the user end PE, the PE

in the SP network connected with uPE is also called nPE, namely, network

end PE. uPE can be the L2 device with the Ethernet switch function only. It

can also be L3 device with switch and route functions. One end is

connected with PE of SP network; the other end (multiple interfaces)

connects multiple user CE devices in the building. uPE is one part of VPLS.

It connects with PE by creating a PW. The PW is also called SVC.

In the H-VPLS network, user end PE (U-PE), is usually placed at the

entrance. Therefore, it is also called Multi-Tenant Unit (MTU). If the MTU

only has the switching function, H-VPLS can use the L2 QinQ mode to

access. The mode is applicable to the early stage of the network

construction, when the accessed devices in the system do not have the

MPLS function. It can also be used in small access network. Only simple

access function is required. If the MTU has the routing and MPLS function,

H-VPLS can use the LSP mode of the MPLS to access. This mode can also

be used in the medium-scale access or aggregation point. The MPLS

network can extend to the user end to user other VAS of the MPLS

network.

The core network in the H-VPLS is the full-mesh topology. The edge

network is the Hub-and-Spoke star topology. In the preceding figure, uPE

is the hub, and the multiple CEs are equivalent to spoke. The top layer and

the edge layer of the core are connected through the pseudo wire.

In the H-VPLS network, if you want to make full connection like basic VPLS,

the uPEs will serve as PEs in the basic VPLS to make full connection. The

quantity of sessions is greater than the full-connected PE devices in H-

VPLS. Therefore, H-VPLS enhances the expansibility of the VPLS. As a

result, the N power problem caused by the expansion is prevented. For the

new uPE, configure the uPE and the connected PE. You do not need to



change other devices. Then, the maintainability and manageability are

improved.

For the signaling protocol between PE and uPE, one mode is that the PW

from PE to uPE is implemented through spoke VC function of LDP, to

implement H-VPLS; the other mode is the H-VPLS based on the QinQ

mode. It is only applicable to Ethernet link.

Figure 25-13 H-VPLS connection

In the H-VPLS, uPE can access multiple CEs. The CEs can belong to one or

multiple different VPLS instances. Between nPE and uPE, label or VLAN-ID

is used to distinguish VPLS instance. If the VLAN-ID is used, the QinQ

technology is required for the user data frame may contain VLAN-ID label.

For the CEs in the same VPLS instance connected to the same uPE to

exchange information, you can implement the function through L2 switch

on uPE. The participation of nPE is not required.

When CE2 wants to send data to the remote CE1 (through the CE

connected to the WAN of SP) in the VPLS instance, the Ethernet frame is

first sent to uPE1. If uPE1 fails to learn the DEST-MAC (broadcast frame or

multicast frame) of the frame, send the frame on other ports (AC, SVC,

and PW) of non-receiving port. After PE2 receives the frame, if the MAC is

not learned, the frame will be broadcast in all ports (PW, AC, and other

SVC) of the VPLS instance. If the DEST-MAC is learned, the frame will be

sent in the corresponding PW. If PE1 of the other end receives the data

frame, it will be forwarded according to the DEST-MAC, namely, if the

DEST-MAC is not learned, broadcast the frame on other ports of the VPLS

instance; if the DEST-MAC is learned, send the frame to the corresponding

AC, and then upload to the CE1.



1. Access Through SVC

The connection between uPE and PE can adopt VC, which is called Spoke-

VC (SVC). Use the SVC to identify the VPLS instance of the packets

entering PE. For the SVC, there are two conditions:

uPE has the switching capability. The processing for received

packets is described previously. Between uPE and PE, maintain

one PW for one VPLS.

uPE is a device without switching capability. Between uPE and PE,

for one VPLS, multiple PWs should be maintained. On uPE, the

mode is the same as that of VPWS. One ingress interface of uPE

accessed by CE corresponds to PE directing to PW. In this case,

the packets of uPE received from the AC will be sent be the PE for

processing. If the packets are sent to another CE of the same

VPLS connected with the local uPE, the switching process is

implemented on the PE. This mode has some disadvantages. But

it is only a compatible mode of using the deployed uPE devices

supporting VPWS.

For SVC, two VPLS instances (such as two cross-MAN VPLS instances) can

be connected. This is called Multi-domain VPLS. Two PEs connected by

SVC are called border-PEs. If multiple multi-domains should be

interconnected, perform full-mesh for border-PEs of each VPLS through

SVC. As a result, a L3 VPLS network is formed.

2. QinQ Access



Figure 25-14 Packet process of QinQ Access

The preceding figure illustrates the packet forwarding process of QinQ

access:

A. Enable QinQ at the CE access port. Add pressed VLAN tag for

the received packets to serve as multiplexing separation tag.

Between MTU and PE1, transparently transmit packets to PE1

through QinQ tunnel.

B. PE1 first determines the home VSI according to the VLAN tag

of the carried MTU, and then press multiplexing separation

label (MPLS label) corresponding to PW it according to the

destination MAC of the packets. At last, forward the packets.

C. After PE1 receives packets from the PW side, determines the

home VSI according to the multiplexing separation label (MPLS

label). Label VLAN tag according the destination MAC Forward

the packets to MTU through the QinQ tunnel. At last, MTU

forwards the packets to the CE.



MAC Address Restr ic t ion The MAC address is learned before switching is performed in the VPLS

instance. Then, search the MAC address table according to the destination

MAC address. One system can support multiple VSI instances. To prevent

oversized MAC address table of an instance, restrict the number of MAC

addressed that can be learned in the VSI.

MAC Address Recycl ing When any fault is encountered, to quicken the convergence speed, notify

other PEs to clear local MAC table entries of the VSI, trigger the re-

learning of MAC address, and reconstruct the MAC forwarding path as soon

as possible. The recycling message of LDP protocol provides the

mechanism.

The address recycling message carries the MAC TLV. The devices receiving

the message delete the MAC address or re-learn the MAC address

according to the parameters specified by TLV.

The destination of the MAC address recycling message is relevant with the

fault type. The basic principle is to notify all devices that may learn the

MAC addresses. The fault types include: AC interface fault, Mesh-PE device

fault, and Spoke-PE device fault.

When the AC interface is faulty, you should send the MAC address

recycling message to all Mesh-PE devices and Spoke-PE devices.

When a Mesh-PE device is faulty, you should notify all Spoke-PE devices.

When a Spoke-PE device is faulty, notify all Mesh-PE devices and other

Spoke-PE devices.

Loopback Avoidance Like the common Ethernet, the loopback avoidance must be taken into

consideration for the virtual Ethernet. In the VPLS, full-mesh and split

horizon must be adopted to avoid loopback.

In the basic networking environment, among all PEs of the same VPLS

instance, the PE will be created to form a full-mesh topology. As a result, a

PE can connect with other PEs through the PW. At the same time, PE will

be connected to CE through the access circuit (AC). In split horizon, the

broadcast, multicast, or the frames to be flooded that are received from

the PW will not be sent to other PWs (including itself) of the same VPLS

instance, but they can be sent to AC; the broadcast, multicast, or frames

to be flooded that are received from AC, except the AC itself, can be sent

to other PWs and ACs of the same VPLS instance, namely, the packets

received from the PWs at the public network will not be forwarded to other



PWs of the public network. The packets can only be forwarded to the

private network.

The core network created in this mode does not have loopback.

If the loopack structure caused by the backdoor exists in the CE network

of the VPLS, the users in the LAN should run the loopback avoidance

protocol, such as STP, to avoid loopback. For the loopback avoidance

control protocol of users, the carrier network does not perceive. It is

transparently transmitted as user data.

In the H-VPLS of the MPLS access network, to avoid loopback in the

forwarding, nPE devices must enable the L2 split horizon in the pseudo

wire connecting to other nPE devices. Disable the split horizon in the

pseudo wires connecting to uPE. On nPE, packets reaching the pseudo

wires connecting to uPE are forwarded to other pseudo wires. When the

packets reach the pseudo wires connecting nPE, the packets are forwarded

to the pseudo wires connecting uPE.

If a uPE connects a PE, since it is a star topology, there is no loopback in

the network. To prevent circuit fault, you can use the MPLS FRR to ensure

fast recovery of the fault. To prevent node fault, you can use the dual-

homing access nPE.

If uPE is dual-homed to two PEs, the L2 split horizon only cannot prevent

the loopback. You have to enable the spanning tree protocol between uPE

and nPE.

Comparison between VPLS and VPWS VPWS VPLS

Concept Virtual private wire service.

Point-to-point virtual circuit connection, in users' eyes, is a circuit connecting to another end, providing L2 packets transparent transmission.

Virtual private LAN service.

Provide the virtual Ethernet service through WAN. In users' eyes, it seems that multiple VPN branches are connected to a huge LAN provided by the SP. In addition, bridge switching is performed in the LAN.

VPN A point-to-point connection mode of L2VPN.

A point-to-multipoint connection mode of L2VPN.

Expansibility There is a network expansibility problem, namely, Npower problem. After providing multi-connectivity PE

Provide good expansibility; operation and maintenance are simple. After providing multi-connectivity PE



devices for VPLS customers, when you add, delete, or re-deploy CEs in the L2 VPN, you must re-configure each peer PE.

devices for VPLS customers, when you add, delete, or re-deploy CEs in the L2 VPN, you must re-configure the connected PEs.

Signaling protocol

LDP, the pseudo wire between PEs is called VC.

LDP, the pseudo wire between PEs is called PW or SVC.

Encapsulation mode

Add the VPWS label, and then add the label of external MPLS tunnel. Take the FR AC access as an example: When the AC interface encapsulation between CE and PE is FR, packets are received on PE. Add the VPWS label before the FR header, and then, add external MPLS label.

Add the VPLS label, and then add the label of external MPLS tunnel. Take the FR AC access as an example: When the AC interface encapsulation between CE and PE is FR, packets are received on PE (the format should be: FR header + Ethernet header + Data). The FR header should be removed. Add VPLS label only before the Ethernet header in the FR packet, and

then, add MPLS label.

AC access Multiple types of ACs are supported, such as, PPP, HDLC, Ethernet, VLAN, FR, and ATM

Multiple types of ACs are supported, such as, PPP, HDLC, Ethernet, VLAN, FR, and ATM

Packet processing flow

The network connection is as follows: CE1--------PE1--------P-------PE2--------CE2. Assume that: data communication is performed from CE1 to CE2, VC label is exchanged through the LDP protocol between PE1 and PE2. The data processing flow is as follows: CE1PE1, PE1 adds the VPWS label and then adds the global route label, send to PE2, after PE2 receives the packets, remove the label, and send to CE2 interface.

The network connection is as follows: CE1--------PE1--------P-------PE2--------CE2. Assume that: data communication is performed from CE1 to CE2, VC label is exchanged through the LDP protocol between PE1 and PE2. The data processing flow is as follows: CE1PE1, after PE1 receives packets, it learns the MAC address of CE1, and then search the MAC address table in the VPLS instance taking the destination MAC as the key value. The found destination MAC will be sent to the PW of PE2, add VPLS label in the encapsulation, and then add global MPLS label, at last, send to PE2. After PE receives the packets, learn the address and then search the table. If the AC is found, remove the label, and then send it to CE2 interface. If the packet is not found in the table, perform flooding in the VPLS instance based on the split horizon principle.

MPLS Traffic Engineering With the expansion of network scale, network engineering and traffic

engineering arise.

Network engineering is to design the network to meet the traffic

requirements. The network designer should understand the transmission

of traffic in the network, and then purchases proper links and network



devices. The implementation of network engineering takes long time for

new links and devices should purchased and installed.

Traffic engineering is to design the traffic for normal transmission over the

network. Despite the efforts of network designers, the actual traffic in the

network is not the same as the predicted value. The increasing speed of

the traffic is beyond the expectation sometimes, but the network

designers cannot upgrade the network at once. Usually, rapid traffic

increase, emergency, or network accident may increase the requirements

for bandwidth at certain places. At the same time, some links in the

network is not fully utilized. The core concept of the traffic engineering is

to transfer the traffic, and the traffic blocking the link will be transferred to

the links not fully utilized. The traffic engineering is not the proprietary

product of MPLS, it is a universal solution. MPLS-based traffic engineering

is a trial. It attempts to use the link-oriented traffic engineering

technology and integrate the technology with IP routing technology.

At the ingress port (it can be considered to be source end of the data) of

MPLS network, the MPLS traffic engineering controls the path to specific

destination. Create the LSP and reserve network bandwidth in the passing

routes. Balance the traffic load and make full use of the link bandwidth.

The acronym of MPLS traffic engineering is MPLS-TE.

MPLS-TE ensures the bandwidth for each traffic by creating tunnels. After

the tunnel is created, the data is mapped to be FEC, and is forwarded in

the tunnel along the LSP path. At the head end of the tunnel, the tunnel

exists as a tunnel interface. Any traffic to pass the tunnel, should be sent

through the interface. In the network routing, the tunnel interface can be

found through static route and dynamic route. The routes directing to the

tunnel interface can be distributed through the dynamic route.

Another major feature of MPTS-TE is to implement communication

protection. Usually, the partial protection technology, namely, the fast

reroute technology is adopted; the graceful restart technology can also be

adopted.

Ground of MPLS Traffic Engineering To implement the MPLS-TE, the following two modes can be adopted:

Constraint-Based Label Distribution Protocol (CR-LDP).

RSVP-TE expanded from RSVP.



RSVP-TE is supported by most vendors. The MP series switches support

RSVP-TE protocol.

In RSVP-TE, LABEL_REQUEST, EXPLICIT_ROUTE, SESSION_ATTRIBUTE,

RECORD_ROUTE and LABEL are added. They are respectively used in the

PATH and RESV messages. The objectives are used to request label,

complete path specified by the source end, description, recording route

and assignment label. The EXPLICIT_ROUTE can specify the path of the

tunnel. The specified path in the objective includes strict hop and loose

hop. Usually, the path calculated by the source end is described in strict

hop. If the data source cannot see all details in the entire network, or the

source end does not want to specify each hop in the path explicitly. you

can use the loose hop to describe the path. When the router node receives

the PATH message, and the path objective is processed, for the strict hop,

the first IPv4 address in the objective must be the address of the local

router, otherwise, the objective cannot be processed. For loose hop, the

router should generate a strict hop path for the source end, which contains

loose hop node. In addition, use new strict hop objective to put into the

PATH message for transmission.

Releasing MPLS-TE Network Topology Information The RSVP protocol cannot see the topology of the entire network.

Therefore, MPLS-TE should resort to the link state routing protocol (OSPF

or IS-IS) to release network topology information and calculate the tunnel

path of MPLS-TE.

The link state routing protocol (OSPF or IS-IS), according to the known

network topology and the advertised MPLS-TE network topology

information, calculates the shortest path of the required MPLS-TE tunnel.

Releasing MPLS-TE Network Topology Information on OSPF The MPLS-TE network topology information released on OSPF includes two

types: switch address information and switch link information. The

released switch address information is the switch ID of the MPLS-TE, which

is used to identify the switch node in the MPLS-TE network topology. The

relevant information is as follows:

Information Function Command

Router ID The Router ID (one interface IP address) of the switch, used to identify the switch node in the MPLS-TE network topology

switch(config-ospf)# mpls traffic-eng router-id address



The link information refers to the relevant information of MPLS-TE released

based on a single link. The corresponding configuration command is

configured in the interface mode. It includes the following content:


Link type Specify the link type, 1: point-to-point, 2: multiple access (for example, Ethernet)

N/A

Link ID On the point-to-point link, it is the OSPF ROUTER ID of the neighbor; on the multiple access link, it is the interface address of the designated router (DR)

N/A

IPv4 interface address

The interface IP address of the advertisement switch on the link

N/A

Neighbor address Point-to-point link refers to the interface of the neighbor at the other end; multiple-point interface refers to the interface address of the DR

N/A

TE metric The cost of calculating tunnel path in the link mpls traffic-eng admin-weight

Maximum physical link bandwidth

The maximum physical bandwidth on the link interface

bandwidth

Maximum reserved bandwidth

Maximum bandwidth that can be reserved in the link

ip rsvp bandwidth

Unreserved bandwidth for each priority

Unassigned reserved bandwidth of each priority tunnel on the link

N/A

Attribute flag The link attributes defined by the user. Include or exclude the link according to the attribute in the path calculation.

mpls traffic-eng attribute-flags

Releasing MPLS-TE Network Topology Information on IS-IS The MPLS-TE network topology information released on IS-IS includes

global MPLS-TE network topology information and the attached MPLS-TE

network topology information.

The global MPLS-TE network topology information released on IS-IS is as

follows:


Router ID The Router ID (one interface IP address) of the switch, used to identify the switch node in the MPLS-TE network topology

switch(config-isis-af)# mpls traffic-eng router-id address

For the attached MPLS-TE network topology information of each link

released on IS-IS, the corresponding configuration command is configured

in the interface mode: Information Function Command

IPv4 interface address

The local interface address of the link. It is used to generate the IP address identifying LSP path.

N/A

Neighbor address Peer IP address on the point-to-point link; non-point-to-point link does not advertise the

N/A



neighbor address

TE metric The cost of calculating tunnel path in the link mpls traffic-eng admin-weight

Maximum physical link bandwidth

The maximum physical bandwidth on the link interface

bandwidth

Maximum reserved bandwidth

Maximum bandwidth that can be reserved in the link

ip rsvp bandwidth

Unreserved bandwidth for each priority

Unassigned reserved bandwidth of each priority tunnel on the link

N/A

Attribute flag The link attributes defined by the user. Include or exclude the link according to the attribute in the path calculation.

mpls traffic-eng attribute-flags

MPLS-TE Tunnel Path Calculation (CSPF) The MPLS-TE tunnel path calculation is the CSPF calculation. In the

calculation, the shortest path to the tunnel end is calculated according to

the network topology state described by the link state route protocol

(OSPF or IS-IS) in the predefined restrictions.

The restrictions of path calculation includes bandwidth requirement,

created priority, included network nodes, excluded network nodes,

included link, and excluded link.

After the MPLS-TE tunnel path is calculated, transfer the path information

to the RSVP. Then, the RSVP creates a tunnel according to the path

information.

Creating MPLS-TE Tunnel Path For the creation of MPLS-TE tunnel, the source end (the start switch of the

tunnel, Ingress) calculates a path to the tunnel end (end point switch of

the tunnel, egress) through the path described previously. Then, form the

explicit route object (ERO) and put into the Path message. If the source

end cannot calculate a qualified path leading to the path end, the source

end will not launch the process of creating tunnel. On the contrary, the

source node will form a Path message and send to the next-hop switch of

ERO.



Figure 25-18 Create a TE tunnel

As shown in figure 25-18, create a tunnel from switch1 to switch 4. The path leading to switch4 calculated by switch1 is: Switch1 Switch2

Switch3 Switch4. Switch1 sends the Path message to the downstream

switch2. After switch2 receives the Path message, checks whether itself is

the node indicated by ERO in the Path message. If it is, accept the

message, and check whether the required bandwidth reservation can be provided in the link (Switch2 Switch3) along the path indicated by ER. If

the bandwidth permits, continue sending Path messages to the

downstream. As a result, the Path messages are sent to the switch at the

end of the tunnel hop by hop along the path indicated by ERO. After

switch4 receives the Path message, send RESV message to the upstream

node switch3. The RESV message contains the labels of switch3 assigned

by switch4. After switch3 receives the downstream RESV message, it also

assigns labels to the upstream switch2. At the same time, reserve certain

bandwidth. As a result, the downstream node sends RESV messages to the

upstream node hop by hop and then distributes tunnel labels. At last, the

RESV message reaches the source end. The source end receives the RESV

message and reserve bandwidth. In this manner, the tunnel is created.

After the tunnel is created (UP), RSVP-TE protects the tunnel through the

"soft-state" mode. Soft sate means that the relevant states are maintained

through refreshing Path and RESV messages. It includes that each node

sends Path messages to the downstream respectively and sends RESV

messages to the upstream. Each node will wait for the upstream Path

messages and downstream RESV messages. If the wait times out, we

think that the tunnel maintenance is not required and thus delete the

relevant resource reservation. Independence means that the node will not

immediately send Path messages to the downstream after receiving

upstream Path messages, and will not immediately send RESV messages

to the upstream after receiving the downstream RESV messages. Of

course, the condition of receiving PATH messages and RESV messages for

the first time should be excluded. They have their own cycle. But the

cycles are not the same. The time of refreshing cycle changes around 50%

(up and down) to avoid global cycle synchronization. If the refreshing

cycle is 30 seconds, the possible refreshing time includes: 30s, 45s, 15s,

and 30s.



Forwarding Traffic on MPLS-TE Tunnel To forward traffic on the MPLS-TE tunnel, you can configure static route,

automatic route and forwarding adjacency.

Stat ic Route In static route, certain traffic is designated to forward through MPLS-TE

tunnel.

Automatic Route In automatic route, when the link state route protocol (OSPF or IS-IS) is

performing SPF route calculation, replace the next-hop of the route

reaching MPLS-TE tunnel end and the tunnel downstream with the MPLS-

TE tunnel. The basic principle of automatic route is that do not affect the

SPF route calculation. It only replaces the next-hop of relevant route.

Forwarding Adjacency In the link state route protocol (OSPF or IS-IS), advertise the MPLS-TE

tunnel as an adjacent link to change the network topology of the SPF route

calculation. The route whose next-hop is the MPLS-TE tunnel is generated.

The effect of forwarding adjacency is heavier than that of automatic route

for the network topology of the route calculation is changed.

Fee-Equivalent Load Balance of MPLS-TE For the route reaching the MPLS-TE tunnel end, the load between the IGP

path and MPLS-TE tunnel cannot be balanced. But the load between MPLS-

TE tunnels can be balanced. If you cannot fully control all traffic reaching

the MPLS-TE tunnel end, it is hard to implement traffic engineering in the

network.

For the route reaching MPLS-TE tunnel end, the load between IGP path

and MPLS-TE tunnel can be balanced. The downstream of the MPLS-TE

tunnel end does not belong to the scope of the traffic engineering.

Therefore, you need not worry whether the nodes support the traffic

engineering.



The next-hop generated by the automatic route cannot balance the load

with the next-hop generated by the forwarding adjacency. The automatic

route has the absolute priority. The forwarding adjacency is implemented

through affecting the SPF route calculation network topology. But the

automatic route is implemented by replacing the next hop of the relevant

route after the SPF route calculation. Therefore, the next hop generated by

the automatic route has the absolute priority.

For MPLS-TE tunnel or IGP path, the condition for determining load

balance is that the metric values of the paths are the same.

Tunnel Protection In the network, the link and switch nodes carrying tunnel traffic may fail

owing to the internal fault. The failure of any link or node may cause

network breakdown and data loss. When the network encounters a failure,

the IGP protocol running in the large-scale network requires long time to

perform route aggregation. RSVP-TE also requires long time to update the

path used by the tunnel. During this period, the data in the tunnel will be

lost. In addition, the duration is long. You can use minutes to be the

metric unit. To prevent this symptom, RSVP-TE provides sophisticated

tunnel protection and restoration function. As a result, the data loss or

interruption caused by the link or node failure is reduced.

The protection includes two types: Full path protection and fast re-route

protection (FRR). MP switch supports FRR.

Ful l Path Protect ion In the LSP ingress node, specify certain nodes (for example, exclude some

LSRs that the protected LSP passes) through the explicit route mode.

Create a sub-LSP. The sub-LSP and the protected primary LSP pass

different nodes. As a result, when the LSP is canceled because of a failure,

the head node of the LSP will map the route that is previously mapped to

primary LSP to the sub-LSP. Thus, the entire LSP is protected. Generally, it

is troublesome to configure a sub-LSP. Therefore, it is rarely used in

practice.

Fast Reroute (FRR) The Fast Reroute (FRR) is a partial protection technology of MPLS LSP. It is

used to protect the TE tunnel. Partial protection means that the protection

is implemented for a node or a link. If protection is implemented for each

node or each link, the entire LSP is protected. FRR is an important feature

of the MPLS traffic engineering. Presently, in most cases, using the MPLS

traffic engineering is to use the FRR function.



Actually, the essence of FRR is "take precautions before it is too late",

namely, create the backup LSP before any fault is encountered. When the

protected node or link fault is detected, the traffic is switched from the

protected LSP to the backup LSP to avoid the loss of traffic.

Partial protection includes:

1. Link protection: protect the link between two switches.

2. Node protection: protect a node of the switch. The node protection

includes the link protection.

Figure 25-19 Node protection

As shown in the preceding figure, the backup LSP protects R2 node. At the

same time, it protects the link R1 –> R2.

RFC4090 defines the methods of implementing FRR:

1. One-to-One

One-to-One mode means that one backup LSP protects one protected

tunnel. See the following figure. The red LSP is the backup LSP, which is

called Detour LSP in this mode. It protects the primary LSP (TUNNEL). The

Detour LSP starts from S1 switch. S1 switch is called point of local repair.

It is the ingress device of the detour. The Detour LSP bypasses the

downstream node S2 of PLR (S1). The destination of the Detour LSP is the

Egress of the protected Tunnel. It meets with primary LSP in S3 and is

merged into the primary LSP. This action is called "Merge". Therefore, S3

is called Merge Point (MP). Actually, the mergence operation is not

necessary. But if the merge operation is not performed, multiple LSP

signaling should be maintained after the MP. Therefore, the mergence

operation is required.



Figure 25-20 One-to-One mode

The detour LSP in the One-to-One mode exists depending on the protected

LSP. If the protected LSP is deleted, the Detour LSP related with the

tunnel will be deleted.

In the One-to-One Mode, the Ingress node of the primary LSP initiates the

FRR requirements. Each node (including Ingress) in the primary LSP try to

create the Detour LSP with itself as the start point. Therefore, the

expansibility of the protection mode is faulty. MP switch does not support

the One-to-One mode.

2. Facility Mode:

Facility is another mode to implement the FRR mechanism, as shown in

figure 25-21 and 25-22. The backup is implemented through a bypass

tunnel. The bypass tunnel is an independent tunnel. It exists and is

maintained independent from the protected tunnel. Actually, it is an

ordinary tunnel. The maintenance for relevant path message and RESV

message is independent. It is different from the non-independence of

Detour LSP in the one-to-one mode.

Figure 25-21 Facility mode of node protection

In the facility mode, the end node of the bypass tunnel is the Next Next

Hop (NNHOP) of the PLR, as shown in S3 device in figure 25-21. It

bypasses the downstream node (S2) of the PLR.



Figure 25-22 Facility mode of link protection

In the facility mode of the link protection, the end node of the bypass

tunnel is the next hop (NHOP) of the PLR. It bypasses the S1->S2 link

between the PLR and the downstream node (S2).

Owing to the independence of the bypass tunnel, it can protect multiple

tunnels to implement the 1: N protection. In this mode, the expansibility is

better. Therefore, the facility mode is also called "Many-to-One" mode.

MP series switches support the facility mode protection, including the link

protection and node protection.

Graceful Restart Graceful Restart (GR), means that the forwarding service is not

interrupted when the protocol is restarted.

The core of the GR mechanism is: when the protocol of a device is

restarted, it can inform the peripheral devices in certain time to maintain

the stability of the neighbor relation and the route. After the protocol is

restarted, the peripheral devices help it to synchronize routing information.

Restore the information of the device as soon as possible. In the process

of restarting protocol, the network route and forwarding are stable. The

packet forwarding path is also not changed. The entire system can forward

IP packets uninterruptedly. The process is called Graceful Restart. It

includes two roles:

GR Restarter: GR restarting router, refers to the router

performing protocol restart implemented by the administrator or

triggered by the fault. It must have the GR capability.

GR helper: the neighbor of the GR restarter, it helps the GR restarter

to maintain the stability of route relation. It must have the GR

capability.

After the TE tunnel is constructed, start the RSVP GR HELLO at the

connected device interface to check the protocol state of RSVP, as shown

in the following figure. If S1 and S3 hello time out, S2 protocol is

considered to be restarted. S1 and S3 maintain the relevant states and

information of RSVP protocol. When the S2 device is restarted, S1 device



will send the path message. S3 device will send the recovery path

message for helping S2 device to restore the state.

Figure 25-23 Graceful restart

MPLS OAM

Introduction to MPLS OAM According to the actual demands of the carrier network, the network

management work can be classified into three types: operation,

administration, and Maintenance (OAM). The operation covers prediction,

planning, and configuration for routine network and services; the

maintenance covers test and fault management.

ITU-T defines the OAM function as follows:

1) Monitor the performance and generate maintenance

information, according to which, evaluate the network stability;

2) check the network fault periodically. Various maintenance and

alarm information is generated.

3) Dispatch or switch to other entities and bypass failed entities

to ensure normal running of the network.

4) Transfer the fault information to the management entity.

The OAM function is very important in the public network for it can simplify

the network operation, check the network performance, and reduce the

operation cost. In the network providing QoS, OAM is particularly

important. Relevant OAM function is defined for traditional SDH/SONET

and ATM. MPLS, as the key carrying technology of the expansible next

generation network, provides multiple-service capability with QoS.



Therefore, MPLS requires the OAM capability urgently. The OAM

mechanism should prevent the network fault and quickly diagnose and

locate network fault. Finally, the network availability and QoS will be

improved.

MPLS OAM Technology

LSP Ping/LSP Traceroute 1. Background

In the MPLS network, when the label switch path (LSP) failed to forward

user data, the control panel requires a method to detect MPLS LSP data

graphical fault. But in the detection methods of traditional IP network, IP

Ping and Traceroute cannot detect the connectivity of the MPLS network.

Successful ping only indicates that the IP forwarding is normal, but it

cannot indicate that the MPLS LSP is connected. When the IP route is

normal but the LSP is disconnected, traditional ping packets can be

forwarded to the destination through IP.

Traditional Traceroute cannot locate MPLS LSP faults hop by hop and

return relevant information of LSP. Successful IP forwarding does not

mean that the LSP is connected. In addition, standard ICMP packets

cannot return relevant information including label stack and downstream

mapping of LSP.

A method for the MPLS network to detect the faults is required. This

document describes a simple but effective mechanism-MPLS LSP

Ping/Traceroute for detecting the faults of the MPLS LSP.

2. Basic Principle

Similar to traditional IP Ping/Traceroute, the MPLS LSP Ping/Traceroute is

also based on the Echo Request and Echo Reply mode. But the LSP

Ping/Traceroute adopts IPv4 UDP protocol instead of ICMP protocol. The

protocol port is 3503. The two basic functions of MPLS LSP Ping/Traceroute

are: 1, checking the connectivity of the forwarding panel; 2, checking the

consistency of the control panel and forwarding panel.

It adopts the packets of specific FEC forwarding class to verify the integrity

of the LSP (from ingress LSR to egress LSR) in the FEC. The information of

the home FEC is carried in the MPLS echo request message.

In the LSP ping operation, the echo request packets are encapsulated in

the UDP packets, including serial number and NTP time stamp parameter.



The destination port number is well-know port 3503. When the MPLS is

processing the LSP Ping request messages, the forwarding policy same as

that of the FEC packets is adopted. When the ping command is used to

test the connectivity, the packets reach the LSP egress port. The LSR

checks the packets to verify that whether the port is the actual egress port

of the FEC.

LSP Traceroute mode is used as a method for locating faults. The LSR that

initiates the test sends ping packets to the destination LSR. The initial

value of TTL is 1, the step value is 1. The LSRs check the packets to return

the information of relevant control panel and data panel.

MPLS BFD 1. Introduction to the Protocol

Bidirectional forwarding detection (BFD) is a solution for quick detection. It

provides a detection method of light load and short duration. In many

aspects, the BFD is similar to the neighbor detection of well-know routing

protocol (such as OSPF). The BFD can create sessions between a pair of

systems. The two ends of the session checks the connectivity of the path

by sending packets periodically. If a system failed to receive the detection

packets from the opposite end in certain time, the bidirectional path of the

adjacent system is faulty.

The BFD protocol describes the mode of implementing bidirectional

detection. There are two modes: Asynchronous mode and query mode.

MPLS BFD adopts asynchronous mode to implement the bidirectional fast

detection of the LSP path.

In the asynchronous mode, the BFD control packets are mutually sent

between systems periodically. If a system failed to receive the BFD control

packets from the opposite end in certain time, announce that the session

is down and notify the control panel or the forwarding panel.

2. Creating a Session

When the BFD is used to detect the fault of the MPLS LSP, a BFD session is

created between the ingress LSR and the egress LSR. The BFD control

packets are transmitted along the data path same as that of LSP.

In the asynchronous mode, the creation of a BFD session is triggered by

the initiative party.

A. The ingress LSR (the initiative party of the session) sends an echo

request packet carrying local session discriminator.

B. The egress LSR (the passive party of the session) replies an echo

replay packet carrying local session discriminator.



C. The ingress LSR sends a BFD control packet to the egress LSR.

Set the value of Your Discriminator field to the session

discriminator of egress LSR to enter the Down state.

D. The egress LSR receives the BFD control packets of the ingress

LSR. Send a BFD control packet to the ingress LSR to enter the

Down state.

E. After the ingress LSR receives the BFD control packets of the

egress LSR, the state changes from Down to INIT. Determine the

sending interval and detection time of the local packets according

to the time parameter carried in the packets. Start the timer of

sending BFD packets. Send the BFD control packets according to

the negotiated interval.

F. The egress LSR receives the BFD control packets of the ingress

LSR. The state changes from Down to Up.

G. After the ingress LSR receives the BFD packets of the egress LSR,

the state changes from INIT to Up.

H. Thus, a BFD session is created. After the session is created, the

egress LSR and the ingress LSR will send BFD control packets

periodically according to the negotiated interval.

3. Session State Machine

The creation of the session covers three handshaking processes. After the

creation process, the session become Up. Negotiate the corresponding

parameters. The subsequent state changes are based on the fault

detection results. Relevant processing should be performed. The state

machine migration is as follows:

Figure 25-24 BFD state migration



IPv6 Network Protocol Technology

Overview With the rapid development of the IP network scale and services, the user

quantity of the IP network increases and more and more problems of the

IP network appear, such as insufficient address space and security

problem. To solve the Internet problems, especially the problem of

insufficient address space, IETF defines the next-generation Internet

protocol based on IPv4 in 1992, called Ipng or IPv6.

The maximum problem solved by IPv6 is to enlarge the address space.

Besides, compared with IPv4, IPv6 has advantages in other aspects, such

as security, service quality, and mobility. One obvious feature of IPv6 is

the ―plug-and-play‖ function. After the node is directly connected to the

network, it can be used without any manual configuration, which makes

the network management and control become simpler; secondly, the node

just need to know its own link-layer address and the subnet prefix of the

local network so that the node can get the unique IPv6 address via the

IPv6 no-status or all-status auto configuration, so as to become one part

of the network; besides, IPv6 realizes the better supporting for the node

mobility. Theses functions are realized via the neighbor discovery protocol.

The interaction between all hosts and gateway devices in one subnet is

realized via the neighbor discovery protocol.

This chapter describes the basic theory of IPv6 protocol.

Main contents:

IPv6 packet format

ICMPv6 protocol

IPv6 address discovery protocol

IPv6 address



IPv6 searching address model

IPv6 extension header

IPv6 Packet Format In IPv6, the packet header takes 64 bits as the unit and the total length of

the packet header is 40 bytes. The IPv6 protocol defines the following

fields in its packet header:

Version: The length is 4 bits. For IPv6, the field must be 6;

Type: The length is 8 bits, indicating that the packet provides one

―distinguish service‖. At first, RFC 1883 defines the field as 4 bits and

names as ―priority field‖. Later, the name of the field changes to ―Type‖.

The latest IPv6 Internet scheme, it is called ―service flow type‖. The

definition of the field is independent from IPv6 and currently, it is not

defined in any RFC. The default value of the field is all-0.

Flow label: The length is 20 bits, used to identify the packets that belong

to one service flow. One node can serve as the sending source of multiple

service flows. The flow label and source node address uniquely identify

one service flow. At first, RFC 1883 defines the field as 24 bits, but the

after the length of the type field increases to 8 bits, the flow label field is

forced to reduce the length as compensation.

Payload length: The length is 16 bits, including the byte length of the

packet payload, that is, the bytes contained in the packet behind the IPv6

header. It indicates that when calculating the payload length, the length of

the IPv6 extension header is contained.

Next header: The field indicates the protocol type in the header field

following the IPv6 header. Similar to the IPv4 protocol field, the next

header field can be used to indicate that the upper layer is TCP or UDP,

but it can also be used to indicate the existing of the IPv6 extension

header.

Hop threshold: The length is 8 bits. After one node forwards the packet,

the field is reduced by 1. If the field reaches 0, the packet is dropped. In

IPv4, there is the life time field with the similar function, but different from



IPv4, people are unwilling to define one upper threshold about the packet

life time in IPv6, which means that the function of judging the timeout for

the outdated packet can be completed by the high layer protocol.

Source address: The length is 128 bits, indicating the address of the

sender of the IPv6 packet.

Destination address: The length is 128 bits, indicating the address of the

receiver of the IPv6 packet. The address can be one unicast, multicast or

any on-demand address. If the routing extension header is used (define

the special routes that one packet must pass), the destination address can

be the address of one intermediate node, but not the final address.

ICMPv6 Protocol The IP node needs one special protocol to exchange packets, so as to get

to know the information about IP. ICMP is just suitable for the requirement.

When the IPv4 is upgraded to IPv6, ICMP experiences some modification.

The latest ICMPv6 is defined in RFC 2463. The ICMP packet can be used to

report the error and the information status, as well as the Internet

detection (Ping) of the packet and route tracking.

The generation of the ICMP packet is from some errors. For example, if

one gateway device cannot process one IP packet because of some reason,

it may generate one type of ICMP packet and directly return the packet to

the source node of the packet. And then the source node adopts some

methods to correct the reported error status. For example, if the reason

why the gateway device cannot process one IP packet is because the

packet is too long and cannot be sent to the network link, so the gateway

device generates one ICMP error packet to indicate that the packet is too

long. After receiving the packet, the source node can use the packet to

confirm one more suitable packet length and re-send the data via a series

of new IP packets.

RFC 2463 defines the following packet types (excluding the group packets

defined in the document):

1. The destination address is unreachable;

2. The packet is too long;

3. Timeout;



4. The parameter problem;

5. The echo request;

6. The echo response;

The following describes these packets in details.

The destination address is unreachable:

The packet is generated when the gateway device or the source host can

forward one packet because of the reasons except for the blocking of the

service flow. The error packet has four codes, includeing:

0: There is no the route to the destination address. The packet is

generated when the gateway device does not define the destination route

of the IPv6 packet. The error is generated when the gateway device does

not set the default route.

1: The communication with the destination address is prohibited by the

administrator. When one prohibited service flow wants to reach one host

in the firewall, the packet filter firewall generates the packet.

2: The address is unreachable. The code indicates that there are some

problems when the IPv6 destination address is parsed to the link-layer

address or the link layer of the destination network goes to its destination.

3: The port unreachable. This happens when the high-layer protocol (such

as UDP) does not listen to the destination port and the transmission layer

protocol does not have other methods to inform the problem to the source

node.

The packet is too long:

When the gateway device that receives one packet cannot forward the

packet because the packet length is larger than the MTU of the destination

link, generate one packet about the too long packet. The ICMPv6 error

packet has one field to indicate the MTU value of the link that results in

the problem. During the process of discovering the path MTU, this is one

useful error packet.

Timeout

When the gateway device receives one packet with hop threshold 1, it

must reduce the value before forwarding the packet. If after the gateway

device reduces the value, the hop threshold field changes to 0 (or the

gateway device receives the packet with hop threshold field), the gateway

device must drop the packet and send the ICMP timeout packet to the

source node. After the source node receives the packet, it can be regarded



that the original hop threshold is set too small (the actual route of the

packet is larger than the expected) or one routing circulating results in the

failure of the packet delivery. The packet is useful in the ―tracking route‖

function. With the function, one node can identify all gateway devices on

the path of one packet from the source node to the destination node. Its

working mode is as follows: First, the hop threshold of the packet to the

destination is set as 1. The first gateway device that the packet reaches

reduces the hop threshold to 0 and returns one timeout packet. In this

way, the source node identifies the first gateway device on the path. And

then if the packet must pass the second gateway device, the source node

re-sends one packet with hop threshold 2 and the gateway device reduces

the hop threshold to 0 and generates another timeout packet, which ends

when the packet reaches the destination address and meanwhile, the

source node also gets the timeout packet sent from each intermediate

gateway device.

Parameter problem

When some part of the IPv6 header or the extension header has problem,

the gateway device cannot process the packet, but just drops it. The

gateway device should generate one ICMP parameter error packet to

indicate the problem type (such as the error header field, un-identifiable

next header type or un-identifiable IPv6 option) and use one pointer value

to indicate which byte has the error.

ICMPv6 echo function

ICMPv6 contains one function that is not related with the error. All IPv6

nodes need to support two kinds of packets, that is, the echo packet and

echo response. The echo request packet can be sent to any correct IPv6

address and contain one echo request ID, one order number and some

data. The echo request ID and order number are optional, but they can be

used to distinguish the responses of different requests. The data of the

echo request is also one option and can be used for diagnosis. When one

IPv6 node receives one echo request packet, it must return one echo

response packet. The response packet contains the same request ID, order

number and the data carried in the original request packet. The ICMPv6

echo request/response packet pair is the basis of the ping function. Ping is

one important diagnosis function, because it provides one method to

confirm whether one special host is connected to the same network with

other hosts.

IPv6 Address Discovery Protocol The neighbor discovery protocol is one basic part of the IPv6 protocol. It

realizes all functions of the re-direction protocol and gateway device



discovery part in the ARP and ICMP of IPv4, and has the mechanism of

checking the unreachable neighbor. The neighbor discovery protocol

realizes the functions of gateway device and prefix discovery, address

resolution, next-hop address confirming, re-direction, neighbor un-

reachable checking and repeated address checking. The functions of the

link-layer address change, input address balance, any-cast address and

proxy advertising. The neighbor discovery protocol adopts five types of

IPv6 control information packet (ICMPv6) to realize the functions of the

neighbor discovery protocol. The five types of messages are as follows:

1. Router Solicitation: When the interface works, the host sends the

router request message to request the gateway device to generate the

router Advertisement message at once, but do not need to wait for the

next scheduled time;

2. Router Advertisement: The gateway device periodically advertises its

existing and the configured link and network parameters, or the

answers the router request message. The router advertisement

message contains on-link confirming, the configured prefix of the

address and the hop quantity limitation.

3. Neighbor Solicitation: The node sends the neighbor request message to

request the link-layer address of the neighbor, so as to verify the

reachabillity of the neighbor link address saved in the buffer or

whether its own address is unique on the local link;

4. Neighbor Advertisement: It is the response of the neighbor request

message. The node can send the neighbor advertisement actively to

advertise the change of the link-layer address rapidly;

5. Redirect: The gateway device informs the host via the re-direction

message. For the special destination address, if it is not the best route,

inform the host to reach the best next hop of the destination address.

IPv6 has one design requirement. Even in the limited network, the host

must work correctly and it is unnecessary to save the route table on the

gateway device or have fixed configuration. Therefore, the host must

configure automatically and learn the information about how to send the

data to the destination. The memorizer that saves the information is called

cache. The data structure is the queue of a series of records, called entries.

The information of each entry has some validity and you need to clear up

the entries in the cache, so as to ensure the space size of the cache. The

host needs to maintain the following information for each interface:

Neighbor cache: A group of entries about one single neighbor. The

neighbors receive the latest data flow. The entry is the key of connecting

the unicast address and the included information has the link-layer

address, the flag that indicates the neighbor is the gateway device or host,

the pointer that points to any queue of waiting for completing the address

resolving the packet, and so on. The neighbor cache entry also includes

the information used by checking whether the neighbor is unreachable,

such as reachable status, the times of detection without response, and the

next time of checking the neighbor unreachable.



Destination cache: A group of entries about the destination nodes of the

recent received data flow. The destination cache includes ―on-link‖ and

―off-link‖ destination and provides some indirect addressing. The

destination cache can map the destination IP address to the IP address of

the next-hop neighbor. The cache updates the information via the re-

direction message. If the accessory information that does not have direct

relation with the neighbor discovery is saved in the destination cache

entries, such as path MTU(PMTU) and the round time set by the

transmission protocol, the execution becomes more convenient.

Prefix list: The list of a group of the prefixed of the ―on-link‖ addresses.

The entries of the prefix list are generated from the information received

by the router advertisement. Each entry has one related invalid timer

value (depending on the advertisement information), which is used to

abandon the prefix when the prefix becomes invalid. Unless one new

(limited) value is received in the later advertisement, the special

―unlimited‖ timer value rules that the prefix is valid forever. The local link

prefix is in the prefix list with the unlimited invalid timer regardless

whether the gateway device is advertising the prefix. The received router

advertisement should not modify the invalid timer of the local link prefix.

Default router list: The list of the routers that receive packets. The entries

of the router list point to the entries in the neighbor cache. The default

selection algorithm of the gateway device is: Select the known reachable

gateway devices, but do not select the gateway device whose reachability

is not confirmed. Each entry has one related invalid timer value (got from

the router advertisement information), which is used to delete the entries

that are not advertised any more.

The above data structure can be realized by different methods. One

realizing method is to use one single longest matching route table for all

data structures. No matter which specified realizing method is adopted, to

prevent repeated neighbor un-reachability checking, the neighbor cache

entries of the gateway device can be shared by all destination cache

entries that use the gateway device.

The neighbor cache contains the information maintained by the neighbor

un-reachability checking algorithm. The neighbor reachability status is the

most key information, whose value is one of the following five values:

1. INCOMPLETE: Performing the address resolution and the link-layer

address of the neighbor is not confirmed;

2. REACHABLE: The neighbor is in the recent reachable status (before

less than 10s);



3. STALE: The neighbor is un-reachable before the data flow is sent to

the neighbor and you cannot check the reachability;

4. DELAY: The neighbor is not reachable any more and the data flow is

sent to the neighbor recently. Do not detect the neighbor at once, but

send detection information after one short delay, which can provide the

reachability confirming for the upper protocol;

5. PROBE: The neighbor is not reachable any more; meanwhile, send the

unicast neighbor request detection to check the reachability.

The sending algorithm of packets:

When the node sends the packet to the destination, use the destination

cache, prefix list, and default router list to confirm the suitable next-hop IP

address and then the gateway device queries the neighbor cache to

confirm the link-layer address of the neighbor.

The operation of confirming the next hop of the IPv6 unicast address is as

follows:

The sender uses the prefix in the prefix list to perform the longest prefix

matching, so as to confirm the destination is connected or un-connected.

If the next hop is connected, the next-hop address is the same as the

destination address. Otherwise, the sender selects the next hop from the

default router list. If the default router list is null, the sender regards that

the destination is connected.

The information confirmed by the next hop is saved in the destination

cache and the next packet can use the information. When the gateway

device sends packets, first check the destination cache. If the destination

cache does not have the related information, activate the process of

confirming the next hop.

After learning the IPv6 address of the next-hop gateway device, the

sender checks the neighbor cache to confirm the link-layer address. If

there is no existing next-hop IPv6 address entry, the work of the gateway

device is as follows:

Create one new entry and set its status as INCOMPLETE;

Start the address resolution;

Make the transmitted packets in a queue;



When the address resolution ends, get the link-layer address and save it in

the neighbor cache. Here, the entry becomes the new reachable status

and the packets in the queue can be transmitted.

For the multicast packet, the next hop always is regarded as being

connected and confirm that the link-layer address of the multicast IPv6

address depends on the link type.

When the neighbor cache starts to transmit the unicast packet, the sender

checks the related reachability information and validate the neighbor

reachability according to the neighbor un-reachable checking algorithm.

When the neighbor is un-reachable, execute the operation of confirming

the next hop and check whether another path to the destination is

reachable.

If the IP address of the next-hop node is known, the sender checks the

link-layer information about the neighbor in the neighbor cache. If there is

no entry, the sender creates one and sets its status as INCOMPLETE.

Meanwhile, enable the address resolution and make the packets whose

address resolution is not complete in a queue. For the interfaces with the

multicast function, the address resolution process is to send one neighbor

request information and wait for one neighbor advertisement. When

receiving one neighbor advertisement response, the link-layer address is

saved in the neighbor cache and send the packets in the queue.

When transmitting the unicast packets and every time reading the entries

of the neighbor discovery cache, the sender checks the related information

of checking the neighbor un-reachability according to the algorithm of the

neighbor un-reachability checking, but the un-reachability checking makes

the sender send out the unicast neighbor request, so as to check whether

the neighbor is reachable.

When the data flow is sent to the destination for the first time, execute the

operation of confirming the next hop and then if the destination still can

communicate normally, the destination cache entries can continue to be

used. If the neighbor un-reachability algorithm decides to end the

communication on one point, execute the operation of confirming the next

hop again. For example, the traffic of the faulty gateway device should

switch to the gateway device that works normally and the data flow to the

mobile node may be re-routed to ―mobile agent‖.



When the node re-confirms the next hop, do not need to drop the entries

of the whole destination cache. Here, information of the PMTU and round

timer value is useful.

Functions of Neighbor Discovery Protocol 1. Router and prefix discovery

The gateway device must drop the router request and router

advertisement messages that do not meet the validity check

unconditionally.

The router discovery function is used to identify the gateway device that is

connected to the specified link and get the prefix and configured

parameters related with the address auto configuration.

As the response for the request message, the gateway device should

periodically send the multicast router advertisement message to advertise

the reachability of the node on the link. Each host receives the router

advertisement message from the gateway device connected to the link

and sets up the default router list (the gateway device used when the path

to the destination is un-known). If the gateway device frequently

generates the router advertisement messages, the host can learn the

existing of the gateway device within several minutes. Otherwise, use the

neighbor un-reachability check.

The router advertisement message should contain the prefix list that is

used to confirm the connection reachability. The host uses the prefix got

from the router advertisement message to confirm whether the destination

is being connected and whether it is reachable directly or whether it is

non-connected or is reachable only via one gateway device. The

destination is connected, but the destination is not covered by the prefix

learned by the router advertisement message. In this case, the host

regards that the destination is non-connected and the gateway device

sends the re-direction message to the sender.

The router advertisement message should contain some flags, which

advertise the host how to execute the auto configuration of the address.

For example, the gateway device can specify the host to use the status

address configuration or the non-status address configuration.



Besides, the router advertisement should contain the parameters managed

by the simplified network in centralized manner, such as the default value

of the hop limitation parameter used in the packet generated by the host

or the link MTU value.

When the host sends the router request message to the gateway device,

the gateway device should send the router advertisement message at once,

which can speed up the configuration of the node.

2. Address resolution

The IPv6 node resolves the IPv6 address to the link-layer address via the

neighbor request and neighbor advertisement message; do not execute

the address resolution for the multicast address.

The node activates the address resolution process via the multicast

neighbor request message. The neighbor request message is used to

request the target gateway device to return its link-layer address. The

source gateway device contains its link-layer address in the neighbor

request message and multicasts the neighbor request message to the

multicast address of the request node related with the target address. The

target gateway device returns its link-layer address in the unicast neighbor

advertisement message. With the pair of messages, the source and

destination gateway devices can resolve the link-layer address of each

other.

3. Re-direction function

When the packet must be sent to one non-connected destination, the

gateway device that forwards the packet needs to be selected. When the

selected gateway device is not the best next hop as the next hop of

transmitting messages, the gateway device needs to generate the re-

direction message and inform the source node that there is one better

next-hop gateway device to the destination.

The gateway device must confirm the local link address of each neighbor

gateway device, so as to ensure that the target address of the re-direction

message identifies the neighbor gateway device according to the local link

address.

When the source terminal does not answer the re-direction message

correctly or the source terminal ignores the un-authenticated re-direction



message, to save the frequency band and the processing expense, the

gateway device must limit the rate of sending the re-direction message.

When receiving the re-direction message, the gateway device cannot

update the route table.

4. Neighbor un-reachability check

Any communication that passes or reaches the neighbor is interrupted

because of various reasons, including hardware fault and hot swap of the

interface card and so on. If the destination becomes invalid, it is

impossible to recover; if the path becomes invalid, it is possible to recover.

Therefore, the node should actively track the reachable status of the

packet to the neighbor.

All paths between the host and the neighbor node should perform the

neighbor reachability check, including the communication between the

host and the host, between the host and the gateway device, and between

the gateway device and the host. It can also be used between the gateway

devices to check the neighbor or the fault of the neighbor forward path.

If the gateway device receives the confirming recently that the IP layer of

the neighbor has received the packet sent to it recently, the neighbor is

reachable. The un-reachability checking of the neighbor uses two methods

to confirm: One is the prompt from the upper protocol, providing the ―the

connection is being processed‖ confirming; the other is that the gateway

device sends the unicast neighbor request message and receives the

responded neighbor advertisement message. To reduce the unnecessary

network traffic, the detection message is only sent to the neighbor.

The neighbor un-reachability checking and sending packet to the neighbor

are performed at the same time. When confirming the neighbor

reachability, the gateway device continues to send packets to the cache

link-layer address; if no packet is sent to the neighbor, do not send the

detection.

After IETF made the standard text RFC2461 of the neighbor discovery

protocol in Dec. 1998, the neighbor discovery becomes the important

protocol used by the IPv6 node, solving the interoperation problem

between all nodes connected on one link.



The current IPv6 standard are already stable and the related products and

devices developed by the international manufacturers also become mature,

but the requirement of China market for IPv6 technology is not clear.

Therefore, the IPv6 technology is still at the practice and operation phrase

of the trial network in China. With the speedup of the commercial process

of the IPv6 network application, the neighbor discovery protocol is used

more widely.

IPv6 Address The most obvious difference between IPv4 and IPv6 addresses is the

length. The length of the IPv4 address is 32 bits and the length of the IPv6

address is 128 bits. The RFC 2373 not only explains the expressing modes

of the addresses, but also describes the different address types and the

structures. The IPv4 address can be divided to 2-3 different parts (network

ID, node ID, and subnet ID). The IPv6 address has larger address space

and supports more fields.

The IPv6 address has three types, including the unicast, multicast and

any-cast address. The unicast address and multicast address are similar to

the IPv4 address. IPv6 does not support the broadcast address in IPv4 any

more, but adds one any-cast address.

Address expressing mode:

The length of the IPv6 address is four times of the IPv4 address, so the

complexity of expressing IPv6 address is four times of IPv4 address. The

basic expression mode of IPv6 address is X:X:X:X:X:X:X:X. Here, X is one

4-bit hexadecimal integer (16 bits). Each number contains 4 bits, each

integer contains 4 numbers, each address includes 8 integers and there

are 128 bits totally (4×4×8 = 128). For example, the following are some

valid IPv6 addresses:

CDCD:910A:2222:5498:8475:1111:3900:2020

1030:0:0:0:C9B4:FF12:48AA:1A2B

2000:0:0:0:0:0:0:1

These integers are hexadecimal integers. A-F mean 10-15. Each integer in

the address must be expressed, but the start 0 is unnecessary to be

expressed. This is one standard IPv6 address expression mode. Besides,

there are another two common modes. Some IPv6 address may contain a

long list of 0 (just like the previous example 2 and 3). In this case, the

standard permits using ―space‖ to express the long list of 0. That is to say,

the address 2000:0:0:0:0:0:0:1 can be expressed as 2000::1.



The two colons mean that the address can be expanded to one complete

128-bit address. In this method, only when 16-bit group is all 0, it can be

replaced by two colons, which can appear for only one time in the address.

In the mixed environment of IPv4 and IPv6, there may be three methods.

The lowest 32 bits in the IPv6 address can be used to express IPv4

address. The address can be expressed by one mixed mode, that is,

X:X:X:X:X:X:d.d.d.d. Here, X means one 16-bit integer, while d means

one 8-bit decimal integer. For example, the address 0:0:0:0:0:0:10.0.0.1

is one valid IPv4 address. Combine two possible expression modes and the

address can also be expressed as ::10.0.0.1.

The IPv6 address is divided to two parts (subnet prefix and interface ID),

so people hope that one IP node address can be expressed as one address

with the additional value by the mode of similar to CIDR address,

indicating how many bits in the address are the mask. The IPv6 node

address indicates the prefix length, which is separated from the IPv6

address by slash, such as 1030:0:0:0:C9B4:FF12:48AA:1A2B/60. In the

address, the prefix length used for routing is 60 bits.

IPv6 Addressing Model Each unicast address identifies one separate network interface. The IP

address is specified to the network interface, but not node, so the node

with multiple network interfaces can have multiple IPv6 addresses. Here,

any one IPv6 address can represent the node. One network interface can

be associated with multiple unicast addresses, but one unicast address can

only be associated with one network interface. Each network interface

must have at least one unicast address. There is one important declaration

and one important exception. The declaration is related with the using of

the point-to-point link. In IPv4, all network interfaces, including the point-

to-point link connecting one node and the gateway device, need one

private IP address. Many organizations start to use the point-to-point link

to connect the branches and each link needs its own subnet, which

consumes much address space. In IPv6, if any point of the point-to-point

link does not need to accept or send data from the non-neighbor node,

they do not need special addresses. That is to say, if two nodes mainly

transmit the service flow, they do not need have the IPv6 address.

The requirement of distributing one unique unicast address for each

network interface blocks the expansion of the IPv4 address. One server

that provides the common services may break down when there are lots of

demands. Therefore, the IPv6 address model puts forward one important

exception: If the hardware can share the network load on multiple



network interfaces correctly, multiple network interfaces can share one

IPv6 address so that the server can be expanded to the server group with

load sharing, but do not need to upgrade the hardware when the demands

of the server increase.

IPv6 Address Type The IP address has three types, including unicast, multicast and any-cast.

The broadcast address is not valid any more. RFC2373 defines three types

of IPv6 address types:

1. Unicast: The ID of one single interface. The packet sent to one unicast

address is transmitted to the interface with the address ID.

2. Any-cast: The ID of a group of interfaces (belong to different nodes).

The packet sent to one any-cast address is transmitted to one of the

interfaces with the address ID (select the nearest one according to the

calculation method of the routing protocol for the distance).

3. Multicast: The ID of one group of interfaces (belong to different nodes).

The packet sent to one multicast address is transmitted to all

interfaces with the address ID.

Unicast The unicast address identifies one separate IPv6 interface. One node can

have multiple IPv6 network interfaces. Each interface must have one

related unicast address. The unicast address can be regarded to contain a

segment of information. The segment of information is contained in the

128-bit field. The address can define one special interface. Besides, the

data in the address can be explained as multiple small segments of

information. Anyway, when all information is placed together, one 128-bit

address that identifies one node interface is formed.

The IPv6 address can provide some information about its structure for the

node, which depends on who views the address and what to view. For

example, the node may only need to know that the whole 128-bit address

is one unique ID, but does not need to know whether the node exists in

the network. On the other hand, the gateway can use the address to

decide that one part of the address identifies one special network or one

unique node on the subnet.

For example, one IPv6 unicast address can be regarded as one entity with

two fields. One field is used to identify the network and the other is used



to identify the interface of the node on the network. The network ID can

be divided to several parts, identifying different network parts. The IPv6

unicast address function can be limited by CIDR like IPv4 address, that is

to say, divide the address on one special edge to two parts to two parts.

The high-bit part of the address contains the prefix used by routing, while

the low-bit part of the address contains the network interface ID.

The simplest method is to make the IPv6 address as one 128-bit data that

is not distinguished, but from the formatting view, it can be divided to two

segments, that is, interface ID and subnet prefix. The length of the

interface ID depends on the length of the subnet prefix. The lengths of the

interface ID and subnet prefix are variable. For the gateway device that is

near to the addressing node interface (far from the backbone network),

you can use fewer bits to identify the interface; but for the gateway device

that is near to the backbone network, just need a few address bits to

specify the subnet prefix. In this way, most part of the address is used to

identify the interface ID.

The IPv6 unicast address includes the following types:

Aggregatable global address;

Un-specified address or all-0 address;

IPv6 address with IPv4 address;

The supplier address based on the supplier and physical location;

OSI network service access point (NSAP) address;

Internet packet switch (IP X) address;

Unicast address format:

RFC 2373 changes and simplifies the IPv6 address distribution. One is to

cancel the address distribution based on the physical location and the

unicast address based on the supplier changes to the aggregatable global

unicast address. Seeing from the name change, for the address based on

supplier, permit the previous defined aggregation and the new aggregation

based on the exchange office. This reflects one more balanced address

classification. The NSAP and IPX address space is still reserved and 1/8 of

the addresses are distributed to the aggregatable addresses. Besides,

except for the multicast address and one type of reserved address, the

remaining part of the IPv6 address space is the un-distributed address,

reserving the enough space for the future development.

1. Interface ID



In the IPv6 addressing structure, any IPv6 unicast address needs one

interface ID. The interface is like the MAC address. The MAC address is

burned into the NIC by the manufacturer. The MAC address is unique

globally and it is impossible that two NICs have the same MAC address.

The address can be used to identify the interface on the network link layer.

The interface ID of the IPv6 host address is based on IEEE EUI-64 format

The format is based on the existing MAC address to create 64-bit interface

ID, which is unique globally and at the local. The appendix of RFC 2373

explains how to create the interface ID.

The 64-bit interface ID can uniquely identify each network interface, which

means that there can be 642 different physical interfaces in theory and

there are about 1.8×1910 different addresses, which only uses a half of

the IPv6 address space.

2. Aggregatable global unicast address

The aggregatable global unicast address is another type of aggregation

and it is independent from ISP. The aggregatable address based on

supplier varies with the supplier, while the address based on exchange

office is located by IPv6 switching entity. The exchange office provides the

address block, while the user and supplier assign the contract for the

network access. This kind of network access is directly provided by the

supplier or exchange office, but the routing is done by exchange office. As

a result, when the user changes the supplier, do not need to re-organize

the address. Meanwhile, permit the user to use multiple ISPs to process

the single network address. The aggregatable global unicast address

includes all addresses whose start three bits are 001 (the format can be

used for the current un-distributed unicast prefix).

The aggregatable global unicast address includes the following fields:

FP field: It is the format prefix in the IPv6 address. The length is three bits,

used to identify to which kind the address belongs in the IPv6 address

space. Currently, the field is 001, indicating that it is the aggregatable

global unicast address.

The TLA ID field: The top-level aggregation ID, containing the highest

level address routing information. It is the maximum routing information

in the network interconnection. Currently, the field is 13 bits and can get

the maximum 8192 different top-level routes.

RES field: The field is 8 bits, reserved for the future use. At last, it may be

used to extend the top-level or the next-level aggregation ID field.



NLA ID field: It is the next-level aggregation ID and the length is 24 bits.

The ID is used by some organizations to control the top-level aggregation,

so as to assign the address space. That is to say, the organizations

(maybe including the large ISP and other organizations that provide the

public access) can use the 24-bit fields in fragments according to their own

addressing level structure. In this way, one entity can be divided to four

top-level routes inside the entity by two bits and the remaining 22-bit

address space is distributed to other entities (such as the small local ISP).

If the entities get the enough address space, the address space distributed

to them can be re-divided by the same method.

SLA ID field: It is the station-level aggregation ID, used by some

organizations to arrange the inner network structure. Each organization

can use the same method as IPv4 to create its own inner hierarchical

network structure. If 16-bit field is all used as the plane address space,

there can be 65 535 different subnets at most. If the front 8 bits are used

for the advanced routing of the organization, permit 255 advanced

subnets and each advanced subnet can have 255 sub subnets at most.

Interface ID field: It has 64 bits, including the 64-bit value of the IEEE

EUI-64 interface ID.

3. Special address and reserved address

The first 8-bit of all addresses in the first 1/256 IPv6 address space: 0000

0000 is reserved. The most empty address space is used as the special

address. The special addresses include:

Un-specified address: This is one all-0 address. When there is no valid

address, adopt the address. For example, when one host is enabled from

the network for the first time and does not get one IPv6 address, the

address can be used, that is, when the configuration information request is

sent out, fill the address in the source address of the IPv6 packet. The

address can be expressed as 0:0:0:0:0:0:0:0 or ::.

Loopback address: In IPv4, the return address is defined as 127.0.0.1 Any

packet that sends the return address must pass the protocol stack to each

the network interface, but is not sent to the network link. The network

interface must accept the packet, just like receiving the packet from the

outer node and returning it to the protocol stack. The return function is

used to test the software and configuration. Except for the lowest bit, the

IPv6 return address is all-0, that is, the return address can be expressed

as 0:0:0:0:0:0:0:1 or ::1.



The IPv6 address with IPv4 address: There are two kinds of addresses.

One permits the IPv6 node access, but does not support the IPv4 node of

IPv6; the other permits the IPv6 gateway device to use the tunnel mode

to transmit the IPv6 packet on the IPv4 network.

4. IPv6 address with IPv4 address

No matter whether people are willing, it is final to transit to the IPv6,

which means that IPv4 node and IPv6 node must find the coexisting

method. The most obvious difference of the two different IP versions is

address. At first, it is defined by RFC 1884 and is brought into RFC 2373.

IPv6 provides two kinds of special addresses with IPv4 address. The high

80 bits of the two kinds of addresses are all 0 and low 32 bits contain the

IPv4 address. When the middle 16 bits are set as FFFF, it indicates that

the address is the IPv6 address reflected by IPv4.

The IPv4 compatible address is used by the node to transmit the IPv6

packet via the IPv4 gateway device in the tunnel mode. The nodes

understand IPv4 and IPv6. The IPv4 reflection address is used by the IPv6

node to access the node that only supports IPv4.

5. Link local and station local address

For the organizations that are unwilling to apply for the global unique IPv4

network address, adopt the 10 model address to translate the IPv4

network address and provide one option for the organizations. The

gateway device used by the organizations should not forward the

addresses, but cannot block forwarding the addresses or distinguish the

addresses or other valid IPv4 addresses. You can configure the gateway

device to forward the addresses.

To realize the function, IPv6 extracts two different address segments from

the global unique Internet space. The link local address is used to number

the host on the single network link. The address identified by the front 10

bits of the prefix is the link local address.

The gateway devices do not process the packets with the link local address

at their source and destination ends, because they do not forward the

packets forever. The middle 54 bits of the address are set as 0. The 64-bit

interface ID also uses the IEEE structure and the part of the address space

permits some network to connect up to (642 - 1) hosts.



Multicast Like the broadcast address, the multicast address is useful in the local

network similar to the old Ethernet. In the network, all nodes can detect

all data transmitted on the line. When each transmission starts, each node

checks the destination MAC address of the packet. If consistent with the

interface MAC address of the local node, the node accepts the packet. If it

is broadcast, the node only needs to listen, but does not need to make any

decision, so it is simple. For multicast, it is a little more complicated. The

node needs to reserve one multicast address. When it is found that the

destination address is the multicast address, you need to confirm whether

it is the multicast address reserved by the node.

The IP multicast is more complicated. One important reason is that IP

broadcast does not place the service flow on the Internet to be forwarded

to all nodes without differentiation. This is the success of IP. To receive the

IP broadcast packets, all broadcast packets are sent to the devices in the

network, which brings in lots of network cost. This is why the gateway

device should not forward the broadcast packets. However, for the

multicast, as long as the gateway device reserves the multicast address on

behalf of other node, it can forward selectively. When the node reserves

the multicast address, it declares becoming one member of multicast. And

then any local gateway device reserves the multicast address on behalf of

the node. When other nodes on the same network send information to the

multicast address, the IP multicast packet is encapsulated in the link-layer

multicast data transmission unit. On Ethernet, the encapsulated unit

points to the Ethernet multicast address. On other networks that use the

point-to-point circuit to transmit (such as ATM), send the packet to the

subscriber via other mechanism. Usually send the packet to each

subscriber via some type of server. The multicast that is not from the local

network is processed via the same method, just being transmitted to the

gateway device, which forwards the packet to the subscribing node.

1. Multicast address format

The format of the IPv6 multicast address is different from that of the IPv6

unicast address. The multicast address can only serve as the destination

address and no packet takes the multicast address as the source address.

The first byte in the address format is all 1, indicating that it is the

multicast address. The other part of the multicast address except for the

first byte includes the following three fields:

Flag field: It comprises four single bit flags. Currently, only the fourth bit

is specified and the bit is used to indicate that the address is the familiar

multicast address specified by the Internet coding organization or the

temporary multicast address used by special occasion If the flag bit is 0, it



indicates that the address is the familiar address; if the flag bit is 1, it

indicates that the address is the temporary address. The other three flag

bits are reserved for future use.

Range field: The length is 4 bits, indicating the multicast range, that is,

the multicast group includes only the nodes in one local network, one

station and one organization, or still includes the nodes at any location of

the IPv6 global address space. The possible values of the four bits are:

Group ID field: The length is 112 bits, used to identify the multicast group.

One multicast ID can show different groups according to the multicast

address is temporary or familiar and the address range. The permanent

multicast address uses the specified group ID with the special meaning,

The members in the group relies on the group ID and the range.

All IPv6 multicast addresses begin with FF. The first 8 bits of the address

are all 1. Currently, the remaining bits of the flag are not defined, so if the

third hexadecimal number of the address is 0, it indicates the familiar

address; if the third hexadecimal number of the address is 1, it indicates

the temporary address. The fourth hexadecimal number means the range,

which can be un-distributed value or reserved value.

2. Multicast group

IPv4 already has the multicast application, because the application sends

the same data to multiple nodes. Use the distributed multicast addresses

and multicast ranges to combine, showing various meanings and being

used on other applications. Some previous registered multicast address

includes the gateway devices in groups, DHCP service, audio and video

service, and the network game service. For details, refer to RFC 2375.

Any-cast The multicast address can be shared by multiple nodes on some meaning.

All nodes of the multicast address member hope to receive all packets sent

to the address. One gateway device connected to five different local

Ethernet networks forwards the copy of one multicast packet to each

network (suppose that at least one on each network reserves the multicast

address). The any-cast is similar to the multicast address. Multicast nodes

share one any-cast address. The difference is that only one node hopes to

receive the packets to the any-cast address. Any-cast is useful for

providing some type of services, especially for some services that do not

need to have specified relation between the client and server, such as the

domain name server and the time server. The name server is just one

name server and it should work the same regardless of the distance.

Similarly, one near time server is more advisable. Therefore, when one

host sends out request to the any-cast address to get information, the

nearest server to the any-cast address should respond.



1. Distribution and format of any-cast address

The any-cast address is distributed to the outside of the normal IPv6

unicast address space. The any-cast address can be distinguished from the

unicast address in the format, so each member of one any-cast address

must be configured explicitly, so as to identify the any-cast address.

2. Any-cast routing

To get to know how to confirm the route for one unicast packet, you must

extract the lowest public routing naming character from a group of hosts

of one specified unicast address, that is, they are sure to have some public

network address number and the prefix defines the area of all any-cast

nodes. For example, one ISP can require each of its users to provide one

time server and the time servers share one any-cast address. The prefix

defining the any-cast area is distributed to ISP for re-distribution. The

routing in the area is defined by the distribution of the hosts that share

the any-cast address. In the area, one any-cast address is sure to carry

one routing option. The routing option includes some pointer, pointing to

the network interface of all nodes that share the any-cast address. In

previous case, the area is limited in the limited range. The any-cast hosts

may disperse on the global Internet. In this case, the any-cast address

must be added to all route tables in the world.

IPv6 Extension Header

Extension Header It is the simplified IPv6 header. It is adopted by the most network service

flows that work in the non-option mode. Meanwhile, it improves the

processing capability of the network for the packets that need the option.

The new IPv6 extension header includes:

Hop-by-Hop Options Header: The extension header must follow the IPv6

header. It contains the optional data that each node on the path passed by

the packet must check. Up to now, only one option is defined, that is,

jumbo payload option. The option indicates that the payload length of the

packet exceeds 16-bit payload length field of IPv6. As long as the payload

(including the hop-by-hop options header) of the packet exceeds 65535

bytes, the packet must contain the option. If the node cannot forward the

packet, it must return one ICMPv6 error packet.



Routing header: The extension header indicates the special nodes passed

by the packet to the destination. It contains the address list of the nodes

passed by the packet. The original destination address of the IPv6 header

is not the final destination address of the packet, but it is the first address

listed in the routing header. After the node of the address receives the

packet, process the IPv6 header and routing header, and then send the

packet to the second address in the routing header list until the packet

reaches the final destination.

Fragmentation header: The extension header contains one fragment offset

value, one ―more fragments‖ flag and one ID field, used by the source

node to fragment the packet whose length exceeds the path MTU between

the source and destination.

Destination Options Header: The extension header contains the options

that can only be processed by the final destination node. Currently, only

the fill option is defined. The header is filled as 64-bit boundary for future

use.

Authentication Header (AH): The extension header provides one

mechanism of performing the encrypted authentication and calculation for

some parts of the IPv6 header, extension header and payload.

Encapsulation Security Payload (ESP) header: This is the final extension

header, which is not encrypted. It indicates that the remaining payload is

encrypted and provides the enough de-encryption information for the

authorized destination node.

Usage of Extension Header Incorporating the IPv4 options to the standard Ipv4 header is complicated.

The shortest IPv4 header is 20 bytes and the longest is 60 bytes. The

additional data contains Ipv4 option and must be translated by the

gateway device to process the IP packet. The method has two influences.

One is that the gateway device performs the flow processing for the

packets of the additional options, which results in the reducing of the

processing efficiency; the other is that because the options result in the

reducing of the performance, the application developer are inclined to not

use the option.

Using the IPv6 extension header can realize the option on the premise of

not affecting the performance. The developer can use the option if



necessary, but does not need to care that the gateway device treats the

packets with extension options distinctively unless the routing extension

header or hop-by-hop option is set. Even the two options are set, the

gateway device still can perform the necessary processing, easier than

using the IPv4 option.

Extension Header ID All IPv6 headers are the same long and look nearly the same The unique

difference is the next header field. In the IPv6 packet without extension

header, the value of the field means the upper protocol. That is to say, if

there is the TCP field in the IP packet, the 8-bit binary value of the next

header field is 6 (from RFC 1700); if there is UDP packet in the IP packet,

the value is 17. The next header field value indicates that whether there is

the next extension header and what is the next extension header.

Therefore, the IPv6 headers can be linked, beginning from the basic IPv6

header to link the extension headers one by one.

Extension Header Order One IPv6 packet can have multiple extension headers, but only one case

permits the one type of extension headers appears in one packet for many

times and the extension headers have one preferred order when being

connected. RFC 1883 defines that the extension headers should comply

with the following order:

1. IPv6 header

2. Hop-by-hop options header

3. Destination option header (applied in the first destination of the IPv6

destination address field and the additional destination listed in the

routing header)

4. Routing header

5. Fragmentation header:

6. Authentication Header (AH)

7. ESP header

8. Destination options header (when the routing header is used, it is only

applied in the final destination of the packet)

9. Upper header

From the previous order, we can see that only the destination extension

header can appear for many times in one IP packet when the packet

contains the routing extension header. The previous order is not absolute.



For example, when the remaining part of the packet needs to be encrypted,

the ESP header must be the last extension header. Similarly, the hop-by-

hop option has higher priority than all other extension headers, because

each node that receives the IPv6 packet must process the option.

Set up new options:

The extension header must be confirmed via the next header field of the

IPv6 header, which means that the field is 8 bits and there can be only

256 different values at most. Even the number of the possible values of

the field is reduced, all possible values of the upper header also must be

supported. That is, the value identifies not only the extension header, but

also all other protocols encapsulated in the IP packet. Therefore, many

values are assigned and the un-assigned values are limited.

Some protocol IDs of the extension header in IPv6 is from IPv4, such as

ID authentication header and ESP header. Up to now, many extension

headers are assigned, but it is also permitted to set up new options via the

hop-to-hop options extension header and destination option extension

header. Besides saving protocol values for the next header field, it is easy

to realize new options by using the option header extension. If using one

new header type to send IP packet and the destination node supports the

new header type, everything goes well. Contrarily, if the new header type

is unknown for the destination node, the destination node has to drop the

packet. On the other hand, all IPv6 nodes must support hop-by-hop

options extension header, destination option extension header and some

basic options (refer to the next section). Here, if the destination node

receives the packet with the destination option extension header, even

does not support the option in the extension header, it still can respond

The option also can request the destination node to return one ICMP error

packet, indicating that the destination node does not understand the

option.

Option extension header

The hop-by-hop extension header and the destination option extension

header can contain specified options. RFC 1883 defines two filling options,

used to ensure that the extension header field complies with the boundary

requirement. That is, if the option uses three 8-bit field followed by one

32-bit field, fill in additional 8 bits to ensure that the 32-bit field in the

option is not taken apart when exceeding one 32-bit field boundary. If no

need to fill in, just define one function option, that is, the jumbo payload

option used in the hop-by-hop options header.

All option extension headers (hop-by-hop options extension header and

destination option extension header) have similar frame format The

extension headers only have two pre-defined fields, that is, the next



header field and header extension length field All IPv6 headers contain the

next header field. The header extension length field occupies 8 bits,

indicating the length of the option header. The length takes 8 bytes as the

unit, excluding the first 8 bytes of the extension header, that is, if the

option extension header only has 8 bytes, the field value is 0. The filed

limits the extension header to 2048 bytes at most. The remaining part of

the extension header is the options contained by the extension header.

Options The IPv6 option contains the following three fields:

Option type: The field is the 8-bit ID, indicating the type of the option.

Even the destination node cannot identify the option, the front 3-bit code

can also translate the option type.

Option data length: The field is 8-bit integer, indicating the length of the

option data field. The maximum value of the field is 255.

Option data: The field contains the specified data of the option and the

maximum length is 255 bytes. The front two bits of the option type field

indicates that the destination node should take actions when the specified

options cannot identified. There are the following four option types:

00: Ignore the option and complete the processing for the remaining part

of the extension header;

01: Drop the whole packet;

10: Drop the packet; no matter whether the destination address of the

packet is multicast address, send one ICMP packet to the source address

of the packet;

11: Drop the packet; if the destination address of the packet is unicast

address or any on-demand address (that is non-multicast address), send

one ICMP packet to the source address of the packet.

The third bit of the option type indicates whether the value of the option

data can change when the packet is transmitted from the source address

to the destination address. If it is 0, the option data cannot change; if it is

1, the option data is variable. Both hop-by-hop options extension header

and destination option extension header contain the same options, that is,

two filling options (filling option 1 and filling option N). The filling option 1

is special; it has only 8 bits, which are all set as 0; there is no option data

length or other option data.



The filling option N is identified by one of the previous four option types. It

uses multiple bytes to fill in the extension header. If the extension header

needs N bytes to fill in, the value of the option data length field is N-2,

that is, the option data field occupies N-2 bytes, which are all set as 0.

Plus the one byte of option type field and one byte of the option data

length field, totally N bytes are filled.

Hop-by-hop Extension Header Each node on the route from the source node to the destination node (that

is, each gateway device that forwards the packet) checks the information

in the option hop by hop. Up to now, only one hop-by-hop option (that is

jumbo payload option) is defined.

The same as other option extension headers, the front two fields indicate

the length of the next header protocol and extension header (here,

because the whole option has only 8 bits, the field value of the extension

header length is 0). The jumbo payload option starts from the third byte of

the extension header. The third byte is the extension header type and the

value is 194. The fourth byte (that is the value of the jumbo payload

option data length) is 4. The last field of the option is the jumbo payload

length, indicating the actual bytes contained in the IP packets (including

the hop-by-hop option extension header, but excluding the IPv6 header).

The node can use the jumbo payload option to send the jumbo IP packet

only when each gateway device on the way can process. Therefore, the

option is used in the hop-by-hop extension header and it is required that

each gateway device on the way must check the information. The jumbo

payload option permits the IPv6 packet payload length to exceed 655535

bytes , exceeding the 4 billion bytes. If the option is used, it is required

that the 16-bit payload length field value of the IPv6 header must be 0

and the jumbo payload length field value in the extension header is no less

than 65535. If the two conditions are not satisfied, the node that receives

the packet should send the ICMP error packet to the source node,

informing the problem. Besides, there is another limitation: If there is

fragmentation extension header in the packet, the jumbo payload option

cannot be used at the same time, because the packet cannot be

fragmented when the jumbo payload option is used.

Routing Extension Header The routing header replaces the source routing realized in IPv4. The

source routing permits the user to specify the path of the packet, that is,

the gateway devices on the way to the destination. In the IPv4 source

routing, use the IPv4 option and there is some limitation for the number of

the medium gateway devices specified by the user. The IPv4 header with

extension has 40 additional bytes and up to 10 32-bit addresses can be



filled. Besides, each gateway device on the path must process the whole

address list no matter whether the gateway device is in the list, so the

processing for the source route packet is slow. IPv6 defines one common

routing extension header, which has two fields, that is, routing type field

and remaining fragment field. The two fields occupy one byte respectively.

The routing type field indicates the type of the used routing header, while

the remaining fragment field indicates the number of the additional

gateway devices listed by the remaining part of the extension header. The

gateway devices must be passed by the packet to the destination. The

remaining part of the extension header is the specified data of the type,

which is related with the routing header type. RFC 1883 defines one type,

that is, type 0 routing header.

Type 0 routing extension header solves the main problem of the IPv4

source routing. Only the gateway devices in the list process the routing

header, and the other gateway devices do not need to process. Up to 256

gateway devices can be specified in the list. The operation process for the

routing header is as follows:

The source node constructs the list of the gateway devices that must be

passed by the packet and construct type 0 routing header. The header

contains the list of the gateway devices, the final destination node address

and the remaining fragments. The remaining fragments (8-bit integer)

indicates the number of the gateway devices that must be passed by the

packet to the destination node.

When the source node sends the packet, set the destination address of the

IPv6 header as the address of the first gateway device in the routing

header list.

The packet is forwarded till reaching the first station of the path, that is,

the destination address of the IPv6 header (the first gateway device in the

routing header list). Only the gateway device checks the routing header

and the medium gateway devices on the path ignore the routing header.

At the first station and all later stations, the gateway device checks the

routing header to ensure that the remaining fragments are consistent with

the address list. If the value of the remaining fragments is 0, it indicates

that the gateway device node is the final destination of the packet and the

node continues to process the other part of the packet.

Suppose that the node is not the final destination of the packet. The node

gets its own address out from the destination address field of the IPv6



header and replaces it with the address of the next node in the routing

header list. Meanwhile, the node reduces the value of the remaining

fragment field by 1 and then sends the packet to the next station. The

other nodes in the list repeat the process until the packet reaches the final

destination.

Fragment Extension Header IPv6 only permits the source node to fragment the packet, which simplifies

the medium processing for the packet. However, in IPv4, the medium

node can fragment the packet that exceeds the length permitted by the

local link. The processing mode requires that the gateway device must

complete the additional work and the packet may be fragmented for many

times during transmission. When the packet to be sent by one node is too

large for one single data transmission unit of the local link, the packet

needs to be fragmented. For example, MTU that Ethernet permits to

transmit is 1500 bytes; to send one 4000-byte IP packet, if the packet is

not fragmented to three parts and each part is smaller than 1500 bytes,

the packet cannot be transmitted in the Ethernet link. Later, some links

may have smaller MTU, such as 576 bytes and the gateway device on such

kind of link must re-fragment the fragmented 1500-byte IP packet to

smaller fragment.

Because of the fragmenting in IPv4, the medium node and destination

node must add the necessary cost for processing the fragmentation. With

the path MTU discovery mechanism, the source node can confirm the

maximum length of the packet that can be transmitted in the link between

the source node and the destination node, so as to avoid the fragment

processing of the medium gateway device. RFC 1883 sets the minimum

MTU as 576 bytes, but in the document that is to replace RFC 1883, the

required minimum MTU is 1280 bytes and it is suggested that the link is

configured to transmit 1500-byte packet at least

The previous description shows that the source node can transmit up to

1280-byte packet without considering the packet fragmentation. Maybe

the 1500-byte packet is not fragmented, but the IPv6 suggests that all

nodes execute the path MTU discovery mechanism and only permit the

fragmentation of the source node. That is to say, before sending any

packet, check the path from the source node to the destination node and

calculate the sent maximum packet without fragmentation. To send the

packet whose length exceeds the maximum value, the source node must

fragment the packet. In IPv6, the fragmentation only happens to the

source node and use the fragmentation header to express.

Next header field: The 8-bit field is common for all IPv6 headers.



Reserved: Currently, the 8-bit field is not used and is set as 0.

Fragment offset field: It is similar to the IPv4 fragment offset field. The

filed has 13 bits and takes the 8 bytes as unit, indicating the location

relation between the first byte of the data in the packet (fragment) and

the first byte of the fragmentable data in the original packet. That is to say,

if the value is 175, it indicates that the data in the fragment starts from

the 1400th byte of the original packet.

Reserved field: Currently, the 2-bit field is not used and is set as 0.

M flag: It indicates whether there is follow-up field. If it is 1, it indicates

that there is follow-up field; if it is 0, it indicates that this is the last

fragment.

ID field: The field is similar to the IPv4 ID field, but it is 32 bits, while in

IPv4, it is 16 bits. The source node distributes one 32-bit ID for each

fragmented IPv6 packet, used to identify the packet that is sent from the

source address to the destination address recently (in the life time of the

packet). Only part of the IPv6 packet can be fragmented. The

fragmentable part includes payload and extension header that can be

processed only when reaching the final destination. For the IPv6 header

and the extension header that must be processed by the gateway device

when sending to the destination node, such as the routing header or hop-

by-hop options header, do not permit fragmenting.

Destination Extension Header Similar to the hop-by-hop options header, the destination options header

provides one mechanism of delivering the optional information with the

IPv6 packet. The remaining extension header options, such as

fragmentation header, ID authentication header and ESP header, are

defined because of some specified reason, but the destination options

extension header is the new option defined for the destination node. The

destination option uses the previous described format of constructing the

option.



GRE Technology

This chapter describes the principle and implementation of GRE protocol.

Main contents:

Terms

Introduction to the Protocol

Typical Application

Terms VPN: Virtual Private Network Through VPN technology, two or multiple

network sites can be connected through the Internet. In the VPN, the

running mode is like that all sites are in a single private network

GRE: Generic Routing Encapsulation

Tunnel: Through a tunnel, a kind of protocol packets is encapsulated into

another type of protocol. As a result, the protocol packets can pass

through another protocol network.

Introduction to the Protocol Main contents:

The location of GRE in the TCP/IP protocol stack

Structure of the GRE packet

Work flow of the GRE

Advantage and disadvantage of GRE

The GRE technology is used to create a tunnel between the source end

and the destination end. The packets that will pass the tunnel are

encapsulated with a new packet header (GRE packet header). Then, the

packets with tunnel destination address are put into the tunnel. When the

packets reach the destination of the tunnel, the GRE header is stripped.



Then, use the destination address of the original packets to perform

addressing operation. The GRE tunnel is usually point-to-point. The GRE

also provides the capability of sorting packets. The GRE tunnel may cause

performance problem for extra encapsulation/de-capsulation process is

required.

Location of GRE in the TCP/IP Protocol Stack

The GRE packets are transmitted after being added with IP header.

Therefore, the GRE is over the IP layer. The protocol ID in the IP header is

47.

Structure of the GRE Packet The packets passing the GRE tunnel are composed of three parts.

Payload packet: the network layer packets (such as IP packets) before

entering the tunnel, serves as the payload of the tunnel packets. The

packet protocol is called GRE tunnel passenger protocol.

GRE header: it is added after the payload packet enters the tunnel;

includes the GRE protocol and passenger protocol-related information.

Delivery header: encapsulated packet header (such as IP header) of

external protocol, namely, the protocol header of the tunnel-resident

network. It is a tool for a protocol packet to pass through another protocol

network.

The structure of header is as follows:



A simplest GRE header contains four bytes, namely, when the C, K, and S

flag bit are 0, the GRE header only contains the information of bit 0 to 31.

Checksum flag bit

Bit 0 is the flag bit of the checksum. Only when the flag bit of checksum is

set to 1, the checksum field is valid.

Key flag bit

Bit 2 is the key flag bit. Only when the key flag bit is set to 1, the key field

is valid.

Sequence number flag bit

Bit 3 is the sequence number flag bit. Only when it is set to 1, the

sequence number filed is valid.

Reserved 0 and Ver field

Not used, they must be cleared.

Protocol type filed

The protocol type field marks the type value of the payload packet.

Generally, the values of the protocol field and the Ethernet frame type are

the same. For example, the protocol type of IP packets equal 0800.

Checksum field

The checksum field carries the checksum of the GRE headers. The

checksum must cover the GRE headers and the payload packets.

Key field

The key field carries the keys of the tunnel. The same key must be

configured at two ends of the tunnel (or do not configure keys at two ends)

for a connected tunnel.

Sequence field

The sequence field carries the sequence number of the packets. If the

sequence flag bit is set, the packets passing the tunnel will carry sequence

numbers. The sequence number starts from 0. 1 is added when one

packet is sent. After the opposite end receives the packet, it will record the

sequence number of the received packet. If invalid packet is received, the

opposite end will discard the packets.

Whether the checksum, sequence, and key fields should be enabled is

controlled by the tunnel checksum, tunnel sequence-datagrams, and

tunnel key commands.

An example is given to describe the structure of the GRE packets:



The shadowed part is the new IP header; the part in the pane is the GRE

header; the rest is the real IP packet, serving as the data.

45 00 05 f4 8f e3 00 00 7f 2f fd 85 c0 a8 01 02 c0 a8 01 01 00 00 08 00

45 00 05 dc 72 3f

05f4 indicates the total length (1524) of the new IP packets.

2f indicates the type of the protocol contained in the IP packet: GRE (47).

c0a80102 c0a80101 indicates the source address and destination address

(source and destination address of the tunnel) of new IP packets.

0000 0800 indicates the GRE header: all the flag bits are 0, which

indicates that the GRE packet does not contain checksum, key, and

sequence number; the passenger protocol is IP.

Work Flow of the GRE The packets of the GRE tunnel are encapsulated at the source end of the

tunnel and de-capsulated at the destination end of the tunnel. The

forwarding between the source and destination is regarded as common

packets.

Packet receiving: If the destination of the packets is the router, send the

packets to the upper-level protocol for processing; if the protocol is GRE

(47), search the corresponding tunnel interface. Then, process the GRE

headers. Perform a series of test and then strip the external IP headers.

Modify the recvif field of the mbuf to the local tunnel interface. At last,

send the packets to the IP input queue.

Packet sending: If the packets are sent to the tunnel interface, add GRE

headers according to the interface configuration. Add the IP headers of the

source address and destination address specified by the tunnel; route

according to the destination address of the tunnel to send the packets to

the actual physical interface.



Take the preceding figure as an example to describe the work principle of

GRE.

Create a GRE tunnel (Tunnel1) between switch2 and switch4. The

trackbacks at the two ends of Tunnel1 are respectively 12.1.1.1 and

21.1.1.1. Configure static route in switch2. The 31.0.0.0 network is

reachable through tunnel1.

Send a packet from switch1 to destination address 31.1.1.1. Route the

packets from port 11.1.1.1. In this case, the source address and the

destination address of the IP packets are 11.1.1.1 and 31.1.1.1.

After the packets reach switch2, the switch2 routes the packets. Owing to

the existence of static route, switch 2 is determined to forward packets

from the tunnel. The packets are encapsulated.

Encapsulat ion In this case, the packets to be forwarded are the payload packets (IP

packets in this case). The tunnel adds a GRE header to the header. The

protocol type of the GRE header is set to 0800 (IP protocol type). Then,

add an IP header (delivery header) to the GRE header. The protocol value

of the IP header is set to 47 (GRE protocol ID). The destination address of

the IP header is set to the destination 21.1.1.1 of Tunnel1. Set the source



address of the IP header to 12.1.1.1. Then, perform routing according to

21.1.1.1. As a result, the packets are sent from interface 12.1.1.1.

After the encapsulation is complete, the packets are sent from interface

12.1.1.1.

Forwarding After switch3 receives packets, it sends the packets to the IP layer for

routing. In this case, the IP header analyzed by switch3 is Delivery header

(the payload packet is encapsulated and switch3 cannot reach the IP

header of payload packet). Therefore, perform route forwarding according

to the destination address 21.1.1.1 of the delivery header.

The process lasts until the packets reach the destination switch4 of the

tunnel.

De-capsulat ion After switch4 receives the packets, it also analyzes the Delivery header. If

the destination address 21.1.1.1 is its own address, it checks the protocol

field of the IP packets. Since the protocol field is 47, the IP packets should

be processed by the GRE tunnel. The tunnel first removes the Delivery

header, and then checks the protocol type of the GRE header. Protocol

type is 0800, therefore, the tunnel sends the payload packet to the IP

layer for processing to implement de-capsulation.

Switch4 performs routes according to the destination address 31.1.1.1 of

the payload packets. The packets are sent through the interface 21.1.1.1

and reach the actual destination switch5.

Advantage and Disadvantage of GRE The configuration of the GRE tunnel is simple. The tunnel can be created in

multiple physical lines (PPP, and Frame Relay). It isolates the host network

environment and the VPN route environment.

The disadvantage of the GRE is the high management cost and the scale

of the tunnel is large. The GRE is manually configured. Therefore, the cost

for configuring and maintaining tunnels is relevant with the number of the

tunnels. When the terminal of the tunnel changes, the tunnel should be re-

configured.



Typical Application The GRE tunnel technology can meet the requirements of Extranet VPN

and Intranet VPN.



Transition Technology

Main contents:

Introduction to the transition technology

Tunnel technology

Introduction to the Transition Technology With the rapid development of the Internet, the existing IPv4 addresses

are in short supply. The technology of using temporary IPv4 address or

Network Address Translation (NAT) relives the condition of lacking IPv4

addresses. At the same time, the technology increases the overhead of

address resolution and processing, which causes the failure of high-layer

applications. But, the problem that the IPv4 addresses will be used up is

not solved. The IPv6 protocol adopting 128-bit IP address solves the

problem of insufficient IP v4 addresses. At the same time, the address

capacity, security, network management, mobility, and QoS are

significantly improved. IPv6 is the core standard of the next generation

internet protocol. IPv6 is not compatible with IPv4. But it is compatible

with all other protocols in the TCP/IP protocol suite, namely, IPv6 can

completely replace IPv4.

The conversion from IPv4 network to the IPv6 network cannot be

completed immediately. It is inevitable that two types of networks co-exist

in certain time. Therefore, at the designing phase of the IPv6 protocol, the

transition and effective seamless interconnection of IPv4/IPv6 are taken

into consideration. Multiple transition technologies and interconnection

solution have emerged. Different technology has different features to solve

the communication problems in different transition periods and

environments. In these technologies, the basic technologies for solving the

transition problem include: dual protocol stack, tunnel, and NAT-PT.



Tunnel Technology

The tunnel technology provides a method using the existing IPv4 route

architecture to transfer the IPv6 data: regard the IPv6 packets as

structureless and meaningless data, encapsulate into IPv4 packets and to

transfer through IPv4 network. According to the creation mode, the tunnel

technology includes manually configured tunnel and automatically

configured tunnel. The tunnel technology uses the existing IPv4 network.

It provides a communication method between IPv6 nodes during the

transition, but it cannot solve the interconnection problem between IPv6

node and IPv4 node.

In the tunnel, the following functions are widely used: manually configured

tunnel, automatically configured tunnel, 6to4, 6over4, and ISATAP.

1. Manually configured tunnel

The tunnel is manually configured. The terminal address of the tunnel is

determined by the configuration. You do not need to assign special IPv6

address for nodes. This is applicable to the IPv6 nodes frequently

communicated. The encapsulation nodes of each tunnel must save the

address of the tunnel terminal. When the IPv6 packets are transmitted

over the tunnel, the terminal address will be encapsulated as the

destination address of IPv4 packets. The encapsulation node determines

whether forwarding the packets through the tunnel according to the route

information. The interconnected nodes adopting the manually-configured

tunnel mode must have available IPv4 connection, and must have as lease

one unique IPv4 address. Each node should support IPv6 and the router

should the dual-protocol stack. If the tunnel passes NAT facilities, the

mechanism fails.

Typical application of the manually configured tunnel:

The manually configured tunnel is applicable to the network with small

topology change. It configures the transition from IPv4 to IPv6. For the



detailed configuration of the manually configured tunnel, refer to the

Configuration of Transition Technology.

2. 6to4 tunnel

The 6to4 requires adopting the special IPv6 address (IPv4ADDR::/48)

derived from the IPv4 address of automatic sub-node. Therefore, the node

adopting the 6to4 mechanism must have at least one unique IPv4 address.

The IPv4 address of the tunnel terminal can be retrieved from the IPv6

address. Therefore, the tunnel is automatically created. The mechanism is

applicable to the interconnection of the nodes running IPv6. The 6to4

mechanism requires that the router in the tunnel terminal should support

dual-protocol stack and 6to4. In addition, the host must support IPv6

protocol stack. In the 6to4 mechanism, between the IPv6 node adopting

6to4 and the pure IPv6 node, run BGP4+ relay router (6to4 relay router)

to intercommunicate. This mechanism regards the WAN IPv4 network as a

unicast point-to-point link layer. It is applicable to the preliminary stage of

the co-existent IPv4/IPv6 to serve as the transition tool.

The typical application of 6to4 is illustrated as follows:

For the configuration of 6to4 tunnel, refer to the Configuration Manual of

Transition Technology.



SLA Technology

This chapter describes the SLA theory and how to realize it.

Main contents:

SLA terms

Introduction to SLA

Debug commands and debug information

Introduction to SLA

SLA Terms SLA: Service Level Agreements; sending the packets of the specified

protocol to detect and monitor the network communication;

RTR: Response Time Reporter; SLA calculates and outputs the report

according to the packet transmission, so it is also called RTR (Response

Time Reporter);

RTR ENTITY: RTR entity is one common concept; different application

detection corresponds with the specified RTR entity. Currently, the RTR

entities include MACSLA, ICMPECHO, JITTER, UDPECHO, ICMP-PATH-ECHO,

ICMP-PATH-JITTER, and FLOW-STATISTICS.

ICMPECHO: It is the RTR entity that sends the ICMP PING packet to

detect the network communication. With the detection, output the packet

round delay, packet loss and so on.

JITTER: It is the RTR entity that simulates the VoIP coder/decoder to

send the analog VoIP packets regularly, so as to detect the quality of the

network transmitting the VoIP packets; with the detection, output the

round delay, uni-directional delay, jitter, MOS value of the packet and so

on.



ICMP-PATH-ECHO: It is the RTR entity that sends the ICMP PING

packets regularly to detect the network communication. With the detection,

output the round delay and packet loss of the packet from the source to

the destination.

ICMP-PATH-JITTER: It is the RTR entity that sends the ICMP PING

packet regularly to detect the network communication. With the detection,

output the round delay, packet loss and jitter of the packet from the

source to the destination.

FLOW-STATISTICS: It is the RTR entity that detects the traffic of one

interface regularly. With the detection, record the peak value of the

interface traffic and detect the history.

UDPECHO: It is the RTR entity that sends the UDP packets regularly to

detect the communication of the UDP packet in the network. With the

detection, output the round delay and packet loss (not the connection

packet, but the data detection packet) of the packet.

RTR GROUP: The RTR group is the set of one or multiple RTR entities.

The RTR group comprises the single RTR entity and the group cannot

become the member of the group. One RTR entity can belong to multiple

RTR groups, but one RTR entity can only belong to one group for one time.

RTR SCHEDULE: It schedules one RTR entity or RTR group to detect the

network communication.

VOIP JITTER: It is used to indicate the change of the transmission delay

of the VoIP packet.

CODEC: It is used for the coding and decoding of the VoIP signals.

MOS: It is used to indicate the index of the transmission quality of the

VoIP packets.

ICPIF (Impairment Calculated Planning Impairment Factor): It indicates

the loss of the VoIP packets during transmission.

PCM: Pulse Code Modulation.

Introduction to SLA There are many factors that affect the normal running of the network,

such as the complexity of the network environment, the configuration

mistake of the administrator, the failure of the network device and even

irresistible factors. Therefore, Detecting and recording the detection result

regularly for the network communication in the networking and network

running is important for solving the problems when the network fails. As

for this, SLA is developed, a the network detection and monitoring tool.

The basic theory is to use the different kinds of RTR entities to represent

different kinds of network detections and initiate the schedule for the



entities to reach the detection purpose. Meanwhile, with the rich schedule

policies, SLA can track and monitor the network communication in detail.

RTR Entity RTR entity is one common concept, not related with the specified type of

RTR entity. Currently, the RTR entity types of the system include the

MACSLA entity used to detect the L2 connectivity, the ICMPECHO entity,

the ICMP-PATH-ECHO entity, the ICMP-PATH-JITTER entity, and the

UDPECHO entity used to detect the network communication, the JITTER

entity used to detect the transmission of the VoIP packets in the network,

and the FLOW-STATISTICS entity used to detect the interface traffic.

The detected history record mode is saved at the local, which is

convenient for the network administrator to view information and fix faults.

ICMPECHO Ent i ty The ICMPECHO entity is used to detect the basic communication of the

network. It sends the ICMP PING packets to one destination address in the

network, so as to detect the transmission delay and packet loss of the

packet from the source to the destination.

The common network devices all support PING, so the entity can take

effect in detecting the basic communication of the network. With the rich

schedule policies and log recording function, the network administrator can

get to know the network communication and history information, as well

as reducing the work of inputting the common PING commands.

ICMP-PATH-ECHO Ent i ty ICMP-PATH-ECHO entity is used to detect the basic communication of the

network. It sends the ICMP PING packets to one destination address in the

network regularly, so as to get the packet transmission delay and packet

loss from the detection end to the destination end, and get the delay and

packet loss between the detection end and the medium devices from the

detection end to the destination end.



schedule policies and history recording function, the network administrator



can get to know the network communication (for example, which network

device has serious delay on the path) and history information.

ICMP-PATH-JITTER Ent i ty The ICMP-PATH-JITTER entity is used to detect the basic communication of

the network. It sends the ICMP PING packets to one destination address in

the network regularly, so as to get the packet transmission delay, jitter

and packet loss from the detection end to the destination end, and the get

the delay, jitter, and packet loss between the detection end and the

medium devices from the detection end to the destination end.



schedule policies and history recording function, the network administrator

can get to know the network communication (for example, which network

device has serious delay on the path) and history information.

JITTER Ent i ty Introduction to VoIP and the related communication detection standards

VoIP is short for Voice over IP. It mainly converts the voice or fax to data

and then share one IP network (Internet) with the data for transmission.

The cost for transmitting the voice and fax on Internet is low, so the

technology is widely applied. The voice is transmitted on the IP network.

Compared with the traditional telephone, it adopts the voice coding mode

to digitalize the analog voice, pack it, and then adopt the Best-Effort IP

transmission mechanism to transmit it to the receiving end via the IP

network. After collecting the packets, the receiving end decodes the voice

to get the analog voice. From the transmission of the voice on the IP

network, we can see that the packet delay and packet loss caused by the

network transmission quality, the cost for the converting between the

analog voice and the data caused by the codec, the

compression/decompression cost, echo cost, process delay and so on

become the factors that affect the Internet VoIP transmission quality. This

shows that the transmission of the voice on the IP network needs to

consider many factors that are different from the traditional telephone

network and traditional data network and the factors limits the VoIP

quality.

Therefore, the related standards are needed to measure the VoIP

transmission quality. The VoIP quality is apperceived by the receiver, so

ITU-TP.800 defines the subjective measuring method for VoIP quality MOS

(Mean Opinion Score). Based on the subjective evaluation, the actions of

listening and apperceiving the VoIP quality are searched and quantized.

Which level of VoIP quality gets how much MOS depends on the reflection



of the human. The corresponding relation of the VoIP quality and MOS is

that the network configuration, standard and monitoring provide the

foundation.

MOS is divided to five levels (1-5) according to VoIP transmission quality.

Level 5 indicates the best VoIP quality and level 1 indicates the poorest. In

this way, the VoIP quality standards are quantized. Usually, the MOS of

more than 3.6 is regarded as good VoIP quality. It is hard to apply the

MOS scoring method in practice (because it is hard to get many persons

together to evaluate the VoIP quality), so many other methods are

generated. However, any measuring method needs to be converted to

MOS to measure the VoIP quality at last.

Another well-known standard is called ICPIF (Calculated Planning

Impairment Factor). ICPIF is to quantize the main impairment of the VoIP

quality. The ICPIF value is the sum of the impairment factors (total

impairment or Itot) minus the expected impairment factor of the user

(also called access advantage factor, indicating the degradation of the

tolerable VoIP quality because of the network access). The formula is:

Icpif = Io + Iq + Idte + Idd + Ie – A

Note

Here, Io indicates the impairment caused by non-optimal loudness rating;

Iq indicates the impairment caused by PCM quantization distortion; Idte

indicates the impairment caused by the telephone echo; Idd indicates the

impairment caused by the uni-directional transmission time (uni-

directional); Ie indicates the impairment caused by the device factor, such

as codec type and packet loss. A indicates the access advantage factor,

also called user Expectation Factor.

The value range of ICPIF is 5-55. If the ICPIF value is small than or equal

to 5, it is called low impairment and the VoIP quality is best, but if the

ICPIF value is no less than 55, it is called high impairment and the VoIP

quality is called high impairment and the VoIP quality is poorest. The ICPIF

value lower than 20 is regarded as acceptable. (Since 2001, ICPIF is not

recommended by ITU-T, and E-MODEL replaces it. But currently, we also

measure the communication quality according to ICPIF)

As mentioned previously, any measuring standard needs to correspond to

MOS at last, including ICPIF. The relation of ICPIF and MOS is as follows:

ICPIF range MOS score

0 - 3 5

4 - 13 4



14 - 23 3

24 - 33 2

34 - 43 1

Currently, in the VoIP network transmission, the common VoIP codec

includes:

G.711 A Law (adopting g711alaw: 64 kbps PCM compression method)

G.711 mu Law (adopting g711ulaw: 64 kbps PCM compression method)

G.729A (adopting g729a: 8 kbps CS-ACELP compression method)

The main transmission parameters are as follows:

Codec Default packet length

Default interval between packets

Default packet quantity

Default sending frequency

G.711 mu-Law (g711ulaw) 160 + 12 RTP bytes

20 ms 1000 Once every 1 minute

G.711 A-Law (g711alaw) 160 + 12 RTP bytes

20 ms 1000 Once every 1 minute

G.729A (g729a) 20 + 12 RTP bytes 20 ms 1000 Once every 1 minute

Test procedure of JITTER entity

In the IP network, it is hard to measure the MOS value actually (because

the related VoIP network devices are needed), so the MOS value is

estimated according to the analog VoIP codec and the transmission status

of the VoIP packet in the network (the packet sending speed, interval,

packet size and so on). The JITTER entity is the RTR entity that is

developed based on the previous theory to measure the transmission

quality of the VoIP packet in the IP network.

The JITTER entity can simulate three kinds of codec or customized codec

to send the UDP packets with the corresponding rate, interval and size,

and measure the round-trip time, uni-directional packet loss and uni-

directional delay. Based on the statistics information, calculate the ICPIF

value and estimate the MOS value according the ICPIF value at last.

Use the JITTER entity to test the network transmitting VoIP packets.

Consider two factors for calculating ICPIF, that is, the uni-directional delay

of the packet and the packet loss. Therefore, the formula for calculating



ICPIF Icpif = Io + Iq + Idte + Idd + Ie – A can be simplified. Suppose that

Io, Iq and Idte are 0 and then Icpif ＝ Idd＋Ie－A. That is to say, the ICPIF

value can be the delay impairment factor of the packet plus the device

impairment factor of the lost packet minus the expected factor.

Idd is called uni-directional delay impairment factor, which is related with

the uni-directional transmission delay and some constant values (defined

by ITU), such as codec delay, and look ahead DSP delay. The relation of

Idd and uni-directional delay is as follows:

Uni-directional delay (ms) Idd

150 or less 0

200 3

250 10

300 15

350 20

400 25

500 30

600 35

800 or greater 40

Ie is called device impairment factor, which is related with the packet loss.

Ie can be got according to the percentage of the packet loss. The relation

is as follows:

Packet loss percentage PCM (G.711) Ie CS-ACELP (G.729A) Ie

0% 0 10

2% 12 20

4% 22 30

6% 28 38

8% 32 42

The expected factor is used to indicate the conflict balance of the user

access and VoIP quality. For example, compare the countryside where the

signal is difficult to receive with the plain where the signal is good. The

VoIP quality of the wireless telephone of the former is sure to be lower

than the expected VoIP value of the cable phone of the latter. Currently,

the relation of the common user access mode and the expected factor is

as follows:

Communication service type Max. expected factor

General cable communication link 0



Mobile communication in the net link of one building 5

Mobile communication of one area or the communication in high-speed movement

10

The area where the signal is difficult to receive (for example, reflect via the satellite for many times)

20

These values are just the recommended upper threshold. In

implementation, we can also set the value as 0 by default.

With the uni-directional delay impairment factor (Idd), the impairment

factor (Ie), and the expected factor (A), we can calculate the ICPIF value

according to the formula. As mentioned before, any voice measuring

method need to correspond to the MOS value. Therefore, after ICPIF is

calculated, it also needs to be converted to the corresponding MOS. The

relation of the ICPIF value and the MOS value is as follows:

ICPIF range MOS Quality type

0 - 3 5 Best

4 - 13 4 High

14 - 23 3 Medium

24 - 33 2 Low

34 - 43 1 Poor

The measured MOS value is just one suggestion for the network to

transmit the VoIP packets, but there may be some difference with the

actual measured MOS.

During the JITTER measuring process, we use the UDP packets (because

the VoIP packets are encapsulated in the UDP packet) to simulate the

transmission of the VoIP packets and calculate the ICPIF value and MOS

value according to the transmission status, so as to detect the quality of

the network transmitting the VoIP packets. The size of the sent UDP

packet, the number of the sent UDP packets, and the interval of sending

the UDP packets depend on the type of the codec to be simulated.

Meanwhile, the user also can customize the codec to configure the

parameters.

To reach more exact measuring and be compatible with Cisco, you need to

configure the RTR Responder at the destination end of the measurement.

Responder is used to set up the connection with the source end and

respond to the detection packet sent by the source end, so as to make the

measurement result more exact. To use the JITTER entity detection,

Responder must be configured at the destination end.



The source end and Responder end adopt the one inner protocol realized

by Cisco—SAA control protocol for the connection and communication

detection. The protocol is encapsulated in the UDP packet, belonging to

the application layer protocol.

The SAA control protocol is one private protocol of Cisco and the main

packet formats include the SAA connection request packet and response

packet and SAA packet.

When using the JITTER entity detection, the SLA source end first initiates

the SAA connection request packet according to the specified parameters

and sends it to the destination monitoring port 1967. The SAA connection

request packet is as follows:

Note

Here, the version field indicates the version of the SAA control protocol

and currently, it is 1. Id indicates the ID of initiating the SAA connection

request, used to identify one connection; the frame length indicates the

length of the SAA connection request packet, it is 52 bytes when the life

time field is 2 bytes and it is 56 bytes when the life time field is 6 bytes;

4-byte reserving area is all-0; the 2-byte command type indicates the

connection property and 0004 is the JITTER detection connection;

currently, the 6-byte reserving area is the unknown area and usually, it is

000100000000. The followed are the 4-byte destination IP address and 2-

byte destination port number, indicating the destination IP and port

number of the JITTER connection. The 2-byte or 6-byte life time field

indicates the life time of the connection from being set up to being

disconnected, taking ms as the unit, and it is equal to the number of the

packets sent for one time × the interval of sending packets + the packet

timeout. The last are the packet end flag field and usually, it is 0001001c,

and the all-0 filling field.

When RESPONDER receives the request packet, send the SAA connection

response packet after processing. If setting up the connection succeeds,

the detection starts. Otherwise, cut off the connection. The format of the

SAA connection response packet is as follows:



Note

Here, the version field indicates the version of the SAA control protocol

and currently, it is 1. Id indicates the ID of initiating the SAA connection

request, used to identify one connection; the packet length indicates the

length o the SAA connection response packet and it is 8 bytes; 2-byte

response code is 0x0000 for success and 0x0002 for failure; the last is the

2-byte reserving area and it is all 0.

After receiving the response packet from the RESPONDER end, the source

end processes it. If the response packet indicates failure, cut off the

connection; if the response indicates success, start to fill in the SAA packet

and send it to the RESPONDER end for detection. After receiving the

packet, the RESPONDER end processes it, filling in the desired contents

and sending to the source end, that is, completing the packet detection.

The format of the JITTER packet is as follows:

Note

Here, 2-byte packet ID is 0x0002; delta indicates the processing delay of

the RTR responder from receiving the packet to sending the packet, filled

by RTR responder; 4-byte send time indicates the time of sending the

packet, filled by the request end; 4-byte receive time is filled by

RESPONDER, indicating the time of receiving the packet; 2-byte sending

serial No. is filled by the sending end, indicating the serial No. of sending

the packet; the last is the 2-byte receiving serial No. filled by RESPONDER,

indicating the serial No. of receiving the packet.

The detection procedure of the SLA JITTER entity is as follows:



Calculate the detection result of JITTER entity

For the JITTER entity, the results that need to be saved include the packet

round-trip delay, jitter, uni-directional delay (need to synchronize the

clocks of the source and destination ends), and packet loss. The ICPIF

value and MOS value can be calculated according to the previous

parameters.

After setting up the connection, the source end sends the UDP detection

packets to the destination port according to the options negotiated by the

SAA control protocol. Before sending the packet, fill the sending time (ST1)

into the packet and fill in the sending serial No. (QS1), while the

destination end fills the receiving time (RT1) and the receiving serial No.

(QR1) into the packet and fills the delay caused by the processing time

(DT1) into the packet before sending. In this way, if the sending end

receives the packet within the timeout, record the receiving time (AT1).

Record the ST2, QS2, RT2, QR2, DT2, and AT2 of the second packet, as

follows:

The round-trip delay of the packet:

RTT＝RT1-ST1+AT1-RT1-DT1＝AT1-ST1-DT1。

The packet jitter:

JITTERSD＝(RT2-RT1)-(ST2-ST1)＝i2－i1，

JITTERDS＝（AT2-AT1）-((RT2+DT2)-(RT1+DT1)) =i3－i2。



Here, i1 is the sending interval of the second packet and the first packet;

i2 is the receiving interval of the first packet and the second packet; i3 is

the interval of receiving the response packets of the first packet and the

second packet.

Meanwhile, if the clocks of the source end and destination end are

adjusted to be consistent, the uni-directional delay is:

DelaySD=RT1-ST1;

DelayDS=AT1-RT1-DT1;

The calculation of the lost packets is performed according to the sending

and receiving serial No. of the SAA packets filled by the source end and

RESPONDER end. If it found that the sending serial No. is inconsistent with

the receiving serial No. after the source end receives the filled detection

packet from the destination end or there is no response packet within the

timeout, it indicates that there is lost packet in the network.

The severity of the packet loss is directly reflected by the device

impairment factor (Ie) in the ICPIF factors, so you need to measure and

calculate the lost packets when the detection ends, so as to calculate the

ICPIF and MOS value.

The calculation of the lost packets is as follows:



And then, you can calculate ICPIF: Icpif ＝ Idd＋Ie－A according to the

uni-directional delay and lost packets. After calculating ICPIF, you can get

the MOS value according to the converting relation of ICPIF and MOS, so

as to get the standard of measuring the network transmitting the VoIP

packets.

UDPECHO Enti ty The UDPECHO entity is to detect the UDP packets transmitted in the IP

network. The destination address and port of the sent packet need to be

specified in the entity. You can monitor the transmission of the UDP

packets in the IP network via the scheduling for the entity.

The UDPECHO entity can record the round-trip delay and packet loss of

the UDP packets in the IP network via the valid monitoring, even can

record the monitored history information via the logs for the network

administrator to get to know the network communication and fix faults.

The request and response packet of the SAA UDPECHO entity is the same

as the SAA JITTER entity. The packets of the UDPECHO entity are different

from the packets of the JITTER entity. The packet format is as follows:

2 bytes 2 bytes Optional



Packet ID DT Part1 Part2

Note

Usually, the 2-byte packet ID is 00 01, used to identify the data frame

between the sender and responder, but not the request or response

packet; the 2-byte DT field is 00 00 for sender and 00 02 or 0001 for

responder; Part1 and Part2 are optional and the filled contents of them are

related with the rtr attribute data-pattern and packet size. The filling

format is: part 1 gets all even bits of data-pattern. If the value of the even

bit is smaller than or equal to f, complement 0 in the front; if the value of

the even bit is larger than f, fill in ff. For part 2, first n = data-pattern

length/2 and then get the value of the first to nth (ASC code) from data-

pattern. By default, the SAA packet length is 16 bytes; if the current filled

length does not reach 16 bytes, the latter vacancy is filled by the ASC

code.

FLOW-STATISTICS Ent i ty The FLOW-STATISTICS entity is to detect the interface traffic. One FLOW-

STATISTICS entity corresponds to one interface. With the scheduling for

the entity, the traffic on the interface can be monitored.

The interval for the FLOW-STATISTICS entity to monitor the interface

traffic is 10s-10min. With the valid monitoring, you can record the traffic

peak value information on the interface, even can record the monitored

history information via the logs for the network administrator to get to

know the network communication and fix faults. Therefore, FLOW-

STATISTICS traffic statistics entity is useful.

MAC SLA Ent i ty The MAC SLA entity is to detect the traffic of the Ethernet link. Currently,

MAC SLA is based on the Delay Measure function of the CFM protocol to

realize, so if you want to configure and run the MAC SLA entity, you need

to configure the CFM domain, service instance and MEP, while the MAC

SLA entity is performed between the specified CFM domain, service

instance and MEP.

Currently, MAC SLA supports the function of detecting the four quality

parameters, including uni-directional delay, bi-directional delay, jitter and

delay of the Ethernet link traffic. When the quality parameter exceeds the

threshold, output the corresponding log information.



The detection function of MAC SLA entity is widely used in Ethernet and

can reflect the network quality.

RTR Group One RTR group is the set of one or multiple RTR entities. One RTR member

can belong to multiple RTR groups. One group cannot become the member

of the group any more. One group can contain one member for only one

time. The RTR group is identified by the group ID and the group name is

automatically generated by the system.

The RTR group is to schedule one RTR set. The scheduling for the RTR

group is equivalent to schedule all existing RTR entities in the RTR group

and the detection result is saved in the RTR entity history record.

RTR Schedule If only the RTR entity or RTR group is configured, the detection cannot be

performed. The detection can be completed only after initiating the

scheduling. The RTR schedule is the policy of performing the scheduling

and detection for the RTR entity or group.

The RTR scheduling can take one single entity member or one RTR group

as the object, but cannot take one group and one entity as the object at

the same time. The RTR schedule is identified by the schedule ID and is

not related with the RTR entity type, but the scheduling interval must

consider the attributes of the RTR entity or the members in the RTR group

to be scheduled.

The RTR schedule provides rich schedule policies. You can choose to start

scheduling at once or after some time, even can set the absolute time of

starting the schedule. Besides, the schedule can die out after the set

schedule times or exist forever.



Debug Commands and Debug Information After configuring the entity detection of the SLA module, you can use the

SLA display and debug commands to view the detection procedure and

result.

SLA display command

show rtr entity [entityId]

show rtr group [groupId]

show rtr schedule [scheduleId]

show rtr history [entityId]

show rtr entity Displayed Information Explanation

26-8#show rtr entity

There are 6 valid entities now in the system

--------------------------------------------------------------

ID:1 name:IcmpEcho1 Created:TRUE

****************type:ICMPECHO****************

CreatedTime:THU JAN 01 05:15:38 2009

LatestModifiedTime:THU JAN 01 05:21:43 2009

Times-of-schedule:0

TargetIp:1.1.1.2

Transmit-packets:2

Totally-send-packets:0

Packet-size:80

Timeout:5(s)

Alarm-type:LOG

Threshold-of-rtt:5 (direction be)

Threshold-of-packet-loss:200000000 (direction be)

Number-of-history-kept:200

Periods:1

In-scheduling:FALSE

Schedule frequency:23(s)

Status:DEFAULT

There are 6 valid rtr entities in the system.

Rtr Id:1 is one icmpecho entity. The time of creating the entity and the last

modifying time; schedule for 0 times; the detected destination address is 1.1.1.2; send two packets for each schedule; the packet size is 80 bytes; the timeout is 5s; the alarm mode is SHELL, none indicates no alarm, log indicates the shell prompt, log-andtrap indicates the shell prompt and sending the trap information to inform the NMS, and trap indicates only sending trap to inform the NMS; the round-trip delay threshold is 5ms; when the round-trip delay of the detection is no less than the threshold, provide the alarm by alarm-type; the threshold of the packet loss is 200000000, be means alarming when no less than the threshold, se means alarming when smaller than or equal to the threshold, and alarm by alarm-type; 200 history records can be saved



--------------------------------------------------------------

ID:2 name:IcmpPathEcho2 Created:TRUE

****************type:ICMPPATHECHO****************


LatestModifiedTime: THU JAN 01 05:36:34 2009

Times-of-schedule:0

Transmit-packets:1 (each hop)

Request-data-size:32

Timeout:5000(ms)

Frequency:60(s)

TargetOnly:TRUE

Verify-data:FALSE

Alarm-type:LOG


Threshold-of-pktloss:1 (direction be)


Periods:1

In-scheduling:FALSE

Status:DEFAULT

--------------------------------------------------------------

--------------------------------------------------------------

ID:3 name:IcmpPathJitter3 Created:TRUE

****************type:ICMPPATHJITTER****************



Times-of-schedule:0

Transmit-packets:10 (each hop)

Packets-interval:20(ms)

at most and the new records cover the old records when exceeding 200; save the history record when scheduling for one time; currently, it is not scheduled; the schedule frequency is 23s; the link status is DEFAULT; if the destination is reachable, the link status is REACHABLE.

Rtr id 2 is the ICMP-PATH-ECHO entity; the time of creating the entity is THU JAN 01 05:15:45 2009; the last modifying time is THU JAN 01 05:36:34 2009; the entity is scheduled for 0 times, that is, not start to schedule; only send one ICMP packet to the destination end and the medium devices during each schedule; the valid payload is s32 bytes; the timeout is 5000ms; the schedule frequency is 60s; just detect

the network of the destination end and to detect the network of the medium device, set as FALSE; do not check the data; the alarm mode LOG is SHELL prompt, none means no alarm, log means the shell prompt, log-andtrap means the shell prompt and sending the trap information to inform the NMS, and trap means just sending trap to inform the NMS; the threshold of the packet loss is 1 and can only be set as 1, be means

alarming when no less than the threshold, se means alarming when smaller than or equal to the threshold, and alarm by alarm-type; save 100 history records and the new records cover the old records when exceeding 100; save the history record during each detection; not in the debug state; the link status is DEFAULT; if the destination is reachable, the status is



Request-data-size:32

Timeout:5000(ms)

Frequency:60(s)

TargetOnly:FALSE

Verify-data:FALSE

Alarm-type:LOG


Threshold-of-pktLoss:200000000 (direction be)

Threshold-of-jitter:5 (direction be)


Periods:3

In-scheduling:FALSE

Status:DEFAULT

--------------------------------------------------------------

--------------------------------------------------------------

ID:4 name:Jitter4 Created:TRUE

****************type:JITTER****************



Times-of-schedule:0

Entry-state:Pend

TargetIp:1.1.1.2 targetPort:3434

Codec:G.729A Packet-size:32 Packet-number:1000

Packet-transmit-interval:20(ms)

frequency:60(s)

TimeOut:5000(ms)

Alarm-type:LOG-AND-TRAP

Threshold-of-dsDelay:8(direction be)

Threshold-of-dsJitter:8(direction be)

Threshold-of-dsPktLoss:3(direction be)

Threshold-f-sdDelay:8(direction be)

Threshold-of-sdJitter:8(direction be)

Threshold-of-sdPktLoss:2(direction be)

Threshold-of-rtt:6(direction be)

Threshold-of-mos:10000000 (direction be)

Threshold-of-icpif: 100000000 (direction se)


Periods:1

Status:DEFAULT

--------------------------------------------------------------

REACHABLE.

Rtr id 3 is the ICMP-PATH-JITTER entity; the time of creating the entity is THU JAN 01 05:15:50 2009; the last modifying time is THU JAN 01

05:48:03 2009; the entity is scheduled for 0 times, that is, not start to schedule; only send 10 ICMP packet to the destination end and the medium devices during each schedule; the valid payload is s32 bytes; the timeout is 5000ms; the schedule frequency is 60s; just detect the network of the destination end and between the source and the medium devices; do not check the data; the alarm mode LOG is SHELL prompt, none means no alarm, log

means the shell prompt, log-andtrap means the shell prompt and sending the trap information to inform the NMS, and trap means just sending trap to inform the NMS; the threshold of the round-trip delay is 6ms and provide the alarm by alarm-type when the round-trip delay of the actual detection is no less than the threshold; the threshold of the packet loss is 200000000; be means alarming when no less than the threshold, se means alarming when smaller than

or equal to the threshold, and alarm by alarm-type; the jitter threshold is 5ms; save 100 history records and the new records cover the old records when exceeding 100; save the history record every detecting for three times; not in the debug state; the link status is DEFAULT; if the destination is reachable, the status is REACHABLE.



--------------------------------------------------------------

ID:5 name:UdpEcho5 Created:TRUE

****************type:UDPECHO****************



Times-of-schedule:0

Entry-state:Pend

TargetIp:1.1.1.2 TargetPort:1234

TimeOut:5000(ms)

request-data-size:16

Frequecy:6(s)

Alarm-type:none


Threshold-of-pktloss:1 (direction be)

Data-pattern:abcd


Periods:1

Status:DEFAULT

--------------------------------------------------------------

--------------------------------------------------------------

ID:6 name:flow-statistics6 Created:TRUE

****************type:FLOWSTATIC****************



Times-of-schedule:0

Alarm-type:none

Threshold-of-inputPkt:20000 (direction be)

Threshold-of-inputFlow:200000000 (direction be)

Threshold-of-outputPkt:200000000 (direction be)

Threshold-of-outputFlow:200000000 (direction be)

Interface:vlan2

Statistics-interval:60(s)


Rtr Id:4 is one jitter entity; the time of creating the entity is THU JAN 01 05:15:53 2009; the last time of modifying the entity is THU JAN 01 05:52:41 2009; the entity is scheduled for 0 times; the entity can run; the destination IP address of the detection is 1.1.1.2; the destination port number is 3434 and the simulated is the well-known codec G729.A, that is, the packet size is 32bytes, send 1000 packets during each schedule, the schedule interval is one minute, and the interval of sending packets is 20ms; the timeout is 5000ms; the alarm mode is shell and send trap to inform the NMS, none means no alarm, log means the shell prompt, log-and-trap means the shell prompt and sending

the trap information to inform the NMS, and trap means just sending trap to inform the NMS, and alarm according to the alarm-type; the mos and icpif thresholds are the calculation result × 106, for example, the MOS threshold is 10.000000 and it is 10000000 after calculation; the number of the history records is 120, and the new records cover the old records when exceeding 100; the link status is DEFAULT; if the destination is reachable, the

status is REACHABLE.

Rtr id:5 is one UDPECHO entity; the time of creating the entity is THU JAN 01 05:15:56 2009; the last time of modifying the entity is THU JAN 01 06:43:11 2009; the entity is scheduled for 0



Periods:1

Status:DEFAULT

--------------------------------------------------------------

times;, that is, do not start to schedule; the entity is in the PEND state; the destination IP address of the detection is 1.1.1.2; the destination port is 1234; the timeout is 5000ms; the valid payload is 16 bytes; the schedule period is 6s; the alarm mode is not alarm; the round-trip delay threshold is 15ms; be means alarming when the actual detection value is no less than the threshold, se means alarming when the actual detection value is smaller

than or equal to the threshold, and alarm by alarm-type; the packet filling field is abcd; the number of the history records is limited to 10 and the new records cover the old records when exceeding 10; save the history record during each schedule; the link status is DEFAULT; if the destination is reachable, the status is REACHABLE.

Rtr Id:6 is one FLOW-STATISTICS entity; the time of creating the entity is THU JAN 01 05:15:59 2009; the last time of modifying the entity is THU JAN 01 06:51:15 2009; the entity is scheduled for 0 times;, that is, do not start to schedule; the alarm mode is none, that

is, not alarm; the threshold for the number of the packets received by the interface is 20000, be means alarming when the number of the packets actually received by the interface is no less than the threshold, se means alarming when the number of the packets actually received by the interface is smaller than or equal to the



threshold, and alarm according to the alarm-type; the interface for detection is vlan2; the detection interval is 60s; the number of the saved history records is limited to 220 and the new records cover the old records when exceeding 10; save the history record during each schedule; the link status is DEFAULT; if the destination is reachable, the status is REACHABLE.

show rtr group Displayed Information Explanation

26-8#show rtr group

There are 1 valid groups now in the system

----------------------------------------------

ID:2 name:rtrGroup2 Members schedule interval:200

*****************************

type:SINGLE Entity Id :3


type:RANGE Entity start id:60 end id:80


26-8#

There is one rtr group in the system.

Rtr group2: The interval of scheduling the members is 200s and the member list is 3, 45, 60-80, 7

show rtr schedule Displayed Information Explanation

26-8#show rtr schedule

There are 1 schedule in the system now

--------------------------------------------------------------

SCHEDULE ID:38

Schedule entity:1

Schedule start after 0:3:0 time

Schedule lives time:500(s)

Schedule repeat time:2 (times) Schedule interval:35(s)

Schedule ageout time:400(s)

----------------------------------------------------------

There is one rtr schedule in the system:

Rtr schedule38: Schedule rtr entity 1; start to schedule after three minutes; the life time is 500s; the ageout is 400s; schedule for twice; the schedule interval is 35s.



----

show rtr history After scheduling, view the history records of rtr entity 1:

Displayed Information Explanation

26-8#show rtr history 1

-------------------------------------------------------------- ID:1 Name:IcmpEcho1 CurHistorySize:2 MaxHistorysize:200 History recorded as following: THU JAN 01 01:06:18 1970 Rtt:1(ms) PktLoss:0 THU JAN 01 01:29:38 1970 Rtt: 1(ms) PktLoss:0

--------------------------------------------------------------

Rtr1 scheduling result is as follows: The maximum number of the history records saved by the ICMP-ECHO entity is 200; currently, two history records are saved and save according to the schedule interval 23s

The bi-directional delay is 1ms and there is no lost packet.

Note If there is another history record when the number of the history records reaches 200, the new record covers the oldest record.

After scheduling, view the history record of rtr entity 2:



-------------------------------------------------------------- ID:2 Name:IcmpPathEcho2 History of record from source to dest: CurHistorySize:2 MaxHistorysize:100 THU JAN 01 00:11:59 1970 Rtt:3 THU JAN 01 00:21:59 1970 Rtt:3

--------------------------------------------------------------

The result of the rtr2 schedule is as follows:

The maximum number of the history records saved by the ICMP-PATH-ECHO entity is 100s; currently, two history records are saved; save according to the schedule interval 60s.

The bi-directional delay is 3ms; if invalid is displayed, it indicates that the network is unreachable, that is, one packet is lost, so the entity just sends only one ICMP packet.


After scheduling, view the history records of rtr entity 3:



--------------------------------------------------------- -------------------------------------------------------------- ID:3 Name:IcmpPathJitter3 History of hop-by-hop:

The result of the rtr schedule is as follows:

The maximum number of the history records saved by the ICMP-PATH-JITTER entity is 100; currently, one history



3.3.3.2 Rtt:1 Jitter:0 Pkt loss:0 1.1.1.2 Rtt:2 Jitter:0 Pkt loss:0 History of record from source to dest: CurHistorySize:1 MaxHistorysize:100 THU JAN 01 02:30:03 1970 Rtt:2 Jitter:0 Pkt loss:0

--------------------------------------------------------------

record is saved; save according to the schedule interval 60s.

The network environment is as follows:

Source-router 1-destination

The round-trip delay from the source to router 1 (3.3.3.2) is 1ms; the jitter is 0; there is no lost packet;

The round-trip delay from the source to destination 1.1.1.2 is 2ms; the jitter is 0; there is no lost packet;

And then record the history records from source to destination.

The round-trip delay from the source to destination 1.1.1.2 is 2ms; the jitter is o and there is no lost packet.


After scheduling, view the history records of rtr entity 4:



-------------------------------------------------------------- ID:4 Name:Jitter4 CurHistorySize:1 MaxHistorysize:120 History recorded as following: THU JAN 01 00:16:06 1970 SdPktLoss:0 DsPktLoss:0 Rtt:16 SdDelay:11 DsDelay:15 SdJitter:10 DsJitter:10 Mos:4.300000 icpif:10.000000

--------------------------------------------------------------

The result of rtr4 schedule is as follows: It is the JITTER entity; the maximum number of the saved history records is 120; currently, one history record is saved.

There is no lost packet from the source to destination and from destination to source. The round-trip delay is 16ms; the uni-directional delay from source to destination is 11ms and the uni-directional delay from the destination to source is 15ms; the jitter from source to destination is 10ms; the jitter from the destination to source is 10ms; the MOS value is 4.3; the icpif value is 10.0.

Note

1. If there is another history record when the number of the history records reaches 100, the new record covers the oldest record.

2. The NTP protocol must be configured; let the clock to synchronize.



After configuring the RTR entity 5, view the history records of rtr entity 5:



-------------------------------------------------------------- ID:5 Name:UdpEcho5 CurHistorySize:2 MaxHistorysize:10 History recorded as following: THU JAN 01 00:31:04 1970 Packet loss:0 Rtt:18(ms) THU JAN 01 00:31:10 1970 Packet loss:0 Rtt:18(ms)

--------------------------------------------------------------

The result of rtr 5 schedule is as follows:

The detection type is UDPECHO; the maximum number of the history records is 10, currently, two history records are saved.

The following is the statistics information after the entity is scheduled:

The number of the lost packets is 0 and the roung-trip delay is 18ms.

Note

If there is another history record when the number of the history records reaches 10, the new record covers the oldest record.

After configuring the RTR entity 6, view the history records of rtr entity 6:



-------------------------------------------------------------- ID:6 Name:flow-statistics6 CurHistorySize:2 MaxHistorysize:220 History recorded as following: THU JAN 01 00:31:27 1970 Input pkt:1 (packets/s) Input flow:0(bits/s) Output pkt:1 (packets/s) Output flow:0(bits/s) THU JAN 01 00:31:37 1970 Input pkt:1 (packets/s) Input flow:0(bits/s) Output pkt:1 (packets/s) Output flow:0(bits/s)

The result of rtr 6 schedule is as follows:

It is the FLOW-STATISTICS entity; the maximum number of the history records is 220, currently, two history records are saved.

The following is the traffic statistics of the interface:

The rate of receiving the packets is 1packets/s; the receiving traffic is 0bits/s; the rate of sending the packets is 1packets/s; the maximum sending traffic is 0bits/s.

SLA Debug Commands debug rtr all: show all SLA debug information

debug rtr icmpecho: the detection information of debugging the

ICMPECHO entity

debug rtr icmp-path-echo: the detection information of debugging the

ICMP-PATH-ECHO entity

debug rtr icmp-path-jitter: the detection information of debugging the

ICMP-PATH-JITTER entity



debug rtr jitter : the detection information of debugging the jitter

entity

debug rtr udpecho: the detection information of debugging the udpecho

entity

debug rtr flow-statistics: the detection information of debugging the flow-

statistics entity

debug rtr macping: the detection information of debugging the macping

entity

debug rtr group: the information of debugging the rtr group

debug rtr schedule: the information of debugging the rtr schedule

debug rtr responder: the information of debugging the rtr responder

Enable the debug during the entity detection and you can see the specific

debug information.



VRRP Technology

This chapter describes the VRRP protocol theory and how to realize it.

Main contents:

Related terms of VRRP protocol

Introduction to VRRP protocol


Related Terms of VRRP Protocol VRRP――Virtual Router Redundancy Protocol

Master: One status of VRRP; the active device is in the state; ensure the

forwarding of the IP packets;

Backup: One status of VRRP; the standby device is in the state; ensure

the switch in time when the active device fails.

Introduction to VRRP Protocol VRRP is the redundancy backup protocol. Usually, the hosts in one

network are configured with one default route. In this way, the packets

whose destination addresses are not in the local segment are sent to the

default gateway A via the default route, so as to realize the

communication between the host and the outer network. When the

gateway A fails, all the hosts with A as the default route next hop in the

local segment disconnects the communication with the outside.



Here, the used gateway is any network device with the IP forwarding

function, such as switch and router. To make it easy for the reader to

understand, the following uses router to express the gateway.

VRRP is to solve the previous problem and it is designed for the LAN with

multicast or broadcast capability (such as Ethernet). VRRP makes a group

of routers of the LAN (including one MASTER and several BACKUP) form

one virtual router, called one backup group.

The virtual router (that is backup group) has its own IP address. The

router in the backup group has its own IP address. The hosts in the LAN

just need to know the IP address of the virtual router, but do not need to

know the IP address of the master router or the IP address of the backup

router. They set their default route as the IP address of the virtual router.

Therefore, the hosts in the network communicate with other networks via

the virtual router. When the master router in the backup group fails, the

other backup router in the backup group becomes the new master and

continues to provide route service for the hosts in the network, so as to

realize the un-interrupted communication with the out network.

Basic Hierarchy of VRRP in TCP/IP

The VRRP protocol is one IP packet and the protocol number is 112 (0x70).

Structure of VRRP Packet The structure of the VRRP packet:



Version: Version number; it is 2.

Type: The packet type is 1, indicating ADVERTISEMENTS;

VRID: The configured vrid of the interface, Virtual Router Identifier (VRID).

Priority: The priority configured on the interface. The priority of the router

with the virtual IP address (the router with VIP as the interface IP) is 255;

the priorities of the other routers are 1-254 and the default value is 100.

Count IP Addr: The number of the virtual IP addresses; usually, it is 1.

AuthType: the authentication type;

0: no authentication; AuthData field is all 0.

1: simple text authentication.

Advertise Interval: the period of sending ADVERTISEMENT, taking the

second as the unit; the default value is 1s.

IP Address: virtual IP address.

Checksum: the check summary.

Auth Data: 8 characters at most; if there are no 8 characters, fill 0.

VRRP Workflow Simply speaking, VRRP is one fault tolerance protocol. It ensures that

when the next hop router of the host fails, there is another router to

replace in time, so as to keep the continuity and reliability of the

communication. To make VRRP work, configure the virtual router number

and virtual IP address on the router. In this way, one virtual router is

added to the network, while the communication between the host on the

network and the virtual router does not need to know any information of

the physical router on the network. One virtual router comprises one

master router and several backup routers. The master router realizes the

real forwarding function. When the master router fails, one backup router

becomes the new master router and takes over the work.

VRRP just defines one kind of packets—VRRP packet, which is one

multicast packet. The packet is sent by the master router to advertise its

existing. The packet can be used to detect the parameters of the virtual

router and also can be used for the selection of the master router.



VRRP defines three kinds of models, including Initialize, Master and

Backup. Here, only the Master state can provide the services for the

forwarding request via the virtual IP address.

The VRRP protocol defined in RFC2338 is made on the basis of the private

HSRP protocol of Cisco, but VRRP simplifies the mechanism put forward by

HSRP, reducing the additional load brought by the redundancy function to

the network. For example, HSRP defines that the virtual router has 6

states, while VRRP has only three, so as to reduce the complexity of the

protocol. In the stable state, HSRP has two states that can send packets,

while in VRRP, only the router in the Master state can forward packets and

the packets are one kind, which reduces the occupied bandwidth The HSRP

packets are based on UDP, while the VRRP packets are encapsulated on

the IP packet. Meanwhile, VRRP supports using the actual interface IP

address as the virtual IP address.

VRRP router forms the different virtual routers via VRID. The routers that

form one virtual router are divided to master router and backup router.

The master and backup virtual routers needs to be confirmed via some

rules. The following are the rules for selecting the master and backup

routers:

1. Select the master router according to the priority. The router with the

highest priority is the master router and the status is Master. If the

priorities of the routers are the same, compare the IP addresses of the

interfaces, the one with larger IP address becomes the master router.

2. The other routers serve as the standby router, monitoring the status

of the master router in real time. When the master router works

normally, it sends one VRRP multicast packet (224.0.0.18), informing

the backup router in the group that it is in the normal state. If the

backup router in the group does not receive the packets from the

master router for a long time, it turns to Master. When there are

multiple backup routers in the group, there may be multiple master

routers. Here, each master router compares the priority in the VRRP

packet and its local priority If the local priority is smaller than the

priority in VRRP, its status turns to Backup. Otherwise, keep its status.

In this way, the router with the highest priority becomes the new

master router and completes the backup function of VRRP.

The virtual router has three status, including Initialize, master and backup.

Master status:

Must answer the ARP request for the virtual IP address; the response

of ARP is the corresponding MAC address of the virtual router IP

address;

Be responsible for forwarding the packets via virtual IP;



Cannot receive the packets with destination IP address as the virtual

router IP (except for the IP address owner);

Must receive the packets with the related IP address as the destination

(if it is the IP address owner);

Must send and receive the protocol packets (multicast);

When turning to master from other status, send the free ARP packets;

BACKUP status:

Cannot answer the ARP request for the virtual router IP address;

Cannot receive the packet with the destination IP address as the

virtual router IP address;

Cannot send the protocol packets; must receive the protocol packets

(multicast);

INITIALIZE status:

No any operation except for answering startup.

The converting of the three status:



VRRP Features VRRP has the following features:

Gateway backup: Multiple routers share one IP address, preventing

the single virtual IP address with multiple connected clients from

becoming invalid and minimizing the network back hole. This is the

main function of VRRP.

Load balance: It is one function with high VRRP added value. Use

multiple virtual routers to back up multiple gateways; the terminal

sets different virtual router IP addresses to realize the load balance.

Security expanding: The interacting of the VRRP protocol packets can

expand the security via the security authentication mode. VRRP defines

two kinds of authentication modes, including no authentication, and simple

clear text passwords.

no authentication: In one secure network, you can set the

authentication mode as NO. The router does not perform any

authentication processing for the received and sent VRRP packets,

which can improves the VRRP performance.

simple clear text passwords: In one network that may be threaten

by the security, you can set the authentication mode as SIMPLE.

Encrypt the sent VRRP packet and de-encrypt the received VRRP

packet. If the authentication fails, refuse the illegal packet, so as

to ensure the normal running of the VRRP protocol.

Debug Commands and Debug Information 1. Packet debug

debug vrrp packet or

debug vrrp interface _interface_ group _groupId_ packet

The command is used to print the information of the VRRP packet.

9:54:48: VRRP 1[vlan 1]: Send advertisement priority 100

It shows that the switch sends the VRRP packet from the interface VLAN1;

the VRID is 1 and the priority is 100.

1d14h: VRRP: vlan1 receive packet from 128.255.17.54



1d14h: VRRP: Version 2, Type 1, Vrid 1, Priority 100, AuthType 0,

Adver_Interval 1

It shows that the switch receives the VRRP packet from the interface

VLAN1; the contents of the packet is displayed in detail.

2. Event debug

debug vrrp event

debug vrrp interface _interface_ group _groupId_ events

The command can be used to view the status change of the VRRP device

in detail.

20:00:18: %LINEPROTO-5-UPDOWN: Line protocol on Interface vlan1,

changed state to down

20:00:18: VRRP: vlan1 happen UP/DOWN

20:00:18: VRRP 1: Shutdown event happen

20:00:18: VRRP 1[vlan1]: Change state to INIT

VRRP turns to the INIT state.


changed state to up

20:00:28: VRRP: vlan1 happen UP/DOWN

20:00:28: VRRP 1: Startup event happen

20:00:28: VRRP 1[vlan1]: Change state from INIT to BACKUP

VRRP turns to the BACKUP state.

20:03:32: VRRP 1[vlan1]: Timeout event happend

20:03:32: VRRP 1[vlan1]: Change state from BACKUP to MASTER

VRRP turns to the MASTER state.



VBRP Technology

This chapter describes the VBRP protocol theory and how to realize it.

Main contents:

VBRP protocol terms

Introduction to VBRP protocol


VBRP Protocol Terms VBRP: Virtual Backup Router Protocol, compatible with the HSRP protocol

of Cisco

HSRP: Hot Standby Router Protocol

Active Router: The active device is responsible for forwarding packets;

Standby Router: The standby device

Standby Group: A group of devices added to VBRP; they maintain one

virtual device together

Introduction to VBRP Protocol The VBRP protocol takes the function of backing up the device. By forming

the virtual IP address, multiple devices are simulated to one device

(including switch and router). Even one device fails, another device takes

over the corresponding work, which improves the network stability.



As shown in the above figure, the two devices that have unique IP address

respectively are in one network. In the normal state, the user must select

one of the two devices as the default gateway. The failure rate of the user

network depends on the failure rate of the device. However, if the two

devices are configured with the VBRP protocol, generate one logical device

with separate virtual IP address, which is used as the default gateway of

the host. In any specified time, one device is the active device and the

other one is the standby device. The master device forwards and

processes the data flow of the user. When the active device fails, the

standby device takes over all work of the active device and becomes the

new active device, so as to reduce the failure rate of the network to the

concurrent failure rate of the two devices.

Basic Hierarchy of VBRP in TCP/IP

The VBRP packet is one UDP packet. Both the source and destination ports

are 1985.

VBRP Packet Format The format of the VBRP packet is as follows:



Version: The version number is 0;

Op code: The packet type, 0-Hello, 1-Coup, and 2-Resign;

Hello message: It indicates that the router is running and can become the

active or standby device;

Coup message: When one device hopes to become the active device, send

the message;

Resign message: When device does not hope to become the active device,

send the message;

State: The current status of the device;

0x00－Initial, 0x01－Learn, 0x02－Listen, 0x04－Speak, 0x08－Standby,

0x10-Active.

Hellotime: It indicates the Hello interval of the sender of the Hello packet

(s). The field is valid in the Hello packet. The router that sends the Hello

packet must fill its own Hellotime into the packet. By default, the Hellotime

is s3s;

Holdtime: It indicates the validity of the Hello packet (s). The field is valid

in the Hello packet. The receiver of the Hello packet regards the Holdtime

in the packet as the validity of the Hello packet. Holdtime should be 3

times of Hellotime at least.

Priority: The priority field; it is used when selecting the active and standby

device. The one with larger value is preferential. If the devices have the

same priority, the one with larger address is preferential.

Group: The standby group number; the value range is 0-255.

Authentication Data: The authentication password; if the authentication

password is not configured, the default value is 0x63, 0x69, 0x73, 0x63,

0x6F, 0x00, and 0x00.

Virtual IPAddress: The virtual IP address used by the standby group.



VBRP Workflow To make VBRP work, first create one virtual IP address. In this way, one

virtual device is added to the network. However, when the host on the

network communicates with the virtual device, do not need to know any

information of the physical device on the network. One VBRP device is

specified as the active device and another physical device serves as the

standby in case that the active device fails. The active device responds to

not only its own IP address but also the virtual IP address.

When the host sends one packet to the networks except for the local

network, the host configuration indicates that the next hop of the packet is

the default gateway. The IP address of the default gateway is configured,

but to send the Ethernet frame to the device, the host needs to know the

MAC address of the device. The host sends one ARP request to the

network to query the MAC address of the default gateway. The actual host

on the network does not have the MAC address of the virtual device, so

the active device responds to the ARP request. The active device monitors

any traffic to the virtual IP address and maintains the traffic. It looks like

the traffic is routed to the active device.

The device configured with VBRP uses the UDP call packet to advertise

their existing. The advertisement is used to detect the invalidity and

negotiation parameters of the device, such as virtual IP address and

authentication password. The advertisement is also used to select the

device. At any time, there can be only one active device and one standby

device on the network. All other devices configured in one standby group

are in the Listen state until the next route selection. The next selection

happens when the active or standby device becomes unavailable.

VBRP defines three types pf packets. The first is Hello packets, sent by the

active device, standby device and the router in the SPEAK state to inform

group members of their existing The Hello packet also contains the

configuration parameters, such as IP address and timer value. The device

that does not define the parameters can get the parameter values via the

Hello packet.

The second is the Resign packet. When the active device exits from the

VBRP group because the configuration changes or the device is disabled

and so on, the active device sends the Resign packet.

The third is Coup packet. The packet is sent when the preempt

configuration command causes one device to replace the active device. If



the device is the standby device with the highest priority, it becomes the

active device.

The VBRP protocol has 6 states, including INITIAL, LEARN, LISTEN, SPEAK,

STANDBY, and ACTIVE.

1. INITIAL state

All devices start from the initial state. This is one initial state, indicating

that VBRP does not run. When one interface is in DOWN state or turns to

the DOWN state, it enters the state.

2. LEARN state

In the LEARN state, the device waits for the hello packet from the ACTIVE

device and plans to learn the virtual IP address. When one device

configured with one virtual device group is not configured with VIP, the

state appears.

3. LISTEN state

In the LISTEN state, the device knows its VIP, but it is not the ACTIVE

device or STANDBY device. It only accepts the protocol packets from the

ACTIVE device and STANDBY device. It changes its status to take part in

the election of the ACTIVE device or STANDBY device when the protocol

packets are not received from one device within some time (the other

devices except for the ACTIVE and STANDBY devices are all in the LISTEN

state).

4. SPEAK state

In the SPEAK state, the device sends the periodical hello packets and

takes part in the election of the ACTIVE/STANDBY device. The device

cannot enter the SPEAK state before getting VIP.

5. STANDBY state

In the STANDBY state, the device becomes the candidate device of the

next ACTIVE device and sends the periodical hello packets. In one virtual

device group, there can be only one standby device.

6. ACTIVE state



In the ACTIVE state, the device is responsible for forwarding the packets

that are sent to the virtual MAC address of the virtual device group and

responding to the ARP request whose destination IP is VIP. The active

device sends periodical hello packets. In one virtual device group, there

can be only one active device.

VBRP Functions 1. Gateway backup: Multiple devices share one IP address,

preventing that the unique gateway fails and minimizing the

network black hole. This is the main function of VBRP.

2. Load balance: Configure two or more virtual device groups on one

interface. When the virtual device groups are in the normal

running state, they can forward the packets of the segment

balancedly. When one device fails, the other devices take over the

work of the faulty device. When the fault is fixed, they can

continue to work balancedly.

3. Tracking function: Track the status of some important interfaces.

When the status of one interface changes, adjust their priorities.

When the priority reaches some degree (for example, the device

in the standby state turns from the DOWN to UP because of one

interface status, its priority may increase to exceed the priority of

the ACTIVE device) and the status converting appears, so as to

provide the backup function when other link fails.

4. Remote login: When the IP address of the virtual device is like the

IP address of one interface, you can log into the device in the

ACTIVE state remotely;

5. Security authentication: VBRP provides 8-byte text authentication

mode.

Debug Command and Debug Information

1. Packet debug

debug standby packets hello

The command is used to print the information of the Hello packet.

00:28:18: VBRP: vlan1 Grp 0 Hello out 128.255.17.54 Active Pri 100 vIP

128.255.17.1



The above information shows that the Ethernet port vlan1 sends the VBRP

Hello packet. The VBRP group number is 0; the main address of the

Ethernet port is 128.255.17.54 and the current status is Active; the

priority is 100 and the virtual IP address is 128.255.17.1.

00:38:44: VBRP: vlan1 Grp 0 Hello in 128.255.16.3 Standby pri 100 vIP

128.255.17.1

The above information shows that Ethernet port vlan1 receives the VBRP

Hello packet. The VBRP group number is 0; the source address of the

sender is 128.255.16.3; the current status is Standby; the priority is 100;

the virtual IP address is 128.255.17.1.

Only the VBRP devices in the Speak, Standby, and Active state can send

Hello packets.

debug standby packets coup

The command is used to print the information of the Coup packet.

00:28:18: VBRP: vlan1 Grp 0 Coup out 128.255.17.54 Active Pri 100 vIP

128.255.17.1

The above information shows that Ethernet port vlan1 sends the VBRP

Coup packets. The VBRP group number is 0; the main address of the

Ethernet port is 128.255.17.54; the current status is Active; the priority is

100; the virtual IP address is 128.255.17.1.

02:43:54: VBRP: vlan1 Grp 0 Coup in 128.255.16.3 Active pri 110 vIP

128.255.17.1

The above information shows that Ethernet port vlan1 receives the VBRP

Coup packets. The VBRP group number is 0; the source address of the

sender is 128.255.16.3; the current status is Active; the priority is 110;

the virtual IP address is 128.255.17.1.

debug standby packets resign

02:46:26: VBRP: vlan1 Grp 0 Resign out 128.255.17.54 Active Pri 100 vIP

unknown

02:45:37: VBRP: vlan1 Grp 0 Resign in 128.255.16.3 Active pri 110 vIP

0.0.0.0

The above two pieces of information shows that the vlan1 interface sends

and receives the resign packets respectively.

debug standby packets detail

The command is used with the above debug commands to print the details

of the specified packet, as follows:



r2#debug standby packets detail

r2#debug standby packets hello

02:50:30: VBRP: vlan1 Grp 0 Hello out 128.255.17.54 Active Pri 100 vIP

128.255.17.1

02:50:30: hel 3 hol 10 auth cisco

The above information shows the details of the Hello packet. The Hellotime

is 3s; Holetime is 10s; the authentication password is Cisco.

2. Event debug

debug standby events

The command is one important debug command. The command can be

used to view the status change of the VBRP device.


changed state to up

03:01:15: VBRP: vlan1 API Software interface going up

03:01:15: VBRP: vlan1 Grp 0 Init: a/VBRP enabled

03:01:15: VBRP: vlan1 Grp 0 Init -> Listen

The interface configured with VBRP becomes UP. VBRP first turns from Init

state to Listen state.

03:01:25: VBRP: vlan1 Grp 0 Listen: d/Standby timer expired (unknown)

03:01:25: VBRP: vlan1 Grp 0 Listen -> Speak

The Hello packet is not received from other device, so VBRP turns from

Listen to Speak.

03:01:25: VBRP: vlan1 Grp 0 Speak: c/Active timer expired (unknown)

03:01:35: VBRP: vlan1 Grp 0 Speak: d/Standby timer expired (unknown)

03:01:35: VBRP: vlan1 Grp 0 Standby router is local, was unknown

03:01:35: VBRP: vlan1 Grp 0 Speak -> Standby

The Hello packet is not received from other device, so VBRP turns from

Speak to Standby.

03:01:35: VBRP: vlan1 Grp 0 Standby: c/Active timer expired (unknown)

03:01:35: VBRP: vlan1 API MAC address update

03:01:35: VBRP: vlan1 Grp 0 Active router is local, was unknown

03:01:35: VBRP: vlan1 Grp 0 Standby router is unknown, was local

03:01:35: VBRP: vlan1 Grp 0 Standby -> Active

The Hello packet is received from other device, so VBRP turns from

standby to active.



r2(config-if-vlan1)#shutdown


changed state to down

03:08:32: VBRP: vlan1 API Software interface going down

03:08:32: VBRP: vlan1 Grp 0 Active: b/VBRP disabled


03:08:32: VBRP: vlan1 Grp 0 Active router is unknown, was local

03:08:32: VBRP: vlan1 Grp 0 Active -> Init

The vlan1 port becomes down, so VBRP turns from Active to Init.

The following debug information shows the converting process from Active

to Standby.

03:11:53: VBRP: vlan1 Grp 0 Active: g/Hello rcvd from higher pri Active

router (110/128.255.16.3)


03:11:53: VBRP: vlan1 Grp 0 Active router is 128.255.16.3, was local

03:11:53: VBRP: vlan1 Grp 0 Active -> Speak

The Active device receives one Hello packet with high priority from another

devicer (128.255.16.3). The router is configured as preempt, so the device

enters the Speak state.

03:11:56: VBRP: vlan1 Grp 0 Speak: g/Hello rcvd from higher pri Active

router (110/128.255.16.3)


router (110/128.255.16.3)


router (110/128.255.16.3)

03:12:03: VBRP: vlan1 Grp 0 Speak: d/Standby timer expired (unknown)

03:12:03: VBRP: vlan1 Grp 0 Standby router is local, was unknown

03:12:03: VBRP: vlan1 Grp 0 Speak -> Standby

The Hello packet is not received from other Standby device, so the device

turns from Speak to Standby.

The priority of the Standby device is adjusted to 200 and it turns to Active.

r2(config-if-vlan1)# standby priority 200

03:20:29: VBRP: vlan1 Grp 0 Standby: h/Hello rcvd from lower pri Active

router (110/128.255.16.3)


03:20:29: VBRP: vlan1 Grp 0 Active router is local, was 128.255.16.3

03:20:29: VBRP: vlan1 Grp 0 Standby router is unknown, was local

03:20:29: VBRP: vlan1 Grp 0 Standby -> Active



IPFIX Technology

Overview This chapter describes the working principle of IPFIX.

Main contents:

Terms

Introduction to the principle

Terms IPFIX-IP Flow Information Export

IPFIX Packets-The packets sent to the IPFIX workstation from the IPFIX

module; it carries the IP flow statistical information monitored by the

IPFIX on the network devices. The IPFIX packets are UDP packets and

assembled according to the NetFlow v9 mode.

IP flow-The IP packets processed by the network devices; categorize the

packets according to the ingress port, protocol ID, source address,

destination address, TOS field, TCP/UDP source port, and TCP/UDP

destination port. Each category is a IP flow.

IPFIX flow recording template-a type of IPFIX packets; it defines the

format of the subsequent IPFIX flow recording packets.

IPFIX option recording template-a type of IPFIX packets; it defines the

format of the subsequent IPFIX option recording packets.

IPFIX flow record-a type of IPFIX packets; it records the statistics of the

IP flow.

IPFIX option records-a type of IPFIX packets; it records the content of the

statistical options irrelevant with single IP flow in the IPFIX.



Introduction to the Principle Main contents:

IPFIX working flow

IPFIX restrictions

IPFIX packet structure

IPFIX Working Flow When the IPFIX function is enabled in the system, the IP packets are

classified into different IP flows according to the ingress port, protocol ID,

source address, TOS field, TCP/UDP source port, and TCP/UDP destination

port. Each IP flow is counted independently. The statistical data of the

flows are assembled into IPFIX packets by the IPFIX periodically and sent

to the specified IPFIX server. The IPFIX server provides powerful graphical

display and calculation capability. It analyzes the flow statistics in the

IPFIX packets to provide materials for traffic monitoring and management

for the network administrators

When the IPFIX is enabled in the switch, the simplest procedure is as

follows:

1. Determine the ports to monitor traffic. The ports are called observation

points.

2. In the observation points, use the ipfix ingress/egress command to

enable the IPFIX to monitor traffic. The ipfix ingress means monitoring

the IP flow received from the observation point; the ipfix egress means

monitoring the IP flow sent from the observation point.

3. Configure the address of the IPFIX server and the UDP destination port

number. The destination address of the IPFIX packets and the UDP

destination port number will use the configuration.

After the preceding configuration is complete, the IP traffic forwarded by

the observation point will be divided into different IP flows for processing

and calculation. The historical IP flow statistics are sent to the IPFIX

module periodically. After the statistical information is received, the IPFIX

module assembles the IP flow statistics into IPFFIX packets. Fill in the

destination address of the packets and the destination UDP port number

according to the configuration. Then, send the packets.

The time cycle of delivering IP flow statistics to IPFIX is determined by the

IPFIXinactive timer configured in the port. The inactive timer specifies the

failure time of a flow. If no packets are hit for an existing flow in the

inactive time, the flow record fails. If the inactive timer of the flow record



times out, the statistical information of the flow will be delivered to the

IPFIX.

IPFIX Restrictions The restrictions of the IPFIX in a switch are as follows:

1. The IPFIX flow record is controlled by the chip, instead of software.

The switching chip that does not support IPFIX function cannot support

the IPFIX function.

2. For the statistics of INGRESS flow, only the unicast flow is counted. For

the unicast flow, the chip forwards the packets through a single port

instead of multiple ports (namely, it cannot be flooding). The flow

statistics of the egress is not restricted.

IPFIX Packet Structure The IPFIX packet complies with the NetFlow v9 format. It is composed of

packet header and FlowSet.

Packet Header

Figure 32-1 Format of IPFIX Packet Header

Version: ver9 format, 0x0009.

Count: the quantify of records carried in the packets.

System Uptime: the running time of the device, with the unit of ms.

UNIX Seconds: the seconds from 1700 0 UTC till now.

Sequence: the sequence number of the packets; it is accumulated.

Source ID: the value is 0.



FlowSet FlowSet includes: Template FlowSet and Data FlowSet. One IPFIX packet

can contain multiple FlowSets.

Template FlowSet

One Template FlowSet is composed of multiple template records. Each

template record defines a template. The template defines the explanation

for corresponding data records. The IPFIX server explains the received

data subsequently according to the received template.

The template can be classified into flow record template and option record

template. The flow record template defines how to explain the flow record;

the option record template defines how to explain the option records.

The format of the FlowSet composed of flow record template is as follows:

Figure 32-2 Template FlowSet format of the flow template

FlowSet ID: the FlowSet composed of flow record template uses ID 0.

Length: the total length of FlowSet.

Template ID: for the matching of data and template. It starts from 256.

Field Count: the number of Template record fields.

Field Type: the type of the field, indicated with numbers

Filed Length: the number of bytes of the field defined by the field type.

The format of the FlowSet composed of option record template is as

follows:



Figure 32-3 FlowSet format of the option template

FlowSet ID: the FlowSet composed of the option template uses ID 1.

Length: the length of FlowSet, including the length of Padding.

Template ID: for the matching of data and template; it is greater than 255.

Option Scope Length: the number of bytes in the Scope field.

Options Length: the number of bytes in the Option field.

Scope Field Type: the type of the scope field quoted by the relevant data

of the IPFIX process 0x1: system; 0x2: interface; 0x3: line card; 0x4:

IPFIX cache; 0x5: template.

Scope Field Length: The length of Scope field.

Option Filed Type: the type of the option data, the used value is the same

as the field type value described in flow template.

Option Field Length: the length of option data (number of bytes).

Padding: for the FlowSet to align by 32 bits.

The types of the fields used in the IPFIX template are as follows:

Type value Name Description

42 TOTAL_FLOWS_EXP Total exported flow records

41 TOTAL_PKTS_EXP Total exported IPFIX packets



1 IN_BYTES Input bytes

2 IN_PKTS Input packets

21 LAST_SWITCHED The last hit time of the packets

22 FIRST_SWITCHED The time of creating the flow

8 IPV4_SRC_ADDR The source IP address.

12 IPV4_DST_ADDR The destination IP address

10 INPUT_SNMP The MIB index at the input interface

14 OUTPUT_SNMP The MIB index at the output interface

15 IPV4_NEXT_HOP The IPv4 address of the next hop.

7 L4_SRC_PORT Source port number

11 L4_DST_PORT The destination port number

4 PROTOCOL Protocol

5 SRC_TOS Source TOS

9 SRC_MASK The length of source mask

13 DST_MASK The length of destination mask

6 TCP_FLAGS TCP flag

32 ICMP_TYPE ICMP type

16 SRC_AS The BGP AS of the source route

17 DST_AS The BGP AS of the

destination route

18 BGP_IPV4_NEXT_HOP BGP route gateway

23 OUT_BYTES Output bytes

24 OUT_PKTS Output packets

Data FlowSet

Figure 32-4 Packet structure of the Data FlowSet



FlowSet ID: The FlowSet ID is corresponding to the template ID; the IPFIX

explains the data information according to the corresponding relation.

Length: the length of FlowSet.

Padding: round the FlowSet length according to 32 bits. The length

includes padding.



Port Isolation Technology

This chapter describes the port isolation technology of the switch.

Configure Port Isolation Main contents:

Introduction to port isolation

Application instance of port isolation

Introduction to Port Isolation Port isolation is the port-based security feature. The user can realize the

L2 and L3 data isolation between the port and the isolated port according

to the isolated port of the specified port, improving the network security

and provide flexible networking scheme for the user.

By default, the packet forwarding can be realized between any two ports in

one VLAN of the switch. To realize that any specified port in one VLAN

cannot communicate, you can configure the isolated port in the specified

port mode so that the port configured with the port isolation cannot

communicate with the specified isolated port.

The port isolation feature is not related with the port VLAN. Currently, the

switch supports configuring the isolated port in the common port and

aggregation port mode. The configured isolated port can be common port

or aggregation port. The port isolation function only realizes the uni-

directional packet dropping. Suppose that the configured isolated ports on

port A are port B, C, and D. If the destination port of the packet entering

from port A is B/C/D, the packet is directly dropped. But if the destination

port of the packet entering from port B/C/D is A, the packet can be

forwarded normally.



Port Isolation Application

Appl icat ion Instance 1

Application instance of port isolation

Illustration

Three ports of the switch are connected to three terminal devices

respectively. Port 0/0/1, port 0/0/2 and port 0/0/3 are connected to

terminal 1, terminal 2, and terminal 3 respectively. Port 0/0/1, port 0/0/2

and port 0/0/3 belong to one VLAN. To make terminal 1 cannot

communicate with terminal 2 and terminal 3, use the previous commands

to complete the configuration of the function.

The switch configuration:

Command Description

switch(config)#port 0/0/1 Enter the port configuration mode switch (config-port-0/0/1)#isolate-port port0/0/2-0/0/3

Configure port0/0/1 to be isolated from port0/0/2 and port0/0/3

switch (config-port-0/0/1)#exit Exit the port configuration mode



IPv6 Unicast Routing

IPv6 RIPng Dynamic Routing Protocol Main contents:

Terms of IPv6 RIPng protocol

Introduction to IPv6 RIPng protocol

Terms of IPv6 RIPng Protocol UDPv6 (IPv6 User Datagram Protocol): It is one simple IP network

transmission layer protocol based on the unreliable transmission of

packets.

D-V algorithm (Distance-Vector): It is one method of calculating the roite

of the computer network, also called Bellman-Ford algorithm.

IGP: Interior Gateway Protocol;

Request packet: It is used to request the IPv6 RIPng route information

of other route devices.

Resposne packet: It is used to advertise its own route information to the

IPv6 RIPng of other adjacent route device.

Split horizon: learn the route from one interface, but do not advertise

the route to the interface. The IPv6 RIPng protocol is one measure to

prevent the route loop.

Poisoned reverse: Learn the route from one interface and then advertise

the route to the interface with unreachable metric (16). IPv6 RIPng

protocol is one measure to prevent the route loop, which is more active

than Split horizon.

Triggered updates: It is one measure of IPv6 RIPng protocol to speed up

the convergence. When the route changes, generate the triggered updates,

advertising the changed route. Regular updates is opposed to triggered

updates. Regular updates means that the IPv6 RIPng protocol sends out

the updates of all route information with an interval of 30s (by default).



Introduction to IPv6 RIPng Protocol IPv6 RIPng (Routing Information Protocol for IPv6) is one Distance-Vector

IGP, used for the simple IPv6 route learning of the small network. This

section describes how to configure the IPv6 RIPng dynamic routing

protocol on Maipu route devices for the IPv6 network interconnection.

The running mechanism of the IPv6 RIPng protocol is basically consistent

with the IPv4 RIP protocol. The unique difference is that the advertised

learned route changes from the IPv4 route to IPv6 route.

The advantages of the IPv6 RIPng protocol are that the protocol is simple

and the configuration is simple, but the route information that needs to be

advertised by the IPv6 RIPng is proportional to the route quantity of the

route table. When there are many routes, many network resources are

consumed. Meanwhile, the IPv6 RIPng protocol defines that the maximum

hops of the route devices that are passed by the route path is 15 hops.

Therefore, the IPv6 RIPng protocol is just used for the simple middle/small

networks.

The IPv6 RIPng protocol can be used for most of the campus networks and

the area networks with simple structure and strong continuity. Generally,

the complicated environments do not use the IPv6 RIPng protocol.

Locat ion of IPv6 RIPng Protocol in TCP/IP

Data Link Layer

Network Layer (IPv6)

TCPv6 UDPv6

IPv6 RIPng

Figure 34-1 Location of IPv6 RIPng protocol in TCP/IP

A shown in the above figure, the IPv6 RIPng protocol is one routing

protocol based on the UDP protocol. The protocol packet sent by the IPv6

RIPng protocol is encapsulated in the UDPv6 packet. By default, IPv6

RIPng protocol uses the 521 port to send and receive the protocol packets

from the remote route device, updates the local route table according to

the route information in the received protocol packet, and then add the



metric with 1 to advertise to the other adjacent route device. In this way,

all route devices in the route domain can learn all routes.

IPv6 RIPng protocol sends the protocol packets in three modes, as follows:

Table 34-1 The mode of IPv6 RIPng protocol sending packets

Mode Address Port Usage

Multicast ff02::9 521 Send the protocol packets to all adjacent route devices on one interface

Unicast Unicast IPv6 address

The source packet of the request packet

The response packet of one request packet

Unicast Unicast IPv6 address

521 The protocol packet sent to the configured neighbor

IPv6 RIPng Protocol Packet Type The IPv6 RIPng protocol has two kinds of protocol packets, including

request packet and response packet. The IPv6 RIPng protocol packet type

and function are as follows:

IPv6 RIPng protocol packet type

Packet Type Function Sending status

Request packet Request the route information from the IPv6 RIPng of the adjacent route device. You can request the specified route information or all route information (there is only one route entry whose destination address is 0, prefix length is 0 and metric is 16).

When IPv6 RIPng just starts running on the interface, request all route information from IPv6 RIPng of the adjacent route device.

Response packet Advertise the route information to the IPv6 RIPng of the adjacent route device

1. Answer the request packet;

2. When the route changes,

trigger updating the route

information;

3. Advertise all route

information to IPv6 RIPng of

the adjacent route device

regularly (regular updates).



IPv6 RIPng Protocol Packet Structure Data Link

Header

IPv6

Header

IPv6 RIPng routing

information

route table entry

(20 Bytes)

UDPv6

Header

IPv6 RIPng

Header

command

(1 byte)

version

(1 Byte)

must be zero (2 Bytes)

route table entry

(20 Bytes)

Figure 34-2 Basic structure of IPv6 RIPng protocol packet

As shown in the above figure, the IPv6 RIPng protocol packet is

encapsulated in the UDPv6 packet. In the IPv6 header of the IPv6 RIPng

protocol packet, the Hop count field is set as 255, preventing the IPv6

RIPng protocol packet from being forwarded by other route device.

IPv6 RIPng header has two fields: Command field identifies the packet is

the request packet (the value is 1) or the response packet (the value is 2);

the version field is always 1.

Route table entry can have two types, which are described as follows:

Table 34-2 Route table entry type of the IPv6 RIP protocol

Route table entry Type Format Description

The route table entry As shown in the following figure

Bear the IPv6 route information

The entry of the next address route table

As shown in the following figure

Bear the next-hop address of the IPv6 route information. The using method is: First, add the entries of the next-hop address route table, and then add the next-hop address as the route table entry of the address, at last, end with the next-hop address route table entry whose next-hop address is 0:0:0:0:0:0:0:0.

IPv6 prefix (16 Bytes)

Route Tag(2 Bytes)

Prefix len (1 Bytes)

Metric (1 Bytes)

Route table entry

IPv6 next hop address (16 Bytes)

Must be zero(2 Bytes)

Must be zero (1 Bytes)

0xFF (1 Bytes)

Next hop route table entry

Format of the IPv6 RIPng protocol route information entry



Basic Work Principle of IPv6 RIPng Protocol

IPv6 RIPng protocol start

Send Request packet asking

for all routing information

from neighobr

Update all routing

information to neighbor

30 Sec

IPv6 RIPng receive

packets

Response routing

information in unicast

Request

packet

Packet type?

Update routes in

database by packet

Response

packet

Routes

changed?

Trigger update

routing information

Y

N

Protocol start flow Receive packet process flow

Else

packet

Basic work flow of the IPv6 RIPng protocol

The basic work flow of the IPv6 RIPng protocol is as shown in the above

figure, including two parts. One is the flow of starting the protocol and the

other is the flow of processing the received packet.

Protocol Start Process When the IPv6 RIPng protocol starts to run on one interface, send the

route request packet to the interface in the multicast mode to request all

route information from all adjacent route devices on the interface, so as to

reach the purpose of fast convergence.

After receiving the response packet of the request packet, update the

routes in the route database according to the route information in the

packet and then advertise the changed route to IPv6 RIPng of other

adjacent route device (Triggered updates).



Meanwhile, enable the Updates Timer and use the route response packet

to advertise all route information to IPv6 RIPng of all adjacent route

devices, so as to ensure the synchronization of the route database

between IPv6 RIPng of each route device and update the advertised route.

In this way, the previous advertised route does not time out and become

invalid on other route devices.

Route Database The route database records all route information of the IPv6 RIPng

protocol. Each route information comprises the following elements:

1. Destination subnet address: The destination host or subnet of the

route;

2. Metric: The metric of the destination;

3. Next-hop interface: the interface that forwards the packet to the

destination, that is, the interface that learns the route;

4. Next-hop IPv6 address: The interface IPv6 address of the adjacent

route device that needs to be passed, so as to reach the destination.

Generally, it is the source IPv6 address of the response packet that

learns the route.

5. Source IPv6 address: The source IPv6 address of the response packet

that learns the route;

6. Route tag: It is defined by the user, used to tag one type of route. For

example, tag one route is got by re-distributing the BGP route.

Sources of Route Entr ies in Route Database The sources of the route entries in the IPv6 RIPng protocol route database

are as follows:

1. The protocol covers the direct-connected route of the interface;

2. The protocol re-distributes the route of other protocol;

3. The RIPng instance re-distributes the route of other RIPng instance;

4. The route generated by the protocol configuration command, such as

generate the command of releasing the default route (default-

information originate);

5. The route learned from IPv6 RIPng of the adjacent route device;



How to Get Route Next Hop In IPv6 RIPng, the next-hop interface of the route is the interface that

learns the route, but the next-hop IPv6 address is selected from the

following two addresses, that is, the source IPv6 address of the response

packet that learns the route and the next-hop IPv6 address in the route

information. If the next-hop IPv6 address exists in the route information

and it is the link local address, the next-hop IPv6 address of the route is

the next-hop IPv6 address in the route information. Otherwise, the next-

hop IPv6 address of the route is the source IPv6 address of the response

packet. This is to realize the function similar to re-direction.

Therefore, for the re-distributed route, when the sending interface is the

next-hop interface of the route, the route carries the next-hop address of

the route.

The following provides one instance to describe the using of the next-hop

address information of the route information in IPv6 RIPng.

Instance diagram of IPv6 RIPng route re-direction

As shown in above figure, IPv6 RIPng runs on Switch-A; IPv6 RIPng and

IPv6 OSPFv3 run on Switch-B; IPv6 OSPFv3 runs on SwitchC. IPv6 RIPng

in Switch-B re-distributes the IPv6 OSPFv3 route 11::/24 learned by the

local device so that switch-A can learn the route to the subnet 11::/24.

When the route is learned on switch-A, the next-hop is Switch-B, that is,

fe80::0201:7aff:fe4f:73f8 by default. As a result, the packets forwarded

from switch-A to the destination subnet 11.0.0.0/8 all first pass switch-B

and then reaches Switch-C.



To solve the problem, when switch-B advertises the route 11::/24 to

switch-A, the next-hop of the route is specified as Switch-C, that is,

fe80::0201:7aff:fe4f:73f7. When switch-A learns the route, the next hop

of the route 11::/24 is specified as Switch-C, that is,

fe80::0201:7aff:fe4f:73f7. As a result, the packets forwarded from switch-

A to the destination subnet 11::/24 are all directly forwarded to Switch-C,

but do not need to pass Switch-B.

Route Update When IPv6 RIP of the adjacent route device learns one route, add 1 to the

metric before route processing, so as to accumulate the metric hops.

When the metric is smaller than 15, the route is the reachable route; when

the metric is larger than or equal to 16, the route is un-reachable route.

If the route complies with the following conditions, use the route to update

the routes in the route database:

1. The route does not exist in the route database and the metric of the

route is smaller than 16 hops;

2. The route exists in the database and the source IPv6 address is

consistent with the source IPv6 address of the learned route;

3. The route exists in the database, but the metric is larger than or equal

to the metric of the learned route.

Protocol Packet Authent icat ion IPv6 RIPng protocol packet is not authenticated by the protocol, but is

authenticated by UDP v6.



Status Transition of IPv6 RIPng Protocol Route Entry and Related Timer

Valid

Invalid +

HolddownInvalid

Flush(Delete route from

database)

Invalid Timer timeout

or metric is updating

to 16 (Unreachable)

Flush

Timer timeout

Holddown

Timer timeout

Route Update

Flush

Timer timeout

Running

invalid timer on

nexthops of routes

Running

holdown timer

and

flush timer on

routes

Running

flush timer on

routes

Status transition of IPv6 RIPng protocol route entry

IPv6 RIPng protocol has four timers, including Update Timer, Invalid Timer,

Holddown Timer, and Flush Timer. The timers are described as follows:

IPv6 Timers of the RIPng protocol

Timer Name Operation Object

Default Value

Start Condition

Function

Update Timer Route database

30s When RIP is enabled, start the timer circularly.

Use the response packet to advertise all route information to the RIP of the adjacent route device regularly. 1. Ensure the route database

synchronization between the

RIP of each route device;

2. Refresh the previous

advertised route so that the

previous advertised route

does not time out or become

invalid on other route device.

Invalid Timer The next-hop of the route entry

180s Start the timer when learning one route entry

One route becomes invalid when it is not updated within some time. The status transition is as shown in the above figure. The timer can be updated by the response packet. When the route entry becomes invalid, disable the timer.

Holddown Timer

Route entry

0s Start the timer when the route entry enters the invalid

One route is not permitted to be updated by the response packet within some time after becoming invalid, so as to prevent the route



state loop. The status transition is as shown in the above figure. Disable the timer when the route entry leaves the holddown state.

Flush Timer Route entry

240s Start the timer when the route entry enters the invalid state

One route is deleted from the route database after becoming invalid for some time. The status transition is as shown in the above figure. Disable the timer when the route entry is deleted.

Avoidance of IPv6 RIPng Protocol Route Loop The IPv6 RIPng protocol is the dynamic routing protocol based on

Distance-Vector and does not know the topology of the whole network.

When the network changes, the routes of the whole network need some

time to converge and as a result, the route database of the route device

cannot synchronize in some time. Meanwhile, the topology of the whole

network is not known, so the rout loop may appear. The IPv6 RIPng

protocol uses the following mechanisms to reduce the possibility of

generating the route loop because of the inconsistency on the network,

including Counting to Infinity, Split Horizon, Poisoned Reverse, Holddown

Timer, and Triggered updates.

Counting to Inf in i ty The IPv6 RIPng protocol permits the maximum metric to be 15. The

destination whose metric is larger than 15 is regarded as unreachable.

This limits the network size and prevents unlimited transmission of the

route information. The route information is transmitted from one route

device to another route device and the metric is added with 1 after

transmitting for one time. When the metric exceeds 15, the route is

deleted from the route table.

Spl i t Horizon The route learned from one interface cannot be advertised to the same

interface. If the route learned from one interface is advertised to the same

interface, it may result in the route loop.

The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6

RIPng of the route device learns the route information A from one

interface, the response packet sent to the interface cannot contain the

route information A.



Split Horizon has one special case. When one interface receives a part of

the route information request packet, the response of the packet does not

perform Split Horizon.

Poisoned Reverse The purpose of the poisoned Reverse is the same as that of Split Horizon,

but there is a little difference as follows.

The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6

RIPng of the route device learns route information A from one interface,

the route response packet sent to the interface contains route information

A, but the metric is set as 16 (that is unreachable).

Compared with Split Horizon, the advantage of Poisoned Reverse is to

advertise the route information to the source route device by setting the

hops as unreachable. If there is route loop, it can be broken at once, while

Split Horizon can only wait for the wrong route entry to be deleted

because of timeout. The disadvantage is that Poisoned Reverse increases

the size of the route response packet, and as a result, the protocol

bandwidth consumption is increased,

Holddown Timer Holddown timer is to deny the route entry to be updated by the route

response packet within some time after becoming unreachable.

Holddown timer ensures that the unreachable route is not updated by the

response packet before each route device receives route unreachable

information. The information of the route entry in the received response

packet may be the one advertised previously.

Triggered updates Triggered updates is to use the route response packet to advertise the

route change information to the adjacent route device at once when the

route changes.



Poisoned Reverse and Split Horizon breaks he route loop formed by ant

two route devices, but the route loop formed by three or more route

devices still appear until the metric of he route is transmitted and

accumulated to unreachable (16). Triggered Updates can speed up the

route convergence, so as to shorten the time of breaking the route loop.

IPv6 OSPFv3 Dynamic Routing Protocol Main contents:

Terms of OSPFv3 Protocol

Introduction to the OSPFv3 protocol

Terms of OSPFv3 Protocol AS- Autonomous System: a group of route devices exchanging information

through the same routing protocol.

Area: the collection of route devices, which has such topology database:

OSPFv3 divides one AS into multiple areas; the topology of one are is

invisible to another area, which reduces the number of routing information

in an AS. The area is used to contain link state updates and enables the

administrator to create hierachical network.

areaID-the 32-bit ID of the area in the AS.

IGP- Internal Gateway Protocol: the routing protocol running on the

route devices of an AS system, each AS system has an independent IGP;

different AS system may run different IGP. OSPFv3 is one kind of IGP.

Router ID-a 32-bit number, it is granted to the OSPFv3, as a result, each

route device can identify the route device in the AS.

Point To Point network-the network composed of a pair of route devices,

such as a 56kb serial port connection.

Broadcast Networks-the network supports multiple (more than 2) route

devices. The route devices can exchange information with all netowkr

(broadcast) route devices. The neighbor route device is dynamically

detected by the OSPFv3 hello packets. If the network has the multicast

capability, OSPFv3 also uses multicast. Each pair of route device on the

network is supposed to directly connect with the opposite party. The

Ethernet is an example of the broadcast network.



Non-broadcast Multi-Access network-the network supports multiple

(more than 2) route devices. But it has no broadcast capability. The

neighbor is maintained by the Hello packets of the OSPFv3. Owing to the

lack of broadcast capability, configuration is required in the case of

detecting neighbors.

OSPFv3 can exchange information in two types of non-broadcast network:

1. Non-Broadcast Multi-Access, OSPFv3 in the network is similar to the

broadcast network; 2. Point-to-MultiPoint, OSPFv3 processes the network

like processing multiple point-to-point collection.

Interface-the connection between the route device and the reachable

network; each interface has the relevant status information, which can be

obtained through the bottom layer or routing protocol. Each interface has

one associated and unique IPv6 address and mask (except for

unnumbered point-to-point connection).

Neighbor-two route devices have an interface connecting to the same

network. The neighbor relationship is maintained through the OSPFv3 hello

packets.

Adjacency-OSPFv3 creates adjacency between neighor route devices and

then they can exchange routing information. Not every pair of neighbor

route devices can be adjacent.

LSA- Link state advertisement: the data unit for describing local route

device or network state. For a route device, the interface state of the route

device and the adjacency state are contained. The advertisement of each

link is sent to the entire area. The route device uses the collected link

state advertisement to form the link state database.

Stub Area-the area that has only one interface connected with the

external. Category 5 LSA cannot be flooded to the area.

Backbone Area-Composed of all area boarder route devices and the links

among them.

ASE- AS external route: the routes obtained by the non-OSPFv3 protocols,

such as BGP4+, RIPng, and static configured route of the system.

DR- Designate Router: to reduce the number of adjacencies; the

adjacencies are formed in the multiple access network, such as Ethernet,

token ring, and frame relay. The reduction of the number of formed

adjacencies lowers the scale of the topology database. The DR forms

adjacencies with all route devices in the multiple-access network. The

route device send the LSA to the DR, and the DR sends the LSA to the

entire network. Each routng device has a convergence point for sending

information. At the same time, each route device exchanges information

with other route devices in the network.

BDR- Buckup Designate Router: applied in a multi-access network; the

task is to takes over the DR when it fails.

Inter-Area Route-a route generated in non-local area



Intra-Area Route- a route in an area

Flooding-a technology distributing LSA among route devices, as a result, the

route devices running OSPFv3 synchronize the link state database

Hello-hello packets: to create and maintain the neighbor relationship In

the broadcast network, the hello packets can discover the neighbor route

devices dynamically; in addition, hello packets can be used to select a DR

in the network.

NSSA- Not-So-Stubby-Areas: allow the external route to advertise to the

OSPFv3 AS; at the same time, for other parts of the AS system, the stub

area features are reserved. In NSSA ASBR, type 7 LSA is generated to

advertise external routes of the AS area; when the ABR of the NSSA

receives type 7 LSA and the P bit is set to 1, type 7 LSA is converted to

type 5 LSA to other parts of the AS area.

Introduction to the OSPFv3 Protocol OSPFv3 is an expansion of OSPFv2. OSPFv2 is started in IPv4 and OSPFv3

is started in IPv6. OSPFv3 manages the IPv6 link and IPv6 address. It is

different from OSPFv2 for they are based on different IP protocols, but the

mechanisms of the OSPF protocols are the same.

OSPFv3 detects the changes of IPv6 link and network in the AS and

advertises the link state information. After the convergence for some time,

new route is formed. The convergence time is short and the link state

information is insufficient. In the OSPFv3 protocol, each route maintains

one network topology database describing the AS. Each specific route

device has the same database. Each record of the database is the local

state of the specific route device The route device distributes the local

states through the flooding mode in the AS.

All route devices run the same algorithm in parallel. Each route device

uses the link state database to generate a shortest path tree with itself as

the root. The shortest path tree provides the route to each destination in

the AS. The external routing information serves as leaves in the tree.

OSPFv3 allows the combination of multiple networks. The combination is

called an area. The topology information in an area is invisible to other

areas in the AS. The information shielding can reduce the route traffic. In

addition, the determination of interior route in an area requires the

topology information about the area. Then, the routing information in the

area can be protected. Normally, in the area, the route is determined by

its own topology. One area is the division of a type of application or a

geographical area.



OSPFv3 advertises the IPv6 information including IPv6 prefix and the

prefix length. The last calculated IPv6 route includes one prefix and the

prefix length. IPv6 datagram is routed to the best route.

External routes (such as exterior gateway protocol: BGP) are advertised in

AS. External routes use specific LSA advertisement and serve as part of

the OSPFv3 link state data.

The hierarchy of the OSPFv3 in the network protocol stack is as follows:

Figure 34-1 Hierarchy of OSPFv3 in the network protocol stack

Area Divis ion in OSPFv3

Figure 34-2 OSPFv3 area, AS division



SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder

router (ABR);

SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder

router (ABR);

SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router

(ABR);

SW5 is the AS boarder router (ASBR).

SW3, SW5, SW6, and SW8 comprise the backbone area 0.

Process of OSPFv3 The basic idea of OSPFv3: in the AS, each route device running OSPFv3

collects the IPv6 link state. Broadcast the link state in the entire system

through the flooding mode. Then, the entire system maintains the

synchronized link state database. Each route device calculates a shortest

path tree with the device itself as the root and other network nodes as the

leaves through the database. Then, the best routes to many places in the

system are obtained.

The route devices running the OSPFv3 form an AS. The AS can be divided

into multiple areas. For each route device in the area, an AS topology (link

state database is required).

When the OSPFv3 is enabled in a route device, it creates relationship with

other route devices in the area. By sending hello packets, other route

devices know its existence. It knows the existence of the opposite part by

receiving the hello packets. Then, the neighbor relation with other route

devices is created.

If the network type is broadcast or NBMA network, the route device A will

select the DR and BDR from the known neighbors. In addition, it creates

adjacency with them. As a result, the data traffic is reduced for all route

devices create adjacencies only with the DR and BDR.

If the network type is point-to-point or point-to-multiple point, route

device A attempts to create adjacency with all neighbors. In this case,

route device A exchanges network topology with neighbors that have

created adjacencies.



Routing device A exchanges network topology through the database

description (DD) with adjacent neighbor-route device B.

When route device A discovers updated route in route device B, request

the route from route device B through the link state request. Routing

device B also requests updated route from route device A. After the two

parties receives the requests from the opposite party, the two parties send

detailed routing information to the opposite party through the link state

update packets. And confirm the receiving of link state update packets

(link state ACK).

After the topology is obtained, route device A runs the SPF algorithm to

generate a shortest path tree to other route devices in the area with its

own as the root. Calculate the shortest path of each route according to the

routing information advertised by each route device and then record it in

the IPv6 routing table. The route to the destination in the future is

obtained from the routing table.

Each route device in the area exchanges link state information with

specified route devices continuously. Therefore, the adjacencies of each

point-to-point link exchange link state information paralelly. After the link

state information is exchanged, the link state information will also be

flooded. Therefore, the route devices in the entire area have the same link

state database.

The area boundary router belongs to multiple areas at the same time.

Therefore, the route of the home area of route device A will be advertised

to other areas, and the routes of other areas will be advertised into the

area. Through the exchange of topology in the boundary route devices,

the home area of route device A learns the network topology and routes of

the entire AS area. In the OSPFv3, the boundary routers form the

backbone area.

When the AS boundary router knows the AS external route, the AS

boundary router will advertise the routes to the internal of the AS. As a

result, route device A can obtain a topology of the entire network.

OSPFv3 Gracefu l Restart To support the None Stop Forwarding function of the device, the protocol

needs to support Graceful Restart, so as to prevent the route flap and

route black-hole after the device is restarted or active/standby switchover.



The basic principle of graceful restart: Prevent the neighbor relation

between the neighbor route device and the restarting route device from

flapping during restarting. The neighbor route device still keeps the

protocol information and topology information of the restarting route

device during the restarting and regards that the restarting route device

still can forward packets. After restarting, the restarting route device

completes the synchronization of the route information with the neighbor

route device as soon as possible and then updates the local route

information.

Graceful Restart Roles

According to the NSF capability, the route devices are divided as follows:

NSF-Capable routing device: the route device with the None Stop

Forwarding capability. It is required that the device has the dual-control

redundancy and routing protocol GR capability.

GR-Capable routing device: the route device with the graceful restarting

capability.

GR-Aware route device: the route device that can be aware that GR

happens to the neighbor and can help the neighbor to complete GR. GR-

Capable route device is also the GR-Aware route device.

GR-Unaware route device: the route device that cannot be aware that GR

happens to the neighbor and cannot help the neighbor to complete GR.

According to the role of the route device in the GR process, the route

device can be divided as follows:

GR-Restarter route device: the route device that performs the protocol

graceful restarting;

GR-Helper route device: the route device that helps the protocol graceful

restarting.

Process of OSPFv3 Restarting Restarter Gracefully

Restarter is the device that performs the device restarting or OSPFv3

protocol restarting. The process is: Generate Grace-LSA, inform the

neighbor, prepare the graceful restarting and the interval of the graceful

restarting is also called graceful period. During the graceful period, the

neighbor regards that the restarted route device is normal and the

neighbor status keeps as FULL. For the restarted route device, there are

two processes, including enter and exit the graceful restarting. During the



period, the neighbor plays the role of Helper, also called Helper mode,

including enter and exit Helper mode.

Graceful period rule: Do not generate any type of LSA. Do not perform the

update processing for the received self-generated SLA, but just receive it.

Permit the route calculation, but do not install the route to the system

forwarding table. If the device is DR before restarting, it is still DR after

restarting.

The features of entering the graceful restarting period: After the interface

becomes up, first generate Grace-LSA to advertise the neighbor. Delay

sending the Hello packet, so as to receive the hello packet of the neighbor

and enter the 2-way status. After the adjacency becomes FULL, perform

the SPF calculation, but do not install the route to the core route table.

As long as meeting any of the following conditions, exit the graceful

restart status: Finish setting up all adjacency relations; receive the

Router-LSA inconsistent with the one before restarting (for example, the

link of the Router-LSA generated by the neighbor does not contain itself,

which indicates that the neighbor exits the helper mode abnormally or

other abnormality); the graceful restart time arrives.

The action of exiting the graceful restarting: Re-generate Router-LSA. If it

is DR, Network-LSA needs to be re-generated. Re-run SPF to calculate the

route, generate Summary-LSA, NSSA-LSA, and As-External-LSA, and

update the route table. Delete the invalid LSAs and Grace-LSAs (that is,

set LSA age as 3600 and be advertised).

Process of OSPFv3 Restarting Helper Gracefully

If Route device (X) wants to complete the graceful restarting, its neighbor

route device (Y) must help to complete the graceful restarting. The device

that helps to complete the graceful restarting is Helper. During the period,

Helper is also called entering the Helper mode. The feature is that it is

based on each segment, that is, the link with the adjacency relation;

During the restarting period, advertise the link of the restarting route

device. For the virtual link, still set Bit V.

When the route device at Helper end receives Grace-LSA of the neighbor,

set the neighbor restart flag and prepare to enter the Helper mode. The

following conditions need to be met: Check whether X (the graceful

restarting route device Restarter) and Y (Helper route device) are the FULL

adjacency; After X restarts, the related link does not change; whether the



local configuration is to permit the Helper mode; Y is not in the graceful

restarting Restarter status.

If meeting any of the following conditions, exit the Helper mode: Grace-

LSA is deleted; Grace period is due; the link database contents change.

The action of exiting the Helper mode: Re-elect the DR of the segment and

regenerate Router-LSAs of the segment. If it is DR, regenerate Network-

LSA; if it is virtual link, regenerate the Router-LSA of the virtual link.

Link State Database (LSDB) of the OSPFv3 The LSDB of the OSPFv3 contains the information about the entire area. It

exchanges information with the adjacent neighbor to maintain the

synchronization of the LSDB in the entire area. It enables the OSPFv3 to

dynamically perceives the route changes through the hello packets and the

link state update packets.

The LSDB is composed of link state advertisements (LSA). The LSA can be

divided into 8 categories:

Router-LSA: generated by the route devices in the area. It describes the

link state of the route device and is flooded only in the area.

Network-LSA: generated by the DR in the area. It describes the reachable

route devices in the area on is only flooded in the area.

Inter-Area-Prefix-LSA: generated by ABR. It describes the network


Inter-Area-Prefix-LSA: generated by ABR. It describes the network


AS-External-LSA: generated by ASBR. It describes the external route

information outside of the AS.

NSSA-LSA: generated by the ASBR. It describes the external route

information outside of the AS (it is flooded only in the NSSA area).

Link-LSA: generated by the route devices in the domain area. It describes

the IPv6 Link-Local address of the IPv6 link and the IPv6 prefix. It is

advertised only in the local link.

Intra-Area-Prefix-LSA: generated by the route devices in the area. It

describes the IPv6 prefix and the association information about router-LSA

and network-LSA.



In the area boarder router, all areas use the intra-area routes calculated

to form an Inter-Area-Prefix-LSA and flood it to other areas. The backbone

area uses the intra-area inter-area routes calculated to form an Inter-

Area-Prefix-LSA and flood it to other areas. All boarder routers and the

links among them form the backbone area. Backbone areas are mutually

reachable. They can be connected physically or through the virtual link. In

the case of configuring the virtual link, the passed area must be transit

area, instead of stub or NSSA area.

The ASBR of the AS sends the external routing information to all areas

except the stub area in the AS. The route devices in the stub area are

directed to the ASBR through the default route.

NSSA allows external routes to be advertised to the OSPFv3 AS. In

addition, the stub features of other parts in the AS are reserved. ASBR of

the NSSA generates NSSA External LSA (type 7) to advertise external

routes. The NSSA External LSAs are flooded in the NSSA are but

terminated in the ABR. When the ABR of the NSSA receives the type 7 LSA

and the P bit is set 1, the type 7 LSA will be converted into type 5 LSA to

other AS areas. If the P bit is set to 0, it will not be converted. Therefore,

the NSSA External LSA will not be advertised to external NSSA.

OSPFv3 Packet Encapsulat ion The OSPFv3 packet is composed of multiple encapsulations. The external

layer of the packet is IPv6 header. In the IPv6 header, the encapsulated

packet can be one of the following five types. The format of each type of

packet starts with the OSPFv3 header with unified format. The packet data

of the OSPFv3 packet varies with the packet format.



Figure 34-3 OSPFv3 packet encapsulation

OSPFv3 Packet Header

Figure 34-4 OSPFv3 packet header

OSPFv3 packet has a standard OSPFv3 header. The length of the packet

header is 16 bytes. The recorded information determines whether further

processing is required.

Version: the version number of OSPFv3; the value is 3.

Type: the packet type at the later part of the OSPFv3 header. The OSPFv3

has five types of packets. Hello packets, type=1; database description

packets, type=2; link state request packets, type=3; link state update

packets, type=4; link state acknowledgement, type =5.

Area ID: the area where the packet is generated; when the packet passes

the virtual link, area ID is 0.0.0.0.

Checksum, the checksum of the entire packets.

Instance ID: an IPv6 link can be started in multiple OSPFv3 processes.

Different instance ID is used to identify the OSPFv3 process. When the

neighbor packets are exchanged, the instance IDs must be the same.

0: reserved field. It is not used currently.

Hello Packet Format



Figure 34-5 Hello packet format

The hello packets are for creating and maintaining adjacencies. After the

interface is UP, if the OSPFv3 is started, the hello packets are sent

periodically to detect neighbors and thus to create adjacency relation.

After the adjacency relation is created, periodically hello packets are

required to maintain the adjacency. Hello packets contain some necessary

consistent parameters required when the neighbor sets up the adjacency,

such as the hello interval and neighbor dead time. If they are inconsistent,

the hello packets will be discarded.

Interface ID: a 32-bit number; it identifies the interface sending the hello

packets in the local route devices, such as the IfIndex.

Router priority: it is used in the case of selecting DR and BDR. When the

router priority is 0, the route device cannot be selected as DR or BDR..

Option: The optional capability supported by the route devices. See the

option domain in OSPFv3 packets.

Hello interval: the interval of sending hello packets periodically

Router Dead Interval: if no hello packets are received in the router dead

interval, the neighbor is considered to be down. Delete the neighbor.



Designated Router: the router ID of the DR selected by the interface

generating the packets.

Backup DR: the router ID of the BDR selected by the interface generating

the packets.

Neighbor: the list of the neighbors that can receive hello packets at the

interface generating the packets in the router dead interval.

Format of Database Description Packets

Figure 34-6 format of the database description packets

DD packets are exchanged at the beginning of adjacency creation. The DD

packets carry the summary description information of LSA. The summary

information of all LSAs in the link state database should be exchanged

through the DD packets. The DD packets are exchanged through the

select-respond mode. The two neighbors creating the adjacency negotiate

a Master and a Slave. The Master first starts sending DD packets. After the

Slave receives the DD packets, a response is given. The response DD

packets contain its LSA summary. When the information of all LSAs is

exchanged, the DD packet exchange process is over.



Interface MTU: the maximum IPv6 packets that can be transmitted when

the interface generating the packets is not fractionized When the packets

are transmitted in the virtual link, the interface MTU is set to 0.

Option: see the option domain in the OSPFv3 packets.

I-bit: initial bit, when the packet is the initial packet of the DD packet


M-bit: when the packet is the last packet of the DD packet sequence, the

bit is 1.

MS-bit: Master/Slave bit, when the Master is set to 1 in the case of

generating packets, the slave is set to 0.

DD Sequence Number: sequence number of the DD packets, set by the Master

LSA Headers: the LSA header list of the link state database.

Format of Link State Request Packet

Figure 34-7 Format of the link state request packets

After the DD packets are exchanged, compare the link summary

information and the LSA in the database. For the LSA unavailable in the

database or the older LSA, send the link state request to neighbors for

new LSA or the LSA unavailable in the database.

Link State Type: for describing the LSA type


a LSA.

Advertising Router: the router ID of the route device generating the LSA



Format of the Link State Update Packet

Figure 34-8 Format of the link state update packets

In the process of creating neighbors, when the link state request packets

are received, the LSA in the local database is sent to neighbors through

the update packets. In addition, if the local link state changes, the

changed LSA is sent out through the update packets. The flooding

mechanism is used in the case of sending update packets.

#LSAs (Number of LSA): the number of LSAs contained in the packet

LSAs: the list of the LSAs sending updates

Format of the Link State Acknowledgment Packet



Figure 34-9 Format of the link state acknowledgement packets

LSA headers: the LSA headers acknowledged

LSA header

Figure 34-10 LSA header

Age: the duration after the LSA is generated

Type: the type of LS.


a LSA.

Advertising Router: the router ID of the route device generating the LSA

Sequence Number: the sequence number of LSA, when new instances of

LSA are generated, it increases.



Checksum: the checksum of the LSA except Age

Length: length of LSA, with the unit of byte

Format of Router LSA Packet

Figure 34-11 Format of the router LSA packet

V: Virtual Link Endpoint bit; set the bit when the route device generating

the packet is one end of a virtual link

E: External bit, set the bit when the route device generating the packets is

ASBR

B: External bit, set the bit when the route device generating the packets is

ASBR

W: Multicast bit, it is set when the route device generating the packet is

the wild-card multicast receiving route device.

Options: supported option capability.



Type: the described interface type, including point-to-point, multipoint

access, and virtual link.

Metric: the output cost of the interface

Interface ID: described in the interface index.

Neighbor Interface ID: the interface ID of the neighbor, point-to-point

interface type refers to the neighbor interface ID; multipoint access

interface type refers to the interface ID of the DR route device.

Neighbor Router ID: the router ID of the neighbor route device; the point-

to-point interface refers to the router ID of the neighbor route device; the

multipoint access interface type refers to the router ID of the DR router.

Format of Network LSA Packet

Figure 34-12 Format of the Network LSA packet

Link State ID: for the Network LSA, it is the interface ID of the DR

interface

Attached Router: the list of the route devices adjacent to the DR in the

network

Format of Inter-Area-Prefix-LSA Packet



Figure 34-13 Format of Inter-Area-Prefix-LSA packet

Metric: the cost of the destination route

PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix

of the destination address.

Format of Inter-Area-Router-LSA Packet



Figure 34-14 Format of Inter-Area-Router LSA packet

Options: the option capability of the route devices described in the LSA.

Metric: the cost for reaching the destination route device described in the

LSA.

Destination Router ID: the router ID information about the described route

devices.

Format of the Autonomous System External LSA Packet



Figure 34-15 Format of the Autonomous System External LSA packet

E: External metric bit, the type of the external cost used by the route If

the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.

F: forwarding address bit, if it is set to 1, it indicates that the forwarding

address exists.

T: the tag bit of the route, if it is set to 1, it indicates that the tag value

exists.

Referenced LS Type: the LS type related with the LSA; if the value is set,

the Referenced Link State ID exists; through the LS Type, Link State ID

and the advertised router ID of the LSA, you can find the related LSA.

Metric: the cost of the route, set by the ASBR



PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix

of the destination address.

Forwarding Address: the destination address of the generated packets If

the forwarding address is not set, the packets of the advertised destination

should be sent to the ASBR generating the packets.

External route tag: the tag of the external route

Referenced Link State ID: the related link state ID

Format of Link LSA Packet

Figure 34-16 Format of the Link LSA packet

Each IPv6 link in the route device generates a corresponding link LSA. The

link LSA is advertised only in the local link. The content of the

advertisement contains the IPv6 link-local address and the IPv6 prefix

address in the link. The link ID of the LSA is the interface ID.

Rtr Pri: the priority of the router



Options: the options will be used in the Network LSA where the link

resides.

Link-local Interface Address: the IPv6 link-local address of the link.

#Prefixs: the number of prefixes contained

PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix.

Format of Intra-Area-Prefix LSA Packet

Figure 34-17 Format of Intra-Area-Prefix LSA packet

Intra-Area-prefix LSA: it is used to advertise the interface address, stub

network prefix address, and transit network prefix address of the route

devices. The information is advertised through Router-LSA and Network-

LSA in OSPFv2. In Router-LSA and Network-LSA of OSPFv3, there in no

prefix address information. You need to use Intra-Area-Prefix LSA to

advertise.

# prefixs: the number of IPv6 prefix addresses advertised in LSA.

Referenced LS Type, Link State ID, Advertising Router: the LSA related

with IPv6 prefix advertised by LSA can be router-LSA and network-LSA.



PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix.

Metric: the cost of advertisement prefix.

Option Domain in the OSPFv3 Packets

Figure 34-18 Option domain of the OSPFv3 packets


DC: set the bit in the case of configuring the demand line

EA: set the bit when the source route device has the capability of

receiving/sending external attributes LSA

N: used only in the hello packets, set it to 1 when the NSSA external LSA

is supported; set it to 0 when the NSSA external LSA is not supported;

when N is set to 1, the E bit must be 0.

P: used only in the NSSA external LSA headers If P bit is set, the ABR of

NSSA must convert type 7 LSA to type 5 LSA.

MC: set the bit when the source route device forwarded multicast packets.

E: set the bit when the source route device received the ASE LSA packets.

IPv6 Address in the OSPFv3 Packets

IPv6 address is a 128-bit string. It is composed of three parts: PrefixLegth,

describes the prefix length of the IPv6 address; PrefixOptions: describing

the option capability of the prefix; Prefix, the prefix information of the

address.

The prefixoptions include:

Figure 34-19 Option domain of the OSPFv3 prefix


NU: non-unicast address, if the bit is set to 1, it indicates that the prefix

address cannot be used in the case of calculating routes.

LA: local address, indicates that the prefix address is a local address

advertising the route device.



MC: set the bit when the source route device forwarded multicast packets.

P: the prefix used in NSSA External LSA. If P bit is set, the ABR of NSSA

must convert type 7 LSA to type 5 LSA.

Dif ference Between OSPFv3 and OSPFv2 1. Based on Different IP Protocol

OSPFv2 is running on IPv4 protocol. It manages IPv4 links and IPv4

addresses.

OSPFv3, namely OSPF for IPv6, it is the expansion of OSPFv2 to support

IPv6. It is based on OSPFv2. It advertises the IPv6 link state and the

address of IPv6 link prefix It is running on IPv6 protocol.

2. Same Points

The basic packet types are the same, including hello, LS-DD, LS-Req, LS-

Upd, and LS-Ack. The process and principle of neighbor discovery and

adjacency creation are the same. The types of the supported interface

network are the same, including P2P, P2MP, Broadcast, NBMA, and Virtual.

The flooding mechanism and the aging mechanism of LSA are the same.

The SPF calculation principles are also the same. The contained LSAs are

basically same. Two types of LSA are added in OSPFv3 to advertise IPv6

Link-local address and IPv6 prefix address. Router ID, Area ID, and Link

ID use the IPv4 address format.

3. Difference

OSPFv3 is running on an IPv6 link. The concept of subnet does not exist.

OSPFv2 is running on a subnet.

In one IPv6 link, multiple OSPFv3 processes are allowed and they are

identified through the Instance ID. But one IPv4 interface can run one

OSPFv2 process only.

The link ID of the OSPFv2 LSA expresses the IPv4 address information.

But the link ID of OSPFv3 does not express the address information. It is

used to identify different LSAs and it has no special meaning.(a few link

IDs of the LSA express the interface ID information, such as Network LSA).

OSPFv3 multicast packets use the IPv6 multicast address to send. Unicast

packets use the IPv6 link local address to send.

The scope of the OSPFv2 LSA flooding is judged from the LSA type. The

header of OSPFv3 LSA contains the flooding scope (flag bit of other

capabilities are also contained, for example, how to process the



unidentified LSA), through which to determine the flooding scope of the

LSA.

OSPFv3 neighbors is identified through the router ID.

Two types of LSAs are added in OSPFv3: Link LSA, advertises the link local

address, and it is flooded only in the local link; Intra-area-prefix LSA, used

to advertise the IPv6 address information of the interface.

OSFPv3 Features OSFPv3 Features

1. OSPFv3 is a kind of IGP, designed for using in the AS system

2. The link state advertisement packet is small in size, each

advertisement describes one part of the link state dabase.

3. Support NBMA; OSPFv3 processes the network like processing LAN-

select DR, generate network LSA. Some configurations are required

when the route devices discover the network neighbor.

4. In OSPFv3, the AS system can be divided into multiple areas. It has

the following advantages: 1) the routes in an area and the routes

between areas are separated; 2) dividing the AS system into areas can

reduce the calculation of SPF.

5. Input external information flexibly: each external route in the OSPFv3

is input in the AS system in a single LSA. It reduces the flooded data

volume. As a result, when a single route changes, it is possible to

update part of the routing table.

6. Four route levels: intra-area, inter-area, external type1, and external

type 2. Then, the route protection of multiple levels is implemented

and the route management of the AS is simplified.

7. Support virtual link: through the configuration of allowing virtual link,

the OSPFv3 can partly remove the restriction over the AS system of

the physical topology.

8. Flexible metric: in the OSPFv3, the metric is specified as the output

cost of the route device interface. The path cost is the total of the cost

of all interfaces. The route metric can be specified by the system

administrator according to the network features (delay, bandwidth, and

cost).

9. Equivalent multiple paths: If there are multiple paths with the same

cost to the destination, OSPFv3 finds the paths and uses load balancing.

10. Support stub area: when the area is set to stub area, the external

LSAs cannot be flooded to the stub area. In the stub area, the route to

the external destination is specified by the default route.



Resource Cost of the OSPFv3

Link bandwidth: in the OSPFv3, the reliable flooding mechanism ensures

the synchronization of the link state database of the route device. When

the network topology is not changed, single LSA packet update lasts for

long (30 minutes by default). When the size of the database increases, the

bandwidth used by flooding algorithm also increases.

Memory of route device: the link state database of the OSPFv3 may

become very large, especially when many external link states are

advertised. In this case, the memory of the route device must be very

large. In the process of updating and synchronizing the link state database,

large amount of memory is used.

CPU usage: in the OSPFv3, it is related with time of running the SPF

algorithm. Moreover, it is related with the number of route devices in the

OSPF system. In addition, when the link state database is very large, in

the process of protocol convergence, if large amount of packets should be

exchanged, a great deal of CPU is occupied.

Specify the router role: specify the router in the multi-access network to

receive and send more packets than other route devices. At the same time,

when the specified router fails, it is switched to a new specified router.

Because of this, the number of the route devices connected to a network

should be restricted.

Precautions of OSPFv3

Limiting the size of the OSPFv3 system can save the memory of the route

device.

In the area, to reduce the database size, do as follows: 1. the area can

use the default route, so reduce the external route that should be input; 2.

EGP external gateway protocol can use its own information to pass the

OSPFv3 AS area instead of depending on the IGP (such as OSPFv3) to

transmit information; 3. You can specify the route device to be the stub

area; 4. If the prefix address of external network is regular address, you

can summarize the addresses. After the summary, the external

information of the OSPFv3 decreases dramatically.

IPv6 IS-IS Dynamic Routing Protocol Main contents:

Terms of IPv6 IS-IS protocol



Introduction to IPv6 IS-IS protocol

Route learning of IPv6 IS-IS protocol in Single-Topology

IS-IS Multi-Topology

Terms of IPv6 IS-IS Protocol PDU (Protocol Data Unit): The packet unit that bears the protocol data

information;

SPF: Shortest Path First Algorithm;

IS (Intermediate System): It is equivalent to the route device in TCP-IP. It

is the basic unit of generating the route and transmitting the route

information in the IS-IS protocol. In the following contents, IS has the

same meaning as the route device;

ES (End System): It is equivalent to the host system in TCP/IP. ES does

not take part in the processing of the IS-IS routing protocol. ISO has the

specified ES-IS protocol to define the communication between the end

system and the intermediate system;

NET (network entity title): It is used to identify the ISO address of one

intermediate system. It is similar to the IP address and is divided to area

ID and system ID;

Area: The route area in the IS-IS protocol, including Level-1 Area and

Level-2 Area;

LSP (Link State PDU): Bear the link status information to be publicized,

including the adjacency information and reachable subnet information;

LSDB (Link State Database): It comprises the LSP generated by all ISs of

the whole area, describing the adjacency topology and related route

information of the whole area. LSDB has the same backup on each IS. IS

uses the SPF algorithm to calculate the route according to its own LSDB;

IIH (Intermediate System to Intermediate System Hello PDU): It is used

to discover and keep the IS neighbor alive.

SNP (Sequence Number PDU): The abstract information of advertising a

group of LSP packets, including PSNP and CSNP. It is used to confirm the

LSP packet, request the LSP packet, and advertise the LSDB abstract

description information;

PSNP (Partial Sequence Number PDU): It is one kind of the SNP packet,

used to confirm the LSP packet (point-to-point network) and request the

LSP packet (broadcast network);

CSNP (Complete Sequence Number PDU): It is one kind of the SNP packet,

used to advertise the LSDB abstract description information;



Pseudo-node: One IS node simulated by DIS in the broadcast network,

used to simplify the adjacency topology of the broadcast network;

DIS (Designated IS): One IS system elected from all ISs on the broadcast

network. It is responsible for simulating one Pseudo-node and maintaining

the synchronization of LSDB of all ISs on the broadcast network.

Introduction to IPv6 IS-IS Protocol IS-IS (Intermediate System to Intermediate System) is the IGP based on

the SPF algorithm. The basic design idea and algorithm of the IS-IS

protocol are basically consistent with OSPF. The IS-IS protocol is the

routing protocol based on the link layer, is unrelated with network layer

(IPv4, IPv6, OSI) and is not limited by the network layer, so it has good

expansibility.

The IS-IS protocol can support the routes of multiple protocol stacks,

including IPv4, IPv6, and OSI. At first, the IS-IS protocol is applied in the

OSI protocol stack (ISO10589) and then is used in the routes of IPv4

protocol stack (RFC1195) and the IPv6 protocol stack (draft-ietf-isis-ipv6).

Meanwhile, the IS-IS protocol can support the CSPF calculation of MPLS-TE

(RFC3784).

The IS-IS protocol has good compatibility (the different devices that

realize different expanding functions can also be compatible with each

other) and large the network capacity; it supports the multiple protocol

stacks and can be upgraded smoothly; it is simper than OSPF and is

unlikely to have problems. Therefore, IS-IS is suitable for large core

backbone network.

Route Learning of IPv6 IS-IS Protocol in Single-Topology

Overview Single-Topology means that the IS-IS database only records and describes

one network topology and all address stacks (IPv4, IPv6, OSI) use one

network topology (adjacency information topology) to calculate the route.

To generate the route of the IPv6 address stack according to the topology,

each route device needs to advertise the IPv6 reachable subnet

information when advertising the link status information. After calculating

the shortest path (SPF tree) to all route devices, generate the IPv6 route



according to the shortest path and the IPv6 reachable subnet information

advertised by the route devices.

IS-IS Single-topology Neighbor Check The address stacks (such as IPv4 and IPv6) supported by the local IS-IS

interface must be supported by the IS-IS interface of the neighbor.

Meanwhile, you need to check the interface address: When checking the

IPv4 address stack, check whether the Hello packet of the neighbor

advertises the IPv4 address of the same subnet as the local interface;

when checking the IPv6 address stack, check whether the Hello packet of

the neighbor advertises the Link Local Address.

Advert ise Reachable Information of IS -IS Single-topology Subnet When advertising the link status information, the route device advertises

its own IPv6 reachable subnet information, which is the same as IPv4.

Calculate IPv6 Route of IS- IS Single-Topology The process of calculating the IPv6 route is similar to the process of

calculating the IPv4 route.

First, calculate the shortest path to the route device, that is, calculate the

SPF tree by the SPF algorithm. And then calculate the route according to

the shortest path and the advertised IPv6 reachable subnet information to

the route device.

IS-IS Multi-Topology

Overview In the previous IS-IS protocol, the advertised link status database has

only one network topology, which is Single-Topology.

To advertise and learn the IPv6 route in Single-Topology, it is required

that the network topology of the IPv6 route domain is consistent with the

network topology of the IPv4 route domain, because the link status



database has only one network topology. However, in the actual

application, IPv4 network topology is inconsistent with the IPv6 network

topology and the single-topology cannot meet the application. Therefore,

Multi-Topology appears.

Multi-Topology means that the advertised link status database advertises

multiple separate network topologies. Each topology is identified by the MT

ID. Multi-Topology is not just for the separation of the IPv4 unicast route

topology and the IPv6 unicast route topology, but can support the

separation of various route topologies.

IS-IS Mult i - topology Packet Format To advertise multiple separate topologies, IS-IS multi-topology adds

several new TLV formats. The basic principle is to encapsulate the previous

single-topology link status TLV in the multi-topology link status TLV, so as

to distinguish the link status TLV of the topologies. This kind of TLVA is

released in the LSP packet. The TLV format is as follows:

R |R |R |R | MT ID

Type 222

Length

extended IS TLV format

extended IS TLV format

MT Intermediate

Systems TLV

R |R |R |R | MT ID

Type 235

Length

extended IP TLV format

extended IP TLV format

MT Reachable IPv4

Prefixes TLV

R |R |R |R | MT ID

Type 237

Length

IPv6 Reachability format

IPv6 Reachability format

MT Reachable IPv6

Prefixes TLV

0 - 253

1

Octect

Num

1

2

Figure 34-26 Format of IS-IS MT link status TLV

In the IS-IS multi-topology, add one TLV of advertising the topology

status, which records which multi-topology the system supports and the

status of the topology (Overload, Partition, and Attached). This kind of TLV

is encapsulated in the LSP packet and Hello packet. The format of this kind

of TLV is as follows:



O |A |R |R | MT ID

Type 229

Length

O |A |R |R | MT ID

Multi-Topology TLV

1

Octect

Num

1

2

Figure 34-27 Format of IS-IS MT TLV

Maintain IS-IS Mult i - topology Neighbor When any protocol stack on the local IS-IS interface uses the multi-

topology, use the multi-topology checking method when checking the

neighbor protocol. Otherwise, use the single-topology checking method.

The following describes multi-topology neighbor checking method.

Point-to-point Neighbor

When the neighbor has any kind of topology which is the same as the

interface, set up the neighbor. Otherwise, the neighbor cannot be set up.

Broadcast Interface Neighbor

To elect DIS on the broadcast interface, set up the neighbor no matter

whether the neighbor has the same topology as the interface.

Advert ise Adjacency Reachable Information of IS-IS Mult i - topology Adjacency Information of Point-to-point Interface Neighbor

For point-to-point neighbor, the adjacency information only appears in the

link status database of the topology supported by the local interface and

neighbor.



Adjacency Information of Broadcast Interface Neighbor

For the broadcast adjacency, as long as it is the topology supported by the

interface, all adjacency information on the interface appears in the link

status database of the topology. When checking the adjacency in route

calculation, check whether there are both the forward and backward paths.

If only one end supports one topology and the other end does not,

although supporting end advertise the adjacency information, there is no

backward path in the link status database because the other end does not

advertise. Here, the adjacency information is not used when calculating

the topology route.

Adjacency Information of Virtual Node

Generating the adjacency information of the virtual node is consistent with

the previous single-topology. Calculating the routes of all topologies uses

the information.

Advert ise Reachable Subnet Informat ion of IS -IS Mul t i - topology The reachable subnet information is added to the corresponding topology

according to the protocol stack (IPv4 or IPv6) of the subnet information.

Overload, Part i t ion and Attached Flags of IS-IS Mul t i - topology The Multi-Topology TLV of the LSP packet contains the Overload, Partition

and Attached flags of the topology. When calculating the route of the

topology, use the related flag of its own topology.

When calculating the single-topology route, use the related flag carried by

the LSP header.

IS-IS Mult i - topology Route Calculat ion When calculating the route, the route of each topology should be

calculated separately. When calculating the route, each topology can only

use its own topology information. The topology information of virtual node

is used in all topologies.



In the database, there can be the following topologies: single-topology

and IPv6 multi-topology. The IPv4 route can be calculated only in the

single topology, while the IPv6 route can be calculated in wide single-

topology or IPv6 multi-topology, but cannot be calculated in the two

topologies at the same time.

IPv6 BGP4+ Dynamic Routing Protocol Main contents:

Terms of IPv6 BGP4+ protocol

Introduction to IPv6 BGP4+

Terms of IPv6 BGP4+ Protocol AS: Autonomous System AS is a set of routing devices and hosts in the

same management control domain and policy. The AS number is allocated

by the internet registration organization.

EBGP: BGP between AS systems. An EBGP neighbor is a routing device of

the management and policy control beyond the local AS.

IBGR: the BGP in the same AS. An IBGP neighbor is the routing device in

the same management control domain.

NLRI: Network Layer Reachability Information NLRI is a part of the BGP

update packets, used to list the collection of the reachable destination.

MP-BGP (Multiprotocol BGP): The BGP that carries different kinds of route

information is called multi-topology BGP.

Introduction to IPv6 BGP4+ Protocol Border Gateway Protocol (BGP) is a kind of route selection protocol for

exchanging network layer reachability information (NLRI) between route

selection domains. Its main function is to exchange NLRI with other BGP

peers. A BGP peer refers to any device running BGP.

BGP uses the TCP as the transmission protocol (port 179). Then, reliable

data transmission is provided. The retransmission and acknowledgement



of data are implemented by the TCP, instead of BGP. As a result, the

process is simplified. The reliability need not be designed in the protocol.

Create a TCP connection between two routing devices running BGP. Then,

the two routing devices are called peers. Once the connection is created,

the two peer routing devices acknowledge the connection parameters by

exchanging the open packets. The parameters include BGP version number,

AS number, duration, BGP identifier and other optional parameters. After

the two peers negotiate parameters successfully, the BGP exchanges

routes by sending update packets. The update packets contain the list of

reachable destinations passing each AS system (namely NLRI), and the

path attributes of each route. When the route changes, incremental

update packets are used between peers to transmit the information. BGP

does not require refreshing routing information periodically. If the route

does not change, the BGP peers only exchange keepalive packets. The

keepalive packets are sent periodically to ensure the valid connection.

The present Internet is one large network that comprises multiple

interconnected AS systems. Here, BGP V4.0 (BGP version 4, BGP4) is the

route selection protocol.

IPv6 BGP4+ is the inter-domain routing protocol that supports IPv6. Based

on BGP4, it reflects the information of the IPv6 network layer protocol to

NLRI and Next_Hop attribute. It brings in two NLRI attributes, that is,

MP_REACH_NLRI (Multiprotocol Reachable NLRI, used to release reachable

IPv6 route and next-hop information) and MP_UNREACH_NLRI

(Multiprotocol Unreachable NLRI, used to cancel the unreachable IPv6

route). The Next_Hop attribute is identified by the IPv6 address, which can

be IPv6 global unicast address or next-hop link local address. IPv6 BGP4+

uses the BGP4 multi-topology expanding attribute to be applied in the IPv6

network, while the original message mechanism and routing mechanism of

BGP4 do not change, so we can say that the application situation and

working principle of IPv6 BGP4+ are the same as BGP4.

BGP Message Header The BGP message header contains a 16-byte tag, 2-byte length field, and

1-byte type field. The following figure illustrates the format of the BGP

message header.



Figure 34-28 Format of the BGP message header

The header can be followed by data or not. It depends on the message

type, for example, the keepalive message only requires the message

header, and no data is followed.

Marker: the marker field occupies 16 bytes, used to detect the

synchronization loss between BGP peers. If the message type is open, or

the open packets do not contain the authentication information, the

marker fields must be set to 1. Otherwise, the marker field is calculated by

the authentication technology.

Length: the length field occupies 2 bytes. It indicates the length of the

message. The minimum allowed length is 19 bytes and the maximum is

4096 bytes.

Type: The type field occupies one byte. It indicates the type of the BGP

message. The four types of the BGP message are as follows:

Table 34-3 Numbers of BGP message types

Number Type

1 Open

2 Update

3 Notification

4 Keepalive

Open Messages After the TCP connection is created, the first packet is the open message.

The Open message contains BGP version number, AS number, duration,

BGP identifier, and other optional parameters.



If the open message is acceptable, it means that the peer routing devices

agree with the parameters. In this case, the keepalive message is sent to

acknowledge the open message.

Except the fixed BGP header, the open message contains the following

fields:

Figure 34-39 Format of BGP Open message

Version: the version field occupies one byte. It indicates the version

number of the BGP protocol. When the neighbors are negotiating, the peer

routing devices agree on the BGP version numbers. Usually, the latest

version supported by the two routing devices is used.

My Autonomous System: the field is two bytes. It indicates the AS number

sending the routing device.

Hold Time: the field is two bytes. It indicates the maximum waiting time

when the sending party receives the adjacent keepalive or update

messages. The BGP routing device negotiates with the peer and set the

hold time to the smaller value of the two hold times.

BGP Identifier: the field is four bytes. It indicates the identifier of the BGP

sending routing devices. The field is the ID of the routing device, namely

the maximum loopback interface address or the maximum IP address of

the physical interfaces. You can set the address of the router-id manually.

Optional parameter Length: the field is one byte. It indicates the total

length of the optional parameter fields (the unit is byte). If there are no

optional parameters, the field is set to 0.



Optional Parameters: variable length field. It provides the list of the

optional parameters of the BGP neighbor negotiation.

Update Message The update message is used to exchange routing information between BGP

peers. When you advertise routes to a BGP peer or cancel the routes, the

update message is used. The update message contains the fixed BGP

header and the following optional parts:

Unfeasible Routes Length: two-byte field. It indicates the total length of

the withdrawn route field. If the field is 0, there is no withdrawn routes.

Withdrawn Routes: variable length field. It contains the IP address prefix

list of the routes withdrawn from the services.

Total Path Attribute Length: the field is two bytes; it indicates the total

length of the path attribute field.

Path Attribute: the variable long field contains the BGP attribute list

related with the prefix in the NLRI. The path attribute provides the

attribute information of the advertised prefix, such as the priority or next

hop. The information is for route filtering and route selection. The path

attribute can be classified into the following types:

1. Well-Known Mandatory: the attributes must be contained in the BGP

update message and the attributes must be implemented and

recognized by all BGP vendors, such as origin, AS_PATH, and

Next_HOP.

ORIGIN: one kind of the well-known mandatory attributes. It gives the

origin of the route update message. There are three possible origins: IGP,

EGP, and INCOMPLETE. The routing device uses the information in the

processing of multiple route selections. Select the route with the lowest

ORIGIN attributes. IGP is lower than the EGP and EGP is lower than the

INCOMPLETE.

AS_PATH: The AS_PATH is a kind of well-known mandatory attributes.

AS_PATH indicates the AS systems that the route in the update message

passes.

NEXT_HOP: It is a kind of well-known mandatory attributes. The attribute

describes the IP address of the next-hop routing device of the destination

listed in the reaching update message.

2. Well-Known Discretionary: the attributes that must be recognized by

all BGP implementations. But the BGP update message can contain the

attribute or not.

LOCAL_PREF: used to distinguish the priority of multiple routes to the

same destination. The higher the attribute of the local priority is, the



higher is the route priority. The local_pref is not contained in the update

message sent to the EBGP neighbor. If the attribute is contained in the

update message from the EBGP neighbor, the update message will be

ignored.

ATOMIC_AGGREGATE: used to warn that the path information is lost in the

downstream routing devices. Some routing information is lost in the route

aggregation for the aggregation comes from different sources with

different attributes. If a routing device sends the aggregation that causes

the information loss, the routing device requires adding the

atomic_aggregate attribute to the route.

3. Optional Transitive: not all BGPs support the optional transitive

attribute. If the attribute cannot be recognized by the BGP process, it

views the transitive tag. If the transitive tag is set, the BGP process

accepts the attribute and transmit it to other BGP peers.

AGGREGATOR: the attribute marks the BGP peer (IP address) performing

the route aggregation and the AS number.

COMMUNITY: the attribute indicates that one destination serves as one

member of the destination group, and these destinations share one

multiple features. The type code of the community attribute is 8. The

community is regarded as a 32-bit value. To facilitate management,

assume that: the community values from 0 (0x00000000) to 65535

(0x0000FFFF) and from 4294901760 (0xFFFF0000) to 429467295

(0xFFFFFFFF) are reserved. The left community value should use the AS

number as the first two bytes. The meaning of the last two bytes can be

defined by the AS. Beyond the reserved values, several well-known

community values are defined.

NO_EXPORT (4294967041 or 0xFFFFFF01): the received routes with the

value cannot be published to the EBGP peers. If an alliance is configured,

the route cannot be published beyond the alliance.

NO_ADVERTISE (4294967042 or 0xFFFFFF02): the received route with

value cannot be published to the EBGP or IBGP peers.

LOCAL_AS (4294967043 or 0xFFFFFF03): the received route with the

value cannot be published to the EBGP peer or the peers of other AS in the

alliance.

4. Optional Nontransitive: not all BGPs support the optional nontransitive

attributes. If the attribute is not recognized by the BGP process, it

views the transitive tag. If the transitive tag is not set, the attribute is

ignored and is not transmitted to other BGP peers.

MULTI_EXIT_DISC (MED): used by BGP peers to distinguish multiple exits

to a adjacent AS. The lower the MED is, the higher is the route priority.

MED attributes are switched between AS systems. When the MED attribute

enters an AS, it will not leave the AS (nontransitive). This is different from



the processing of local priority. The external routing device may affect the

route selection of another AS. The local priority only affects the route

selection in the AS.

ORIGINATOR_ID: the attribute is used by the route reflector. The attribute

is a 32-bit value generated by the route originator. The value is the

routing device ID in the AS. If the originator finds its own router-id in the

received originator-id of the route, it knows that route loopback is


CLUSTER_LIST: the attribute is a list of the cluster ID of the route reflector

that the route passes. If the route reflector finds its own local cluster-id in

the received CLUSTER_LIST of the route, it knows that route loopback is


To advertise the IPv6 reachability information and cancel the IPv6

unreachability information in the BGP update message, create the

following two attributes:

MP_REACH_NLRI: Multiprotocol Reachable NLRI, used to release the

reachable IPv6 route and next-hop information;

MP_UNREACH_NLRI: Multiprotocol Unreachable NLRI, used to cancel the

unreachability IPv6 route.

Network Layer Reachability: the variable long field contains the list of

reachable IP address prefix advertised by the sender.

Keepal ive Message The keepalive messages are exchanged between peers periodically to

check whether the peer is reachable.

Noti f icat ion Message When any error is detected, the notification message is sent. The BGP

connection is closed after the message is sent. Except the fixed BGP

message header, the notification message contains the following fields:

Error Code: one byte, the field indicates the error type.

ERROR SUBCODE: one byte, the field provides more details about the

error.

DATA: variable length field, the field contains the data related with the

error, for example, invalid message header, illegal AS number. The

following table lists the possible error codes and the error subcodes.



Table 34-4 BGP Notification message error code and error subcode

Error Code Error Subcode

1-Message header error 1-Connection not synchronized

2- Message length is invalid

3-Message type is not supported

2-Open message errors 1-Version numbers not supported

2-AS number of invalid peers

3-Invalid BGP identifiers

4-Not supported optional parameters

5-Authentication failed

6-Unacceptable hold time

7-Not supported capability

3-Update message error 1-Format of the attribute list is incorrect

2-well-known attribute cannot be recognized

3-Well-known attribute is lost

4-Attribute tag error

5-Attribute length error

6-Source attribute is invalid

7-AS route cycling

8- next-hop attribute is invalid

9-Optional attribute error

10-Network field is invalid

11-AS path format is incorrect

4-Hold timer timeout Not used

5-FSM error (errors detected by FSM) Not used

6-Stop (critical errors except the listed errors)

Not used

BGP Fini te -State Machine Before the BGP peer can exchange the NLRI, one BGP connection must be

created. The creation and maintenance of the BGP connection can be

described in the FSM. The following provides the complete BGP FSM and

the input events causing the state change.



Figure 34-40 BGP FSM

Table 34-5 Input Events (IE)

IE Description

1 BGP starts

2 BGP ends

3 BGP transmission connection opens

4 BGP transmission connection is terminated

5 Fail to open the BGP transmission connection

6 BGP transmission fatal errors

7 Retrying connection timer times out

8 Duration time terminated

9 Keepalive timer terminated

10 Receive Open messages.

11 Receive Keepalive messages.

12 Receive update messages

13 Receive notification messages

Idle: initial status, the BGP is in the idle status until an operation triggers

a startup event. The startup event is usually triggered by the creation or

restart of BGP session.

Connect: BGP is waiting for the completeness of the transmission protocol

(TCP). If the connection succeeds, send the Open message, and enter the

status of sending open message. If the connection failed, move to the



active status. If the re-connecting the timer times out, it remains in the

connection status; the timer will be reset and one transmission connection

is started. If any other events occur, it returns to the idle status.

Active Status: in the status, BGP attempts to create a TCP connection with

the neighbor. If the connection succeeds, send the Open message, and

move to the status of sending open message. If re-connecting timer times

out, the BGP restarts the connection timer and goes back to the

connection status to monitor the connection from the peers.

OpenSent: in the status, the open message is sent. BGP is waiting for the

open message sent from the peers. Check the received open message. If

any error occurs, the system sends a notification message and goes back

to the idle status. If no error occurs, the BGP sends a keepalive message

to the peer and resets the keepalive timer.

OpenConfirm: in the status, BGP is waiting for a keepalive or notification

message. If a keepalive message is received, it enters the created status.

If a notification message is received, it goes back to the idle status. If the

hold timer times out before the keepalive message reaches, send a

notification message, and goes back to the idle status.

Established: the last phase of the neighbor negotiation. In the status, the

connection between BGP peers is established. Between peers, the update,

notification, and keepalive messages can be exchanged.

BGP Path Att r ibutes The path attribute is a major feature of the BGP route. The path attribute

provides the necessary information about the basic route function and

allows the BGP to set and exchange the route policy.

The route attribute can be one of the following:

Well-Known Mandatory;

Well-Known Discretionary;

Optional Transitive

Optional Non-Transitive;

Well-known mandatory: all BGP update messages contain the attribute,


Well-known discretionary: BGP update messages can contain the attribute,




Optional Transitive: BGP does not need to support the attribute, but it

should accept the path with the attribute and the paths should be

advertised.

Optional Non-Transitive: BGP does not need to support the attribute. If it

is not recognized, the update message with the attribute is ignored; the

path is not published to the peer.

The meaning of the common path attribute is as follows:

ORIGIN: Well-known mandatory, specifies the source of the update

message;

AS_PATH: Well-known mandatory; use the AS sequence to describe the

path between AS systems or the routes to the destination specified by the

NLRI.

NEXT_HOP: Well-known mandatory; describes the next-hop IP address of

the published destination path.

MULTI_EXIT_DISC: Optional non-transitive; allows one AS to notify the

first entrance point to another AS.

LOCAL_PREF: Well-known; the attribute is used to describe the first level

of the BGP device whose route has been published;

ATOMIC_AGGREGATE: well-known discretionary; used to warn the path

information loss in the downstream devices;

AGGREGATOR: Optional transitive, indicates the AS number and IP

address of the device launching the aggregation route;

COMMUNITY: Optional transitive, simplifies the implementation of policy;

ORIGINATOR_ID: Optional non-transitive, the route originator prevents

loopback by identifying the ID in the attribute;

CLUSTER_LIST: Optional non-transitive, the reflector prevents loopback by

identifying the ID in the attribute;

BGP Route Decis ion BGP BGP Route Decision Process

When multiple routes with the prefix of the same length and to the same

destination exist, BGP select the best route according to the following rules:

1. Next-hop unreachable route is ignored;

2. Preferentially select the route with the maximum weight value;



3. Preferentially select the route with the maximum LOCAL_PREF value;

4. Preferentially select the route originated locally;

5. Preferentially select the route with the shortest AS_PATH;

6. Preferentially select the route with lowest ORIGIN attribute;

7. Preferentially select the route with the minimum MED value;

8. Preferentially select the route obtained through the EBGP, instead of

through IBGP;

9. Preferentially select the route whose next-hop has the minimum IGP

metric;

10. Preferentially select the first received EBGP route;

11. Preferentially select the route with the minimum BGP ROUTER-ID;

12. Preferentially select the route with shortest CLUSTER_LIST;

13. Preferentially select the route from the lowest neighbor address;

14. If the BGP load balancing is started, rules 10-13 are ignored. All routes

with the same AS_PATH length and MED values will be installed in the

routing table.

Instance of LOCAL_PREF and MED Preferential Selection


higher LOCAL_PREF value



User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred

ISP. When the device connected to the ISP1 announces routes to the

switch-F, set the LOCAL_PREF value higher. For the same destination,

preferentially select the routes learned by ISP1 for its LOCAL_PREF value

is higher.


lower MED value

The two-host structure is used between a user and an ISP. The ISP prefers

to use LINK2 and use LINK1 as the backup. When the user publishes

routes to the ISP, the update packets with lower MED value are

transferred on LINK2. If the routes transferred on EBGP neighbor created

on LINK2 and LINK1 have no different options, the route with lower MED is

selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.

Route Fi l ter ing Route filtering means that a BGP speaker can determine the sent route

and the received route from any BGP peers. Route filtering is to define the

route policy. The procedure is as follows:

1. Identify Routes

2. Allow or deny routes

3. Operation attributes

We can complete route filtering through access list, prefix list, or AS path

access list. We can also use the route mapping to implement filtering and

attribute operation.



Route Ref lector The route reflector is the centralized routing device or focus of all internal

BGP (IBGP) sessions. The peer routing device of the route reflector is

called route reflector customer. The customers match with route reflector

and exchange routing information. Then, the route reflector exchanges or

reflects the information to all other customers to eliminate the

requirements for the full interconnection environment. As a result, large

amount of money is saved.

The route reflector is recommended only in the large scale internal BGP

closed network. The route reflector increases the overhead of the route

reflector server. If the configuration is incorrect, the route may be cyclic or

unstable. Therefore, route reflector is not recommended in every topology.

All iance The alliance is another method for processing the sharp increase of IBGP

closed network in the AS. Similar to the route reflector, the alliance is

recommended only in the large scale internal BGP closed network.

The concept of the alliance is put forward because one AS can be divided

into multiple sub-AS systems. In each sub-AS, all IBGP rules are

applicable. For example, all BGP routing devices in the sub-AS must form

a fully closed network. Each sub-AS has different AS number. Therefore,

external BGP must be run between them. Although the EBGP is used

between sub-AS systems, the route selection in the alliance is similar to

the IBGP route selection in a single AS. Namely, when the sub-AS boarder

is crossed, the next-hop, MED, and local priority information is reserved.

An alliance looks likes a single AS.

The defect of the alliance is: in the case of changing the plan from the

non-alliance to the alliance, the routing devices should be reconfigured

and the logical topology should be changed. In addition, if the BGP policy

is not manually set, you cannot select the best route through the alliance.

Route Damping Route damping (route attenuation) is a technology controlling the

unstability of routes. It significantly reduces the unstability caused by

route oscillation.



The route damping divides the route into normal performance and bad

performance. Routes with normal performance demonstrate long-term

high stability. In addition, the route with bad performance demonstrate

unstability in short term. The route with bad performance should be

punished with direct proportion to the expected route unstability. Unstable

routes should be suppressed until the route becomes stable.

The recent history of the route is the basis of evaluating the future

stability. To know the route history, first, you should know the swing times

of the route in certain period. In the route damping, when the route

swings, it is punished. When the punishment reaches a predefined limit,

the route is suppressed. After the route is suppressed, the route can

increase punishments. The more frequent the route swing is, the earlier

the route will be suppressed.

Similar rules are used to un-suppress the route and re-advertise the route.

An algorithm is used to exit (reduce) punishment according to the power

law. The basis of configuring the algorithm is the parameters defined by

users.

BGP Graceful Restart Principle of BGP Graceful Restart

After the route device becomes faulty, the neighbors in the BGP route

layer will detect that the neighborship becomes down and up, which is

called BGP neighbor oscillation. The oscillation of neighborship finally

causes the route oscillation. As a result, route blackhole occurs after the

routing device is restarted for a while or the data service of the neighbor

bypasses the restarted routing device. Consequently, the reliability of the

network is decreased.

The BGP graceful restart in the case of routing device failure prevents the

route disturbance and accelerates the route aggregation, which ensures

the network reliability.

Process of BGP Graceful Restart

Through BGP graceful restart, the following aspects are expanded:

1. In the BGP OPEN message, the graceful restart capability is added. The

fields are as follows:

Restart-flag: indicates whether the neighbor is restarted, 1: Yes; 0: No.



AFI/SAFI: the address family supporting graceful restart;

Fwd-flag: if an address family has the graceful restart capability, and

request for reserving the address family route, the value is 1. Otherwise,

the value is 0;

2. In the BGP update packets, add the EOR flag to indicate that the

update is complete.

3. Three timers are added

Restart-timer: Helper end is started, indicates that the reconstruction

session enters the longest waiting time of the GR flow

Stale-path-timer: Helper end is started, the longest time of reserving

routes;

Defer-timer: restarter end is started, the longest time of delaying

calculation and advertisement

Figure 34-33 Graceful restart flow

Restarter end (Switch-A):

1. At the beginning of creating neighbors, negotiate the GR capability

through the open message;

2. When any fault occurs, the forwarding layer of switch A reserves the

route and continue guiding the forwarding;

3. Re-construct the neighbor, send open messages. The restart-flag is set

to 1, which indicates that the restart is performed, notifying the



restart-time value and the reserved address family route to the

neighbors.

4. After the neighbor is restarted, start defer-timer to receive updates

from the neighbors.

5. Delay the route calculation until the EOR flag from the neighbor is

received or the deter-timer times out.

6. Calculate the route, update the core route and advertise the route.

Helper end (Switch-B):

1. At the beginning of creating neighbors, negotiate the GR capability,

and record that the neighbor has the GR capability.

2. After the restarter end becomes faulty, if any TCP error is detected,

run step 3, if no TCP error is detected, run step 4.

3. Reserve Routes; start the restart timer.

4. Re-construct neighbors and delete the restart timer. If the timer exists,

start the stale-path timer.

5. Before the creation, the restart timer times out, or the fwd-flag in the

corresponding address family of the open message is not 1, or the

corresponding address family information is not contained, run step 8.

6. Send routes to the restart routing device. Then, send EOR flag.

7. If the stale-path times out before the EOR is received, run step 8.

8. Delete the reserved route and then enter the normal BGP flow.



GVRP Technology

This chapter describes the GVRP and GARP technology and the application.

Main contents:

GVRP overview and GARP principle

Implementation of GVRP

Typical Application

GVRP Overview and GARP Principle This chapter describes the GVRP concept and GARP principle.

Main contents:

GVRP overview

GARP principle

GVRP Overview Generic Attribute Registration Protocol (GARP) provides the mechanism of

generic attribute registration, de-registration, and transfer. According to

different attributes of the GARP protocol packets, different upper layer

protocol applications are supported.

GARP VLAN Registration Protocol (GVRP) is one application of GARP. It

implements VLAN dynamic registration, de-registration, and attribute

transfer. The GARP protocol distinguishes applications through the

destination MAC of the protocol packets. The destination MAC of GVRP is



01-80-c2-00-00-21. The GVRP can only be configured in the port of trunk

mode.

GARP Principle

GARP Message The information exchange between GARP members is through three types

of messages: join message, leave message, and LeaveAll message.

Join Message

When a GARP application entity wants other entity to register its own

attribute information, it will send join message. When the join message

from other entities is received or some attributes are statically configured

in the entity, if you need other GARP application entity to register, it will

send join message.

The join message includes JoinEmpty and JoinIn. The differences are as

follows:

JoinEmpty: Announce an attribute not registered.

JoinIn: Announce an attribute registered.

Leave Message

When a GARP application entity wants other devices to de-register its own

attribute information, it will send the Leave message. When you de-

register some attributes after receiving the Leave message from other

entities or you de-register some attributes statically, it will send the leave

message.

The Leave message includes LeaveEmpty and LeaveIn. The differences are

as follows:

LeaveEmpty: De-register an attribute not registered.

LeaveIn: De-register an attribute registered.

Leaveall Message

When each GARP application entity is started, the LeaveAll timer will be

started at the same time. If the timer times out, the GARP application

entity will send the LeaveAll message. The LeaveAll message is used to



de-register all attributes. Then, other GARP application entity re-register

all attribute information in the local entity.

Note

For the GARP protocol standard, refer to IEEE 802.1D.

GARP Timer Join Timer

The Join timer is used to control the sending of Join message (including

JoinIn and JoinEmpty). To ensure the reliable transmission of the Join

message, you have to wait for the interval of the Join timer after the first

join message is sent. If the JoinIn message is received within one Join

timer interval, the second Join message will not be sent. If the JoinIn

message is not received, re-send a Join message.

Hold Timer

The hold timer is used to control the sending of Join message (including

JoinIn and JoinEmpty) and Leave message (including LeaveIn and

LeaveEmpty).

When the attribute is configured in the application entity or the application

entity receives a message, the entity will not send the message to other

devices immediately. It waits for a hold timer interval before sending the

message. The device encapsulates the messages received in the Hold

timer interval into the least packets to reduce the amount of sent packets.

The value of Hold timer should be less than or equivalent to half of the

Join timer value.

Leave Timer

The Leave timer will be started after each application entity receives the

Leave or LeaveAll message. If the Join message of the attribute is not

received before the Leave timer times out, the attribute will be de-

registered.

LeaveAll Timer

After each GARP application entity is started, the LeaveAll timer will be

started. If the timer times out, the GARP application entity will send

LeaveAll message. Then, the LeaveAll timer is started to start a new cycle.



GARP Packet Format

GARP Packet Format

The description of the GARP protocol fields

Field Description Value

Protocol ID Protocol ID 1

Message Each message is composed of attribute type and attribute list

―

Attribute Type The type of the attribute The value of GVRP attribute type is 1; it indicates the VLAN ID

Attribute List Attribute list －

Attribute Each attribute is composed of attribute length, attribute event, and attribute value.

－

Attribute Length Attribute length (including the length field)

2-255 bytes



Attribute Event Attribute event 0: LeaveAll Event 1: JoinEmpty Event 2: JoinIn Event 3: LeaveEmpty Event 4: LeaveIn Event 5: Empty Event

Attribute Value Attribute value The attribute value of GVRP is the VLAN ID. But the value of LeaveAll attribute is invalid

End Mark End flag 0x00

Implementation of GVRP GVRP is one application of the GARP. It maintains the VLAN dynamic

registration information and transmits the information to other devices

based on the GARP working mechanism. The manually configured VLAN is

called a static VLAN. The VLAN created through the GVRP protocol is called

a dynamic VLAN.

Enable the GVRP function (enable the GVRP function globally; enable GVRP

in the trunk port). Transmit the VLAN information allowed by the trunk

port to the connected network segment through the GVRP packet. When

the switch on the network segment receives the GVRP packets, it registers

or de-registers the LAN according to the parsed packet information. At the

same time, the switch transmits the VLAN information to the network

segment of the active port. As a result, the VLAN information is

transmitted to the entire switching network. When the GVRP is

transmitting information, the VLAN information is only transmitted in the

corresponding active ports (in the forwarding status). The active status of

the port is retrieved from the MSTP module. If the port is not in the

FORWARDING state in the instance mapped by the message VLAN after

receiving the message, directly drop the message and do not transmit it.

GVRP has three registration modes. Different modes have different

processing mode for static VLAN and dynamic VLAN. The definition of

three GVRP registration modes is as follows:

Normal Mode

Allow the port to dynamically register or de-register VLAN, to transmit the

information about dynamic VLAN and static VLAN.



Fixed Mode

Forbid dynamic registration or de-registration of the VLAN in the port;

transmit only the information about static VLAN and the information about

dynamic VLAN is not transmitted. Namely, for the trunk port set to be

Fixed, even if all VLANs are allowed, only the manually configured VLANs

can pass the port.

Forbidden Mode

Dynamic registration and de-registration of VLAN in the port are forbidden.

The information about the VLAN except VLAN1 cannot be transmitted.

Namely, for the trunk port set to be Forbidden, even if all VLANs are

allowed, only the VLAN1 can pass the port.

Note

For the GVRP protocol standard, refer to IEEE 802.1Q.

Typical Application Through the GVRP function, you only need to configure the VLAN of some

devices (boarder devices). Then, the VLAN configuration can be

automatically applied to the switching network, which reduces the work of

the administrator and reduces the possibility of making mistakes.

GVRP networking diagram



The preceding figure describes the dynamic creation of VLAN in the

network. In each device, the GVRP function is enabled. The GVRP function

is enabled in the ports where devices are connected. The port is configured

as trunk port and permit all VLANs to pass. In this case, you only need to

statically create VLAN 10-20 in switch A and switch G. As a result, other

devices can dynamically learn the VLAN attributes and then VLAN10-20

can be created dynamically.



Private VLAN Technology

This section describes the Private VLAN protocol technology and the

application. The function is just applicable to MyPower 3400 and

MyPower4100.

Related Terms of Private VLAN Protocol Private VLAN(PVLAN): The private VLAN divides the L2 broadcast domain

of one VLAN to multiple sub domains. Each sub domain comprises one

private VLAN: Primary VLAN and Secondary VLAN.

Primary VLAN: The primary VLAN represents one sub domain. All PVLANs

in one PVLAN domain share one primary VLAN;

Secondary VLAN: There are two types of primary VLAN, including Isolate

VLAN and Community VLAN;

Isolated VLAN: The ports in one Isolated VLAN cannot perform the L2

communication each other. There is only one Isolated VLAN in one PVLAN

domain;

Community VLAN: The ports in one community VLAN can perform the L2

communication each other, but cannot perform the L2 communication with

the ports in other community VLAN. There can be multiple community

VLANs in one PVLAN domain.

Promiscuous port: It belongs to the primary VLAN and can communicate

with any port in the PVLAN domain, including the Isolated ports and

Community ports of the secondary VLAN in one PVLAN domain.



Isolated port: It belongs to the Isolated VLAN and can only communicate

with the promiscuous port.

Community port: It belongs to the community VLAN. The community ports

in one community VLAN can communicate with each other and also can

communicate with the promiscuous ports, but cannot communicate with

the community ports in other community VLANs or the Isolated ports in

the Isolated VLAN.

Introduction to Private VLAN Protocol The VLAN domain in the standard Ethernet is the broadcast domain. The

L2 communication can be performed between the users in one VLAN,

which is sure to bring a serious hidden trouble for the network security.

The traditional solution is to distribute one separate VLAN for each user

that needs to be isolated, which brings twp aspects of problems. One is

the resource problem. At first, there are only 4096 VLANs and 1-4094

VLANs are usually configured, which restricts the user quantity supported

by the service provider. Secondly, one VLAN is usually specifies one

subnet address or a series of addresses. If distributing too many VLANs,

too many IP resources are consumed. The other is the management

problem. Based on the previous description, when there are users that

need to be added or deleted, you need to ser VLAN and IP and the

network management is difficult. To sum up, the traditional scheme of

solving the L2 isolation brings two aspects of problems, that is, resource

consumption and management.

PVLAN (Private VLAN) is the technology of distributing and using VLAN

resources in the carriers’ network. The basic theory of the technology is to

endure the VLAN with two different kinds of attributes, that is, Primary

VLAN and Secondary VLAN. Primary VLAN is for the carriers’ network,

while Secondary VLAN is for the connected network of the user. According

to the different L2 forwarding isolation rules, Secondary VLAN is divided to

Isolated VLAN and Community VLAN. The port contained in Secondary

VLAN is called host port. According to the two types of Secondary VLAN,

the host port is divided to Isolated Port and Community Port. The port for

the carriers’ network in Primary VLAN is called promiscuous port.

Primary VLAN and Secondary VLAN form one PVLAN domain. One PVLAN

domain must contain one and at most one Primary VLAN (therefore, we

take Primary VLAN to represent PVLAN domain), and can contain multiple

Community VLANs and at most one Isolated VLAN. The promiscuous port

belongs to all PVLANs of the PVLAN domain, while the host port only

belongs to its own Secondary VLAN and Primary VLAN.



In PVLAN domain, the host port of Isolated VLAN can only communicate

with the promiscuous port of primary VLAN, while the host ports in

Isolated VLAN cannot communicate with each other. The host port of

Community VLAN can communicate with the promiscuous port of primary

VLAN and the other host ports in the Community VLAN.

The Secondary VLAN of PVLAN domain is transparent for the L3 function,

that is, all L3 functions should be bound to the Primary VLAN. All ports of

the PVLAN share the same L3 interface.

To ensure the normal forwarding of the packets in the PVLAN domain,

ensure that all VLANs in the PVLAN run on one MSTP instance.

Typical Application of Private VLAN The PVLAN networking is as follows:

The above figure is one complete PVLAN domain. VLAN 2 is Primary VLAN;

VLAN 100 is Isolated VLAN; VLAN 101 and VLAN 102 are Community VLAN.

Port 0/0/7 is Promiscuous Port; Port 0/0/1 and Port 0/0/2 are Isolated



Port ; Port 0/0/3, Port 0/0/4, Port 0/0/5 and Port 0/0/6 are all Community

Port.

Port 0/0/7 can communicate with Port 0/0/1-Port 0/0/6; Port 0/0/1 and

Port 0/0/2 can only communicate with Port 0/0/7; Port 0/0/3 and Port

0/0/4 can communicate with each other and with Port 0/0/7. Port 0/0/5

and Port 0/0/6 can communicate with each other and with Port 0/0/7.

For details about the PVLAN configuration, refer to PVLAN Configuration

Manual.



Voice VLAN Technology

This chapter describes the Voice VLAN protocol technology and application.

The function is only applicable to MyPower 3400 and MyPower4100.

Related Terms of Voice VLAN Protocol Voice VLAN: It is the VLAN used to transmit the VoIP data. It also means

the function of identifying and distributing the VoIP data at the access

layer, provided by the MyPower 3400 and MyPower4100 series switch.

OUI address: The address range got by performing ―and‖ on the MAC

address and address mask, used to identifying the packet sent by the VoIP

device of the manufacturer.

Introduction to Voice VLAN With the development of the VoIP technology, the IP telephones and IAD

(Integrated Access Device) are being applied more widely, especially in

the broadband districts. In the network, there is voice data and service

data at the same time. During the transmission, the voice data should

have the higher priority than the service data, so as to reduce the delay

and packet loss.

The traditional method of improving the transmission priority of the voice

data is to use ACL to distinguish the voice data and use QoS to ensure the

transmission quality. To simplify the user configuration and manage the

transmission policy of the voice flow more conveniently, MyPower 3400

and MyPower4100 series switch provides the Voice VLAN function, which

identifies the voice flow via the source MAC address of the packet and

sends the voice flow to the specified VLAN (Voice VLAN).

MyPower 3400 and MyPower4100 series switch matches the source MAC

address of the packet via the OUI address. The packet that complies with



OUT address is regarded as the VoIP packet. By default, five OUI

addresses are pre-configured in the system.

Table 1: The pre-set default OUI address of the switch

Serial No.

OUI address Manufacturer

1 0003-6b00-0000 Cisco phone

2 000f-e200-0000 H3C Aolynk phone

3 00d0-1e00-0000 Pingtel phone

4 00e0-7500-0000 Polycom phone

5 00e0-bb00-0000 3Com phone

When the source MAC address of the packet matches the OUI address of

the VoIP device, the data is regarded as the VoIP data, the priority of the

packet is automatically modified, and the packet is forwarded to the

corresponding Voice VLAN, ensuring the call quality.

When configuring the Voice VLAN on the port, the user can choose the

following two application modes:

Auto mode: When the port configured as the auto mode receives the

VoIP packet, automatically modify the priority of the packet, forward

the packet to the corresponding Voice VLAN, and use the aging

mechanism to maintain the ports in Voice VLAN. If the port does not

receive the data from the MAC address any more before the aging

time reaches, the MAC address automatically exits from the Voice

VLAN.

Manual mode: The user needs to use the default vid of the command

configuration port as the vid of the voice vlan.

The port in auto mode only processes the untagged voice flow. The system

uses the untagged packet sent on the VoIP device regularly, learns the

source MAC address and automatically adds the MAC address of the VoIP

device to the Voice VLAN; the MAC address that reaches the aging time,

but cannot update the OUI address is automatically deleted from the Voice

VLAN. The user needs to adopt the command to add the port to the Voice

VLAN or remove the port from the Voice VLAN manually.

The port in manual mode processes the voice flow in the configured VLAN.

The user needs to adopt the command to add the port of the access IP

telephone to the Voice VLAN directly.

The system regards that the tag packet is distributed with the priority, so

does not need to modify the packet priority.



Ports Cooperating with IP Telephone Sending tagged Voice Flow To send tagged voice flow, the IP telephone needs to get the Voice VLAN

information automatically or manually. As for this, different types of ports

need the corresponding configurations so that the voice packets can be

transmitted normally in the Voice VLAN and does not affect the forwarding

of the common service packets.

The IP telephone that is configured with Voice VLAN manually does not

need the process of requesting the IP address in the default VLAN for the

first time, but always send/receive the voice flow with Voice VLAN Tag.

However, the IP telephone that is configured with IP address and voice

VLAN directly initiates registration and communication with the voice

gateway.

Therefore, when cooperating with the IP telephone whose Voice VLAN

information is known and that sends the tagged voice flow, the ports

connected to the IP telephone on the switch need to meet the following

conditions:

Table 2 The conditions for all types of ports to cooperate with the IP phone

that automatically gets the Voice VLAN information

Port Type Support or Not Voice VLAN Work

Mode of Port Condition

Access

Do not support;

the port sends

the Tag data, so

it cannot be

configured as the

Access port.

- -

Trunk Port Support Auto mode

(tag+untag)

You need to configure the default

VLAN of the port and configure the

port to permit the packets of the

default VLAN to pass; the default

VLAN cannot be Voice VLAN

(PVID！=Voice-VLAN, allowed tag-

list contains the PVID)



Port Type Support or Not Voice VLAN Work

Mode of Port Condition

Manual mode

(tagged)

Similar to the auto mode, you also

need to configure the port to

permit the packets of the default

VLAN to pass (PVID！=Voice-

VLAN, allowed tag-list contains the

Voice-VLAN)

Hybrid Port Support

Auto mode

(untag+tag)

You need to configure the default

VLAN of the port and configure the

port to permit the packets of the

default VLAN to pass without tag;

the default VLAN cannot be Voice

VLAN (PVID！=Voice-VLAN, PVID

is untag mode)

Manual mode

(tagged)

Similar to the auto mode, you also

need to configure the port to

permit the packets of Voice VLAN

to pass with Tag (PVID！=Voice-

VLAN and tag-list contains Voice-

VLAN)

Note In the above conditions, if the user configures the Voice VLAN

information of the IP phone manually, whether the access port needs to

permit the packets of the default VLAN to pass depends on whether the

port is connected to common PC, so the default VLAN is mainly used to

transmit the common service packets of the PC. If no common PC is

connected, the port does not need to permit the packets of the default

VLAN to pass.

Ports Cooperating with IP Telephone Sending untagged Voice Flow To make the switch receive the untagged packets, the user needs to

configure the default VLAN of the receiving port and configure the port to

permit the default VLAN to pass. When the IP phone sends the untagged

voice flow, the default VLAN of the port needs to be configured as the

default VLAN of the port as Voice VLAN so that the voice flow can be

transmitted in the Voice VLAN. This is equivalent to configure the port to

be added to the Voice VLAN manually. Therefore, if the IP phone sends

untagged voice flow, the Voice VLAN work mode of the port can only be

manual mode, but cannot be configured as auto mode.



Table 3 The conditions for all types of ports to cooperate with the IP phone

that sends the untagged voice flow in manual mode

Port Type Support or not Condition

Access Support Configure the default VLAN as Voice VLAN

(PVID=Voice-VLAN)

Trunk Port Support

The default VLAN of the access port must be Voice

VLAN and the access port permits the VLAN to pass

(PVID=Voice-VLAN).

Hybrid Port Support

The default VLAN of the access port must be Voice

VLAN and must be in the untagged VLAN list that the

access port permits to pass (PVID=Voice-VLAN，

untag-list contains PVID)

From the point of the switch:

If the port enables the Voice VLAN function and is configured as the auto

mode, use the PVID of the port to forward when receiving the first

untagged packet, and later, forward the packets according to the matching

status of the source MAC; if the tagged packet is received and tagged is

Voice-VLAN, forward the packet in Voice-VLAN.

If the port enables the Voice-VLAN function and is configured as the

manual mode, use the PVID of the port to forward when receiving the

untagged packet (PVID＝Voice-VLAN); if the tagged packet is received and

tag is permitted to pass the port, forward the packet in tag-vlan.

For the port in manual mode whose default VLAN is Voice VLAN, any

untagged packet can be transmitted in Voice VLAN, but do not need to use

OUI to check.

Precautions Voice VLAN uses some limitation conditions, as follows:

VLAN1, super-vlan, p-vlan, and QinQ cannot be configured as voice-vlan.

The interactive check is needed in the realization.

Voice-VLAN supports the aggregation port.



By default, the OUI information of the manufacturer is loaded. When the

Voice VLAN initiates, the OUI information of the manufacturer is written

into ACL; meanwhile, the user cannot delete the OUI information.

Both Voice VLAN and MAC VLAN need to use the hardware resources of

MAC VLAN. When Voice VLAN and MAC VLAN are configured for one MAC

address, only the configuration of Voice VLAN takes effect.

Typical Application of Voice VLAN The auto mode is suitable for the networking (as shown in Figure 1) where

the PC-IP phones are connected in series (the ports transmits the voice

data and common service data); when the user performs the voice

communication, the port can transmit the voice data first; when there is

no voice flow, the port can process the common service data in full sail.

Figure 1 The network diagram when host and IP phone are connected to

switch in series

The manual mode is suitable for the network mode (as shown in Figure 2)

where the IP phone is separately connected to the switch (the ports only

transmit the voice packets). The static adding mode can make the port be

used to transmit the voice data privately, avoiding the influence of the

service data for the voice data transmission furthest.

Figure 2 The network diagram when IP phone is separately connected to

switch



Neighbor Discovery Technology

This chapter describes the neighbor discovery technology and its

application.

Main contents:

NDSP and relevant terms

Introduction to NDP

Typical Application

NDSP and Relevant Terms Neighbor: the devices connected with the local device are the neighbors of

the device

Hello packets: the packets are the basis of maintaining the neighbor

relation. In the packets, the information about the sender is encapsulated

for the receivers to learn and update.

Aging time: when the local devices failed to receive hello packets sent

from the neighbors after the aging time, the neighbor is thought to be

nonexistent. Delete the neighbor from the neighbor list.

Introduction to NDSP The NDSP protocol is for detecting the directly connected Maipu devices.

The NDSP uses the hello messages (NDSP packets) periodically sent

between two directly-connected devices to maintain the neighbor relation.

By default, each Maipu device sends a NDSP packet to the connected

opposite party at an interval of 60 seconds. If no NDSP packets from the



opposite party are received after three hello periods (180 seconds,

holdtime or TTL), the local device deletes the NDSP neighbor device in the

NDSP neighbor table.

Typical Application

Illustration

As shown in the preceding figure, two switches are connected through port

0/0/0.

Configuration of Switch-a:

Command Description

SwitchA(config)#ndsp run Enable NDSP globally

SwitchA(config)#ndsp timer 30 Send hello packets of NDSP at an interval of 30 seconds

SwitchA(config)#ndsp holdtime 150 Set the aging time of NDSP neighbor to 150 seconds

SwitchA(config)# port 0/0/0 Enter the port configuration mode.

SwitchA(config-port-0/0/0)#ndsp enable Enable NDSP port

Configuration of Switch-b:

Command Description

SwitchB(config)#ndsp run Enable NDSP globally

SwitchB(config)#indsp timer 35 Send hello packets of NDSP at an interval of 30 seconds

SwitchB(config)#ndsp holdtime 160 Set the aging time of NDSP neighbor to 160 seconds

SwitchB(config)# port 0/0/0 Enter the port configuration mode.

SwitchB(config-if-dialer0)#ndsp enable Enable NDSP port



MFF Technology

This chapter describes the MFF technology and the application.

Main contents:

MFF technology

Typical application

MFF Technology In the traditional Ethernet networking scheme, to realize the L2 isolation

and L3 intercommunication between different client hosts, adopt the

method of dividing VLAN on the switch, but when there are many users

that need the L2 isolation, it occupies lots of VLAN resources. Meanwhile,

to realize the L3 intercommunication between clients, you need to divide

different IP segment for each VLAN and configure the IP address of the

VLAN interface. Therefore, dividing too many VLANs reduces the

distributing efficiency of the IP addresses.

To improve this, MAC-Forced-Forwarding (MFF for short) provides one

solution for realizing the L2 isolation and L3 intercommunication between

the client hosts in one WAN.

MFF intercepts the ARP request packet of the user and replies the ARP

response packet of the gateway MAC address via the ARP pick-up

mechanism. In this way, you can force the user to send all traffic

(including the traffic in one subnet) to the gateway so that the gateway

can monitor the data flow, avoiding the vicious attack between users and

ensuring the security of the network deployment.



MFF Terms Related terms:

AN (access node): the access node of the user terminal; usually, it

refers to the access switch of the user;

AR (access router): the access router of the user terminal or the switch

with the L3 function; usually, it refers to the gateway of the subnet where

the user is located;

AS (access server): the server that provides the specified service;

User port: the port that is directly connected to the network terminal user;

Network port: the ports that connect to other network devices, such as

access switch, aggregation switch and gateway.

MFF principle:

The MFF principle processes the following three aspects:

Get the IP address and MAC address of AR. In the DHCP environment,

get the IP address of AR via DHCP snooping and get the MAC address

of AR via ARP; in the static IP address environment, you need to pre-

configure the default IP address of AR and then get the MAC address

of AR via ARP.

Intercept the ARP request packet of the user and reply the MAC

address of AR to the user. In this way, the ARP request host forms

the MAC addresses to all other hosts as the ARP entries of the MAC

addresses of AR. When receiving the request packet for the user host

from AR, reply the MAC address of the user host to AR.

Filter the uplink packets and drop all unicast packets except for those

whose MAC address is AR. Because of the virus or other network faults,

the unicast packets whose destination MAC is other host may be

received, so these packets need to be dropped.

MFF port features:

The VLAN in which MFF is enabled include two port roles, that is, user port

and network port. The two ports only limit the ingress packets.

1. User port (the port connected to the user terminal device) processes

different packets as follows:

Permit multicast packets and DHCP packets to pass



The ARP packets are sent to CPU for processing

When the MAC address of AR is learned, permit the unicast packet

with destination MAC as AR to pass and drop the other packets;

when the MAC address of AR is not learned, drop the unicast

packet with destination MAC as AR;

Drop the other packets;

2. Network port (the port of AN connected to other device devices)

processes different packets as follows:

Permit multicast packets and DHCP packets to pass

Send the ARP packets to CPU for processing

Permit unicast packets to pass

Drop the other packets

In the VLAN enabled with the MFF function, all ports are the user ports by

default. The network ports need to be enabled via the command. The

limitation feature of the network ports and users for packets is just in the

VLAN enabled with the MFF function. In the VLAN not enabled with the

MFF function, the user ports and network ports do not have the above

features.

MFF gateway detection:

To get the ARP information of the gateway and ensure the availability of

the gateway, after enabling the MFF function of VLAN, the gateway

detection function is enabled by default. The user can force disabling the

gateway detection function via the command. The gateway detection relies

on the ARP information of the user. When one user is connected, MFF

intercepts the ARP packet of the user and uses the ARP information of the

user to detect the gateway. If the gateway is unavailable , the detection

interval is 5s; if the gateway is available, the detection interval is 30s by

default. The user can configure the detection interval of the gateway (the

gateway detection interval configured by the user can take effect only

when the gateway is available; when the gateway is unavailable, the

gateway detection interval is fixed as 5s.)

User ARP aging

After MFF learns the ARP of the connected user from the user port, the

ARP aging function of the user is enabled by default. You can use the

command to disable it. By default, the aging interval is 300s. The user can

configure the aging interval. If the user ARP is not received in successive

four aging time, regard that the user does not exist any more and delete

the ARP information of the user.



Typical Application

Figure 40-1 MFF typical application example

As shown in the figure, switch A and switch B are the access devices of the

user terminal; switch C is the aggregation device.

Gateway: 10.1.1.254 0001.7a4c.a945; server: 10.1.1.253

Host A, Host B and Host C are the user hosts, which all belong to VLAN 10.

The corresponding IP addresses are 10.1.1.1 10.1.1.2 10.1.1.3. The MFF

function is enabled on the access device of the user terminal switch A and

switch B. When host A wants to communicate with host B, send ARP to

request the MAC address of host B; switch A intercepts the ARP request

and replies the MAC address of the gateway to Host A. As a result, host A

regards the MAC address of the gateway as the MAC address of Host B by

mistake. Therefore, it sends data to gateway. After the gateway receives

the data from host A, it is found that the destination IP address is Host B.

After querying the route, the gateway sends the route query result to Host

B. Similarly, the data sent from Host B to Host A is forwarded via the

gateway.

The data forwarding path is as follows:



Figure 40-2 MFF data forwarding path

Switch A configuration:

Command Description

SwitchA(config)#port 0/1-0/2 Enter port mode

SwitchA(config-port-range)#port access vlan 10 Add port 0/1,0/2 to VLAN 10

SwitchA(config-port-range)#port 0/3 Enter port 0/3

SwitchA(config-port-0/3)#port mode trunk Set port 0/3 as trunk port

SwitchA(config-port-0/3)#port trunk allowed vlan 10 Add port 0/3 to VLAN 10

SwitchA(config-port-0/3)#mac-forced-forwarding network-port

Set port 0/3 as the network port

SwitchA(config-port-0/3)#exit Exit the port mode

SwitchA(config)#vlan 10 Enter the VLAN mode

SwitchA(config-vlan10)#mac-forced-forwarding default-gateway 10.1.1.254

Configure the default gateway of VLAN as 10.1.1.254

Switch B configuration:

Command Description

SwitchB(config)#port 0/1 Enter port mode

SwitchB(config-port-0/1)#port access vlan 10 Add port 0/1 to VLAN 10

SwitchB(config-port-0/1)#port 0/2 Enter port 0/2

SwitchB(config-port-0/2)#port mode trunk Set port 0/2 as trunk port

SwitchB(config-port-0/2)#port trunk allowed vlan 10 Add port 0/2 to VLAN 10



SwitchB(config-port-0/2)#mac-forced-forwarding network-port

Set port 0/2 as the network port

SwitchB(config-port-0/3)#exit Exit the port mode

SwitchB(config)#vlan 10 Enter the VLAN mode

SwitchB(config-vlan10)#mac-forced-forwarding default-gateway 10.1.1.254

Configure the default gateway of VLAN as 10.1.1.254



PPPoE+ Technology

This chapter describes the principle and application of the PPPoE+

technology.

Main contents:

PPPoE+ principle

PPPoE+ typical application

PPPoE+ Principle With the popularity of the network construction based on the IP

technology and being richer of the user service type, carriers need to

enhance the control capability for the user service data. Currently, IP

DSLAM serves as the main access device of DSL. The upstream BAS

cannot or is hard to get the user port information from the Ethernet packet,

so it cannot authenticate and manage the user ports in a unified manner

and cannot prevent the user account from being embezzled effectively.

PPPoE＋ is short for PPPoE Intermediate agent. At first, the scheme is put

forward on the DSL FORM forum and is defined according to the RFC 3046

user line ID field. The original idea of the PPPoE+ scheme is that after

receiving the PPPoE PADI and PPPoE PADR packets of the user, DSLAM

adds the PPPoE+ tag that indicates the user physical port number or PVC

in the packet. After identifying PPPoE+ Tag, the upstream BRAS extracts

the physical location information of the user and uses the Radius NAS-

Port-ID attribute to Radius Server for user identification and user

management.



Figure 41-1

As shown in the above figure, the PPPoE+ flow is as follows:

1. The user terminal initiates the PPPOE request and sends the PPPOE

PADI packet;

2. DSLAM captures the PADI packet and sends it to PPPoE Intermediate

Agent for processing;

3. PPPoE Intermediate Agent writes the physical location information of

the user into the PADI packet as VSA (Vendor Specified Attribute)

according to the physical location of the user. The VSA is PPPoE+ Tag.

4. After receiving PADI+VSA, BRAS replies the PADO packet to the user;

5. The terminal sends the PADR packet to request access according to

the normal flow;

6. DSLAM captures the PADR packet and inserts PPPoE+ Tag to the PADR

packet;

7. After receiving PADR+VSA, BRAS distributes one PPP Session ID for

the STB and bind the PPPoE+ Tag and PPP Session ID;

8. Here, BRAS can process the PPP flow normally. After the PPP flow is

complete, BRAS sends PPPoE+ Tag to the IPTV service system and

Radius Server via Radius NAS-Port-ID.



PPPOE+ Typical Application

Figure 41-2

The above figure is the typical application environment of PPPoE+. pc A

and pc B initiate the PPPoE connection request to router A via switch A and

switch B. After enabling the PPPoE+ function in the access ports of switch

A and switch B, radius server records the access information of pc A and

pc B. If changing the port or the switch is re-connected after pc A and pc b

are connected successfully, radius server can discover the change of the

access location and do the corresponding processing according to the user

configuration, so as to control the user access.

MyPower Switch Technical Manual - Intelek · 2020. 5. 4. · MyPower Switch Technical Manual Maipu Confidential & Proprietary Information Page 3 of 628 Maipu Feedback Form Your opinion

Documents