NAT44 at NTNU - Uninett at NTNU_0.pdf · –ASR supports BGP FlowSpec •Potential cooperation with NREN (or NorduNET) for DDoS mitigation. 16 Performance • Tested by Uninett using

NAT44 at NTNUImplementing CGN for Eduroam

Kyrre Liaaen

NTNU

2

Agenda

• What

• Why

• How

• A word on performance

3

What

• Centralised NAT scaled to serve more than 20k users

• Implemented on university Internet Edge

– 2 routers

– 2 x 10Gbps bonded uplinks each

– Redundancy and stateless failover

• NAT44

– Native IPv6

– RFC1918 in client networks

4

Why

• IPv4 scarcity– Initially generous campus allocations (/23 for everyone!)

– Smaller allocations after «Peak IPv4»• /24 for everyone!

• /25 for everyone!

• What could possibly go wrong

– Cost outweighs benefit when considering reclaiming allocatedspace

• Coordinating with IPv4 range «owner» and effectuating changes werenon trivial

• Reclaiming a /24 gains us … a /24. We need at least multiples of /22..

– Lack of public IPv4 adresses impacts service delivery• New services are put on hold

• The wireless monster needs feeding– More users, more devices per user

– Merger with other University Colleges brings more users» University Colleges are also short of IPv4. Go figure…

5

Why (II)

• IPv6 «any time now»…

– No IPv6 in current WLAN due to legacy technology (WiSM…)

– IPv6 often disabled when troubleshooting issues

– Users still heavily reliant on IPv4-only resources

• Library journals, etc.

– Many services managed by University IT still lack IPv6 presence

• Woe is me

– Still seems to be a low priority for equipment vendors

• Bugs and resource shortages

• Long turnover (WiSM…)

6

How – Design process

• Philosophical

– NAT for WLAN only, guest only, area only, or entire campus?

• Technical

– Distributed v.s centralised – we have 10 campus routers

– Scaling solution to serve required user mass

– Managing one v.s managing ALL THE THINGS

– Cost of purchasing several smaller units v.s fewer large ones

• And then upgrading when a smaller unit is too small

• Practical

– RFC1918 already in use for internal only networks

• Lab equipment etc.

• Sudden surprise access to Windows Update was… undesired

7

How – Design Process (II)

• We selected a centralised model for service deployment– Scaled to meet expected need «until we turn off IPv4»

• Service made available for all devices in networkregardless of type or usage– Key phrase: «prepared for». We don’t know what the users are

going to need 5 years down the line

• Redundancy and failover– Stateless is less complex

– NAT only for Internet service «consumers» and thus can handle changes in public IP

– NAT engine decoupled from base routing – failure in NAT on onerouter causes all traffic from RFC1918 to converge on the other, while normal service is unaffected

8

How - Hardware

• Considered ASA 55XX or ASR 1k– Prohibitively expensive in distributed model or unsuitable for

large scale NAT in centralised model

• ASR 9006 with VSM-500 selected– Large chassis for future expansion

– VSM-500 – Virtualised Services Module

• Server crammed into router blade form factor

– Many RAMs and Cores

• CGN service via .OVA image from Cisco

• Can run vDDoS service, and maybe more in the future

• Performance listed at 40Gbps per VSM

– Multiple VSMs per chassis possible

– Implemented as Internet Edge

• Links to other orgs terminated here

9

How - Logging

• NetFlow v9 or Syslog supported

– NetFlow more compact but requires collector

– Syslog «easier» and simpler

• We opted for NetFlow

– Incentive to migrate from old v5 collector

• We now have NetFlow for IPv6

• NetFlow export from VSM is separate from regular traffic

export

– Script developed to search for records to track abuse and

reported AUP violations

10

How – Implementation

• Resource allocations– One /22 pool of public IPv4 addresses per border router

– 1024 ports per inside client

– Approx. 60k NAT’ed clients per pool• More NAT pools can be added if this is insuficcient

• Jury is still out on whether 1024 ports per user is adequate

– Licensed for 5 meeelion translations• Honours based licensing model

• Platform scales to 60m translations

• Traffic flow– Selected RFC1918 networks redirected towards NAT engine using ACL

Based Forwarding (ABF)

– «native» public IP traffic is unmolested by NAT• Failure in NAT service has no impact on normal service

– Virtual interfaces• NAT Inside and NAT Outside are not associated with physical ports

• In case of upstream connectivity loss NAT’ed traffic hairpins and finds anotherway using the power of routing

11

How - Deployment

• 15 VLANs in Interface Groups

• One public /22 and one RFC1918 /22 per VLAN

• Slow start

– One /22 brought online intially

– Discover «brokenness», limit impact

• Internal services not accepting RFC1918 clients

– Another 3 /22s onlined yesterday

• Wireless monster feeds

12

How – Deployment (II)

DHCP Scope Expanded

Wireless Monster likes IPv4. Om nom nom

13

How – Deployment (III)

• Wait for any issues to rear their heads

• Deploy remaining scopes after a week of no issues

• Job done

14

How – Lessons Learned

• ASR is geared for Service Providers

– I may have spent half a day looking for «write mem»…

– Administration model is heavily decentralised and aimed for

large NOC setups

• Platform administrator, etc.

• VSM is geared for Service Providers

– VRF ALL THE THINGS

• Very suitable for multi tenant networks

• Needs some work for enterprise deployment

– BGP is your friend

– VPNv4 AF is also your friend

– Routing Policy Language is your very very best friend

15

How – Lessons Learned (II)

• Thinking big yields benefits

– Merger with University College in Trondheim (HiST)

• We can NAT their 3k users on our platform with no service changes

– vDDoS service can be run on the same platform

• We’ve been targeted a few times

– ASR supports BGP FlowSpec

• Potential cooperation with NREN (or NorduNET) for DDoS

mitigation

16

Performance

• Tested by Uninett using a Spirent test rig

– 10Gbps input

– 10Gbps output

– 20 Gbps aggregate

– 17mpps (17 meeelion packets per second)

– No discernable delta latency between NAT’ed and non-NAT’ed

traffic

• NTNU main upstream peaks at approx. 4Gbps and

400kpps

• Conclusion

– It’ll do nicely

17

Thank you!

NAT44 at NTNU - Uninett at NTNU_0.pdf · –ASR supports BGP FlowSpec •Potential cooperation with NREN (or NorduNET) for DDoS mitigation. 16 Performance • Tested by Uninett using

Documents