NAT44 at NTNU Implementing CGN for Eduroam Kyrre Liaaen NTNU
NAT44 at NTNUImplementing CGN for Eduroam
Kyrre Liaaen
NTNU
2
Agenda
• What
• Why
• How
• A word on performance
3
What
• Centralised NAT scaled to serve more than 20k users
• Implemented on university Internet Edge
– 2 routers
– 2 x 10Gbps bonded uplinks each
– Redundancy and stateless failover
• NAT44
– Native IPv6
– RFC1918 in client networks
4
Why
• IPv4 scarcity– Initially generous campus allocations (/23 for everyone!)
– Smaller allocations after «Peak IPv4»• /24 for everyone!
• /25 for everyone!
• What could possibly go wrong
– Cost outweighs benefit when considering reclaiming allocatedspace
• Coordinating with IPv4 range «owner» and effectuating changes werenon trivial
• Reclaiming a /24 gains us … a /24. We need at least multiples of /22..
– Lack of public IPv4 adresses impacts service delivery• New services are put on hold
• The wireless monster needs feeding– More users, more devices per user
– Merger with other University Colleges brings more users» University Colleges are also short of IPv4. Go figure…
5
Why (II)
• IPv6 «any time now»…
– No IPv6 in current WLAN due to legacy technology (WiSM…)
– IPv6 often disabled when troubleshooting issues
– Users still heavily reliant on IPv4-only resources
• Library journals, etc.
– Many services managed by University IT still lack IPv6 presence
• Woe is me
– Still seems to be a low priority for equipment vendors
• Bugs and resource shortages
• Long turnover (WiSM…)
6
How – Design process
• Philosophical
– NAT for WLAN only, guest only, area only, or entire campus?
• Technical
– Distributed v.s centralised – we have 10 campus routers
– Scaling solution to serve required user mass
– Managing one v.s managing ALL THE THINGS
– Cost of purchasing several smaller units v.s fewer large ones
• And then upgrading when a smaller unit is too small
• Practical
– RFC1918 already in use for internal only networks
• Lab equipment etc.
• Sudden surprise access to Windows Update was… undesired
7
How – Design Process (II)
• We selected a centralised model for service deployment– Scaled to meet expected need «until we turn off IPv4»
• Service made available for all devices in networkregardless of type or usage– Key phrase: «prepared for». We don’t know what the users are
going to need 5 years down the line
• Redundancy and failover– Stateless is less complex
– NAT only for Internet service «consumers» and thus can handle changes in public IP
– NAT engine decoupled from base routing – failure in NAT on onerouter causes all traffic from RFC1918 to converge on the other, while normal service is unaffected
8
How - Hardware
• Considered ASA 55XX or ASR 1k– Prohibitively expensive in distributed model or unsuitable for
large scale NAT in centralised model
• ASR 9006 with VSM-500 selected– Large chassis for future expansion
– VSM-500 – Virtualised Services Module
• Server crammed into router blade form factor
– Many RAMs and Cores
• CGN service via .OVA image from Cisco
• Can run vDDoS service, and maybe more in the future
• Performance listed at 40Gbps per VSM
– Multiple VSMs per chassis possible
– Implemented as Internet Edge
• Links to other orgs terminated here
9
How - Logging
• NetFlow v9 or Syslog supported
– NetFlow more compact but requires collector
– Syslog «easier» and simpler
• We opted for NetFlow
– Incentive to migrate from old v5 collector
• We now have NetFlow for IPv6
• NetFlow export from VSM is separate from regular traffic
export
– Script developed to search for records to track abuse and
reported AUP violations
10
How – Implementation
• Resource allocations– One /22 pool of public IPv4 addresses per border router
– 1024 ports per inside client
– Approx. 60k NAT’ed clients per pool• More NAT pools can be added if this is insuficcient
• Jury is still out on whether 1024 ports per user is adequate
– Licensed for 5 meeelion translations• Honours based licensing model
• Platform scales to 60m translations
• Traffic flow– Selected RFC1918 networks redirected towards NAT engine using ACL
Based Forwarding (ABF)
– «native» public IP traffic is unmolested by NAT• Failure in NAT service has no impact on normal service
– Virtual interfaces• NAT Inside and NAT Outside are not associated with physical ports
• In case of upstream connectivity loss NAT’ed traffic hairpins and finds anotherway using the power of routing
11
How - Deployment
• 15 VLANs in Interface Groups
• One public /22 and one RFC1918 /22 per VLAN
• Slow start
– One /22 brought online intially
– Discover «brokenness», limit impact
• Internal services not accepting RFC1918 clients
– Another 3 /22s onlined yesterday
• Wireless monster feeds
12
How – Deployment (II)
DHCP Scope Expanded
Wireless Monster likes IPv4. Om nom nom
13
How – Deployment (III)
• Wait for any issues to rear their heads
• Deploy remaining scopes after a week of no issues
• Job done
14
How – Lessons Learned
• ASR is geared for Service Providers
– I may have spent half a day looking for «write mem»…
– Administration model is heavily decentralised and aimed for
large NOC setups
• Platform administrator, etc.
• VSM is geared for Service Providers
– VRF ALL THE THINGS
• Very suitable for multi tenant networks
• Needs some work for enterprise deployment
– BGP is your friend
– VPNv4 AF is also your friend
– Routing Policy Language is your very very best friend
15
How – Lessons Learned (II)
• Thinking big yields benefits
– Merger with University College in Trondheim (HiST)
• We can NAT their 3k users on our platform with no service changes
– vDDoS service can be run on the same platform
• We’ve been targeted a few times
– ASR supports BGP FlowSpec
• Potential cooperation with NREN (or NorduNET) for DDoS
mitigation
16
Performance
• Tested by Uninett using a Spirent test rig
– 10Gbps input
– 10Gbps output
– 20 Gbps aggregate
– 17mpps (17 meeelion packets per second)
– No discernable delta latency between NAT’ed and non-NAT’ed
traffic
• NTNU main upstream peaks at approx. 4Gbps and
400kpps
• Conclusion
– It’ll do nicely
17
Thank you!