Skip to main content

Resilient VPN's - Part 1


I have been working on building a resilient VPN architecture for our monitoring network. One of the stipulations was that it was not to use GRE tunnels and must be capable of terminating at any number of peer VPN devices on the customers network. Routing must work automatically and no manual intervention is required.


The problem you get with using plain IPSEC tunnels is that first you need some way of knowing if the tunnel is up. Then you have to adjust the routing on the customer side so that traffic destined to your network exits their network via the router with the currently active IPSEC tunnel.


This is not an easy task and has taken a while to come up with some workable designs to deploy.


I have settled on IPSEC HA in our datacentre and using Reverse route injection on the customer network to push our subnets into their dynamic routing protocol.


I will go through the RRI solution in the next post for now lets have a look at the IPSEC HA configuration.
IPSEC HA is available on the 3800 models an above and in conjunction with HSRP allows routers to synchoronise the IPSEC information so that failovers happen almost instantly.
The first thing you will need to do is enable redundancy on both the routers. Note that the scheme name used must match with the HSRP standby group name.
redundancy inter-device
scheme standby IPSECHA
Next you set up IPC to transfer the stateful IPSEC information between the two routers. Local ip is the ip address of the router you are working on. Remote-ip is the IP address of the other redundant router. Obviously whatever you use as local-ip on one router will be what you use as the remote-ip on the other one.
ipc zone default
association 1
no shutdown
protocol sctp
local-port 5000
local-ip 10.159.69.95
retransmit-timeout 300 10000
path-retransmit 10
assoc-retransmit 10
remote-port 5000
remote-ip 10.159.69.250
Now we just setup our IPSEC tunnel as normal. Note I am encrypting traffic from my loopback interface for the purposes of this test.
crypto isakmp policy 1
encr 3des
authentication pre-share
group 2
crypto isakmp key test address 10.159.69.229
crypto isakmp key test address 10.159.69.107
crypto isakmp keepalive 10 periodic
crypto ipsec transform-set TEST esp-3des esp-sha-hmac
crypto map TEST 1 ipsec-isakmp
set peer 10.159.69.229
set peer 10.159.69.107
set transform-set TEST
match address TEST
interface Loopback0
ip address 2.2.2.1 255.255.255.0
We then create an HSRP standby group on the outside interface and have it track our inside interface (in my test this was the loopback but in practice this would be the LAN interface). This way the HSRP will fail over should the inside interface go down. N.b the HSRP group name IPSECHA matches the name we configured earlier under redundancy. We apply the crypto map to the interface but use the redundancy group and statful keywords. Effectively what this does is tell the router to use the HSRP address to source the VPN tunnel. (On the VPN router the other end make sure you use the HSRP address as the peer not the interface address)
interface GigabitEthernet0/0
ip address 10.159.69.95 255.255.255.0
standby delay minimum 30 reload 30
standby 1 ip 10.159.69.251
standby 1 timers 1 5
standby 1 preempt
standby 1 name IPSECHA
standby 1 track Loopback0
crypto map TEST redundancy IPSECHA stateful
ip access-list extended TEST permit ip 2.2.2.0 0.0.0.255 3.3.3.0 0.0.0.255
Once you have completed the VPN configuration the other side and it has come up you will see the following on the active router.
ISR2#sh crypto isakmp sa
dst src state conn-id slot status
10.159.69.229 10.159.69.251 QM_IDLE 2 0 ACTIVE
And on the standby router....
skybox# sh crypto isakmp sa
dst src state conn-id slot status
10.159.69.229 10.159.69.251 QM_IDLE 1 0 STDBY
When I do a ping across the tunnel and then force a failover by shutting an interface I see only two packet getting dropped which is lightning fast.
Melody#ping 2.2.2.1 source fast0/1 repeat 10000
Type escape sequence to abort.
Sending 10000, 100-byte ICMP Echos to 2.2.2.1, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.1
Success rate is 99 percent (3646/3648), round-trip min/avg/max = 1/2/24 ms
You see the following when the standby changes to active and begins sending via the VPN.
*Sep 21 14:45:23.230: %HSRP-5-STATECHANGE: GigabitEthernet0/0 Grp 1 state Standby -> Active*Sep 21 14:45:23.234:
%CRYPTO-5-IKE_SA_HA_STATUS: IKE sa's if any, for vip 10.159.69.251 will change from STANDBY to ACTIVE*Sep 21 14:45:23.234:
%CRYPTO-5-IPSEC_SA_HA_STATUS: IPSec sa's if any, for vip 10.159.69.251 will change from STANDBY to ACTIVE
So that all there is to achieving a resilient stateful IPSEC failover between routers. Next we will tackle the really tricky part which is having multiple VPN points of entry into a customers network and how to inject the routes so that you can failover automatically between them.

Comments

Popular posts from this blog

Moving the SSH port on a CISCO router

If you admin your routers over the internet you probably know you should be using SSH. Telnet being sent in clear text is easily sniffed and your passwords captured. However Cisco routers use the standard TCP port 22 for their SSH service. As soon as you open this up to the world and turn on SSH access logging you will start to see hundreds of IP's connecting to your device and running dictionary attacks against you using standard username and password combinations. The majority of these IP's seem to originate from China or Russia and they find your open port extremely quickly. This is very anoying it fills up your log files with these attacks and uses up your system resources dealing with them. I believe they are simply running scans for any open TCP port 22. For this reason I decided I could cut down the amount of attacks by moving the SSH port to a different number. One thing you should know before we start is that there is no way to actually change the SSH port number o...

Error Message %DUAL-6-NBRINFO: EIGRP-IPv4 34256

If you see the error  %DUAL-6-NBRINFO: EIGRP-IPv4 xxxx  is blocked: not on common subnet then it simply means that there are EIGRP devices sending multicast hellos on an interface which have a different IP Range configured to the receiving router.  160617: .Feb 22 15:11:05.194 GMT: %DUAL-6-NBRINFO: EIGRP-IPv4 34256: Neighbor 17 2.31.253.1 (Vlan43) is blocked: not on common subnet                                                     (172.31.252.1/31) 160618: .Feb 22 15:11:12.770 GMT: %DUAL-6-NBRINFO: EIGRP-IPv4 34256: Neighbor 19 2.168.205.0 (Vlan44) is blocked: not on common subnet (192.168.204.1/31)                                                                       ...

Shutting Cisco 3750 Stackwise ports

Today I came across a customers 3750 switch stack which had a flapping stackwise link. The stackwise link was transitioning up/down around 3 times a second and causing massive issues with connectivity and EIGRP routing for the site. Previously I believed that I would need to physically remove the Stackwise cable in order to restore service by shutting the flapping link. It seems it is possible to shut the Stackwise port from the CLI although it is done from enable mode rather than Configure terminal. The command is.. Switch#switch 1 stack port 1 ? disable Disable stack port enable Enable stack port The first number 1 would indicate the switch number in the stack and the second number 1 after the port is the Stackwise port number you want to shut. Make a note of which switch and port you shut as it will not show up in the config or the show outputs which could prove tricky when you want to reenable it.. You can determine the status of the ports using the command below but not how ...