Any form of HA (especially in the cloud) requires some collaborative effort between both parties configuration. Expect that you and your connectivity partner will need to jointly make some decisions.
Here are the major options:
1) BGP Peering
This is the industry standard for using a dynamic routing protocol to keep one tunnel up between multiple devices on both ends of the connection. Security appliances like Cisco ASA don't do BGP, and for many customers it is too complex. VNS3 supports the customer connecting two devices on their end to a two controller mesh on the cloud end, using BGP over IPsec. This option works with our Encrypted Overlay, but does not currently support Cloud Underlays (AWS VPC, Azure Vnets, etc.)
2) Failover between VNS3 Controllers driven from the customer side using a feature like Cisco Multi-Peer list.
The Cisco ASA is able to keep a list of peer IP addresses to try in the event the Ipsec device it is connected to fails. In this scenario you run a 2 controller mesh of VNS3, each with a connection to the customer's ASA, and with the VNS3 side of the connection in "passive" mode, meaning not attempting to connect to the ASA (connection=receive and connection-rekey=no in the Extra Parameters section). In the event that the ASA cannot reach the VNS3 Controller it is connected to, it flips to the second one.
3) Manual failover via re-mapping the public IP of the primary VNS3 Controller.
You can keep a backup VNS3 instance running in the same VPC as the primary with its configuration identical to the primary, but with connection=receive and connection-rekey=no in the settings. If the primary fails, the public IP is re-mapped to the back up instance, and if necessary the connection parameters are changed to connection=bidirectional and connection-rekey=yes.
4) Semi-Automatic Failover to backup VNS3 Controller.
This will be a new type of VNS3 Controller mode where all of the steps of reconfiguring and taking over as the backup are automated except the DECISION to failover. This is a very difficult decision to make in a cloud-based environment and we want customer's to become familiar with the human-driven version first.
Coming early in '17:
5) Automatic Failover to backup VNS3 Controller.
This will be an enhancement of item #4 above where a monitoring system will use the AWS "healthcheck" API to determine if the primary VNS3 Controller is running and capable of network traffic, in the event it is not, the failover will be automatically triggered and performed by the VNS3:ms (management system).
Coming later in '17:
6) Cohesive now has in Quality Assurance our HA Tunnels feature. HA Tunnels are similar to Cisco ASA Multi-peer List. In this case a VNS3 instance can have a list of customer endpoints to connect to, in the event one of them is not available it will try the next in the list. HA Tunnels combined with VNS3:ha will give customers automated healing of outages whether as a result of an AWS instance failure, or a customer hardware failure.
NOTE: HA in the world of IPsec MUST be driven from one side or the other. With the exception of using the BGP dynamic routing protocol, one side in the HA scenario must be passive. If both sides attempt to be active in starting/stopping devices, connections, tunnels, etc. you can end up in a situation where the devices never converge, but "flap" endlessly. This has nothing to do with VNS3, this is a best practice element of IPsec with HA industry-wide. For example, if using Cisco Multi-Peer list, you would not also use VNS3 HA Tunnels simultaneously.
---- UPDATE: MARCH 1ST 2016 ----
VNS3:ha is now available! Contact [email protected] to add VNS3:ha to your VNS3:ms set up.
VNS3:ha in action: https://youtu.be/_f2bz7tiKcs