VPN with NAT in Google Cloud

Google Cloud provides the capability of terminating a VPN connection with a VPN Gateway. The problem is that the VPN Gateway - at the moment - is relatively limited in capabilities. One of the missing capabilities I would have liked to see implemented is the NAT capability.

VPNs can be used to connect the machines of two different parties. Although this is usually not the best architectural pattern, since a connection on the public internet encrypted at the Transport Layer is often a better option, it’s relatively common in more legacy environments. When a VPN is used in this way, it is very common to incur in an IP space collision, and therefore it becomes required to use some form of NAT. Let’s see how to implement this scenario in Google Cloud without terminating the VPN directly on an instance (which is possible but has its problems, and maybe we’ll be discussing it some point in the future).

Our environment has the following characteristics:

  • we are using the europe-west1
  • our project ID is mytestproject-281109
  • our machines are in the 10.0.0.0/8 subnet

In the example, we are going to create a VPN with ECorp with the following characteristics:

  • the VPN will use IKEv2
  • the remote IP is 203.0.113.4
  • the VPN PSK is mdOmMGAM0Oqs3jKMwM7s1waUNM7oCgMKvUxX
  • the machines we are going to expose will be in the 172.16.0.0/25 subnet
  • the machines ECorp is going to expose will be in the 172.16.0.128/25 subnet

Create the VPN

The first step is to terminate the VPN with a Google VPN Gateway. In this example, I’ve used the Classical, but in a very similar way (except that the commands will not work without any modifications), you can do it with HA VPN Gateways. You will need to create this from the Google Cloud Console since I’ve not found a way to easily create it from the gcloud utility. I usually name the VPN Gateway as vpn-OTHER_PARTY_NAME, so my VPN will be called vpn-ecorp. We can now create the VPN Tunnel from GUI or with the following command:

gcloud compute vpn-tunnels create vpn-ecorp \
    --project mytestproject-281109 \
    --region europe-west1 \
    --shared-secret mdOmMGAM0Oqs3jKMwM7s1waUNM7oCgMKvUxX \
    --peer-address 203.0.113.4 \
    --target-vpn-gateway vpn-ecorp \
    --ike-version 2 \
    --local-traffic-selector 172.16.0.0/25 \
    --remote-traffic-selector 172.16.0.128/25

So far, we only created the VPN without managing the NAT part. This means that the VPN should go green, but no traffic will be able to pass, since the range of IPs managed by the VPN (172.16.0.0/24) has nothing in common with the one that Google Cloud knows and manages (10.0.0.0/8).

Create a Subnet for our IP range

The first thing that we will need to create is a subnet with our side of the IP range (72.16.0.0/25). To do so, we can execute the following command:

gcloud compute networks subnets create nat-ecorp \
    --project mytestproject-281109 \
    --region europe-west1 \
    --network default \
    --range 172.16.0.0/25

In this way, we can assign IPs in this subnet to our machines that we want to expose. This, though, does not yet mean that our machines will be able to communicate with machines on the other side of the VPN. To do so, we need to create a network route.

Create the network route

To allow Google Cloud to route properly the traffic that needs to go to the other side (would this be outbound requests, or inbound responses), we need to create a route. To do so, we can execute the following command:

gcloud compute routes create nat-ecorp-ext \
    --project mytestproject-281109 \
    --network default \
    --priority 500 \
    --destination-range 172.16.0.128/25 \
    --tags nat-ecorp \
    --next-hop-vpn-tunnel https://www.googleapis.com/compute/v1/projects/mytestproject-281109/regions/europe-west1/vpnTunnels/vpn-ecorp

This route will make sure that all packages that have as destination an IP in the address space assigned to the other side of the VPN will be inserted in the VPN by Google. This rule, though, is only applied to machines that have the nat-ecorp tag, so make sure that the machines you put in the 172.16.0.0/25 network have this tag!

Though, this is not yet enough to make the traffic flowing since we need to create a firewall rule to allow it (by default, only the traffic in the 10.0.0.0/8 space is allowed).

Create the firewall rules

To allow the traffic from the VPN, we need to create a firewall rule. In our case, we want to allow all TCP and ICMP traffic to work, but probably this is not what you want, and therefore you should tweak the rule to only allow the traffic you need!

gcloud compute firewall-rules create nat-ecorp-inbound \
    --project mytestproject-281109 \
    --network default \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges 172.16.0.128/25 \
    --destination-ranges 172.16.0.0/25 \
    --rules tcp,icmp

At this point, from a machine that has one network card on the nat-ecorp subnet, you will be able to ping a machine on the other side of the VPN. Also, the reverse ping will work as well!

We can now look at how to pipe all traffic in a transparent way for the machines on your environment.

Pipe the traffic transparently

To pipe the traffic transparently, so that the machines in your network don’t need any additional network interface to be able to communicate with services on the other side of the VPN, we will need to:

  • create an instance that will perform the NATting
  • create a route so that Google Cloud knows how to manage the traffic properly
  • create a firewall rule to allow this traffic

Create an instance to perform the NATting

As the first thing, we need an instance that will perform the NATting process. Usually, the VPNs that require NATting do not require massive bandwidth so that you can use a tiny instance (like a g1-small). Remember, though, that in Google Cloud the instance size can limit its bandwidth, so if you need high bandwidth NATs, consider using a bigger instance.

gcloud compute instances create $natter-ecorp \
    --project mytestproject-281109 \
    --zone europe-west1-c \
    --network default \
    --subnet nat-ecorp \
    --machine-type g1-small \
    --no-address \
    --tags nat-ecorp \
    --image-project centos-cloud \
    --image-family centos-8 \
    --can-ip-forward \
    --private-network-ip 172.16.0.10 \
    --metadata "enable-oslogin=TRUE,startup-script=
modprobe nf_nat_ftp
modprobe nf_conntrack_ftp
sysctl -w net.ipv4.ip_forward=1
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE"

As you can see, we started from a CentOS 8 machine and added a startup script that will ensure that the machine will load the nf_nat_ftp and nf_conntrack_ftp modules (both are needed only if you plan to use the FTP protocol through the NAT), as well as enabling the traffic forwarding and creating the iptables rule that allows the NATting itself.

We can now proceed with the creation of the network route.

Create the network rule

We now need to inform Google Cloud that any traffic from any machine that has the destination in the 172.16.0.128/25 range, should be sent to the NAT instance.

gcloud compute routes create nat-ecorp-int \
    --project mytestproject-281109 \
    --network default \
    --priority 1000 \
    --destination-range 172.16.0.128/25 \
    --next-hop-instance natter-ecorp \
    --next-hop-instance-zone europe-west1-c

As you can see, the priority is higher than the other rule we created (nat-ecorp-ext), so that the machines that have the nat-ecorp tag will ignore it and send the traffic directly.

As before, this is not yet working, since we still need to configure the firewall properly.

Allow traffic to the NAT instance

To allow the traffic from the Google Cloud instances to flow to the NAT instance, we need to create the following rule:

gcloud compute firewall-rules create nat-ecorp-outbound \
    --project mytestproject-281109 \
    --network default \
    --action ALLOW \
    --direction INGRESS \
    --source-ranges 10.0.0.0/8 \
    --destination-ranges 172.16.0.0/25 \
    --rules tcp,icmp

Similarly to what we have seen before, we allow all machines in the 10.0.0.0/8 IP space to communicate with the instances in the 172.16.0.0/25 network on all TCP ports and ICMP. In real environments, you will probably want to limit this further with more specific rules.

Now, all traffic will work properly!

Conclusions

As you can see, the procedure is fairly complex and could have been avoided if Google implemented the NAT capability directly in the VPN Gateway. Hopefully, in the future, this will happen, and the process will be much simpler.