What is network tunneling
So I have been studying with proxies for well over 2 years now. I mainly work on the application layer, so I have come across the term HTTP tunnels many times now. You know, the thing that is formed after the HTTP CONNECT request.
But if you go even a little bit deep, you will also come across TCP TUNNELS, and then IP tunnels and the list goes on for each layer of the TCP/IP stack.
So what is tunneling and why does it often come up in cases of proxies and vpns? This question has lived rent-free in my head for years now, and today I finally try to understand it.
The core conecpt
Tunneling is nothing but sending the packets/frames built for one protocol as the data payload of another protocol.
When would that be useful? Well two important usecases were:
- Bypassing firewalls
- Making it possible to send unsupported protocols via middle boxes.
But first let's look at how tunneling happens with an example with an example:
What actually happens when you connect to a HTTP PROXY
So let's say you have a proxy configured on your machine that points to a HTTP proxy server. Before we move ahead, let's understand what that means.
Every new HTTP connection from a client, that is configured to work with this proxy, now first sends an HTTP CONNECT request. This request looks something like this:
CONNECT <destination ip>:<destination port> HTTP/1.1
Host: <proxy server ip>:<proxy server port>
You can also use hostnames here, instead of IPs
This instructs the proxy server to establish a TCP connection on the client's behalf to talk to this destination IP.
There many more details about how CONNECT works (like how it handles authorization, connection termination, responses etc.) You can read the RFC for it, its a very small read. But those aren't relevant to understanding how network tunneling works.
You can also see how this can chain forward. If this proxy server (A) is behind another proxy server (B) then B makes the tunnel for A, and so on.
So once the proxy server has established a TCP connection to the destination, it lets the client know that its ready to relay it's request with a 200 response. The client then sends the actual HTTP request that it intended to send to the destination.
Note that this request i.e the client's HTTP REQUEST, is sent as a payload to the Proxy's HTTP request over the tunnel. Sound confusing? Well it didn't make sense to me at first.
So to paint a clearer picture I want to illustrate this with an example of building an ssh conneciton over an http tunnel
SSH over HTTP
Disclaimer: This is not a general practice. SSH is intended and designed to be used over TCP connections. This is just an example.
Let's say you have a client that is capable of making ssh requests and you have configured a HTTP proxy over it
I am not smart enough to think of a scenario where that would happen, maybe a browser ssh client does that (will have to check that!), but let's assume it does
So now you have a client making ssh request. But you have an http proxy configured for that client. So every request that it sends is sent via the http tunnel made during the initial CONNECT request that I explained earlier.
These SSH packets will be sent as payloads of an http request, which are themselves being sent via the proxy's already established TCP connection.
Remember that SSH in general works on top of TCP, so the proxy (or to be precise, the tunnel) added an extra layer of HTTP encapsulation in this case.
Sounds counter productive right? Well I never said it was a very good example, but this approach does have its benefits. Like I mentioned earlier, using this approach:
- You can use SSH over networks whose firewalls might be blocking SSH
- In case you wanted to write a layer 7 proxy for whatever reason (maybe you silently hate your life), you did not need to explicitly add support for SSH (which is a lso a layer 7 protocol). You just need to write an HTTP proxy and the rest is taken care of (as long as the client is capable of speaking HTTP)
This might still look like an overhead at this level, because things in the application layer are quite flexible and there is clearly no need to do all this.
But things start to make soo much more sense once you start exploring tunneling at different layers of the network stack.
I am still exploring this myself, and will update this article with a detailed review of that soon (some illustrations are also under the works)