Building a secure TCP Forwarder

https://github.com/AlexXi19/rust-tcp-proxy

Context

In my personal Datacenter, the nginx server communicates with my kubernetes cluster via http, which is unsafe, because the request body is unencrypted. That’s why I had this idea of building a tcp proxy to secure the communication between the two servers.

I can build a layer 7 proxy to forward all http requests and encrypt the content of the requests. That would be extremely functional and easy to build, but that won’t be much fun. So I chose to go with a tcp proxy to learn a bit more about layer 4 networking.

Based on my understanding, the implementation should be a simple rust program that listens to data from a port, uses some encryption method to secure the bytes during the transport, send the bytes over to the target server that decrypts the data.

The process

Building a simple tcp proxy was easy, and the code would look something like this. You bind a listener, then you connect to the target address. You create inbound & outbound connections, then you use io::copy to copy the bytes from the client connection to the server connection.

let listener = TcpListener::bind(listen_addr).await?;
let (inbound, _) = listener.accept().await?;
let mut outbound = TcpStream::connect(forward_addr).await?;
let (mut ri, mut wi) = inbound.split();
let (mut ro, mut wo) = outbound.split();
let client_to_server = io::copy(&mut ri, &mut wo);
let server_to_client = io::copy(&mut ro, &mut wi);
tokio::try_join!(client_to_server, server_to_client)?;

Pretty simple, my dummy server was able to receive the request!

127.0.0.1 - - [25/Nov/2023 23:14:41] "GET / HTTP/1.1" 200 -

However, the implementation has an issue, the server can receive the request but the client, in this case the curl command, hangs.

Challenge 1: Connection management

To solve this issue, we need to understand how http knows if a request is terminated. One way for a client to know that a request has ended is when the server to closes the connection. However, if you want to keep the tcp connection alive and reuse it, like in HTTP/1.1, http uses special headers to indicate that a connection should be kept alive. In our case, the target server ended the connection by closing it, but the connection isn’t closed by the proxy server, hence the client hangs. There are a few ways to deal with this:

Keep the long running connections between the client server and the proxy and somehow make the proxy Layer 7 aware and implement correct http request termination behavior by reading the http content or sending the correct end request headers.
Create a new connection for each request, and close the connection when the server has finished sending the request.

To keep the proxy strictly layer 4 and agnostic to any higher level protocols, I chose the second method.

For one, we need to make the request handling multithreaded to handle multiple requests at the same time. Here, we use tokio::spawn to create a thread for handling the inbound request.

    while let Ok((inbound, _)) = listener.accept().await {
        let forward_addr = forward_addr.clone();
        let proxy_mode = proxy_mode.clone();
        tokio::spawn(async move {
            if let Err(e) = forward_to_server(inbound, forward_addr, proxy_mode).await {
                eprintln!("Error in forward_to_server: {}", e);
            }
        });
    }

For two, we need to detect when the server has finished the response and close the connection correspondingly. In this case, when there are no more bytes to read, we break from the loop and close the incoming stream. With some slight refactoring from the first section, the code looks like this, where the process_data function is a cryptography encrypt or decrypt function depending on whether it’s a proxy server or client.

let mut buf = vec![0; 1024];
loop {
	let n = read_stream.read(&mut buf).await?;
	if n == 0 {
		break;
	}
	let content = &buf[0..n];
	let processed_content = process_data(content.to_vec())?;
	write_stream.write_all(&processed_content).await?;
}
 
write_stream.shutdown().await?;

Now that we curl the client proxy port, we get:

 ➜ curl localhost:7998


⠀⠀⢀⣠⠤⠶⠖⠒⠒⠶⠦⠤⣄⠀⠀⠀⣀⡤⠤⠤⠤⠤⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⣴⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⣦⠞⠁⠀⠀⠀⠀⠀⠀⠉⠳⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡾⠁⠀⠀⠀⠀⠀⠀⣀⣀⣀⣀⣀⣀⣘⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⢀⡴⠚⠉⠁⠀⠀⠀⠀⠈⠉⠙⠲⣄⣤⠤⠶⠒⠒⠲⠦⢤⣜⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⡄⠀⠀⠀⠀⠀⠀⠀⠉⠳⢄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⠹⣆⠀⠀⠀⠀⠀⠀⣀⣀⣀⣹⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⣠⠞⣉⣡⠤⠴⠿⠗⠳⠶⣬⣙⠓⢦⡈⠙⢿⡀⠀⠀⢀⣼⣿⣿⣿⣿⣿⡿⣷⣤⡀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⣾⣡⠞⣁⣀⣀⣀⣠⣤⣤⣤⣄⣭⣷⣦⣽⣦⡀⢻⡄⠰⢟⣥⣾⣿⣏⣉⡙⠓⢦⣻⠃⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠉⠉⠙⠻⢤⣄⣼⣿⣽⣿⠟⠻⣿⠄⠀⠀⢻⡝⢿⡇⣠⣿⣿⣻⣿⠿⣿⡉⠓⠮⣿⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠙⢦⡈⠛⠿⣾⣿⣶⣾⡿⠀⠀⠀⢀⣳⣘⢻⣇⣿⣿⣽⣿⣶⣾⠃⣀⡴⣿⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠙⠲⠤⢄⣈⣉⣙⣓⣒⣒⣚⣉⣥⠟⠀⢯⣉⡉⠉⠉⠛⢉⣉⣡⡾⠁⠀⠀⠀⠀⠀⠀⠀
⠀⠀⣠⣤⡤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⡿⠋⠀⠀⠀⠀⠈⠻⣍⠉⠀⠺⠿⠋⠙⣦⠀⠀⠀⠀⠀⠀⠀

That’s only half of our response. Where did the other half go?

Challenge 2: Buffer size

After doing some debugging, I noticed that the vec![0; 1024] allocated for the response bytes could be an issue. It did not have enough space to fit the incoming bytes. If the proxy receives a package that’s larger than the buffer, the decryption fails, and the proxy closes the connection from the error. (Note: the buffer does not correspond to tcp packets)

To verify this, we change the buffer size to vec![0; 1_000_000] the curl command returns the entire response.

However, setting it to an arbitrarily large buffer size does not fix the underlying issue. The main problem is that the encoding chunk size does not match the buffer read chunk size, hence the decryption function is run on the incorrectly sized chunk. We can solve this issue by reading the buffer in sizes that it was encoded in.

To do this, we create a custom write and custom read protocol to include the chunk size in the start of the buffer.

In the custom write protocol, we include the encrypted chunk size in u16 at the start of the buffer.

pub async fn custom_write_protocol(
    content: Vec<u8>,
    mut write_stream: impl AsyncWrite + Unpin,
) -> Result<()> {
    let content_length: u16 = content.len() as u16;
    let content_length_array = content_length.to_be_bytes();
    let mut content_with_length_byte = vec![0u8; content.len() + 2];
 
    content_with_length_byte[0] = content_length_array[0];
    content_with_length_byte[1] = content_length_array[1];
 
    content_with_length_byte[2..].copy_from_slice(&content[..]);
 
    write_stream.write_all(&content_with_length_byte).await?;
 
    Ok(())
}

In the custom read protocol, we read the u16 denoting the chunk size and use read_exact to read the buffer to be decrypted.

pub async fn custom_read_protocol(
    mut read_stream: impl tokio::io::AsyncRead + Unpin,
    process_data: CryptoFn,
) -> Result<Vec<u8>> {
    let mut length_buf = [0u8; 2];
    read_stream.read_exact(&mut length_buf).await?;
    let length = u16::from_be_bytes(length_buf) as usize;
 
    if length == 0 {
        return Ok(vec![]);
    }
 
    let mut buf = vec![0u8; length];
    read_stream.read_exact(&mut buf).await?;
 
    let res = process_data(buf.to_vec());
    res
}

With this last piece solved, we can start up our tcp proxy client and server, set them up with the appropriate ports, and see that the proxy is working correctly!

Simple Benchmarking

 ~/Desktop/CS/tcp-forwarder git:(main)
 (⎈|pi:default)
 
	 ➜ hyperfine 'curl localhost:7998' # Proxy server
 
Benchmark 1: curl localhost:7998
  Time (mean ± σ):      14.9 ms ±   1.6 ms    [User: 5.5 ms, System: 4.4 ms]
  Range (min … max):    13.2 ms …  22.6 ms    126 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
 
 ~/Desktop/CS/tcp-forwarder git:(main)
 (⎈|pi:default)
 
 ➜ hyperfine 'curl localhost:8000' # Origin server
 
Benchmark 1: curl localhost:8000
  Time (mean ± σ):      14.5 ms ±   8.6 ms    [User: 5.6 ms, System: 4.7 ms]
  Range (min … max):    11.6 ms … 105.7 ms    132 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

References: https://datatracker.ietf.org/doc/html/rfc7230

Alex Xi's Notes

Explorer