Follow the white rabbit...

May 13, 2020

Party all night with HAProxy

This is the story of how I stopped worrying and learned to love reverse proxies. Years ago on some weeknight I was still up on my couch hidden under a pile of pistachio shells trying to serve Odoo 10 on Apache. I had spent the day tinkering with over-engineered solutions using shit like squid3 until settling on this kinda setup. But before we begin, let me start off by circling back to a tweet I posted a few years back with a golden rule one should ne’er forget when setting up stuff like this…


Okay, that’s enough self-promotion. Adventure awaits!

Step 1: Enable the proxy modules

$ sudo a2enmod proxy

$ sudo a2enmod proxy_http

$ sudo /etc/init.d/apache restart

Step 2: Edit the vhost file…

$ sudo nano /etc/apache2/sites-available/mysite

And make it like this…

<VirtualHost *:80>

#

ServerName example.com #

ProxyPreserveHost On

ProxyRequests off

ProxyPass / http://IPADDR/

ProxyPassReverse / http://IPADDR/

#

</VirtualHost>

… now enable your site and reload.

$ sudo a2ensite mysite

$ sudo /etc/init.d/apache2 reload

But wait! There’s more! The focus today is on my main man Mr. HAProxy. This is going to be used as an HTTP / HTTPS proxy today, but haproxy can handle way more things, from mail to mysql, and really anything that uses TCP. Having a reverse proxy in front of your web server has several advantages…

• Not having the web server publicly accessible (for example for its protection)     

• Pre -process requests (number of sessions, bandwidth limitation, etc.)     

• Balancing requests between several web servers      

• Centralize public access (to use only one public IP with different web servers behind)     

SO! Start off with a web server, and a “gateway” server which will have the public address of your site, and on which we will install haproxy. Haproxy uses fairly obvious terms in its configuration:

• bind: defines on what IP and port the haproxy port will listen. For example, 192.168.1.1 on port 80     

• frontend: it is a configuration block that allows you to define all the rules that will apply (domains listened to, limitations, etc.). A frontend can apply to one or more binds.   

• backend: this is another configuration block, which is placed behind a frontend. If the frontend manages what is public (at the “front” of the server), the backend manages “the back”. This is where you will define the web servers to which to send requests.   

• acl: an “access control list” allows you to define conditions in a block, for example “if the domain contains site1, then do this, if the request is in https, then do this”. That’s enough jibba Jabba. Let’s begin our journey…

apt-get update

apt-get install haproxy

The 2 global and default sections allow you to define variables that will be applied to the rest of the configuration (except for more precise redefinition in a sub-block). The parameters set at installation are correct, so you can leave these sections as is for now. The haproxy configuration file is placed in /etc/haproxy/haproxy.cfg. However , to make it more convenient to configure, you can write your custom configurations in a /etc/haproxy/haproxy.local file , this will avoid having to modify the default file. Anyway, here’s a basic example:

http-in frontend

    bind IP_PUBLIC: 80

    http mode

    httplog option

    acl your_acl hdr (host) yourwebsite.tld

    use_backend backend1 if your_acl

BE ADVISED: YOU MUST MIND YOUR WHITESPACE SYNTAX

Just like Python, it’s all about INDENTATION. All the parameters of the block must be shifted below so that haproxy sees that these parameters are part of the block.

You must define the following:

• frontend http-in: the frontend keyword indicates the presence of a frontend configuration block. here, http-in is a name for this frontend, chosen arbitrarily. You can name the frontend as you wish, a good practice is to take a clear name and not too long 🙂   

• PUBLIC_IP: the IP address on which haproxy will listen. You can specify a specific IP, or 0.0.0.0 to listen to all the IPs present on the server. You can also put several lines one below the other to add specific IPs   

• http mode: we define that this frontend will only process the HTTP protocol (and therefore also HTTPS). This already allows haproxy to analyze requests, and reject anything that is not formatted correctly with respect to RFCs.   

• httplog option: allows you to log the details of http requests. This allows you to have more information in the haproxy logs (headers, http session, …).   

• acl: an ACL is defined, which will be recognized if the HOST part of the http request corresponds exactly to yourwebsite.tld . It is also possible to search for the end of a host (everything that ends with yourwebsite.tld), begins with , contains such word, etc. Here if it matches, the acl will be named your_acl , we can reuse it in the rest of the block.   

• use_backend: here we define that we will use the backend backend1 IF the ac your_acl is active. So in our case, if the processed request contains exactly yourwebsite.tld in the HOST part, the acl is active. Everything overlaps. Here are a few options…

Backend configuration

Once the frontend is ready and receive public traffic, you create the backend that will be able to know where to send these requests

backend backend1

    http mode

    httpchk option

    option forwardfor except 127.0.0.1

    http-request add-header X-Forwarded-Proto https if {ssl_fc}

    server web-server1 WEB_SERVER_IP: 80 maxconn 32

In the same way as for the frontend, pay attention to the indentation so that the parameters are shifted below the keyword backend

Here you define:

• backend backend1: the backend keyword is used to indicate the start of a backend block. The name backend1 is optional, as for the name of the frontend. It will be the one to use in the frontend when writing the use_backend   

• http mode: as for the frontend, this indicates that this backend will take care of http, and allows to use various practical options ( rewrite header in particular)   

• httpchk option: httpchk allows haproxy to check the status of the web servers behind it at any time so it can know if the server is ready to receive requests, switch to a backup server, display an error page in the event of a failure, etc. Basically it is a simple HTTP check but it is possible for example to specify a script or a precise path.

• forwardfor except 127.0.0.1 This option will allow to add xforwardfor in requests are going through the backend, head containing the real IP address of the visitor. The requests passing by the proxy, but it is its IP which will be seen network level by the web server which can be really fuckin’ annoying to if you wanna make statistics from visits. You would have the impression that all the visits come from the proxy server … The except 127.0.0.1 makes it possible to avoid adding this header if it is 127.0.0.1 which generated the request.   

• server web-server1: this definition will indicate the server to which to transmit requests. IP_SERVER_WEB is of course the IP address of the web server. : 80 used to indicate the port or transmit. It is possible to indicate several lines to define several web servers and do load balancing.   

• maxconn 32: allows to limit the maximum number of connections managed by this server, here 32. This allows to avoid overloading the web server above its capacity for example, and to directly and inexpensively mitigate an attack .   

Now, restart haproxy to apply and let dry.

systemctl restart haproxy

“What about HTTPS?!”

Setting up an https frontend is very simple. Create a second frontend bound to port 443…

frontend https-in

    bind IP_PUBLIC: 443 ssl crt / etc / haproxy / cert / no-sslv3

    http mode

    httplog option

    acl your_acl hdr (host) yourwebsite.tld

    use_backend backend1 if your_acl

BE ADVISED: There are some differences from the configuration defined previously…

• https-in: the name is there as desired, but it must be different from the first, otherwise haproxy will return an error at startup.   

• bind: the bind line changes. We see that this time we will put on port 443 instead of 80, HTTPS being by default on port 443. Nothing prevents you from putting another port according to your needs.   

• ssl: this keyword (usable on the same line as bind) allows to indicate to haproxy that it will have to do SSL on this bind   

• crt / etc / haproxy / cert /: defines the directory in which you put your certificates. haproxy manages certificates in pem format, which you can simply create as follows by merging the .crt and the .key:   

cat domain.tld.crt domain.tld.key> domain.tld.pem

• no-sslv3: this makes it possible to specify to haproxy to refuse to use the sslv3 protocol, considered from now on as insecure.   

All other options are the same as for the HTTP frontend (acl, use_backend, …)

haproxy manages certificates very efficiently. You can bulk them in the directory, haproxy parse them at startup and use the certificates as precisely as possible.

If for example you have a wildcard certificate (* .domain.tld) and a more precise certificate (api.domain.tld)

• Visits to api.domain.tld will use the certificate of api.domain.tld   

• Visits to domain.tld will use the wildcard certificate   

• Visits to whatever.domain.tld will use the wildcard certificate   

Since it is the frontend which manages the HTTPS part, you can use exactly the same backend for the http frontend and the https frontend. HAProxy has tons of extra options so make sure to RTFM at some point.

MULTISITE CONFIGURATION:

As you can only create one frontend listening on port 80 (or 443) on an IP address, you will have to use the same frontend to manage several sites.

This is done using several acl in the frontend, for example:

acl site1 hdr (host) site1.tld

acl subdomain hdr (host) yourdomain.site1.tld

acl other-site2 hdr (host) other.site2.tld

You can then use one or more backend according to the acl:

use_backend backend1 if site1 or subdomain or other-site2

Or …

use_backend backend1 if site1 or subdomain

use_backend backend2 if other-site2

In all these examples we used hdr (host) in my acl, which allows you to search for the exact content of the HOST variable in the HTTP request. But it is possible to make more it general, such as creating acl based on the end of the HOST, and first see if HOST contains a particular string. This can create an acl that will manage everything that ends in site.tld, everything that contains www, and blah blah blah… for example:

acl test1 hdr_beg (host) www. # match what starts with www.

acl test2 hdr_end (host) domain.tld # match everything that ends with domain.tld

acl test3 hdr_reg (host) REGEX # match anything that matches the regular expression REGEX

You can now use these acl in any backend you want. You’re welcome.

“COME AT ME, BRO!”

A common practice is to use the default backend like a high school gym locker for nerds. All visits arriving on your server and which do not match an acl can be considered as trash, or even as a potential attack, and therefore as much not to redirect them to a default site (as does for example the basic vhost apache default). Just add in the frontend the keyword default_backend which allows to define a backend which will be used for the queries which did not match any acl in the frontend, and to create a backend which will return for example an error 403 “Access forbidden” :

http-in frontend

    […]

    default_backend trash

trash backend

    http mode

    http-request deny

“Speaking of load distribution…”

If you define several server lines in a backend, then haproxy will automatically distribute (via the round robin protocol) incoming requests fairly between the servers. But it is also possible to define weights for these servers, if for example one of them is more powerful, you can make it as much more responsible.

backend backend1

    […]

    server server1 ip1: 80 weight 1

    server server2 ip2: 80 weight 2

Here, server2 will receive twice as many requests as server1. You can also use the backup keyword:

backend backend1

    […]

    server server1 ip1: 80

    server server2 ip2: 80 backup

Here, server1 will receive all requests , but if haproxy detects that it is no longer accessible, then it will automatically send all requests to server2. It is also possible to combine these 2 techniques, to distribute the load between several servers while having others available for backup.

It is also possible to tell haproxy, in your frontend on port 80, that such a site must absolutely use HTTPS. In which case, if haproxy receives a request in HTTP, then it will immediately redirect the visitor to exactly the same request but in HTTPS:

http-in frontend

    […]

    # We have several acl

    acl site1 hdr (host) site1.tld

    acl subdomain hdr (host) subdomain.site1.tld

    acl other-site2 hdr (host) other.site2.tld

    redirect scheme https code 301 if! {ssl_fc} site1.tld or subdomain

    use_backend backend2 if other-site2

Okay so here we will force via a code 301 (permanent redirection) the redirection to HTTPS (we detect that the ssl protocol is not used) for access on site1.tld and subdomain.site1.tld, but we remain in HTTP and we use backend2 if the visit is on other.site2.tld

You can, in a backend, write all the headers you want. If the frontend in front of the backend is in https, you can add a header which will indicate to the web server behind that there has been https processing, even if the web server never sees the certificate since it is haproxy that takes care of it:

backend backend1

    […]

    http-request add-header X-Forwarded-Proto https if {ssl_fc}

Okay, good talk. Time for a beer.

Posted in DevOps, Infrastructure