Proxying Requests based on arbitrary Strings with Apache

Although Apache comes with a lot of features, there’s one that’s been missing since the late nineties. That’s redirecting requests to a remote proxy server based on the file extension. Currently Apache only supports proxying based on the directory using the

ProxyPass /some/dir http://my.backend.server/some/other/dir

statement. While being a nice feature this only works satisfyingly as long as you kept your different code bases well apart in their respective directories. But what if you didn’t? In that case it’s time to go back to the roots and remember where the name “Apache” came from. Let’s get our hands dirty and patch “a patchy server” ;)

Introduction
What we want to achive is: Instead of proxying requests based on the target directory we’d like to have Apache send off requests based on file extension (or any other given string for that matter). So basically, instead of saying

ProxyPass /some/dir http://my.backend.server/other/dir

we’d like to use

ProxyPass .java http://my.bigfat.javamachine/java/dir
ProxyPass .web4dot0 http://my.dream.machine/web4.0/

and so on.

The best place to start is mod_proxy for it is already there and does a lot of the things we want. Unfortunately, mod_proxy doesn’t do all the things we need. That’s because in Apache versions before the 2.2.x series mod_proxy didn’t know anything about Cookies and ProxyDomains. So we need Apache 2.2.x, right?

No, we don’t. Luckily, the mod_proxy module from Apache 2.2.x is largely compatible to Apache 2.0.x. But before we go into the gory details of how to make mod_proxy from Apache 2.2.x work with Apache 2.0.x let’s first have a look at the code. So go get a source tarball of Apache 2.2.x, unpack and change to

httpd-2.2.x/modules/proxy

. There you’ll find

mod_proxy.c

which is the file we’re going to modify. So fire up your preferred editor and load it.

The Surgery
The function we need to change first is

static int alias_match(const char *uri, const char *alias_fakename)

. Change it to look like this:

static int alias_match(const char *uri, const char *alias_fakename)
{
    const char *end_fakename = alias_fakename + strlen(alias_fakename);
    const char *aliasp = alias_fakename, *urip = uri;
    const char *end_uri = uri + strlen(uri);

    // if first character of alias isn't a slash, we're going to just
    // match and return the whole path
    if (*aliasp != '/') {
        if (strstr(urip, aliasp)) {
            return -1;
        } else {
            return 0;
        }
    } else {
        while (aliasp < end_fakename && urip < end_uri) {
            if (*aliasp == '/') {
                /* any number of '/' in the alias matches any number in
                 * the supplied URI, but there must be at least one...
                 */
                if (*urip != '/')
                    return 0;

                while (*aliasp == '/')
                    ++aliasp;
                while (*urip == '/')
                    ++urip;
            }
            else {
                /* Other characters are compared literally */
                if (*urip++ != *aliasp++)
                    return 0;
            }
        }

        /* fixup badly encoded stuff (e.g. % as last character) */
        if (aliasp > end_fakename) {
            aliasp = end_fakename;
        }
        if (urip > end_uri) {
            urip = end_uri;
        }

        /* We reach the end of the uri before the end of "alias_fakename"
         * for example uri is "/" and alias_fakename "/examples"
         */
        if (urip == end_uri && aliasp!=end_fakename) {
            return 0;
        }

        /* Check last alias path component matched all the way */
        if (aliasp[-1] != '/' && *urip != '\0' && *urip != '/')
            return 0;

        /* Return number of characters from URI which matched (may be
         * greater than length of alias, since we may have matched
         * doubled slashes)
         */

        return urip - uri;
    }
}

(Don’t miss the closing bracket on the second to last line!)

Now, what have we done here? We just instructed mod_proxy to check whether the “path” we want to proxy starts with a slash. If it doesn’t, we tell the module to just check whether the string we passed it is contained in the requested URI. If so, we’re returning a value of -1. Ok, now this return value has to be evaluated somewhere, doesn’t it?

Right you are and the function we’re going to modify next is

static int proxy_trans(request_rec *r)

. Here we have to add the code that acts on our return value of -1:

static int proxy_trans(request_rec *r)
{
    void *sconf = r->server->module_config;
    proxy_server_conf *conf =
    (proxy_server_conf *) ap_get_module_config(sconf, &proxy_module);
    int i, len;
    struct proxy_alias *ent = (struct proxy_alias *) conf->aliases->elts;

    if (r->proxyreq) {
        /* someone has already set up the proxy, it was possibly ourselves
         * in proxy_detect
         */
        return OK;
    }

    /* XXX: since r->uri has been manipulated already we're not really
     * compliant with RFC1945 at this point.  But this probably isn't
     * an issue because this is a hybrid proxy/origin server.
     */

    for (i = 0; i < conf->aliases->nelts; i++) {
        len = alias_match(r->uri, ent[i].fake);

       if (len > 0) {
           if ((ent[i].real[0] == '!') && (ent[i].real[1] == 0)) {
               return DECLINED;
           }

           r->filename = apr_pstrcat(r->pool, "proxy:", ent[i].real,
                                     r->uri + len, NULL);
           r->handler = "proxy-server";
           r->proxyreq = PROXYREQ_REVERSE;
           return OK;
       } else if (len < 0) { // we have matched a substring
           if ((ent[i].real[0] == '!' ) && ( ent[i].real[1] == 0 )) {
               return DECLINED;
           }

           r->filename = apr_pstrcat(r->pool, "proxy:", ent[i].real,
                                     r->uri, NULL);
           r->handler  = "proxy-server";
           r->proxyreq = PROXYREQ_REVERSE;
           return OK;
       }
    }
    return DECLINED;
}

As you can see, this really is an ugly hack. It doesn’t do any path replacement as the original code will do. But it’s good enough for me and you’re encouraged to improve on the code. Oh, did I forget to tell you? We’re finished. Save, compile. What you’re able to do afterwards is this: To proxy requests for, say, .myext to the server my.backend.machine just add some statements to your httpd.conf or the respective vhost section. Like this:

ProxyPass         .myext    http://my.backend.machine/some/dir
ProxyPassReverse  .myext    http://my.backend.machine/some/dir

In case you’re using cookies you should also add

ProxyPassReverseCookieDomain    my.backend.machine         my.real.domain

To get the values for my.backend.machine and my.real.domain I recommend using Firefox with the “Live HTTP Headers” extension. The extension will tell you what the server gives back as the cookie domain for the backend machine. I had to learn the hard way that his isn’t neccessarily what you’d expect.

Disclaimer
The code published here works for me. Your mileage may vary. Don’t hold me responsible if it doesn’t work for you. Suggestions and feedback is welcome as are improvements to the code.

The Apache 2.0.x part
Ah, I almost forgot. To make this work using Apache 2.0.x, all you have to do is this: Grab a source tarballs of Apache 2.0.x and unpack. Rename or delete the original

modules/proxy

directory. Copy the

modules/proxy

directory from Apache 2.2.x to your Apache 2.0.x modules directory. Search and replace every occurence of ap_regex_t in the code with regex_t. Do the same for AP_REGEX.
Compile. This will result in an error. Now edit modules.mk in the modules/proxy directory:

mod_proxy.la: mod_proxy.slo proxy_util.slo
        $(SH_LINK) -rpath $(libexecdir) -module -avoid-version  mod_proxy.lo proxy_util.lo
mod_proxy_connect.la: <strong><font color="red">mod_</font></strong>proxy_connect.slo
        $(SH_LINK) -rpath $(libexecdir) -module -avoid-version  mod_proxy_connect.lo
mod_proxy_ftp.la: <strong><font color="red">mod_</font></strong>proxy_ftp.slo
        $(SH_LINK) -rpath $(libexecdir) -module -avoid-version  mod_proxy_ftp.lo
mod_proxy_http.la: <strong><font color="red">mod_</font></strong>proxy_http.slo
        $(SH_LINK) -rpath $(libexecdir) -module -avoid-version  mod_proxy_http.lo
DISTCLEAN_TARGETS = modules.mk
static =
shared =  mod_proxy.la mod_proxy_connect.la mod_proxy_ftp.la mod_proxy_http.la

Compile again. Install. Be Happy :-D

Incoming search terms:

This entry was posted in Linux, Misc, Software. Bookmark the permalink.

One Response to Proxying Requests based on arbitrary Strings with Apache

  1. Pingback: Happily Patching A Patchy Server: Extending RewriteMaps » WhoCares?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>