Caching CGI Proxy on sourceforge.net

Caching CGI Proxy

About

Unlike other CGI proxies, this one exists to provide a high bandwidth front-end cache for a low bandwidth web site, rather than for firewall piercing. Using mod_rewrite rules, you can accelerate/mirror your DSL-hosted site on a cheap shared server.

Similar functionality is provided by the Squid proxy running in reverse proxy (or "httpd-accelerator") mode. You can also achieve the same effect with an appropriate configuration of Apache httpd's mod_cache module. If these options are available to you for free, they will provide a better solution than this program.

However, to rent a server that will allow you to run a squid proxy or configure your own apache httpd will easily cost 10 to 100 times what a shared host offering perl CGI will cost, for the same amount of storage/bandwidth.

Since I wanted the cheap solution, I wrote a caching proxy in perl CGI form.

Dependencies

The CGI proxy requires several perl modules to be installed and available on your host. If your host doesn't have them, it's possible to install them in your home directory. These are the required modules: LWP::UserAgent CGI File::Basename File::Path Fcntl URI::Escape POSIX. Only LWP is likely to be missing from a perl install, and most CGI hosts should have it.

Using the CGI proxy

In order to use it, you will have to download and install caching-proxy.cgi in the normal way that you would install a CGI program on your web host (for example, by putting the file in public_html/cgi-bin/ and running chmod +x on it).

Then you will need to edit the file to configure it: you need to tell it what site to mirror, and where to save the cache. Edit these two lines:

$REMOTE_HOST = 'jerkface.net';
$CACHE_DIR = '/home/childrf6/www/jerkface.net/cache';
to reflect the appropriate settings for your host.

Finally, you will probably want to add Apache mod_rewrite rules (i.e., to add these lines to a file in your web root named .htaccess) so that your users will not have to use the CGI script name in URLs. This is the configuration I use to make childrenofmay.org mirror jerkface.net:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(?!cgi-bin/)(?:childrenofmay\.org/)?(.*)$ cgi-bin/caching-proxy.cgi?$1 [L]

The RewriteCond lines will make sure that if a file is stored directly on childrenofmay.org, it won't be treated as cached. Otherwise, a request for http://childrenofmay.org/whatever will be silently redirected to http://jerkface.net/whatever. If you don't want that behavior, you can just exclude those lines.

Features

I originally wanted a proxy like this because when I uploaded large files to my web host, I always wanted to post the link before the upload had finished, but sometimes people would end up downloading incomplete versions -- depending on which transfer won the race. This proxy eliminates that race. So I was careful to make sure that the proxy efficiently supported the case of multiple downloaders downloading a large file while it was still being downloaded slowly from a remote host. This makes it safe to post a link to a large file the instant you make it available on your DSL site. The first download will be slow, but it will still be faster than waiting for the whole file to be uploaded to the web host before you start. Subsequent and simultaneous downloads will benefit from the bandwidth of the web host.

In principle, full RFC 2616 HTTP caching semantics would be possible in a CGI script. However, I have not implemented more than I need for my own configuration. Currently, the script will check the remote host for updates with Modified-Since headers and serve or invalidate the cache depending on what it finds. It will also support resuming from downloads the server using Range headers; however, it does not allow clients to resume downloads.

Status and future Development

In the future, I intend to add support for resuming downloads on clients. I might also write an installer and modify the script to present a configuration panel, instead of making users edit the source file.

I don't intend to support cache control headers any time soon. However, maybe some day in the distant future I will. If you have any use for such a feature, tell me about it; maybe I'll change my mind.

Project Web Hosted by SourceForge.net

About - Legal - Help