Apache module mod_proxy

This module provides for an HTTP 1.1 caching proxy server.

Status: Extension
Source File: mod_proxy.c
Module Identifier: proxy_module
Compatibility: Available in Apache 1.1 and later.

Summary

This module implements a proxy/cache for Apache. It implements proxying capability for FTP, CONNECT (for SSL), HTTP/0.9, HTTP/1.0, and (as of Apache 1.3.23) HTTP/1.1. The module can be configured to connect to other proxy modules for these and other protocols.

This module was experimental in Apache 1.1.x. As of Apache 1.2, mod_proxy stability is greatly improved.

Warning: Do not enable proxying with ProxyRequests until you have secured your server. Open proxy servers are dangerous both to your network and to the Internet at large.

Directives

Common configuration topics

Controlling access to your proxy

You can control who can access your proxy via the normal <Directory> control block using the following example:
<Directory proxy:*>
Order Deny,Allow
Deny from all
Allow from yournetwork.example.com
</Directory>

A <Files> block will also work, and is the only method known to work for all possible URLs in Apache versions earlier than 1.2b10.

For more information, see mod_access.

Using Netscape hostname shortcuts

There is an optional patch to the proxy module to allow Netscape-like hostname shortcuts to be used. It's available from the contrib/patches/1.2 directory on the Apache Web site.

Why doesn't file type xxx download via FTP?

You probably don't have that particular file type defined as application/octet-stream in your proxy's mime.types configuration file. A useful line can be
application/octet-stream        bin dms lha lzh exe class tgz taz

How can I force an FTP ASCII download of File xxx?

In the rare situation where you must download a specific file using the FTP ASCII transfer method (while the default transfer is in binary mode), you can override mod_proxy's default by suffixing the request with ;type=a to force an ASCII transfer. (FTP Directory listings are always executed in ASCII mode, however.)

How can I access FTP files outside of my home directory?

A FTP URI is interpreted relative to the home directory of the user who is logging in. Alas, to reach higher directory levels you cannot use /../, as the dots are interpreted by the browser and not actually sent to the FTP server. To address this problem, the so called "Squid %2f hack" was implemented in the Apache FTP proxy; it is a solution which is also used by other popular proxy servers like the Squid Proxy Cache. By prepending /%2f to the path of your request, you can make such a proxy change the FTP starting directory to / (instead of the home directory).
Example: To retrieve the file /etc/motd, you would use the URL

ftp://user@host/%2f/etc/motd

How can I hide the FTP cleartext password in my browser's URL line?

To log in to an FTP server by username and password, Apache uses different strategies. In absense of a user name and password in the URL altogether, Apache sends an anomymous login to the FTP server, i.e.,

user: anonymous
password: apache_proxy@
This works for all popular FTP servers which are configured for anonymous access.
For a personal login with a specific username, you can embed the user name into the URL, like in: ftp://username@host/myfile. If the FTP server asks for a password when given this username (which it should), then Apache will reply with a [401 Authorization required] response, which causes the Browser to pop up the username/password dialog. Upon entering the password, the connection attempt is retried, and if successful, the requested resource is presented. The advantage of this procedure is that your browser does not display the password in cleartext (which it would if you had used ftp://username:password@host/myfile in the first place).
Note that the password which is transmitted in such a way is not encrypted on its way. It travels between your browser and the Apache proxy server in a base64-encoded cleartext string, and between the Apache proxy and the FTP server as plaintext. You should therefore think twice before accessing your FTP server via HTTP (or before accessing your personal files via FTP at all!) When using unsecure channels, an eavesdropper might intercept your password on its way.

Why does Apache start more slowly when using the proxy module?

If you're using the ProxyBlock or NoCache directives, hostnames' IP addresses are looked up and cached during startup for later match test. This may take a few seconds (or more) depending on the speed with which the hostname lookups occur.

Can I use the Apache proxy module with my SOCKS proxy?

Yes. Just build Apache with the rule SOCKS4=yes in your Configuration file, and follow the instructions there. SOCKS5 capability can be added in a similar way (there's no SOCKS5 rule yet), so use the EXTRA_LDFLAGS definition, or build Apache normally and run it with the runsocks wrapper provided with SOCKS5, if your OS supports dynamically linked libraries.

Some users have reported problems when using SOCKS version 4.2 on Solaris. The problem was solved by upgrading to SOCKS 4.3.

Remember that you'll also have to grant access to your Apache proxy machine by permitting connections on the appropriate ports in your SOCKS daemon's configuration.

What other functions are useful for an intranet proxy server?

An Apache proxy server situated in an intranet needs to forward external requests through the company's firewall. However, when it has to access resources within the intranet, it can bypass the firewall when accessing hosts. The NoProxy directive is useful for specifying which hosts belong to the intranet and should be accessed directly.

Users within an intranet tend to omit the local domain name from their WWW requests, thus requesting "http://somehost/" instead of "http://somehost.my.dom.ain/". Some commercial proxy servers let them get away with this and simply serve the request, implying a configured local domain. When the ProxyDomain directive is used and the server is configured for proxy service, Apache can return a redirect response and send the client to the correct, fully qualified, server address. This is the preferred method since the user's bookmark files will then contain fully qualified hosts.


ProxyRequests directive

Syntax: ProxyRequests on|off
Default: ProxyRequests Off
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyRequests is only available in Apache 1.1 and later.

This allows or prevents Apache from functioning as a proxy server. Setting ProxyRequests to 'off' does not disable use of the ProxyPass directive.

Warning: Do not enable proxying until you have secured your server. Open proxy servers are dangerous both to your network and to the Internet at large.


ProxyRemote directive

Syntax: ProxyRemote match remote-server
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyRemote is only available in Apache 1.1 and later.

This defines remote proxies to this proxy. match is either the name of a URL-scheme that the remote server supports, or a partial URL for which the remote server should be used, or '*' to indicate the server should be contacted for all requests. remote-server is a partial URL for the remote server. Syntax:

  remote-server = protocol://hostname[:port]
protocol is the protocol that should be used to communicate with the remote server; only "http" is supported by this module.

Example:

  ProxyRemote http://goodguys.com/ http://mirrorguys.com:8000
  ProxyRemote * http://cleversite.com
  ProxyRemote ftp http://ftpproxy.mydomain.com:8080
In the last example, the proxy will forward FTP requests, encapsulated as yet another HTTP proxy request, to another proxy which can handle them.

ProxyPass directive

Syntax: ProxyPass path url
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyPass is only available in Apache 1.1 and later.

This directive allows remote servers to be mapped into the space of the local server; the local server does not act as a proxy in the conventional sense, but appears to be a mirror of the remote server. path is the name of a local virtual path; url is a partial URL for the remote server.

Suppose the local server has address http://wibble.org/; then

   ProxyPass /mirror/foo/ http://foo.com/
will cause a local request for the <http://wibble.org/mirror/foo/bar> to be internally converted into a proxy request to <http://foo.com/bar>.

ProxyPassReverse directive

Syntax: ProxyPassReverse path url
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyPassReverse is only available in Apache 1.3b6 and later.

This directive lets Apache adjust the URL in the Location header on HTTP redirect responses. For instance this is essential when Apache is used as a reverse proxy to avoid by-passing the reverse proxy because of HTTP redirects on the backend servers which stay behind the reverse proxy.

path is the name of a local virtual path.
url is a partial URL for the remote server - the same way they are used for the ProxyPass directive.

Example:
Suppose the local server has address http://wibble.org/; then

   ProxyPass         /mirror/foo/ http://foo.com/
   ProxyPassReverse  /mirror/foo/ http://foo.com/
will not only cause a local request for the <http://wibble.org/mirror/foo/bar> to be internally converted into a proxy request to <http://foo.com/bar> (the functionality ProxyPass provides here). It also takes care of redirects the server foo.com sends: when http://foo.com/bar is redirected by him to http://foo.com/quux Apache adjusts this to http://wibble.org/mirror/foo/quux before forwarding the HTTP redirect response to the client.

Note that this ProxyPassReverse directive can also be used in conjunction with the proxy pass-through feature ("RewriteRule ... [P]") from mod_rewrite because its doesn't depend on a corresponding ProxyPass directive.


AllowCONNECT directive

Syntax: AllowCONNECT port [port] ...
Default: AllowCONNECT 443 563
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: AllowCONNECT is only available in Apache 1.3.2 and later.

The AllowCONNECT directive specifies a list of port numbers to which the proxy CONNECT method may connect. Today's browsers use this method when a https connection is requested and proxy tunneling over http is in effect.
By default, only the default https port (443) and the default snews port (563) are enabled. Use the AllowCONNECT directive to override this default and allow connections to the listed ports only.


ProxyBlock directive

Syntax: ProxyBlock *|word|host|domain [word|host|domain] ...
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyBlock is only available in Apache 1.2 and later.

The ProxyBlock directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP, HTTPS, and FTP document requests to sites whose names contain matched words, hosts or domains are blocked by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:

  ProxyBlock joes-garage.com some-host.co.uk rocky.wotsamattau.edu
'rocky.wotsamattau.edu' would also be matched if referenced by IP address.

Note that 'wotsamattau' would also be sufficient to match 'wotsamattau.edu'.

Note also that

ProxyBlock *
blocks connections to all sites.

ProxyReceiveBufferSize directive

Syntax: ProxyReceiveBufferSize bytes
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyReceiveBufferSize is only available in Apache 1.3 and later.

The ProxyReceiveBufferSize directive specifies an explicit network buffer size for outgoing HTTP and FTP connections, for increased throughput. It has to be greater than 512 or set to 0 to indicate that the system's default buffer size should be used.

Example:

  ProxyReceiveBufferSize 2048

ProxyIOBufferSize directive

Syntax: ProxyIOBufferSize bytes
Default: 8192
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyIOBufferSize is only available in Apache 1.3.24 and later.

The ProxyIOBufferSize directive specifies the number of bytes that will be read from a remote HTTP or FTP server at one time. This directive is different from the ProxyReceiveBufferSize directive, which specifies the low level socket buffer size.

When a response is received which fits entirely within the IO buffer size, the remote HTTP or FTP server socket will be closed before an attempt is made to write the response to the client. This ensures that the remote server does not remain connected unnecessarily while the response is delivered to a slow client. A high value for the IO buffer decreases the load on remote HTTP and FTP servers, at the expense of greater RAM footprint on the proxy.

Example:

  ProxyIOBufferSize 131072

NoProxy directive

Syntax: NoProxy Domain|SubNet|IpAddr|Hostname [Domain|SubNet|IpAddr|Hostname] ...
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: NoProxy is only available in Apache 1.3 and later.

This directive is only useful for Apache proxy servers within intranets. The NoProxy directive specifies a list of subnets, IP addresses, hosts and/or domains, separated by spaces. A request to a host which matches one or more of these is always served directly, without forwarding to the configured ProxyRemote proxy server(s).

Example:

  ProxyRemote  *  http://firewall.mycompany.com:81
  NoProxy         .mycompany.com 192.168.112.0/21 
The arguments to the NoProxy directive are one of the following type list:
Domain
A Domain is a partially qualified DNS domain name, preceded by a period. It represents a list of hosts which logically belong to the same DNS domain or zone (i.e., the suffixes of the hostnames are all ending in Domain).
Examples: .com .apache.org.
To distinguish Domains from Hostnames (both syntactically and semantically; a DNS domain can have a DNS A record, too!), Domains are always written with a leading period.
Note: Domain name comparisons are done without regard to the case, and Domains are always assumed to be anchored in the root of the DNS tree, therefore two domains .MyDomain.com and .mydomain.com. (note the trailing period) are considered equal. Since a domain comparison does not involve a DNS lookup, it is much more efficient than subnet comparison.
SubNet
A SubNet is a partially qualified internet address in numeric (dotted quad) form, optionally followed by a slash and the netmask, specified as the number of significant bits in the SubNet. It is used to represent a subnet of hosts which can be reached over a common network interface. In the absence of the explicit net mask it is assumed that omitted (or zero valued) trailing digits specify the mask. (In this case, the netmask can only be multiples of 8 bits wide.)
Examples:
192.168 or 192.168.0.0
the subnet 192.168.0.0 with an implied netmask of 16 valid bits (sometimes used in the netmask form 255.255.0.0)
192.168.112.0/21
the subnet 192.168.112.0/21 with a netmask of 21 valid bits (also used in the form 255.255.248.0)
As a degenerate case, a SubNet with 32 valid bits is the equivalent to an IPAddr, while a SubNet with zero valid bits (e.g., 0.0.0.0/0) is the same as the constant _Default_, matching any IP address.
IPAddr
A IPAddr represents a fully qualified internet address in numeric (dotted quad) form. Usually, this address represents a host, but there need not necessarily be a DNS domain name connected with the address.
Example: 192.168.123.7
Note: An IPAddr does not need to be resolved by the DNS system, so it can result in more effective apache performance.

See Also: DNS Issues

Hostname
A Hostname is a fully qualified DNS domain name which can be resolved to one or more IPAddrs via the DNS domain name service. It represents a logical host (in contrast to Domains, see above) and must be resolvable to at least one IPAddr (or often to a list of hosts with different IPAddr's).
Examples: prep.ai.mit.edu www.apache.org.
Note: In many situations, it is more effective to specify an IPAddr in place of a Hostname since a DNS lookup can be avoided. Name resolution in Apache can take a remarkable deal of time when the connection to the name server uses a slow PPP link.
Note: Hostname comparisons are done without regard to the case, and Hostnames are always assumed to be anchored in the root of the DNS tree, therefore two hosts WWW.MyDomain.com and www.mydomain.com. (note the trailing period) are considered equal.

See Also: DNS Issues


ProxyDomain directive

Syntax: ProxyDomain Domain
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyDomain is only available in Apache 1.3 and later.

This directive is only useful for Apache proxy servers within intranets. The ProxyDomain directive specifies the default domain which the apache proxy server will belong to. If a request to a host without a domain name is encountered, a redirection response to the same host with the configured Domain appended will be generated.

Example:

  ProxyRemote  *  http://firewall.mycompany.com:81
  NoProxy         .mycompany.com 192.168.112.0/21 
  ProxyDomain     .mycompany.com

ProxyVia directive

Syntax: ProxyVia on|off|full|block
Default: ProxyVia off
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: ProxyVia is only available in Apache 1.3.2 and later.

This directive controls the use of the Via: HTTP header by the proxy. Its intended use is to control the flow of of proxy requests along a chain of proxy servers. See RFC2068 (HTTP/1.1) for an explanation of Via: header lines.


CacheForceCompletion directive

Syntax: CacheForceCompletion percentage
Default: 90
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheForceCompletion is only available in Apache 1.3.1 and later.

If an http transfer that is being cached is cancelled, the proxy module will complete the transfer to cache if more than the percentage specified has already been transferred.

This is a percentage, and must be a number between 1 and 100, or 0 to use the default. 100 will cause a document to be cached only if the transfer was allowed to complete. A number between 60 and 90 is recommended.


CacheRoot directive

Syntax: CacheRoot directory
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheRoot is only available in Apache 1.1 and later.

Sets the name of the directory to contain cache files; this must be writable by the httpd server. (see the User directive).
Setting CacheRoot enables proxy cacheing; without defining a CacheRoot, proxy functionality will be available if ProxyRequests are set to On, but no cacheing will be available.


CacheSize directive

Syntax: CacheSize kilobytes
Default: CacheSize 5
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheSize is only available in Apache 1.1 and later.

Sets the desired space usage of the cache, in KB (1024-byte units). Although usage may grow above this setting, the garbage collection will delete files until the usage is at or below this setting.
Depending on the expected proxy traffic volume and CacheGcInterval, use a value which is at least 20 to 40 % lower than the available space.


CacheGcInterval directive

Syntax: CacheGcInterval hours
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheGcinterval is only available in Apache 1.1 and later.

Check the cache after the specified number of hours, and delete files if the space usage is greater than that set by CacheSize. Note that hours accepts a float value, you could for example use CacheGcInterval 1.5 to check the cache every 90 minutes. (If unset, no garbage collection will be performed, and the cache will grow indefinitely.) Note also that the larger the CacheGcInterval, the more extra space beyond the configured CacheSize will be needed for the cache between garbage collections.


CacheMaxExpire directive

Syntax: CacheMaxExpire hours
Default: CacheMaxExpire 24
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheMaxExpire is only available in Apache 1.1 and later.

Specifies the maximum number of hours for which cachable HTTP documents will be retained without checking the origin server. Thus, documents will be out of date at most this number of hours This restriction is enforced even if an expiry date was supplied with the document.


CacheLastModifiedFactor directive

Syntax: CacheLastModifiedFactor factor
Default: CacheLastModifiedFactor 0.1
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheLastModifiedFactor is only available in Apache 1.1 and later.

If the origin HTTP server did not supply an expiry date for the document, then estimate one using the formula

  expiry-period = time-since-last-modification * factor
For example, if the document was last modified 10 hours ago, and factor is 0.1, then the expiry period will be set to 10*0.1 = 1 hour.

If the expiry-period would be longer than that set by CacheMaxExpire, then the latter takes precedence.


CacheDirLevels directive

Syntax: CacheDirLevels levels
Default: CacheDirLevels 3
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheDirLevels is only available in Apache 1.1 and later.

CacheDirLevels sets the number of levels of subdirectories in the cache. Cached data will be saved this many directory levels below CacheRoot.


CacheDirLength directive

Syntax: CacheDirLength length
Default: CacheDirLength 1
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheDirLength is only available in Apache 1.1 and later.

CacheDirLength sets the number of characters in proxy cache subdirectory names.


CacheDefaultExpire directive

Syntax: CacheDefaultExpire hours
Default: CacheDefaultExpire 1
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: CacheDefaultExpire is only available in Apache 1.1 and later.

If the document is fetched via a protocol that does not support expiry times, then use the specified number of hours as the expiry time. CacheMaxExpire does not override this setting.


NoCache directive

Syntax: NoCache *|word|host|domain [word|host|domain] ...
Default: None
Context: server config, virtual host
Override: Not applicable
Status: Base
Module: mod_proxy
Compatibility: NoCache is only available in Apache 1.1 and later.

The NoCache directive specifies a list of words, hosts and/or domains, separated by spaces. HTTP and non-passworded FTP documents from matched words, hosts or domains are not cached by the proxy server. The proxy module will also attempt to determine IP addresses of list items which may be hostnames during startup, and cache them for match test as well. Example:

  NoCache joes-garage.com some-host.co.uk bullwinkle.wotsamattau.edu
'bullwinkle.wotsamattau.edu' would also be matched if referenced by IP address.

Note that 'wotsamattau' would also be sufficient to match 'wotsamattau.edu'.

Note also that

NoCache *
disables caching completely.