|
October 3, 2006
Proxy Auto-Configuration Gives Relief From Internet Traffic Chaos
Networks are inherently messy, but lately it seems that they are just
getting out of control. I don't mean physical messes--although those
certainly exist too. Instead I'm referring to the constantly increasing
quantity of seemingly random packets that are zipping around on all of
our networks, without any apparent order or intent, coming from unknown
sources and going to the weirdest of places.
One of the biggest contributors to this phenomenon is the large number
of auto-patch widgets that are embedded into every application and utility
under the sun, all of which generate frequent connection requests to
their own update servers. Meanwhile, Internet-centric and hosted applications
that essentially depend on connectivity for basic functionality are also
becoming more common, and they naturally generate a large amount of Web
traffic that otherwise wouldn't exist.
Going forward, companies like Google and Microsoft are also working
hard to come up with new Web-based applications that they hope we'll
integrate into our desktops, and which will just make managing Internet
traffic harder for administrators everywhere.
The most effective way to deal with this situation is usually to implement
some kind of caching proxy server, such as Squid,
or Microsoft's Internet
Security and Acceleration Server, or any of the other dozen-plus
similar offerings, and then force all client-side Web requests to go
through the proxy server. Properly implemented, these servers can provide
administrators with a single choke-point for all Web traffic, thereby
providing administrators a way to actively manage the traffic.
The primary factor in the overall effectiveness of these tools is the
ability and willingness of the network clients to use them, which isn't
always guaranteed. For example, protecting the network from mass patch
downloads largely depends on whether or not the applications will use
the proxy for outgoing connections, but it can be difficult to get all
of the various applications and utilities to use the proxy server in
the first place.
Similarly, a proxy server with an anti-virus plug-in can provide network-wide
protection that goes well beyond traditional desktop solutions, but this
whole value proposition will be made moot if network clients aren't actually
using the proxy server for their downloads.
This situation is compounded by the fact that almost all the different
Internet-aware applications have their own unique proxy configuration
mechanisms, each of which has to be managed and maintained separately
from all the others. This makes it hard for administrators to reach high
levels of functionality. Largely as a result of these kinds of difficulties,
many networks simply do not use proxy servers at all--one informal survey
showed that fewer than 25% of networks actually use them--meaning that
most network administrators are foregoing one of their most effective
management tools.
However, there are some automatic client-configuration services available
that can alleviate some, but not all, of the configuration difficulties.
These auto-configuration tools have made it much easier for network administrators
to effectively use proxy servers as part of a holistic management strategy.
Static Configuration Options
Generally speaking, there are usually only four or five ways to configure
the proxy settings for an application. Some of these mechanisms are shown
in the screenshot below, as taken from the Firefox Web browser.
By default, most applications are configured to connect directly to
Internet sites, which essentially means that they will not use any proxy
servers. When set up this way, every instance of every application has
to communicate with remote Web services, which generates a significant
amount of unmanaged traffic.
The first step up from that is to explicitly point the application
to a proxy server by specifying its host name and port number, as shown
in the "Manual" options in the screenshot above. This information
usually has to be specified for each transfer protocol that the application
supports, which means that maintenance is usually N multiples of each
application on each system. So this is perhaps the most inefficient of
all the configuration mechanisms.
Worse, hard-coded host names and port numbers tend to make networks
more fragile, so while this is sometimes the easiest configuration method
to set up initially--just type in the values and get it over with--it
can also lead to serious wide-scale problems if the proxy server is ever
replaced or moved.
In an effort to try and generalize proxy configuration beyond this
meager level, the developers at Netscape came up with a Proxy
Auto-Configuration (PAC) file for use with the Navigator browser.
PAC files are nowadays supported by almost every browser and operating
system but still aren't widely supported by many stand-alone utilities.
Essentially, the
PAC specification relies on a text file containing JavaScript code,
which the client parses and interprets with its internal JavaScript
interpreter on startup. The client applications only have to store a
URL that points to the location of the PAC file, as shown in the "Automatic
proxy configuration URL" field in the screen shot above.
The PAC file that I use on my lab network is shown below:
function FindProxyForURL(url, host) {
if (isInNet(host, "127.0.0.0", "255.0.0.0"))
return "DIRECT";
if (dnsDomainIs(host, "localhost"))
return "DIRECT";
if (shExpMatch(host, "localhost.*"))
return "DIRECT";
return "PROXY bulldog.labs.ntrg.com:3128; DIRECT"; }
That code essentially instructs the proxy client to connect directly
to any host in the loopback network, or any URL with a host name or domain
name that contains "localhost." All other connection requests
should be sent to port 3128 on bulldog.labs.ntrg.com, or should be tried
directly if the proxy server is unavailable.
PAC files are very powerful, and provide a lot of flexible configuration
options to clients. Furthermore, the contents of the PAC files are also
somewhat dynamic, since changes to the JavaScript will be picked up the
next time the client restarts, if not sooner. But while the file contents
are somewhat dynamic, the location of the file is still somewhat static,
since the applications have to rely on stored and fixed URLs that use
static hostnames and possibly port numbers.
Auto-Discovery Mechanisms
In an effort to further simplify automatic proxy configuration, Microsoft
published a specification for Web
Proxy Auto-Discovery (WPAD), which essentially describes two different
methods for a Web agent to automatically locate proxy control files on
the local network. In this model, a client can simply ask the network
for the URL of the local PAC file (instead of having to store the URL
itself), which the client will then download and process, as described
in the preceding section.
The WPAD protocol requires that clients first issue a DHCP request
for the location of the shared PAC file, with the answer containing a
simple URL that points to the location of the PAC file that the client
should use. If the DHCP request fails, then the WPAD specification says
that the client should follow up with a DNS lookup for the IP address
of a host named "WPAD" in the local domain. If an answer for
that query is returned, then the client must issue an HTTP request to
the specified host, asking for a file with the exact name of "wpad.dat." The
WPAD specification also describes some other lookup mechanisms beyond
these first two stabs, but those are extremely rare in the wild, and
might as well not actually exist.
The DHCP option is the more flexible of these two auto-configuration
mechanisms, since it allows the use of general-purpose URLs pointing
to any file, using any protocol, while the DNS option essentially requires
using the fixed URL of http://wpad.your.domain/wpad.dat. You
can even point to a shared file systems via a "file://" URL
if you want. In contrast, the DNS method requires the use of a pre-defined
transfer protocol, host name, and file name. However, while most of the
proxy clients support the use of the DNS discovery mechanism described
above, very few of them use the DHCP mechanism.
For example, the "Auto-detect proxy settings" option shown
in the screenshot above is for Firefox, which uses only the DNS method,
while Internet Explorer is the only browser I know of that actually supports
both DHCP and DNS. As such, if you want to use the WPAD protocol, you
are pretty much required to use the DNS mechanism, regardless of whether
you also use the DHCP method.
If you decide to go with the DNS-based WPAD method (which you should),
then you can take advantage of the mandatory naming to normalize other
configuration methods too. For example, since you'll need to have a host
named WPAD.your.domain for the DNS lookup to succeed, you can use that
host name for your static configurations too. Similarly, you'll need
to use a PAC file named WPAD.DAT so you might as well reference that
file in any URL-based configurations too.
There are other options for pushing proxy configuration out to clients
that do not rely on any of these discovery protocols. For example, you
can use predefined Active
Directory Group Policy settings to push Internet Explorer proxy settings
down to your Windows clients, or you can use legacy
NT Domain policy settings with Samba and NAS appliances to do the
same basic thing. Using this model, you can configure most of your PCs
with the same information in a single operation, without having to rely
on auto-discovery methods.
Furthermore, Windows-based applications that reuse Internet Explorer's
DLLs for transfer purposes automatically inherit its proxy settings.
Unix-derived operating systems and applications can also reuse configuration
settings, but this varies by application.Command-line tools typically
read the fixed values stored in the environment variables but nothing
else, while Gnome applications often use the Gnome-specific fixed-configuration
settings, and so forth. Even if the applications understand the same
locator mechanism, they might not support features such as user authentication,
meaning that consistency can still be spotty even where there is reuse.
Overall, there is still a lack of consistency among applications on each
platform, not to mention a lack of consistency across them.
One last option worth mentioning before I sign off--due to the kinds
of problems described above, some organizations go so far as to use "transparent" proxies
on their networks. Transparent proxies essentially work by having a router
or forwarder intercept traffic to TCP port 80, and then redirect that
traffic to a proxy server, where it is processed like normal. This seems
like a good idea on the surface, since you don't have to perform any
client configuration whatsoever, but there are also costs associated
with this approach that can make it unacceptable.
For example, if a URL points to an odd-ball port number like 8080,
then the traffic might not get redirected to the transparent proxy for
processing. Furthermore, protocols like FTP may or may not be supported
by the proxy server, so you might lose access to alternative protocols
that otherwise work with visible proxies. Transparent proxies are also
known to have problems with sites that depend on fixed IP addresses for
access control, and user authentication problems can crop up too. All
told, it's best to configure as many agents as possible with WPAD and
network policies, and send the rest of the traffic sources through the
transparent proxy if needed.
Written by Eric
A. Hall.
Copyright © 2004 CMP Media, Inc. Used with permission. |