I needed an effective way to locate proxy servers using BorderManager logs.
Cygwin provides a Win32 port of the GNU grep pattern matching command. What I have found is 95% of our web traffic is GET requests based on a user typing in a URL or clicking a link. When a user fills in a form/field and submits, it is a POST request. The user enters a URL into a webproxy and then POSTs the value to the server.
One caveat is the search engine. Users are constantly POSTing to these sites. The grep -v parameter specifies a pattern match to exclude.
grep POST logfile.log | grep -v google.com | grep -v yahoo.com | grep -v ask.com > newfile.log
The result is a log file of URLs that are POSTed to. I then look for URLs that appear to be random (e.g.,
/cgi-bin/1.php?=8392ksudowUJSD98wyh3sd87SJDHEused89usU2Je39slf). The pseudo-random string is an encoded URL to pass through the filter.
A third pattern that I grep for is proxy itself. Be careful when you view this report because legitimate websites often use proxy in the URL. For example, ESPN uses proxy.espn.com extensively, but it is not an anonymizer.
I have also subscribed to the mailing list at www.peacefire.org. When a new proxy site is added using their software, they email the URL to the mailing list.
When running in transparent proxy mode, I configure BorderManager to listen on ports 80 and
443. This is the most practical way, without causing issues with non-HTTP traffic on a monitored port.
The better suggestion is to use the proxy settings of Windows and the browser. We allow BorderManager to talk through ports 80, 443, and 1024-65536. This will allow you to log non-standard port traffic while still allowing the traffic to pass. We do not allow our workstations to access the Internet via NAT; all is proxied (HTTP, FTP, RSTP) and therefore logged.
Disclaimer: As with everything else at NetIQ Cool Solutions, this content is definitely not supported by NetIQ, so Customer Support will not be able to help you if it has any adverse effect on your environment. It just worked for at least one person, and perhaps it will be useful for you too. Be sure to test in a non-production environment.