Diary and notebook of whatever tech problems are irritating me at the moment.

20111125

Haphazard proxy support in Linux programs

Some of my clients require Internet content filtering on computers their kids are using. The solution to that is DansGuardian. While it has many problems there really isn't a better F/OSS alternative. Its development has been stagnant for years but recently a new maintainer joined the project so submitted patches are being applied to fix bugs and add features (like system group integration).

DansGuardian requires a proxy. The common options are TinyProxy and Squid. TinyProxy has a few annoying bugs so I use Squid with my clients. One challenge with content filtering is preventing the proxy from being bypassed. The two solutions are transparent interception or an explicit-proxy with dropping of connections that aren't destined for the proxy ports.

With a transparent proxy all outgoing connections are routed via iptables rules to DansGuardian regardless of the client settings. While this simplifies deployment by eliminating client configuration it also prevents using different content filtering levels on a per-user basis as it masks the source port of the connection. Without the source port the associated user can't be identified. Since the systems I maintain have a variety of users within the same household and thus different filtering requirements, this doesn't meet their needs.

The alternative method is to use iptables rules that drop connections that aren't destined for the DansGuardian. Here are the nat rules that I use:

*nat
:PREROUTING ACCEPT
:POSTROUTING ACCEPT
:OUTPUT ACCEPT
-A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 80 -j REDIRECT --to-ports 8090
-A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 443 -j REDIRECT --to-ports 8090
-A OUTPUT ! -o lo -p tcp -m owner ! --uid-owner proxy -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -m tcp --dport 21 -j REDIRECT --to-ports 8090
-A OUTPUT -p tcp -m tcp --dport 3128 -m owner ! --uid-owner dansguardian -m owner ! --uid-owner root -m owner ! --uid-owner clamav -m owner ! --uid-owner administrator -j REDIRECT --to-ports 8080
COMMIT

Fairly simple but note that I'm not dropping the packets. Any TCP connection that is destined for ports 80 (HTTP), 443 (HTTPS), and FTP (21) are rerouted to port 8090. Some accounts are excluded to prevent false-positive blocking by DansGuardian.

DansGuardian is using port 8080 (and connects to Squid on 3128). So what is 8090? Its an Apache server. One of the problems with programs that aren't configured to use the proxy is that the users won't know why their connections are failing. The web site, known as a network billboard, displays a page that informs them that their programs need to be configured to use the proxy and how to do it. This is much friendlier than just dropping the packets. DansGuardian uses ident2 to identify the user that is the source of the connection and applies the filtering rules specific to the filter group they are assigned to.

This configuration works very well with web browsers. Most use the system proxy settings through gconf on Gnome. Some need manual configuration so I created default configuration files and put them in /etc/skel so that new user accounts have them at creation. Unfortunately, many other programs rely on environment variables to determine the proxy address and Ubuntu's proxy configuration tool (gnome-network-properties) has a really stupid bug and they aren't set correctly. Some are set in bash in terminal windows but not in the session so any graphical program that doesn't use gconf fails to access the proxy correctly. It's easy to demonstrate. Open a terminal window and enter:

tail -f ~/.xsession-errors

Then create a custom application launcher in the panel and enter "printenv" for the command. Then just click it and check the output from tail. On my system, variables for "HTTP_PROXY" and the like aren't present. I created a fix for this. Just extract the file and add it to the end of ~/.profile and relogin. Run the tail/printenv commands again with a proxy set in System>Preferences>Network Proxy. Add this fix to /etc/skel/.profile to use it as the default for new user accounts.

Even with this fix it is surprising is how many Internet-using programs don't support proxies correctly. I tested every streaming media player I could find and a few other programs and here are the results with my systems (Ubuntu 10.04 Lucid Lynx i386 and amd64):

Clementine (0.7.1): Neither Last.fm and SomaFM work. Jamendo lists songs but doesn't play them but this is due to Ogg problems at Jamendo. Unlike other players Clementine's plug-in for Jamendo is not configurable for MP3 so I couldn't work around it. Mangatune and Icecast work.

Rhythmbox (0.13.1): Jamendo failed to work. Magnatune was really slow to load.

Miro (4.0.3-82931155): Could find video podcasts but not download them (except VODO which uses BitTorrent). Its integrated web browser would always show the network bulletin for any other link in the side panel.

Banshee (2.0.1): Internet Archive links work. Live365.com and xiph.org show results but nothing plays (I can copy the xiph links to VLC and they play). Miro Guide works (unlike Miro) but likes to freeze. Amazon MP3 Store, Jamendo, Magnatune (both extensions), RealRadios.com, and SHOUTcast.com extensions fail to load. Last.fm would log in but not much else. I noticed that according to ~/.xsession-errors Banshee is an exceptional media player.

Gnome MPlayer (0.9.9.2): Nothing fancy but it functioned with the streams I tried.

VLC (1.0.6): About the same as Gnome MPlayer. A lot of complaints about some playlists like radio.wazee when it encounters unavailable entries. Needs a less ugly way to handle error messages with playlists of Internet streams since they are usually just alternate servers.

Google Earth (6.1): It would connect to the DB and you could navigate the worlds but none of the Panoramio pictures would show. Wikipedia entries wouldn't show after being enabled until the app was restarted. Even then, clicking on "Full Article" resulted in the network bulletin page being shown (webkit?). Changing the preferences to use an external browser is an adequate workaround.

Totem (2.30.2): Functioned but was picky about some streams (radio.wazee).

gPodder (2.2): Useless.

Hulu beta functions but is mostly relying on Flash.

Skype beta (2.2.0.35): Connected to their network without problems and I successfully called their sound testing service.

Sun Java Plug-in (1.6.0_26 in Firefox 3.6.24): Useless with a proxy. Even without a proxy you have to work around IPv6 bugs (Debian bug #618725). With that working the online test usually fails and I've found that Pogo.com Boggle Bash is a better test. Manually setting the proxy with jcontrol doesn't have any effect. Debian is dropping the plug-in so it may not matter.

FrostWire (5.1.5): Useless with a proxy. It uses Java so not surprising. It has its own proxy settings but it couldn't connect to anything even with manual settings.

Update - Added a few more tests:

Desura (110.22): Could login and see items I had ordered (free demos) but could not download them for installation or show any web pages. Some of the links on the menu bar opened in Firefox but showed the network bulletin. Apparently it was resolving the links (maybe querying their servers) to localhost:8090 and then sending that to the default browser even though Firefox could access the Internet through the proxy without problems.

Konqueror (4.4.5): No problems (KHTML).

Epiphany (2.30.2): No problems (webkit).

X-Moto (0.5.9): No problems. Can use environment variables, manually-specified proxy, or SOCKS proxy.

DraftSight (Beta V1R1.3): Couldn't connect to the registration server initially. The browser in the Home panel showed the network bulletin. Setting the proxy manually in "Tools>Options>System Options>General>Proxy server settings" and restarting allowed the registration to function but not the Home panel browser. I found that reapplying the proxy settings (without changing anything) then right-clicking the Home panel and reloading it fixed the problem for that session but it would reoccur if DraftSight was restarted.

Clarification: My proxy configuration doesn't use authentication or SOCKS. My bug work-around script supports the environment variables for authentication but I didn't test it.

Update 20111202: I removed Sun Java because of the security problems and switched to OpenJDK/IcedTea6 (1.9.10) but it didn't do any better. I did try FrostWire again with a manually specified proxy but it had no effect. I did come across an interesting Java library for proxy detection named proxy-vole but it won't solve my immediate problem.

Update 20111204: Corrected the DansGuardian/Squid port usages mentioned in the article and added a forgotten DansGuardian anti-bypass iptables rule. They now match my test environment.

I think part of the problem is that the developers test against a proxy and if the program works then its assumed to be proxy-compatible. That can be misleading, especially when multiple components are involved, as some may use the proxy while others access the network directly (Miro being a prime example). Adding some iptables rules to drop anything bypassing the proxy would close that testing hole.

20111123

Documentation standards for commands

Here are some references for shell script developers, man page creators, README writers, etc. While documentation styles are a bit haphazard and vary with OS and programming language, there are some standards.

For man pages see man-pages(7). What does that mean? You open a terminal window then type:

man 7 man-pages

The GNU project has some guidelines on writing software manuals. They recommend using Texinfo to create them.

The Debian Policy Manual says where the different documentation files should be located but not what they should look like.

The most detailed standard I've found is the Open Group Base Specifications utility conventions and typographical conventions.

I'm not going to admit to following these but please post any other IT technical writing style guides you know of. :D

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.