Diary and notebook of whatever tech problems are irritating me at the moment.

20101124

Two more utilites for DansGuardian Users

I reduced the DansGuardian user account and blacklist maintenance hassles with my previous two utilities but while working on whitelisting I found the need for a few more.

In DansGuardian (DG) terms a blacklist bans something, a greylist allows something (overrides blacklisting) but still filters it, and an exceptionlist allows something without filtering (overriding greylists and blacklists). The "something" can be URLs, IP addresses, server names, etc., depending upon the specific list type. Blacklisting a site is easy but blacklisting a specific type of content is very difficult and error-prone. It works the same way as anti-malware utility definitions - if the undesirable items are on the list, and they match a particular requested target, then it's blocked. If not, then it gets through. It's a big Internet out there and trying to block all the bad is rather difficult. "Bad" is also relative and what is bad for one person/religious group/company/government may not be bad for another. Whitelisting has the opposite problems in that you gain strict control over what is available but trying to predict where the user wants to go, determining if that is a safe destination, and maintaining the lists is also difficult.

I found that I needed to use both blacklisting and whitelisting. I use whitelisting with younger children and blacklisting for older. Older children won't put up with strict constraints and will either figure out how to bypass them or simply go somewhere else to browse the Internet. Younger children are easier to keep happy but you still have to spend time figuring out all the web sites they will want access to, preferably with the initial configuration so they're not whining every five minutes about another toy/game/whatever site they can't access.

With DG a "whitelist" configuration is basically a blacklisting of all sites with a "**" in the bannedsitelist file with entries in greylist and exceptionlist files to bypass it. The exceptionlist file entries will enable site access but this is not what you want for allowing a user to browse a particular site because it disables all filtering. Use greylist files instead. This way if there is an offensive part of a site that you didn't know about (or it gets defaced by black hats) then you still have the filters to rely on. The exception lists are useful for sites that are not normally browsed but may trigger the filters inadvertently such as Linux distro repositories using http.

One of the problems with whitelisting is that the user won't necessarily know where they can go on the Internet. To solve this problem you need an index page of some sort. This is the problem I encountered when creating my greylists and I came up with a solution.

I didn't want to maintain an index separately from the greylists so I figured out a way to embed the data in the lists. DG recognizes a # in the list as a comment. I added a comment at the end of each list entry with a Wiki-style link after it. This isn't all that unusual as Debian/Ubuntu did something similar with the menu.lst file in Grub. The comment hides data that isn't relevant to DG but the defined format allows extraction of the data to create an index. Soon after I started adding the links to the list entries I figured out two things - it was a lot of typing and was going to be a very big index. To organize the index better I added a category tag on the end which could be used in the index. The final format is:

<exception link> #[<URL><space><label>][Category:<category text>]

The brackets are required characters. The parsing is somewhat whitespace tolerant but in the Category tag don't leave any spaces between the colon and the category text (sed and regex expressions can be tedious). Example:

gutenberg.org # [http://www.gutenberg.org Project Gutenberg][Category:Books]

To save some typing I wrote add-exceptionlist-url-comments which creates a default URL comment. First it pads the end with tabs (up to 5) to keep it pretty. The default link is made by slapping an http protocol prefix on the exception entry. It then uses wget to try to fetch the default web page and scrape the page title to use as a default link label. This works for most pages and redirects but not those that are using a meta refresh. It finally adds an undefined category tag at the end. Anything in the list that starts with a # is ignored. Note that not every entry will need a link. Some sites you don't want may serve data to a site you want. A lot of USA government sites that are kid-specific will link to media on the main government sites which aren't of interest to kids and just clutter the index. Some web stores also use third-party search services which will need exceptions but not links. In many cases you'll want a link that points to a specific part of the site, not just the server root, so you'll have to edit the defaults.

To create the index page I wrote exceptions-index-page-generator. It looks for the bracket-formatted URLs in the input files. It also builds a list of category tags, assigning a default tag (defined in the script) to any that are missing. It then creates a basic html file with entries separated by category. If a category has more than a certain number of entries (default 5 as defined in the script) it makes two columns to reduce the page length. It doesn't try to normalize category names so they must match in the entries exactly in order to be combined. It ain't pretty but it works. These are both command-line utilities but are rather easy to use.

UPDATE: I updated exceptions-index-page-generator. Version 1.1 adds a category table of contents to the top of the page. It will also make two columns of these if the number of categories exceeds the column threshold.

You can use my greylists to test with and as a base for your own lists for younger children. I haven't performed in-depth checking of these but they look relatively safe. Some of the entries may seem odd but they're intended to aid holiday gift buying. You will also notice that I used html entity codes in the labels for some punctuation as they didn't display correctly in Firefox.

20101115

A pair of utilities for DansGuardian users

Content filtering is a requirement of the home desktop system configuration build I'm working on. Young children are part of the client base so it's mandatory. DansGuardian is basically the only free option available. It's a server daemon so it has command-line configuration only. Once it's running parents don't need to mess with the basic settings but they need to be able to set filtering controls for children without a lot of hassle. On Ubuntu it doesn't come with any blacklists but third-party lists are available. Shalla Secure Services has one of the most comprehensive list that's free for home use but installing and updating it is also a hassle. I wrote a pair of scripts to solve both of these problems.

There are a few options for DansGuardian GUI. Some firewalls like SmoothWall have plug-ins for it. Two popular stand-alone ones are DansGuardian-GUI from Ubuntu CE and WebStrict from Saliby. Unfortunately they both rely on Tinyproxy which has a bug with DansGuardian that prevents many pages from loading. They also drag in FireHOL which I don't need.

Since remote administration is a requirement for my desktop configuration I installed Webmin. A plug-in is available, DansGuardian Webmin Module, which allows easier control than straight command-line methods including a semi-automatic configuration for multiple filter groups. There's one bug with the latter that I had to fix first and the default DansGuardian binary location in the module's configuration was incorrect for Ubuntu (it's at /usr/sbin/dansguardian) but that's all.

When working with multiple filter groups the goal is to have DansGuardian automatically apply the correct filter based on the user account. Correlating user port activity to filter groups is tricky. Since my targeted desktop systems are stand-alone and won't have multiple simultaneous users I chose the Ident method using Ident2. I tried Bisqwit's identd (bidentd) but the version on Ubuntu 10.04 (Lucid Lynx) has a nasty looping bug that is triggered by local queries. Getting this to work only requires activating the ident authplugin and creating the filter groups.

While the module makes configuration easier for the admin, it's still not that friendly for a parent. The filter groups make it easy to set user restrictions based on group membership but DansGuardian filter groups are completely separate from system groups. They can only be changed from the command line or with the Webmin module. I wanted parents to be able to use the standard desktop user administration tool, users-admin (System > Administration > Users and Groups) to assign users to special DansGuardian groups that could then be converted to filter group memberships. There once was a patch for DansGuardian that integrated the two but it's not included upstream. So I came up with a system group naming scheme and wrote dg-filter-group-updater, a GUI tool that automatically creates the filter group list (/etc/dansguardian/lists/filtergroupslist by default) from the system group membership. Installing it is easy. Just copy the script to "/usr/local/sbin" with root ownership and 755 (rwxr-xr-x) permissions. Download this desktop file and put it in "/usr/local/share/applications" with root ownership and 644 (rw-r--r--) permissions which will cause a menu entry to appear in the System > Administration menu. This is for Gnome as it uses gksudo to get root access by you can convert it for KDE by changing the "gksudo" to "kdesudo" or "kdesu" then changing the "Categories" entry for KDE (look at other KDE desktop menu files in /usr/share/applications). For this script to be useful you have to set up the required system groups first and assign users.

DansGuardian references group filters by an index number. The first group is "filter1" which corresponds to the configuration file "dansguardianf1.conf" and is the default. Typically in a multi-group configuration this filter is set to disable Internet access with a "groupmode = 0" setting. By "Internet" I mean "http" as DansGuardian can't really help with "https" (TLS/SSL) or much else. The rest you have to block with firewall rules or a filtered DNS like OpenDNS. The module's multiple group tool is the one named "Set Up Lists&Configs For Multiple Filter Groups" on its main page. Before using it, backup the "/etc/dansguardian" directory as this option only works once and then locks itself out. Restoring the directory is the only way to revert. When you use this tool you will have a few options to chose from. The scheme is up to you (I used separate). I recommend selecting "Use of Default Group" and "To Set Aside Unrestricted Group". I used four groups:

#1 "No_Web_Access" default (filter1, groupmode = 0)
#2 "restricted" (filter2, whitelisted with groupmode = 1 in its conf file and ** in its bannedsitelist file)
#3 "filtered" (filter3, filtered with groupmode = 1 and nautynesslimit = 100)
#4 "unlimited" (filter4, groupmode = 2)

The idea here is that unassigned accounts are automatically blocked by filter1, young children are sandboxed with filter2, older children are filtered with filter3, and adults unrestricted through filter4. Since the restrictions are more about maturity than age the groups don't have names that refer to the latter.

The dg-filter-group-updater script requires system group names to have a specific format of "dansguardian-f#..." where # is the corresponding filter number. Anything after the digits are ignored so you can create more descriptive group names that a non-technical user can recognize in the users-admin tool when assigning members. These groups should be created as system groups (GID < 1000). I created my groups with addgroup:

addgroup --system dansguardian-f2-restricted
addgroup --system dansguardian-f3-filtered
addgroup --system dansguardian-f4-unlimited

Obviously you need to have a "sudo" before these or get a root terminal with "sudo su" first. Since filter1 is the default you won't be assigning users to it and don't need a matching system group. Next you just need to assign users to each group. If you assign the same user to more than one, DansGuardian will use the lower numbered filter in the resulting filter group list. Afterwards just launch the script via the menu item "DansGuardian filter group updater" and enter your admin password. First it will read through the dansguardian.conf file. The file location is set by the "dg_conf" variable in the script and is the only hard-coded value you need to worry about. From the conf file it locates the filter group list file and the number of filter groups. It then starts a new filter list group file (overwriting any existing one). Next it reads through /etc/groups and looks for the "dansguardian-f#..." groups, extracts the users for each, and adds them to the filter group list file in "username = filter#" format. It then restarts DansGuardian. So all a parent needs to do is assign users to groups with users-admin and then launch the script from the menu item to apply the changes.

The script is based on the same code I used for webcam-server-dialog so it will work with any dialogging program installed. Other than that it only uses basic text manipulation tools including grep, sed, and cut. If it doesn't start then launch it from a terminal window or do a "tail ~/.xsession-errors" to see any messages it put out (including those from DansGuardian when it restarts). Most error messages are displayed in a dialog box.

While dg-filter-group-updater solves the basic user administration problem, the lists for filtering (filter3 in my example) still need to be configured. The Ubuntu package only includes basic advertisement-blocking blacklists. Adding third-party blacklists is complicated as you have to merge them in with "Include" statements in the main lists. The lists are organized by categories so you can pick and choose what to filter. Annoying but you only have to do it once if you're using simple filter groups like mine. The problem with blacklists is that they have to be updated often. Shalla Secure Services has some update scripts but they didn't impress me much or did what I wanted. My policy with third-party anything (clipart, CAD libraries, templates) is to keep them separate as references and use other files for customization. To that end I wrote shalla-bl-update. It downloads the list and creates a MD5 file to track the installed version. When it is executed again it checks the MD5 published on the web site against the installed version and downloads the list again if it differs. It has some fault tolerance included as it will retry if the file fails to download or the downloaded file fails a MD5 check. The lists are located in "/etc/dansguardian/lists/shalla" by default. Just download the script from the link and put it in "/usr/local/sbin" with root ownership and 755 (rwxr-xr-x) permissions. It's designed to be started by cron. To have it run daily do "ln -s /usr/local/sbin/shalla-bl-update /etc/cron.daily/shalla-bl-update". It produces no output as cron will Email root whenever anything it runs does. It has a debug mode you can enable by editing the script if you want it to fill your mailbox. It will restart DansGuardian after a successful list update.

Update: I updated shalla-bl-update to v1.2 which adds an optional check for empty system groups. The idea here is that if there are specified system groups used by dg-filter-group-updater, and these groups use the Shalla lists, then these groups should have members. If they don't then there is no point trying to update the Shalla lists. You need to edit the script and set the system_groups variable to the names of the system groups used by dg-filter-group-updater. The grep expression it is using will find partial matches. You can specify "--force-update" to override the check with empty groups.

Update: I've released v1.3 of shalla-bl-update and the link has been updated. Changes: --force-udpate now sets debug=true and clears existing md5. Retries can now be aborted interactively in debug mode. Because of this, the script now uses bash because of the reqirement of the timeout capability of the "read" command. Added "test" parameter for use with Ubuntu Recovery Mode. DansGuardian is not restarted if RUNLEVEL=S (single mode, essentially Recovery Mode). Added --help parameter.

Note: There is a patch by Philip Allison that integrates DG with system groups but the lead developer, Andreas Büsching, has been too busy to integrate it or keep up with maintenance.

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.