Diary and notebook of whatever tech problems are irritating me at the moment.

20101224

A slightly less open Ubuntu recovery mode

Ubuntu recovery mode is a basic boot configuration for repairing a broken system. In this mode it skips most configuration files and daemons in order to achieve a functioning root prompt. For the security-conscious administrator this itself is a problem.

There have been complaints about unchallenged root access in recovery mode. Ubuntu uses sudo for root access and the root account is disabled via a "*" password. If you forget the passwords of the admins (any user account in the admin group) then this makes it possible to easily reset it.

Originally, recovery mode went straight to a root prompt which wasn't useful to non-technical types. With the addition of Friendly Recovery, a menu is displayed with a list of repair options. The menu is just a Whiptail selection dialog driven by the "/usr/share/recovery-mode/recovery-menu" script which queries other scripts in the "./options" subdirectory. The sub-scripts provide simple repair options like failsafeX, apt-get clean, and update-grub. These are useful to non-technical types for attempting simple repairs to problems. They won't fix complicated problems like gdm crash loops but may save the administrator an on-site visit or two. The root and netroot scripts provide root shell access which is where security becomes a concern, not just because of black hats, but also fools blindly using repair commands like ":(){ :|:& };:" and "rm -rf /".

There are several options for limiting root access.

1. Set a grub password that prevents running recovery mode or editing menu entries. This means the administrator has to make any repairs. If the network is failing then that means on-site.

2. Set a root password with "sudo passwd". The password will then be required to access the shell from the Friendly Recovery screen but this also allows direct root logins during normal operation (although you might not care about that).

3. Disable the shell options in Friendly Recovery. These commands remove the options from the menu and prevents them from reappearing if the friendly-recovery package is updated. This allows users to run the automated commands but makes it more difficult for the administrator to get root access in recovery mode. You'll need to use sudo before these or start a root shell with "sudo su" first.

mkdir /usr/share/recovery-mode/disabled
dpkg-divert --divert /usr/share/recovery-mode/disabled/root \
--rename /usr/share/recovery-mode/options/root
dpkg-divert --divert /usr/share/recovery-mode/disabled/netroot \
--rename /usr/share/recovery-mode/options/netroot

4. Set a root password only in recovery mode. To do this I wrote rootlock.conf. This is a job configuration for Upstart that is added to the "/etc/init" directory (with root:root ownership and -rw-r--r-- permissions). It is triggered by runlevel changes. Within is a script that when the runlevel is "S" (single) mode, which indicates recovery mode, it copies the password from the first admin group member to the root account in /etc/shadow. In runlevels 2-5, it changes the root password back to "*". This allows root logins from the Friendly Recovery menu if the password of the first admin is entered. In normal operations direct root login is disabled. This makes a lost admin password more difficult to fix but for a capable administrator that is only a minor annoyance.

Don't use it if you have set a root password previously because you want a normal root login available. It will be disabled by this job.

I've tested this on Ubuntu 10.04 (Lucid Lynx) extensively and it seems robust but I'm awaiting feedback on the ubuntu-devel mailing list. Check back for updates.

Disabling unchallenged root logins in recovery mode will not keep a knowledgeable hacker out. This is only possible if you use full-disk encryption like LUKS/dm-crypt for which only the administrator has the key. This will prevent the user from booting with a LiveCD and editing shadow directly but will require the administrator to start the system every time it is powered on or rebooted.

20101216

quote-count: A debugging tool for shell scripts

I've been doing a lot of shell scripting lately with Dash and Bash. Complicated scripts with lots of text handling make debugging difficult, especially when they are being used in sub-shells which obfuscate line numbers in error messages. One of my more common mistakes is an unmatched quote. These can be rather difficult to find so I wrote quote-count, a simple analysis tool that counts quotes in lines.

It just accepts a single filename as a parameter and counts single, double, and back quotes on each line and prints their totals. It prints out a warning if the any of the counts is odd-numbered which may indicate a mismatched quote. It also warns if the line is a comment so you easily ignore those. It isn't brilliant as it doesn't handle escaped newlines, in-line comments, escaped quotes or quotes encapsulated within other quotes. It could be enhanced to handle these cases but it's already saved me a lot of debugging time as is. The output from running it on itself looks like this:

1 Q:0 DQ:0 BQ:0 # COMMENT
2 Q:0 DQ:0 BQ:0 # COMMENT
3 Q:0 DQ:0 BQ:0 # COMMENT
4 Q:0 DQ:0 BQ:0 # COMMENT
5 Q:0 DQ:0 BQ:0 # COMMENT
6 Q:0 DQ:0 BQ:0 # COMMENT
7 Q:0 DQ:0 BQ:0 # COMMENT
8 Q:0 DQ:0 BQ:0
9 Q:0 DQ:8 BQ:0
10 Q:0 DQ:2 BQ:0
11 Q:0 DQ:2 BQ:0
12 Q:0 DQ:2 BQ:0
13 Q:0 DQ:2 BQ:0
14 Q:0 DQ:2 BQ:0
15 Q:0 DQ:0 BQ:0
16 Q:0 DQ:0 BQ:0
17 Q:0 DQ:0 BQ:0
18 Q:0 DQ:0 BQ:0
19 Q:0 DQ:0 BQ:0
20 Q:0 DQ:0 BQ:0
21 Q:0 DQ:2 BQ:0
22 Q:4 DQ:5 BQ:1 # ODD
23 Q:0 DQ:2 BQ:0
24 Q:5 DQ:4 BQ:1 # ODD
25 Q:0 DQ:2 BQ:0
26 Q:5 DQ:3 BQ:2 # ODD
27 Q:0 DQ:2 BQ:0
28 Q:0 DQ:4 BQ:0
29 Q:0 DQ:2 BQ:0
30 Q:0 DQ:2 BQ:0
31 Q:0 DQ:0 BQ:0
32 Q:0 DQ:0 BQ:0
33 Q:0 DQ:0 BQ:0

I've tested it with both Dash and Bash on Ubuntu 9.10 and Mandriva 2010.1 so it should work with most systems.

Another typo I occasionally encounter is escaped whitespace at the end of a line. The intent always is to escape a newline but sometimes in my editing I end up with a space or tab after the backslash. These can easily be found with grep:

grep -E -r -n '\\[[:space:]]+$'<filename>

I wanted to add this check to quote-count v1.0 but found that the "while read" loop removes everything after the trailing backslash. Richard Bos sent me a modified version that included the check as a pre-processor utilizing a simple grep trick. I added it in although it used an array which Dash doesn't support.

UPDATE: v1.2 released and link updated. I found some bugs in v1.1 with the TEW check. I also cleaned up the report output a bit.

Reading through the quote-count report for my larger scripts was tedious so I wrote quote-count-query which compares the original source file with the quote-count report and shows the affected lines with two preceding and following lines for context.

20101124

Two more utilites for DansGuardian Users

I reduced the DansGuardian user account and blacklist maintenance hassles with my previous two utilities but while working on whitelisting I found the need for a few more.

In DansGuardian (DG) terms a blacklist bans something, a greylist allows something (overrides blacklisting) but still filters it, and an exceptionlist allows something without filtering (overriding greylists and blacklists). The "something" can be URLs, IP addresses, server names, etc., depending upon the specific list type. Blacklisting a site is easy but blacklisting a specific type of content is very difficult and error-prone. It works the same way as anti-malware utility definitions - if the undesirable items are on the list, and they match a particular requested target, then it's blocked. If not, then it gets through. It's a big Internet out there and trying to block all the bad is rather difficult. "Bad" is also relative and what is bad for one person/religious group/company/government may not be bad for another. Whitelisting has the opposite problems in that you gain strict control over what is available but trying to predict where the user wants to go, determining if that is a safe destination, and maintaining the lists is also difficult.

I found that I needed to use both blacklisting and whitelisting. I use whitelisting with younger children and blacklisting for older. Older children won't put up with strict constraints and will either figure out how to bypass them or simply go somewhere else to browse the Internet. Younger children are easier to keep happy but you still have to spend time figuring out all the web sites they will want access to, preferably with the initial configuration so they're not whining every five minutes about another toy/game/whatever site they can't access.

With DG a "whitelist" configuration is basically a blacklisting of all sites with a "**" in the bannedsitelist file with entries in greylist and exceptionlist files to bypass it. The exceptionlist file entries will enable site access but this is not what you want for allowing a user to browse a particular site because it disables all filtering. Use greylist files instead. This way if there is an offensive part of a site that you didn't know about (or it gets defaced by black hats) then you still have the filters to rely on. The exception lists are useful for sites that are not normally browsed but may trigger the filters inadvertently such as Linux distro repositories using http.

One of the problems with whitelisting is that the user won't necessarily know where they can go on the Internet. To solve this problem you need an index page of some sort. This is the problem I encountered when creating my greylists and I came up with a solution.

I didn't want to maintain an index separately from the greylists so I figured out a way to embed the data in the lists. DG recognizes a # in the list as a comment. I added a comment at the end of each list entry with a Wiki-style link after it. This isn't all that unusual as Debian/Ubuntu did something similar with the menu.lst file in Grub. The comment hides data that isn't relevant to DG but the defined format allows extraction of the data to create an index. Soon after I started adding the links to the list entries I figured out two things - it was a lot of typing and was going to be a very big index. To organize the index better I added a category tag on the end which could be used in the index. The final format is:

<exception link> #[<URL><space><label>][Category:<category text>]

The brackets are required characters. The parsing is somewhat whitespace tolerant but in the Category tag don't leave any spaces between the colon and the category text (sed and regex expressions can be tedious). Example:

gutenberg.org # [http://www.gutenberg.org Project Gutenberg][Category:Books]

To save some typing I wrote add-exceptionlist-url-comments which creates a default URL comment. First it pads the end with tabs (up to 5) to keep it pretty. The default link is made by slapping an http protocol prefix on the exception entry. It then uses wget to try to fetch the default web page and scrape the page title to use as a default link label. This works for most pages and redirects but not those that are using a meta refresh. It finally adds an undefined category tag at the end. Anything in the list that starts with a # is ignored. Note that not every entry will need a link. Some sites you don't want may serve data to a site you want. A lot of USA government sites that are kid-specific will link to media on the main government sites which aren't of interest to kids and just clutter the index. Some web stores also use third-party search services which will need exceptions but not links. In many cases you'll want a link that points to a specific part of the site, not just the server root, so you'll have to edit the defaults.

To create the index page I wrote exceptions-index-page-generator. It looks for the bracket-formatted URLs in the input files. It also builds a list of category tags, assigning a default tag (defined in the script) to any that are missing. It then creates a basic html file with entries separated by category. If a category has more than a certain number of entries (default 5 as defined in the script) it makes two columns to reduce the page length. It doesn't try to normalize category names so they must match in the entries exactly in order to be combined. It ain't pretty but it works. These are both command-line utilities but are rather easy to use.

UPDATE: I updated exceptions-index-page-generator. Version 1.1 adds a category table of contents to the top of the page. It will also make two columns of these if the number of categories exceeds the column threshold.

You can use my greylists to test with and as a base for your own lists for younger children. I haven't performed in-depth checking of these but they look relatively safe. Some of the entries may seem odd but they're intended to aid holiday gift buying. You will also notice that I used html entity codes in the labels for some punctuation as they didn't display correctly in Firefox.

20101115

A pair of utilities for DansGuardian users

Content filtering is a requirement of the home desktop system configuration build I'm working on. Young children are part of the client base so it's mandatory. DansGuardian is basically the only free option available. It's a server daemon so it has command-line configuration only. Once it's running parents don't need to mess with the basic settings but they need to be able to set filtering controls for children without a lot of hassle. On Ubuntu it doesn't come with any blacklists but third-party lists are available. Shalla Secure Services has one of the most comprehensive list that's free for home use but installing and updating it is also a hassle. I wrote a pair of scripts to solve both of these problems.

There are a few options for DansGuardian GUI. Some firewalls like SmoothWall have plug-ins for it. Two popular stand-alone ones are DansGuardian-GUI from Ubuntu CE and WebStrict from Saliby. Unfortunately they both rely on Tinyproxy which has a bug with DansGuardian that prevents many pages from loading. They also drag in FireHOL which I don't need.

Since remote administration is a requirement for my desktop configuration I installed Webmin. A plug-in is available, DansGuardian Webmin Module, which allows easier control than straight command-line methods including a semi-automatic configuration for multiple filter groups. There's one bug with the latter that I had to fix first and the default DansGuardian binary location in the module's configuration was incorrect for Ubuntu (it's at /usr/sbin/dansguardian) but that's all.

When working with multiple filter groups the goal is to have DansGuardian automatically apply the correct filter based on the user account. Correlating user port activity to filter groups is tricky. Since my targeted desktop systems are stand-alone and won't have multiple simultaneous users I chose the Ident method using Ident2. I tried Bisqwit's identd (bidentd) but the version on Ubuntu 10.04 (Lucid Lynx) has a nasty looping bug that is triggered by local queries. Getting this to work only requires activating the ident authplugin and creating the filter groups.

While the module makes configuration easier for the admin, it's still not that friendly for a parent. The filter groups make it easy to set user restrictions based on group membership but DansGuardian filter groups are completely separate from system groups. They can only be changed from the command line or with the Webmin module. I wanted parents to be able to use the standard desktop user administration tool, users-admin (System > Administration > Users and Groups) to assign users to special DansGuardian groups that could then be converted to filter group memberships. There once was a patch for DansGuardian that integrated the two but it's not included upstream. So I came up with a system group naming scheme and wrote dg-filter-group-updater, a GUI tool that automatically creates the filter group list (/etc/dansguardian/lists/filtergroupslist by default) from the system group membership. Installing it is easy. Just copy the script to "/usr/local/sbin" with root ownership and 755 (rwxr-xr-x) permissions. Download this desktop file and put it in "/usr/local/share/applications" with root ownership and 644 (rw-r--r--) permissions which will cause a menu entry to appear in the System > Administration menu. This is for Gnome as it uses gksudo to get root access by you can convert it for KDE by changing the "gksudo" to "kdesudo" or "kdesu" then changing the "Categories" entry for KDE (look at other KDE desktop menu files in /usr/share/applications). For this script to be useful you have to set up the required system groups first and assign users.

DansGuardian references group filters by an index number. The first group is "filter1" which corresponds to the configuration file "dansguardianf1.conf" and is the default. Typically in a multi-group configuration this filter is set to disable Internet access with a "groupmode = 0" setting. By "Internet" I mean "http" as DansGuardian can't really help with "https" (TLS/SSL) or much else. The rest you have to block with firewall rules or a filtered DNS like OpenDNS. The module's multiple group tool is the one named "Set Up Lists&Configs For Multiple Filter Groups" on its main page. Before using it, backup the "/etc/dansguardian" directory as this option only works once and then locks itself out. Restoring the directory is the only way to revert. When you use this tool you will have a few options to chose from. The scheme is up to you (I used separate). I recommend selecting "Use of Default Group" and "To Set Aside Unrestricted Group". I used four groups:

#1 "No_Web_Access" default (filter1, groupmode = 0)
#2 "restricted" (filter2, whitelisted with groupmode = 1 in its conf file and ** in its bannedsitelist file)
#3 "filtered" (filter3, filtered with groupmode = 1 and nautynesslimit = 100)
#4 "unlimited" (filter4, groupmode = 2)

The idea here is that unassigned accounts are automatically blocked by filter1, young children are sandboxed with filter2, older children are filtered with filter3, and adults unrestricted through filter4. Since the restrictions are more about maturity than age the groups don't have names that refer to the latter.

The dg-filter-group-updater script requires system group names to have a specific format of "dansguardian-f#..." where # is the corresponding filter number. Anything after the digits are ignored so you can create more descriptive group names that a non-technical user can recognize in the users-admin tool when assigning members. These groups should be created as system groups (GID < 1000). I created my groups with addgroup:

addgroup --system dansguardian-f2-restricted
addgroup --system dansguardian-f3-filtered
addgroup --system dansguardian-f4-unlimited

Obviously you need to have a "sudo" before these or get a root terminal with "sudo su" first. Since filter1 is the default you won't be assigning users to it and don't need a matching system group. Next you just need to assign users to each group. If you assign the same user to more than one, DansGuardian will use the lower numbered filter in the resulting filter group list. Afterwards just launch the script via the menu item "DansGuardian filter group updater" and enter your admin password. First it will read through the dansguardian.conf file. The file location is set by the "dg_conf" variable in the script and is the only hard-coded value you need to worry about. From the conf file it locates the filter group list file and the number of filter groups. It then starts a new filter list group file (overwriting any existing one). Next it reads through /etc/groups and looks for the "dansguardian-f#..." groups, extracts the users for each, and adds them to the filter group list file in "username = filter#" format. It then restarts DansGuardian. So all a parent needs to do is assign users to groups with users-admin and then launch the script from the menu item to apply the changes.

The script is based on the same code I used for webcam-server-dialog so it will work with any dialogging program installed. Other than that it only uses basic text manipulation tools including grep, sed, and cut. If it doesn't start then launch it from a terminal window or do a "tail ~/.xsession-errors" to see any messages it put out (including those from DansGuardian when it restarts). Most error messages are displayed in a dialog box.

While dg-filter-group-updater solves the basic user administration problem, the lists for filtering (filter3 in my example) still need to be configured. The Ubuntu package only includes basic advertisement-blocking blacklists. Adding third-party blacklists is complicated as you have to merge them in with "Include" statements in the main lists. The lists are organized by categories so you can pick and choose what to filter. Annoying but you only have to do it once if you're using simple filter groups like mine. The problem with blacklists is that they have to be updated often. Shalla Secure Services has some update scripts but they didn't impress me much or did what I wanted. My policy with third-party anything (clipart, CAD libraries, templates) is to keep them separate as references and use other files for customization. To that end I wrote shalla-bl-update. It downloads the list and creates a MD5 file to track the installed version. When it is executed again it checks the MD5 published on the web site against the installed version and downloads the list again if it differs. It has some fault tolerance included as it will retry if the file fails to download or the downloaded file fails a MD5 check. The lists are located in "/etc/dansguardian/lists/shalla" by default. Just download the script from the link and put it in "/usr/local/sbin" with root ownership and 755 (rwxr-xr-x) permissions. It's designed to be started by cron. To have it run daily do "ln -s /usr/local/sbin/shalla-bl-update /etc/cron.daily/shalla-bl-update". It produces no output as cron will Email root whenever anything it runs does. It has a debug mode you can enable by editing the script if you want it to fill your mailbox. It will restart DansGuardian after a successful list update.

Update: I updated shalla-bl-update to v1.2 which adds an optional check for empty system groups. The idea here is that if there are specified system groups used by dg-filter-group-updater, and these groups use the Shalla lists, then these groups should have members. If they don't then there is no point trying to update the Shalla lists. You need to edit the script and set the system_groups variable to the names of the system groups used by dg-filter-group-updater. The grep expression it is using will find partial matches. You can specify "--force-update" to override the check with empty groups.

Update: I've released v1.3 of shalla-bl-update and the link has been updated. Changes: --force-udpate now sets debug=true and clears existing md5. Retries can now be aborted interactively in debug mode. Because of this, the script now uses bash because of the reqirement of the timeout capability of the "read" command. Added "test" parameter for use with Ubuntu Recovery Mode. DansGuardian is not restarted if RUNLEVEL=S (single mode, essentially Recovery Mode). Added --help parameter.

Note: There is a patch by Philip Allison that integrates DG with system groups but the lead developer, Andreas Büsching, has been too busy to integrate it or keep up with maintenance.

20101019

UFW application profiles

Uncomplicated Firewall (ufw) is a front-end to iptables. One of its features are "application profiles" which are INI-style files that contain profile names and ufw settings. This allows packages to include their own firewall settings and make them available to ufw when installed.

Using profiles is relatively easy. To see what profiles are on your system, go to a terminal and enter "ufw app list" to see the names. The profiles are located in the directory "/etc/ufw/applications.d" and the names referenced are the "[section names]" in the files. Note that ufw also references the services list in "/etc/services" for rules. If a section name conflicts with an entry in the services file then the latter takes priority (and ufw warns you every time you use it).

There doesn't seem to be any documentation on the file format and the example files mentioned in the docs don't exist on my Karmic or Lucid systems but the existing files for OpenSSH server and Apache are good examples to determine it from:

[section name] (The identifier that ufw references)
title= (shown in "ufw status")
description= (doesn't seem to be used anywhere)
ports= (the port list)

This is the profile for OpenSSH server:

[OpenSSH]
title=Secure shell server, an rshd replacement
description=OpenSSH is a free implementation of the Secure Shell protocol.
ports=22/tcp

Multiple protocols are specified as "80/udp|80/tcp" with a vertical bar separating them. If just "80" was specified then both udp and tcp are assumed. The port can be a comma delimited list (80,443) or a range with a colon (81:82) or combined (80,443,81:82/udp|8080/tpc). If a range is specified then a separate entry for each protocol is required (81:82/udp|81:82/tcp).

I've been working on a Ubuntu 10.04 deployment configuration for my clients and one of my requirements is a user-friendly firewall for mobile users. While ufw is a command-line application GUIs do exist. Gufw is rather basic and doesn't support application profiles. My clients don't know much about network protocols but they can pick an application by name out from a list. It does list some applications but they seem to be hard-coded. Another GUI is ufw-frontends (ufw-gtk) which does support them. My only complaint with it is that when a profile is used there isn't any way to see what ports it affects - all you see is the profile name. In many cases the title and description are more informative than the profile/section name so I hope the tool shows them in future revisions.

With my deployment configuration selecting the firewall GUI was the easy problem. The hard one was the profiles themselves. Application profiles are easy to make but few packages include them. Many of my clients are gamers and most of the best games have online multi-player capability. This isn't just a Linux problem as all of them want to play games on Wine also. Most of these games are client/server and they need ports unblocked when hosting a private server. The profiles are easy to write but finding out which ports need to be forwarded can be very frustrating. Many gamer-oriented web sites provide aggregated ports lists but most of these are unverified and usually specify way more ports and protocols than are necessary. Developer sites either don't list them or list them without specifying the protocols (TCP/UDP or both) or traffic direction. With home users generally you only care about incoming connections to the server - not outgoing. Since most of the ports used are unofficial and not controlled by IANA many games have port collisions with other games, often because they are based on the same engine (like those from id Software and Epic Games) or the use the same API (DirectX/DirectPlay and GameSpy Arcade). It's very rare that you find an list as accurate or concise as that of Novalogic. Several open-source applications only document their ports the old-fashioned way - in the source code. With some I had to install them and use "netstat -nap" to figure out what was used (which sometimes conflicted with the documentation). Another complication is that several games, like Quake 3, require a different port to be opened for every simultaneous client.

I couldn't really avoid the task so I spent several days writing profiles. You can download them all from here. These are intended to be used as incoming exceptions to a "deny all" rule. Just extract and copy them to "/etc/ufw/applications.d" with root ownership and rw-r-r (644) permissions. Start ufw-frontends and click the "Add Rule" button. In the Destination/Port section select the Application radio button and choose the profile from the list. For applications like Quake 3 that have many possible port configurations I created a few different ones which should cover most situations. Unfortunately the ambiguous profile names in some files are going to be confusing. On a few I tried to make them more readable but fixing ufw-frontends so that it shows the title would be a better solution. Unavoidably there are several duplicates and overlaps with other applications which shouldn't harm anything unless the conflicting servers are both operating at the same time. Many servers can be configured with alternate ports but my profiles only specify the common defaults.

Both ufw and ufw-frontends have limitations that I hope will be addressed in the future. Support for port triggers, dynamic configuration based on the network connection, or just warning when profile port ranges overlap would be helpful. If you add all my profiles to ufw the first thing you will notice with ufw-frontends is that it doesn't handle large numbers of profiles well. To help address the problem I've added a new parameter to the profile file format that hopefully ufw and ufw-frontends can utilize in the future. This is easy to do because INI files don't have much of a standard and ufw ignores everything other than the original ones. The parameter I added is "categories" for classifying profiles. This will allow users of ufw and related GUIs to quickly filter large profile lists. I put in a wishlist bug report about it for ufw. I didn't want to bother creating my own standard from scratch so I used the freedesktop.org menu spec categories since they're already used for organizing desktop menus. I had to break the standard a bit by mixing main categories, usually "Network" with "Game", but this shouldn't be a problem.

The second parameter I added was "reference". This was due to the ridiculous amount of research I had to go through in finding port numbers for each application. Multiple "reference" parameters can exist for each profile, each listing a one-line item. The references indicate the basis for the profile configuration, like "netstat -nap|grep python", to indicate how the port was determined. Mostly these are web site references with a link specified in [URL label] wiki format.

Obviously there are many more servers and daemons to add but generic ones like DirectX, GameSpy, and GGZ Gaming Zone cover many. This brings up a possible optimization - a "meta" or "prerequisite" parameter. Because a lot of games share the same ports due to underlying common code, it would be simpler to define a profile that simply links to other profiles to specify ports. This way a profile could be specified for every individual program but not add a lot of duplicate rules to keep track of.

I only encountered one problem with ufw's profile implementation. It happened when I created a profile for 0verkill. Apparently ufw doesn't allow section names to begin with a digit but I can't imagine why this limitation would exist. Besides that and the way ufw-frontends handles huge profile lists this firewall configuration works well. I don't know if all of the profiles are correct as I didn't have time to test everything. Some may open more ports than a game or application requires and some may not open enough. Feedback is welcome.

Note: The NFS profile (nfs-kernel-server) requires static port mapping. The references in the file will lead you to articles on how to configure NFS this way but I changed the common 4000:4003 ports to 4194:4197 as these aren't in Wikipedia's list or used by anything I could find with Google. There may be a Netfilter module that handles the NFS random port usage better as one exists for saned (nf_conntrack_sane) but I'm unaware of one.

Update: I updated the profiles package to v1.1 which includes a bunch more Linux games and some corrections. NFS profiles have been split into three different files representing the common address ranges. Using static ports for NFS is kind of a hack and I reported bug 688446 about possible solutions.

Update: I released v1.2 of the profiles which made some changes to http due to bug #694894 (which is mostly the fault of Debian and IANA). I also added another parameter, "modules", which specifies the connection tracking module (nf_conntrack) for some protocols. Currently in ufw-gtk some of these can be enabled under "Edit > Preferences > IPT Modules". Having a separate dialog for them doesn't make a lot of sense as they are specific to a particular protocol and you usually need them enabled. The modules act similarly to "port triggering" but are more intelligent as they understand the handshakes of their respective protocols and can identify which additional ports have been negotiated between server and client. I also found out that with NFS v4.1 that the dynamic port problems are being eliminated.

Update: I released v1.3 of the profiles. This is a minor release that only adds Skype, toribash, and webcam-server. The download link has been updated.

20101007

A script for auto-configuring saned network connections

Host-connected image scanners can be shared through saned (part of sane-utils in Ubuntu). It can be run continuously as a daemon or on-demand through Inetd. Basic configuration for either mode is simple and generic but adding the network address to the saned.conf file in CIDR notation is not. When you are setting up systems for multiple clients on different networks and IP ranges, this is a bit of a nuisance. To automate this I wrote saned-subnet-conf which will automatically add an entry for whatever network the host connects to through Network Manager or the ifupdown utilities directly.

Whenever a network connection is made or broken, scripts can be triggered. These scripts need to be located (or linked to) in "/etc/network" in specific subdirectories, the choice of which determines when they execute. Variables are passed to them that can be used for changing behavior based on the network interface, address assignment mode used (DHCP, static, ppp, etc.), and other values. See the interfaces man page for some hints. Network Manager executes these scripts with "/etc/NetworkManager/dispatcher.d/01ifupdown" which uses the run-parts utility. Network Manager does not trigger the "pre" directories due to a design decision.

To install the script, first download and extract the script, then put it in "/etc/network/if-up.d". You'll need to use sudo or have a root terminal for the copying (and most of the rest of the commands). Make it owned by root:root with rwxr-xr-x (0755) permissions. Whenever a network interface is brought up by ifup or Network Manager the script will execute. It uses scanimage to look for scanners and if any are found it will then use the ip command to get a CIDR version of the network address and produce an entry for saned.conf if one doesn't already exist. The last part is important as the script will add an entry for every network the host connects to. If you want to block a particular network, let the script add it to saned.conf and then comment the entry out with a # as the script won't add it again if it finds it anywhere in the file. Make sure you restart saned anytime you edit saned.conf (see below). If you want to keep the script from adding entries in relation to a particular network interface you'll have to edit the script and have it exit based on the IFACE variable. Look at the "$METHOD = loopback" entry for a rough idea. If you enable the VERBOSITY=1 entry the script will generate a log file in /tmp that includes all the variables. Currently the script only supports IPv4 addresses as my network doesn't use IPv6 so I can't test it.

Setting up saned is rather easy. During installation you have the option of running it as a daemon. To enable this later use "dpkg-reconfigure sane-utils" and indicate "Yes" to the standalone server option, or just edit the "/etc/default/saned" file and set "RUN=yes". The server daemon will start automatically at boot but you can start (or stop, restart) it manually with "invoke-rc.d saned start" or "/etc/init.d/saned start". To see any messages from saned use "tail /var/log/daemon.log".

To have saned start automatically when a client connects, indicate "No" to the standalone server option or set "RUN=no" in the default config file. Then add (as per the man page) the required entry to "/etc/inetd.conf" if it doesn't already exist. You can use a text editor but a safer way is with the update-inetd utility with "update-inetd --add "sane-port stream tcp nowait saned.saned /usr/sbin/saned saned". If you watch the log (tail -f -n 20 /var/log/daemon.log) you will see saned start and stop automatically whenever a client connects. Don't run a daemon and have an Inetd configuration active at the same time as they will conflict over network port access (6566 by default). To disable the Inetd entry use the command "update-inetd --disable sane-port".

To configure clients to use the server just add the server IP address or host/domain name to "/etc/sane.d/net.conf" and start whatever scanning program you want to use. You can get a list of available scanners with scanimage -L but note that neither saned or scanimage supports scanners connected via a parallel port.

On Ubuntu 10.04 (Lucid Lynx) and some earlier versions access to scanner devices isn't handled correctly for anyone other than standard users (UID=1000+) on the host. As a workaround you can use my Scanner Access Enabler to correct the permissions until reboot. In the future, scanner network access may be handled by Avahi but it doesn't work with Karmic or Lucid due to another bug.

Update: Forgot to mention that scanimage is used to look for scanners first before adding a saned.conf entry.

20101003

Scanner Access Enabler

There is a problem with scanner device permissions on Ubuntu. Regular users (UID>999) can access libsane applications like Xsane and Simple Scan without problems. Linux Scanner Server, which is running in Apache as www-data, can't access them without a chmod o+rw on each scanner device. Nobody seems to know how the permissions work so this has to be fixed manually in a terminal. This is not n00b friendly so I created a GUI application that automatically changes the permissions of every scanner device.
The application relies on scanimage and sane-find-scanner utilities to identify scanner device ports then simply does a chmod against all of them. It supports USB, SCSI, and optionally parallel port (-p parameter) scanners and has been tested against the same ones I used for my LSS patch. It uses the same universal dialog code as webcam-server-dialog so it should work with almost any desktop environment.
To install first download the archive and extract the contents. Move the script to "/usr/local/bin/scanner-access-enabler" and set it for root:root ownership with rwxr-xr-x (0755) permissions. Copy the destop menu entry to the /usr/local/share/applications directory with root:root ownership and rw-r--r-- (0644) permissions. You may have to edit the desktop file as it uses gksudo by default. On KDE you may want to change the Exec entry to use kdesudo instead. If you specify the -p option on the Exec line you may have to quote everything after gk/kdesudo. If you don't have one of the GUI dialoger utilities installed and plan on using dialog or whiptail then you need to set "Terminal=true" else you won't see anything.
On Ubuntu the menu item will be found under System > Administration. If you want users to be able to activate scanners without a password and admin group membership, you can add an exception to the end of "/etc/sudoers" file. Simply run "sudo visudo" and enter the following:
# Allow any user to fix SCSI scanner port device permissions
ALL ALL=NOPASSWD: /usr/local/bin/scanner-access-enabler *
While you can use any editor as root to change the file, visudo checks for syntax errors before saving as a mistake can disable sudo and prevent you from fixing it easily. If you mess it up, you can reboot and use Ubuntu recovery mode or a LiveCD to fix it.
Update: I released v1.1 which adds filtering for "net:" devices from saned connections. This didn't affect the permission changes but made for a crowded dialog with both the raw and net devices shown.
Update: v1.2 adds a non-interactive/silent mode activated through a "-s" parameter.
Update 20120708: v1.3 cleaned up the code a bit but broke detection completely as I recently noticed.  I updated it to v1.4 which actually works now. :D

20101002

A patch for Linux Scanner Server v1.2 Beta1

I just spent several days testing, fixing bugs, and adding features to Linux Scanner Server v1.2 Beta1. LSS is an easy way to share a non-networkable scanner through a web server. While the interface doesn't allow cropping like Xsane or Simple Scan it does support multiple file outputs, printing, and OCR through Tesseract. Development has stalled with the beta and I encountered some bugs when testing it on Ubuntu 10.04 (Lucid Lynx). Instead of complaining about it, I fixed them.

Bugs fixed/features added:
Noise in Apache logs caused by unquoted variables and non-critical stderr outputs from ls and rm.
Adding scanners would fail if the scanner name included a forward slash.
Multiple scanner support broken due to a lack of newlines between entries in the scanner.conf file.
No support for scanners connected via parallel-ports.

I wanted to try to break the beta before deploying it to my clients and I did - as soon as I connected a second scanner. I decided to fix the bug even though none of my clients have more than one attached to any given system. I just happen to have a bunch of them on hand and, thanks to a local tech recycling center, I added a few more. Sane supports scanners connected to parallel ports but LSS doesn't so I decided to fix that, well, just because. Yes - I went out and paid money for more scanners including an obsolete parallel port Mustek model just to fix LSS.

The deciding factor in doing this was that LSS is based on a shell script and a lot of sed scripts. Shells scripts are about the only programming language I know to any depth (and Applesoft BASIC). Some of the regex/sed stuff still throws me but I had help from some of my LUG mates. These are the scanners I tested with (and tested simultaneously):

AGFA SnapScan 1212U (snapscan:libusb:002:003)
Brother Industries MFC-440CN (brother2:bus4;dev1)
Hewlett-Packard ScanJet ADF (C7190A, identified as 5200C) (hp:libusb:004:002)
Hewlett-Packard ScanJet 4470c (rts8891:libusb:004:003)
Hewlett-Packard ScanJet 6100c (C6260A but identified as C2520A) (hp:/dev/sg5)
Microtek ScanMaker E3 (microtek:/dev/sg3)
Mustek 600 III EP Plus (/dev/parport0)
UMAX Vista-S8 (umax:/dev/sg4)

Fixing the multiple scanner support was a pain. LSS relies on scanimage for all scanner functions. Getting scanimage to provide a newline at the end of the device list was trivial but the message printing function for the web page doesn't tolerate them and they all have to be converted to HTML breaks. Forward slashes in the model names from scanimage also required escaping but not anywhere else (like in the device paths). This got into sed loops which are really hard to do.

Adding support for scanners on parallel ports was also difficult. They have to be defined manually in the sane config files (/etc/sane.d/*.conf) but scanimage doesn't report them regardless. The sane-find-scanner utility does find them and will indicate what brand is on which port but no additional details like the model name. Since sane can use auto-probing to find which parallel port the scanner is on there is no deterministic way to use the information from sane-find-scanner and the sane conf files to indicate a specific model. The only solution I could come up with is to manually specify parallel port scanners in a separate "scan/config/manual_scanners.conf" file and then merge it after the rest are detected. The format is the same as for scanners.conf but the value for ID needs to be specified as %i (same as the device entry in the format line for scanimage). The modified LSS index.cgi script will replace it with an auto-incremented value when merging. The NAME= value doesn't matter but forward slashes have to be escaped with backslashes and anything longer than 30 characters will be truncated in the pull-down list on the Scan Image page.

Setting up a parallel port scanner is a bit confusing. The Mustek model I used was configured in /etc/sane.d/mustek_pp.conf simply by uncommenting the line "scanner Mustek-600-IIIEP * ccd300". The second parameter is the name. The third is the port with an * indicating autoprobing which in my case became /dev/parport0. The last is the actual driver. With scanimage the device is not specified by the port but rather the backend driver and then the name. With the settings I used it became "mustek_pp:Mustek-600-IIIEP" (also specified for the "DEVICE=" value in manual_scanners.conf). If only the backend is specified scanimage will default to whichever is enabled in the conf file. I only have the one parallel port scanner (the ScanJet 4470c has USB and parallel but there's no driver for the latter) so I don't know how it handles multiple ones configured in the same file/backend.

There are still bugs in LSS. The most obvious one is a fault with the "Print_Message" function. There are several page updates that don't occur, mostly the "Please wait" ones that are supposed to display during scanner detection and image scanning. I don't know enough about the interaction between javascript and the browser to identify if it is a bug in the code or an architectural problem with the page design.

Another bug is with the scanner names. As you can see from my list above, some of the scanners are not named correctly. It may be that the model reported is the base one that the actual model is compatible with and sane just isn't more specific than that. LSS just uses whatever scanimage reports. This isn't a major problem as most systems will only have one scanner.

A third problem is with the scanner driver options that LSS specifies - basically none. Some scanner/driver combinations require specific options to be specified else the scanner has problems. The only one I encountered with the models tested was that the default resolution of 200 was unacceptable to one of them so it was downgraded to 150. These errors show up in the Apache logs (/var/log/apache2/error.log) but only refer to index.cgi and not a specific point within the file. I'm not sure how this bug could be fixed. Parsing the options out of the sane conf files may work but different versions of the same base model may require different settings.

Future enhancements that would be nice are cropping and internationalization support but that's more than I'm going to take on. My LUG mates also suggested using anything other than shell scripts.

To use the patch, first download and extract LSS into your web server data directory (/var/www/scan). Then download the patch archive, extract it into the "/var/www" directory and apply with

patch -p1 --directory /var/www/scan --input=/var/www/scan_1.2_Beta4.patch

Just reload any browser window that has the old version loaded to make the new one active. Restarting the server is not necessary.

LSS is GPL 2 but it's not clear in the package as the author didn't follow the recommended method for applying the terms.

Note - there's a nagging problem with Ubuntu in that LSS can't access any scanners due to device permissions. It runs as user www-data but the old scanner group no longer exists so devices need chmod o+rw applied manually. For regular users (UID >1000) it seems to happen automatically but nobody seems to know how that works. I wrote Scanner Access Enabler to solve the problem.

Update: If you also have saned configured for scanner sharing then duplicates may be detected from both the raw devices and saned shared versions. If you don't want the ones from saned then comment out "localhost" in the "/etc/sane.d/net.conf" file and restart saned.

Update: I and pqwoerituytrueiwoq have been making more improvements to the beta. You can follow along and download updated files from this thread at the Ubuntu forums.

20110207 Update: I and pqwoerituytrueiwoq made a bunch of fixes and I've released 1.2 Beta 4 of the patch. The links and instructions above have been updated. It is a recursive patch so it will affect several files. You still need to add the favicon for it to "/var/www/scan/inc/images". I performed a feasibility study of adding proper preview, settings selection, and cropping. I found that the difficulty of adding them to the existing code base is extreme, even though some are needed to get LSS functioning correctly. For example, the Brother MFC-44CN doesn't scan because the modes that LSS uses (like "Color") are hard-coded in the html and don't match up with what the Brother driver offers. Because of these problems (and my lack of time) I've ended my involvement with the project. For my needs Beta 4 functions adequately. I also found another scanner project, phpSANE, that seems to have a better code base on php although it has many limitations otherwise.

20100923

webcam-server-dialog: A basic front-end to webcam-server

I'm working on a new Ubuntu configuration to deploy for friends and family. One of the capabilities I wanted to add was simple remote webcam viewing. I saw an article about webcam_server. While it's old and only does images it met my requirements. The primary limitation is that it's a command-line application. It can be launched from a XDG desktop file and will default to /dev/video0 but on systems with more than one video device a terminal is needed. I did spend some time testing streaming with VLC but it doesn't show a device list for selection either and it's streaming configuration dialogs are confusing at best. To deploy webcam-server I had to make it more friendly which meant making a video device selection dialog for it. My current programming hammer is Bash shell scripting (Applesoft BASIC was the other option) but it's not enough for GUI design. To add that capability I turned to what I call "dialoger" utilities that can produce GUI dialogs and provide feedback to command-line applications.

When I started this project the only dialoger I knew about was dialog which is text-based and I needed something that would run in X. From researching alternatives to dialog I found xmessage which led to gxmessage and eventually Zenity. Plenty to choose from and all different. Now I had another problem - which one to use with each desktop environment? I normally use Gnome but some of my clients use XFCE or LXDE. There is also the possibility of a KDE user in the future. While Ubuntu includes Zenity by default, which would work with XFCE also, a GTK application isn't the best choice on KDE. I could use kdialog but either I had to make custom versions of the script for each environment, select the dialoger with a script parameter, or try to select it dynamically. In the great tradition of overdesign I chose the latter.

After spending several days solving a 5-line problem with 300+ I ended up with webcam-server-dialog. It supports dialog, whiptail, Xdialog, xmessage, gxmessage, kdialog, and Zentiy. Because it's rather generic it can be expanded to support more without a lot of effort. It looks for processes that indicate a particular desktop environment or window manager, uses a built-in priority list of dialogers for that environment, checks for availability, then uses the best available to provide the GUI. The core of this script is nothing more than "ls /dev/video*" dumped into an array but the focus here was eye-candy. Getting this to work with all of them was difficult as they all have different command-line parameters, behaviors, and bugs, even between those that are supposed to be clones of each other. Some use exit return status for indicating button presses, some stdout, some both depending on the mode. The fun ones were dialog and whiptail which use stdout for drawing the screen and stderr for indicating list choices. There are a lot of comments and commented-out debug lines if you want to use this with your own projects (GPL v3). It could be useful as a front-end to other programs that lack selection lists like xawtv.

To use it, install webcam-server then put the script in /usr/local/bin with root ownership and rwxr-xr-x permissions. I also made an XDG destop menu item for it. Put webcam-server-dialog.desktop in /usr/local/share/applications. For an icon I used the one from camorama and just copied /usr/share/pixmaps/camorama.png to /usr/local/share/pixmaps/webcam-server-dialog.png (make the directory if it doesn't exist). You can add parameters to the Exec line in the desktop file but the run dialogs always reference port 8888 (I'm too tired of working on this to make the text dynamic) and quotes don't pass through well so the caption format had to be hard-coded in the script.

In addition to direct web-page access a Java applet is included. It works well since it can automatically reload the image at a selectable interval. It has one really bizarre limitation - it will only connect from localhost. Apparently the developers decided to limit it that way instead of implementing remote authentication but you probably could use it from within an ssh connection. This security feature has a bug in it's implementation when resolving the hostname so it requires the IP (127.0.0.1). I created a custom PHP web page that works around this. Just rename it to index.php and put it in the client directory on your web server (/var/www/client by default) along with applet.jar from the /usr/share/doc/webcam-server directory and link to it from your default home page. You'll need PHP support installed (php5 metapackage).

I also tested it in Mandriva 2010.0 which worked but with one problem - the webcam-server executable is named webcam_server so just replace all instances of one with the other in the script. You'll also need v4l-info which is in the xawtv-common package. If you are using the firewall you have to add "8888/tcp" to the allowed ports for remote access.

20100903

Your Linux system keeps falling and it can't get up

Once in a while a Linux PC technician will encounter a system that has problems with lockups (a.k.a. hanging or freezing). Sometimes it is failing hardware but other times it's a software problem. Here are the common causes for this and how to identify which is the source of your problems. While I predominantly use Ubuntu (and some Mandriva) these tests are valid for most any distribution.

1. A kernel crash or panic is rare but generally fatal. As any PC tech knows, the first test is seeing if the Caps Lock, Num Lock, or Scroll Lock LEDs change state when the corresponding keys are pressed as this is performed by the PC, not the keyboard controller. If they don't change then you know you've got a freeze problem. If the keyboard lights are flashing repeatedly during the freeze then it's a panic. More severe problems can prevent the kernel from failing gracefully enough to even do that. These failures can be caused by fundamental compatibility problems with a kernel module and a critical piece of hardware (like a BIOS with a broken ACPI implementation), a device that was given commands it didn't like and has stopped responding (like a video GPU) or just failing hardware (bad RAM, overheating CPU, and loose PCI or AGP cards).

With hardware problems it is best to start by opening the case and blowing the dust out since that is the source of most overheating problems. After cleaning a few systems you'll understand why you should charge extra for smokers, pet owners, and homes with shag carpet. I find it easiest to use an air compressor with a tank and a blow-off nozzle to do the job as canned air is too weak. Leave the system plugged in (but powered off) to keep it grounded as air streams can produce static electricity which can damage electronics. With modern systems the only hazardous voltages are in the power supply so if you don't insert any metal objects into that you won't get shocked. Try spinning all fans with your finger or a plastic probe to see if they turn freely without resistance. Replace any that drag with anything other than a sleeve-bearing fan (like ball or fluid bearings). Power supply fans can be replaced but it requires soldering or swapping connectors as there isn't a standard connection for their internal fans. While it's fun to use the air nozzle to spin up the fans to 100K+ RPM it's bad for the bearings. They also get cleaner when you hold the blades stationary (I use the end of a big nylon tie). Start with the power supply and then the CPU and work your way around to the front case vents. Keep wires tied up and away from the fans so they don't jam the blades or block airflow. Check for cards and memory modules that are not fully seated in their slots and partially-connected drive cables. If possible, remove heat sinks from CPUs and other chips and check for adequate heat sink grease (should be an even but thin layer across the entire mating surface). Check that the chips are properly seated in the sockets and that the heatsink is pressing down evenly on them else they may tilt and lose contact. Check for failing capacitors. If you find any bad caps it's probably easier to replace the motherboard unless you have good soldering skills. Use your eyes and nose - if something looks or smells burnt then it probably is. Keep in mind that power supplies often have stronger "electrical" smell to them due to hand-soldering during manufacturing.

Laptops are harder to clean. Most can be opened by prying off the top bezel around the keyboard, usually starting with the section enclosing the display hinges. Some have screws and some just latch. Then the keyboard can be removed and the top mounting plate. There are how-to disassembly videos on the Internet for popular models that are often modded by hackers. Some laptops have externally removable heat sinks for easy cleaning. Just because it's easy doesn't mean that users clean them. I once recycled a high-end Sager laptop (about $5K USD) that had overheated and failed. The heatsink had plugged with lint and it kept shutting down so the parents gave it to their kids to play with. The kids laid it on their bed and had it running (bottom fans so no airflow whatsoever) and it overheated enough melt the case around the heatsink. Made me sick to throw it out but the motherboard wasn't practical to fix after that. Modern CPUs will reduce their clock speed when overheating but can't reduce power dissipation entirely and can still overheat and fail when running at minimum levels.

Intermittent failures are harder to diagnose so continuous monitoring with hardware or software tools is needed. Hardware temperature monitoring can be performed with a cooking probe, thermocouple meter, or a dedicated PC temperature monitor that mounts in a drive bay. The CPU, GPU, power supply fan exhaust, and hard drives are the ones to focus on. Temperature limits for devices vary. CPUs and GPUs can often hit 60°C but 50°C is rather hot for a hard drive.

For fan monitoring you can leave the case cover off and keep an eye on them or install a PC fan monitor/controller with a display which mounts in a drive bay (and often includes temperature monitoring probes). Thermostatically-controlled fans will vary a lot but if you are having an overheating problem due to inadequate speed from a non-faulty fan then it may be too far out of the primary air stream to have an adequate response time. Better models have adjustable thresholds or remote sensors but the best solution is one controlled by the motherboard via a 4-pin PWM fan connector. Be careful here - I burned out a CPU fan controller when I used a CPU fan that consumed more current (amperes) than the motherboard's controller could handle so check the specifications before plugging it in. I had to convert mine to a drive connector which meant it ran at maximum speed and sounded like a vacuum cleaner.

A voltmeter is useful for monitoring power supply voltages on the connectors under various loads. Generally supply voltages should be within 10% of the stated voltages on the power supply label. CPUs and GPUs usually have a local regulator on their boards as they need voltages that differ greatly from the normal 12/5/3.3 volts that most power supplies provide. The BIOS often has control over the CPU voltages and configuration so a bug in the BIOS (or incorrect manual settings) can cause erratic lock-ups by making the CPU unstable. Usually a CMOS reset or BIOS update can fix this. One way to test for an unstable or faulty CPU is to underclock it via manual settings in the BIOS (or jumper or switch settings on really old motherboards) and see if stability improves.

To verify what the CPU needs for power and clock rates you first need to identify exactly what one you have as manufacturers have many versions and steppings and their requirements may differ. To see what you have use the command "less /proc/cpuinfo". Use that information to search for exact specifications and compare it to your system. Pay close attention to power requirements as some motherboards, even with the same socket, can't handle some CPUs. This results in unstable CPU voltages and intermittent failures, especially under heavy loads. This problem tends to occur with long-lived socket designs where the CPU family is expanded to include models with higher power requirements (essentially changing the motherboard requirements) that earlier motherboard designs can't meet even though the CPU fits in their sockets. I've damaged a few boards that way. Heatsinks and fans also need to meet the requirements of the CPU. Mass-market PC systems usually have very little power margin between the shipped CPU requirements and the system cooling capabilities so failing to upgrade them both can result in instability. Many of these cheap systems use a ducted case fan for cooling and just replacing a failed one requires tracking down the specifications for the fan and finding a replacement that matches in airflow (CFM) and features (connector type and thermostatic control). Standard CPU fan/heatsink combos usually can't be used as they don't fit in the case or the motherboard lacks mounting holes for them.

Most modern motherboards have built-in sensors as do CPUs, GPUs, and storage devices. These can be queried by software for status information, monitoring, and logging. Some BIOSes report the sensor values and error conditions and advanced servers often have separate hardware modules for remote monitoring of them. The standards that the sensor systems conform to are imprecise so custom drivers and algorithms are needed by external software for each implementation. Software tools include lm-sensors and smartmontools.

The lm-sensors utilities report what thermal/fan/voltage sensors you have on your motherboard (if available and supported) and their current status. You first run "sensors-detect" to identify what kernel modules are needed and have it add them to /etc/modules and reboot (or just load them with modprobe). Then just run "sensors" to get the current status or use a graphical application like the Gnome Sensors Applet, KSensors or the XFCE4 Sensors panel plug-in. Note that wildly extreme readings may not indicate a fault but rather an unused sensor input or an unsupported implementation.

Most modern hard drives and SSDs have a monitoring and diagnostic system called SMART which can be access with smartmontools. While SMART can tell you about problems, it is not good at predicting failures. You use the smartctl program and specify the storage device to query. For most systems the primary storage device is named "sda" by the kernel so the command would be "smartctl -a /dev/sda | less". Most modern drives report temperature, log errors, and have built-in self tests that smartctl can activate. While the underlying registers on the drives are well-defined, what they represent is not so conversion data is needed by smartctl to interpret the values. It will tell you if it recognizes the model or not. The obvious status to check is the "overall-health self-assessment test" which tells you if any of the register values exceed an alarm threshold. More specifically the parameters of type "Pre-fail" are important. Also note the "worst" temperature value as it could indicate a prior significant overheating incident which is most likely to occur under heavy load (like during a backup or a RAID rebuild). Graphical tools include GSmartControl and Palimpsest disk utility in DeviceKit (a.k.a. gnome-disk-utility) but root access may be needed by them. Another is hddtemp which only reads the temperature but has a daemon that can be monitored through the sensor monitoring tools mentioned above.

RAM can be tested with Memtest86+ which is installed in Ubuntu by default. Reboot and hold the left Shift key down before Grub loads and starts booting. You'll get the Grub menu with Memtest86+ listed. You can also download a bootable ISO or USB image from the Memtest86+ site to test with. In the early days of PCs the memory had parity checking but modern RAM doesn't so the only way to identify a failure is by using a RAM test. If you are worried about memory problems then get ECC memory. This costs only a little more than standard RAM but the motherboard has to support it and it can reduce performance and limit overclocking. With ECC memory the BIOS can provide much more memory diagnostic information and testing. For example, wiping unused memory locations is a standard process that is performed at a user-definable interval to see if any bits changed state by themselves. Servers often use ECC memory but usually these are registered ECC memory modules which are sometimes called "server memory". The "registered" aspect isn't a certification - it's a signal amplifier built into the module for use in systems that have more modules than the motherboard's northbridge can communicate with directly. These are not compatible with standard memory or motherboards that use it. RAM memory modules have an SPD device that indicates it's specifications. To read it (and other BIOS information) use the command "dmidecode | less". Another source of intermittent memory problems is faulty configuration by the BIOS, either manually by the user or a faulty automatic configuration. A CMOS reset or BIOS update can often fix this.

Diagnosing a freezing system is difficult since you can't check log messages easily with a frozen system and the logs are often truncated as a result of it. The kernel (and Grub) have built-in remote communication options which can help with this. These out-of-band remote connections can be made through a serial console or with Netconsole and another system. A serial console can be used like an SSH connection but requires a hardware RS-232 serial port which is rare on modern systems. On Ubuntu 10.04 (Lucid Lynx) there is a Memtest86+ serial console configuration already in the menu that can be used to test memory remotely but it's probably more useful for headless (i.e. no display) servers. Netconsole requires a network connection (it uses UDP) and another system running a syslog server. For kernel crashes the Linux Kernel Crash Dump tools can be used to obtain crash data that is useful for diagnostics or reporting kernel bugs but I haven't used it yet.

Check the kernel messages and logs with "dmesg | less", "less /var/log/kern.log", "less /var/log/syslog". There are many different log files including compressed backups of previous logs. Some require you to be root to access them. With Ubuntu you just add "sudo" before the commands or just get a root login with "sudo su". Midnight Commander's internal editor is helpful for reading logs including the compressed ones. The built-in editor is not the default in Ubuntu - you have to enable it within MC with F9 > Options > Alt-I > Alt-S (use Esc 9 instead of F9 when connecting through a serial console).

Most distros have boot options that can be issued through the boot loader to the kernel to change its behavior or deactivate specific functions. Ubuntu and Debian have many but every distro has it's own. These can help to isolate problems or provide long-term stability when added permanently to the boot loader options.

2. An input error resulting in the loss of keyboard/mouse control acts like a freeze but isn't. The first clue is to see if there is any screen activity at all (most desktops at least have a clock applet running). Hardware causes include a faulty peripheral, USB hub, PS/2 port, or KVM switch. With PS/2 ports a failure with one device usually prevents the other from working. A simple test is to plug in a USB mouse or keyboard and see if they work during the freeze. When a kernel bug is responsible the keyboard works in the BIOS setup and Grub menu but fails during boot (I've had problems with a bug related to an Intel i8042 PS/2 controller). These can be intermittent between boots but once it's working during a session it usually stays working. It can also be a bug in X.org if they work in a tty terminal but not in X (as when booting into Ubuntu's recovery mode). I've encountered a freezing problem that affects only the mouse. It often occurs when an OpenGL game crashes. Besides restarting X with the keyboard (knowing the menu hotkeys helps here), I've found that launching the game again and then exiting usually fixes the problem.

Check your X session logs during the freeze by switching to a tty or connecting remotely through a SSH or serial console connection. Login, then do "less /home/<username>/.xsession-errors" and see if there are any crash messages from running applications. Most desktop applications will log messages there if they don't have their own log. If you have no control at all, reboot but don't log in to a graphical session (at the display manager login screen) as the session log will be overwritten as soon as you do. Don't just hit the reset or power button when rebooting - try the Magic Sysrq keys first or connect remotely and issue a "reboot" or "init 6" command.

An example of another traumatic but non-system freeze is when Nautilus hangs as this makes it difficult to do anything with the Gnome desktop until killed (it usually restarts automatically just like Windows Explorer). A Nautilus error would show up in .xsession-errors while a crash would also show up in the kernel logs. If the session log is rather big, making it hard to isolate messages related to a particular application, you can open a terminal window and try running the suspect application from there as any error messages would show up in that window instead. You can also capture the messages to a file by copying the screen or using shell I/O redirection to a file which is helpful when submitting bug reports.

X.org input driver errors will show up in it's log at /var/log/X.#.log where the # represents the instance that was running. Normally it's X.0.log unless you have multiple sessions running like multiple X logins or a non-Xinerama dual-head configuration. A different session ID could also be used if X crashes back to the display manger login screen and it thinks another session is still running (due to a leftover lock file) when you login again.

3. Outright X.org crash. When it involves screen corruption it's obvious but that symptom isn't always present. Sometimes this happens when switching to or from a tty terminal or when an OpenGL application is running full-screen. Sometimes it happens with the display manger at the login screen. If the keyboard lights don't toggle then try switching to a tty. If that doesn't work then try killing X with left-Alt+SysReq+K (or Ctrl+Alt+Backspace if enabled). If that doesn't do anything either then try a remote connection (or just pinging it). If that also fails then you are facing a kernel crash (which can be caused by a misbehaving video device due to integration between X, the drivers, and the kernel). If you do get remote access then save and review the logs including that of the display manger (/var/log/gdm/:0-greeter.log). These crashes are usually caused by video driver problems. In Ubuntu 8.10 through 10.04 (and several other distros) almost any Intel 8xx series graphics device will cause problems. There is a lot of architectural changes occurring with video which involves the kernel, X.org, and DRI and there has been a lot of breakage. Some drivers are not keeping up with the changes and some latent driver bugs are being discovered. The older Intel devices are currently the worst (of the "supported" devices) even though Intel is the one of the companies that is pushing these changes and has engineers working on it. But not all video crashes are the fault of the driver. Some may be kernel bugs with the motherboard chipset that the video GPU is triggering. This was a common problem with AGP ports and video device manufacturers like Nvidia wrote their own AGP modules for specific chipsets.

To save time, instead of analyzing the logs for something that indicates a driver problem, search the Internet for distro bugs relating to the one you have. Identify your graphics device with "lspci | less" or "lshw | less" and then search with Google for the device part number and the distro like "Ubuntu 10.04" or "Lucid". Check the release notes for known problems and possible workarounds. With Ubuntu there are usually workarounds in the Ubuntu help wiki.

4. CPU overload. Some process is hogging the CPU and slowing everything down to a crawl. More common with single cores but can still happen with multicores due to memory bottlenecks. If you can't get a graphical process management tool like Gnome System Monitor to load then switch to a terminal and use the "top" command to see who the culprit is (probably Flash but X.org driver faults can cause overloads without a outright freeze or crash). Identify it by process ID and kill it using top's built-in kill option (press k). You can also list processes with "ps -A" then use "kill -s <signal> <process number>". If there are multiple instances of the same process then use "killall -s <signal> <process name>". The signal is 15 (terminate) by default which means "ask nicely". If that doesn't work then use 9 (kill) which isn't as friendly.

5. I/O overload. Something is hogging the hard drive/SSD which can slow everything down to the point of being non-responsive. You can usually identify this by the hard drive activity LED being lit continuously. You can narrow down the list of processes responsible with the lsof command with "lsof | less" but you'll find the output can be overwhelming. If you know which file it is then you can identify the process responsible with fuser. Interactions between Firefox's database and the EXT filesystem can cause I/O overload intermittently but it's not as often with newer versions. A lack of storage space can cause it if applications that are trying to write to the disk don't handle failed writes well, especially logs and temporary files. They may hang and start hogging the CPU also. To check available storage space use "df -Th".

6. Memory hogging. Some process is eating memory and increasing in size. If the RAM is used up then the swap partition is used which can manifest itself as #5 also. Eventually the system runs out of memory and the kernel starts killing processes to fix it. Identifying the culprit is essentially the same as for CPU hogs. Use the command "free" to check memory and swap usage.


This is only the start of the diagnosis. Once you identify the source of the problem then you can try to find a workaround, file bug reports, and test patches. This all seems rather complicated but after you've fixed a few dozen systems you eventually recognize specific symptoms and behavior patterns right away and can quickly narrow down the problem. What differentiates real technicians and hackers from the amateurs is the stubborn resolve to find the problem.

20100515

Duplicating subsets of package selections between systems

I'm building a pair of Ubuntu systems for kids. For a variety of reasons, including lack of time and hardware problems, this has taken far longer than expected and I ended up with one running 9.10 (Karmic Koala) and the other 10.04 (Lucid Lynx). Since the Karmic system has the most testing effort into it (reviewing games for stability and kid appropriateness) I needed a way to duplicate the selection of games on the Lucid system while filtering out everything else since some packages are release-specific.

Using Synaptic it is possible to export (File > Save Markings) either a list of packages that have changed selection state but the changes haven't been applied yet or optionally a list of all packages and their status. While Synaptic can filter displayed packages by repository or type (the "section" parameter in the deb INFO file), these have no effect on the markings export.

The apt-get command (or dpkg directly) can be used to create a full list of packages but to produce a filtered list you need to use aptitude.

Aptitude has a search function that can be used to filter based on almost anything in a deb package INFO file. For example, to get a list of games available and save it to a game_selections.txt file:

aptitude search '?section(games)' >game_selections.txt

The question marks in the search pattern indicate that a search term follows. Because the parenthesis are special characters to the shell they need to have quotes around them. In this example "section" indicates the section parameter and "games" is term that is being searched for. Any package with a section parameter set to "games" will be listed:

...
i chromium-bsu - fast paced, arcade-style, scrolling space
i A chromium-bsu-data - data pack for the Chromium B.S.U. game
p chromium-data - transitional dummy package for chromium-bs
p circuslinux - The clowns are trying to pop balloons to score points!
p circuslinux-data - data files for circuslinux
...

The leading character indicates it's state with "i" for installed, "i A" for automatically installed (either recommended or a dependency), and "p" for purged (i.e. no trace of existence on system which is the default state of all packages). To filter for installed packages only you add the "?install" parameter:

aptitude search '?section(games) ?installed' >game_selections.txt

...
i chromium-bsu - fast paced, arcade-style, scrolling space shooter
i A chromium-bsu-data - data pack for the Chromium B.S.U. game
i glchess - Chess strategy game
i glines - Five or More puzzle game
i gnect - Four in a Row strategy game
i gnibbles - Worm arcade game
i gnobots2 - Avoid robots game
i gnome-blackjack - Blackjack casino card game
...

The status and descriptions will cause syntax errors when using the file as input to aptitude so a format needs to be specified to filter them out. The "-F" parameter is used to indicate a format change and "%p" equates to the package name:

aptitude search -F '%p' '?section(games) ?installed' >game_selections.txt

...
chromium-bsu
chromium-bsu-data
glchess
glines
gnect
gnibbles
gnobots2
gnome-blackjack
...

To use game_selections.txt as input to aptitude on another system just use command substitution (see the sh or bash man page) to redirect it from the shell:

aptitude install $(< game_selections.txt)

On Ubuntu you need to use sudo in front of this command or use "sudo su" to create a root shell and issue it from there.

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.