Diary and notebook of whatever tech problems are irritating me at the moment.

20110117

Expanding Ubuntu Recovery Mode

Recovery Mode is a text-based interface to a few quick repair tools that is installed by default with most Ubuntu releases and derivatives. I wrote a few add-ons for it that increase its usefulness in remote repair and diagnostics situations. These were developed and tested on Ubuntu 10.04 (Lucid Lynx).

Starting Ubuntu in Recovery Mode (aka. Friendly Recovery) is relatively easy. Just hold down the shift key after the BIOS POST to get Grub2 to show its menu, then just select the kernel with the "recovery" option. Also note the memtest86+ option which is useful for identifying bad RAM.

Adding on to Recovery Mode is relatively simple. At its heart is a shell script, "/usr/share/recovery-mode/recovery-menu", that is started at the end of the single mode (runlevel S) boot. It looks through the options subdirectory and starts every script it finds, passing it a parameter of "test". It looks for a return status of 0 and the description of the script on stdout. Scripts with valid responses are added together and shown in a menu listing using the whiptail dialogger. The user selects one from the menu to execute it.

My additions are more informative than corrective. The intention is to help with diagnostics when dealing with a remote non-technical client. They are also useful for beginners who lack command-line experience and simply don't know where to look for system status information.

Many of my scripts check their respective system configuration and return a non-zero status if required executables are not installed or configured. This keeps the menu from getting cluttered. For example, the sensors script checks for output from the sensors command. Lack of such indicates that the hardware sensors haven't been configured with sensors-detect or the required modules haven't been added to /etc/modules. When this happens it does an exit 1 when started with the test parameter. The ddclient script looks for run_daemon="true" in /etc/default/ddclient and the presence of the ddclient executable. The ssh script looks for the sshd process and its description changes if it is found or not. If you write your own, the only limitation to keep in mind is that the description returned should be 45 characters or less as longer ones will corrupt the whiptail display.

Some of the scripts deserve special attention:

shallablud: works with my shall-bl-update v1.3 or later. It forces an update to the Shalla blacklists for DansGuardian.

lynx: requires the Lynx text browser. It does a su to the default admin member (the first one listed in the admin group) before starting. It defaults to the DynDNS.com check IP page. I used Lynx because it has options for lockdown (prevent shell escapes, etc.) that the others don't offer.

wicd: requires wicd-curses. While the netroot script already provides network activation before switching to a shell, it just starts dhclient to get an IP address and nothing else. This was something I requested back in Hardy. It's better than nothing but is rather useless if you only have a wireless connection. Wicd solves the problem but creates another - it conflicts with Network Manager. Luckily the packages themselves don't conflict on Lucid but the daemons do. The script will stop Network Manager before starting wicd-curses (which starts the wicd daemon). To keep this from happening when starting wicd from a root shell you need to either stop Network Manager first or modify the Upstart job configuration to keep it from starting in recovery mode (runlevel S). The conf file also needs to be diverted by dpkg to keep it from being overwritten on updates (and reverting the changes). The commands to do this are:

dpkg-divert --rename --divert /etc/init/network-manager.conf.original /etc/init/network-manager.conf
cp /etc/init/network-manager.conf.original /etc/init/network-manager.conf
sed -i 's/\(.*and started dbus\)\().*\)/\1\n\t and runlevel [!S]\2/' /etc/init/network-manager.conf

You need to either add a sudo in front of these or open a root terminal with "sudo su". The divert tells dpkg to rename the file and always redirect new installations to "network-manager.conf.original". The file is then copied back to use as a template. The sed expression then adds a condition to not start in runlevel S.

This only solves half of the problem. The Wicd daemon still needs to be prevented from starting during regular operation (runlevel 2) unless you plan to use it instead of Network Manager. Wicd's configuration hasn't been changed to Upstart yet so it's still using init scripts. To disable it do:

mv /etc/rc2.d/S20wicd /etc/rc2.d/K80wicd

This by itself is not enough. If wicd-gtk is installed, it will start when the desktop loads and start the daemon if it is not active. You need to purge it with aptitude or apt-get. In addition, another function somewhere will also start the wicd daemon. The only option I've found is to change the wicd executable, which is just a script that starts the daemon with Python, to not function unless the runlevel is single mode. These commands will make the change:

dpkg-divert --rename --divert /usr/sbin/wicd.original /usr/sbin/wicd
cp /usr/sbin/wicd.original /usr/sbin/wicd
sed -i 's/\([[:space:]]*exec[[:space:]]\+.*\)/[ \"$RUNLEVEL\" = \"S\" ] \&\& \1/' /usr/sbin/wicd

If you make this change you won't have to disable the init script. You will also have to fix the AppArmor profile for dhclient so that wicd can use it (bug #588635). Just add the text in the report before the entry for Network Manager.

One option that isn't listed in the menu is "fsck". This is easy to fix as the script just needs execute permission (bug #566200).

Currently the "resume" option doesn't function (bug #651782).

If you want to prevent the "root" and "netroot" options from providing an uncontested root prompt try my rootlock.

Consider a theoretical example of how this all works with a remote user. They have a problem with X not starting and contact you. They are a considerable distance away and don't have time to ship their PC to you for repair. The system is bootable and they have high-speed Internet so remote access is possible. You tell them how to enter Recovery Mode and how to start wicd. It automatically gets an IP from a wired connection but if they are using wireless they have to select an AP from whatever wicd finds. If they are using Network Manager and their normal wireless connection is encrypted, you will have to set it up beforehand with wicd as SSIDs and keys aren't shared with Network Manager (or the root account which is the one being used here). If they have a dynamic WAN IP address then you have them start ddclient (which also needs to have been configured) or start Lynx and read to you the WAN IP from DynDNS.com. Then they can start sshd. At this point you should be able to access it remotely over SSH assuming that any intervening firewall/NAT routers are forwarding the correct ports. Obviously you should be using key-based authentication with SSH, not passwords. If you can't access it remotely you can still have them perform updates with the dpkg option (also an upgrade), fix the X configuration with failsafeX, or read you the root mail, SMART drive status, and sensor readings (if configured).

Obviously many problems can't be fixed this way but if it saves you a road trip or two it's worth it.

Update: I filed bug #706145 to get these into Ubuntu. Following the normal submit/reject/resubmit/ignore cycle it should be in the repositories within a few years.

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.