Diary and notebook of whatever tech problems are irritating me at the moment.

20080517

Setting up a local repository with debmirror

I set up a lot of PCs and while I have a fast 10Mbs Internet connection I wanted to utilize my faster internal network bandwidth better. With a new distro release it's less important as most of what I need is on the CD but as updates are released I end up downloading increasing amounts of data for each install. I've been doing lazy tricks like copying /var/cache/apt/archives to a network-shared directory but it's sloppy and multiple versions of packages accumulate. Setting up local repository was the answer for me.

You use debmirror to create a local repository for Ubuntu and Debian systems. Instead of duplicating an entire repository server you can select by release (feisty, gutsy, hardy), section (main, universe, multiverse, backports), architecture (i386, amd64), and using regular expressions. An alternative to creating a mirror is to create a caching proxy using apt-cacher. The advantages of one over the other depends on how similar the package selection of each client is. Caching is more efficient for serving similar systems, better at handling limited storage space on the server, and often has an earlier data transfer break-even point (the amount of upstream data transfer saved with it versus without). Depending on what packages are stored locally, a mirror is more efficient with diverse systems but you have to plan out the space requirements beforehand. The data transfer break-even point will take much longer to reach as many unneeded packages will be transferred unless you are very selective about which portions of the repository to mirror. With apt-cacher there can be less latency when subsequent requests for a newly published package are received as the initial request retrieves it immediately while debmirror updates are usually controlled via a cron job. Currently both debmirror and apt-cacher require the clients to be configured to use the new source so there is no administrative savings there. But apt-cacher does have to potential to support an intercepting (a.k.a transparent) proxy configuration if Debian bug #352140 is resolved which would eliminate client configuration.

Because I had space and rather diverse client requirements I went with the mirror approach using debmirror. I relied on several sources for information especially BobSongs' How To but he didn't include some of the third-party sources I needed and disabled some of the secure apt checks. Not every repository uses these security features but I prefer to err on the side of caution.

First, you need to install debmirror using "apt-get install debmirror", aptitude, Synaptic, or Adept. Next set up a location to store and share the mirror. I set the root of mine to "/srv/public/linux/distributions/Ubuntu/mirror". I also have ISOs and other files in this tree which explains the depth. The "/srv" directory is the FHS standard recommended served data location but you may prefer to dump it somewhere in /var. Then you select the repositories to mirror and create a shell script to run debmirror. You can create a cron job run it daily to stay updated. Finally, to share the mirror you can use anything that apt-get supports. Refer to "man sources.list" for the options. I'm not going to duplicate the many HOW-TOs on setting up servers here.

For my systems I needed the Hardy i386 and amd64 versions of packages in the general Ubuntu, Medibuntu, Wine, Google, and Skype repositories. First you need to set up a key ring for debmirror which defaults to "~/.gnupg/trustedkeys.gpg". On systems like Ubuntu which disables direct root logins and uses "sudo" instead, I create a special "administrator" account which is in the admin group and has a high-strength password (since the password provides root access). Within I created the keyring "/home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg" using the following command to import the Ubuntu archive keys:

gpg --no-default-keyring --keyring /home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg --import /usr/share/keyrings/ubuntu-archive-keyring.gpg

To this I added the other keys:

wget -q http://packages.medibuntu.org/medibuntu-key.gpg -O - | gpg --keyring /home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg --import

wget -q http://wine.budgetdedicated.com/apt/387EE263.gpg -O - | gpg --keyring /home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg --import

wget -q https://dl-ssl.google.com/linux/linux_signing_key.pub -O - | gpg --keyring /home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg --import

gpg --keyring /home/administrator/keyrings/mirrorkeyring/trustedkeys.gpg --import rpm-public-key.asc

The last key is for Skype. Their Linux support is minimal and the repository is kind of a mess. They moved the key on their server (or lost it) but I had a copy. The MD5 hash of my key (md5sum rpm-public-key.asc) is 2f595c0efe5d26fb4909f3347670746d and you can get a copy from this link.

Next I created my debmirror-hardy.sh script and put it in /usr/local/bin. Most of my parameters are explicitly defined on the debmirror command line although you could create a configuration file instead (the default is /usr/share/doc/debmirror/debmirror.conf). I specify the parameters in the same order as the corresponding directories appear in the repository path. I used rsync with the main Ubuntu repositories as it is supposedly faster but none of the others support it so they use http. Notice that the root for an rsync server is specified with a preceding colon (:). The "md5sums" parameter adds MD5 checking but you may want to skip it to speed up the mirror process. The "nosource" parameter skips source packages as the only time I need them is when I compile something outside of the distro and even then I only need the headers. I do compile Wine to perform testing on my primary system but I get it straight from the source tree using git. The "progress" option shows a download progress meter and I tee everything to the console so I can watch if I'm bored. It also creates a couple of logs in /var/log and compresses the old ones to save space.

#!/bin/sh
# debmirror script v1.1 for Ubuntu Hardy Heron
# Copyright 2008 Jeff D. Hanson (jhansonxi@gmail.com)
# Released under GNU General Public License version 3
# v1.1 - added debian-installer section, post chown/chmod
# fix, size summary, date/time

DEBMLOG=/var/log/debmirror.log
MIRRORDIR=/srv/linux/distributions/Ubuntu/mirror
export GNUPGHOME=/home/administrator/keyrings/mirrorkeyring

if test -s $DEBMLOG
then
test -f $DEBMLOG.3.gz && mv $DEBMLOG.3.gz $DEBMLOG.4.gz
test -f $DEBMLOG.2.gz && mv $DEBMLOG.2.gz $DEBMLOG.3.gz
test -f $DEBMLOG.1.gz && mv $DEBMLOG.1.gz $DEBMLOG.2.gz
test -f $DEBMLOG.0 && mv $DEBMLOG.0 $DEBMLOG.1 && gzip $DEBMLOG.1
mv $DEBMLOG $DEBMLOG.0
cp /dev/null $DEBMLOG
chmod 640 $DEBMLOG
fi

# Record the current date/time
date 2>&1 | tee -a $DEBMLOG

# Ubuntu mother lode. At least it supports rsync.
echo "\n*** Ubuntu general ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=rsync --md5sums --progress \
--host=us.archive.ubuntu.com \
--root=:ubuntu \
--dist=hardy,hardy-security,hardy-updates,hardy-backports \
--section=main,main/debian-installer,restricted,restricted/debian-installer,\
universe,universe/debian-installer,multiverse,multiverse/debian-installer \
--arch=i386,amd64 \
$MIRRORDIR/ubuntu \
2>&1 | tee -a $DEBMLOG

# Canonical's rather lonely partners repo
echo "\n*** Canonical partners ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=http --md5sums --progress \
--host=archive.canonical.com \
--root=/ \
--dist=hardy,hardy-backports,hardy-proposed,hardy-security,hardy-updates \
--section=partner \
--arch=i386,amd64 \
$MIRRORDIR/canonical \
2>&1 | tee -a $DEBMLOG

# Medibuntu fun stuff
echo "\n*** Medibuntu ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=http --md5sums --progress \
--host=packages.medibuntu.org \
--root=/ \
--dist=hardy \
--section=free,non-free \
--arch=i386,amd64 \
$MIRRORDIR/medibuntu \
2>&1 | tee -a $DEBMLOG

# Wine's latest bugs
echo "\n*** Wine ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=http --md5sums --progress \
--host=wine.budgetdedicated.com \
--root=/apt \
--dist=hardy \
--section=main \
--arch=i386,amd64 \
$MIRRORDIR/wine \
2>&1 | tee -a $DEBMLOG

# Our friends at Google. Including a leading / in the root causes failure.
echo "\n*** Google ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=http --md5sums --progress \
--host=dl.google.com \
--root=linux/deb \
--dist=stable \
--section=main,non-free \
--arch=i386,amd64 \
$MIRRORDIR/google \
2>&1 | tee -a $DEBMLOG

# Skype's half-baked linux contribution. Located in a half-baked repository.
echo "\n*** Skype ***\n" 2>&1 | tee -a $DEBMLOG
debmirror --nosource --method=http --md5sums --progress --ignore-release-gpg --ignore-missing-release \
--host=download.skype.com \
--root=/linux/repos/debian \
--dist=stable \
--section=non-free \
--arch=i386 \
$MIRRORDIR/skype \
2>&1 | tee -a $DEBMLOG

echo "\n*** Fixing ownership ***\n" 2>&1 | tee -a $DEBMLOG
find $MIRRORDIR -type d -o -type f -exec chown root:root '{}' \; \
2>&1 | tee -a $DEBMLOG

echo "\n*** Fixing permissions ***\n" 2>&1 | tee -a $DEBMLOG
find $MIRRORDIR -type d -o -type f -exec chmod u+rw,g+rw,o+r-w {} \; \
2>&1 | tee -a $DEBMLOG

echo "\n*** Mirror size ***\n" 2>&1 | tee -a $DEBMLOG
du -hs $MIRRORDIR 2>&1 | tee -a $DEBMLOG

# Record the current date/time
date 2>&1 | tee -a $DEBMLOG

This works very well so far but it took a lot of time to figure out. One thing I noticed is that apt-get handles some repository structures better than debmirror. Google's repository had an oddity, possibly due to a redirect, that caused debmirror to not find the Release file or detached *.gpg signature unless I left out the preceding / from the root parameter. Skype's repository has a Release file but not where debmirror could find it. They don't sign it either.

UPDATE: I've made some changes to the script. I've been having fun with PXELINUX and performing Ubuntu installs by netbooting. This required the addition of the debian-installer portion of the repositories. I also added time/date timestamps and a final size check (about 37GB for everything so far). One problem I haven't found the solution for is that when I put the script in /etc/cron.daily it doesn't run.

UPDATE2: Thanks to the comment by sq5nbg I figured out the problem with cron.daily. The crontab entry for it uses run-parts to run the executables in the directory. According to it's man page it is picky about the file names it will accept and a period is not a valid character. You either have to rename the file or symlink to it. The run-parts utility is in debianutils and bug #38022 reports this issue. It's marked as a wishlist item since the restriction is documented in the man page. I added a note about this to the cron page in the Ubuntu Wiki.

I need to point out that the script should be edited to use a server nearest (in Internet terms) to you instead of the ones specified. This especially applies to the Ubuntu mirror (us.archive.ubuntu.com). Use the Ubuntu mirror list page to find one that has the packages and protocol you want. This reduces the load on the primary servers.

UPDATE3: You can use your local mirror with the Minimal CD to install Ubuntu on systems that don't support network booting. First, set up a server that provides access to the mirror directory. I used Apache2 to serve them via http and put a link to my debmirror directory in "/var/www". If you are using an http server you should be able to navigate the debmirror directories using any web browser. If you can't see them then the installer won't either. After setting up the server, boot the CD and proceed as you normally would through the boot settings and locale selection. After specifying the network configuration and hostname you will see the "Choose a mirror of the Ubuntu archive" screen where it wants you to select the "Ubuntu archive mirror country". Hit the Home key to jump to the top of the list and select the "enter information manually" option. For the "Ubuntu archive mirror hostname" enter your servers hostname, FQDN, or IP address. Do not specify a protocol prefix (http://) or any directory path on that screen. I'm not sure if the installer tries all protocols, defaults to http, or guesses based on a specified port number but I didn't have to tell it what to use. On the next screen, enter the "Ubuntu archive mirror directory" with the full server path to the directory containing the dists, pool, and project directories. If you do it wrong it won't be able to find the "Release" file and you will get a "bad archive mirror" error.

2 comments:

Unknown said...

As I remember, I have a problem with cron with some python script. When I removed the '.py' extension from script name it started to work. Please try to remove '.sh' extension from script name. I believe it could help.

jhansonxi said...

Thanks for the info. I tracked it down to a file name restriction in run-parts which processes the directories. It's man page lists it's picky file name restrictions.

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.