Diary and notebook of whatever tech problems are irritating me at the moment.

20101002

A patch for Linux Scanner Server v1.2 Beta1

I just spent several days testing, fixing bugs, and adding features to Linux Scanner Server v1.2 Beta1. LSS is an easy way to share a non-networkable scanner through a web server. While the interface doesn't allow cropping like Xsane or Simple Scan it does support multiple file outputs, printing, and OCR through Tesseract. Development has stalled with the beta and I encountered some bugs when testing it on Ubuntu 10.04 (Lucid Lynx). Instead of complaining about it, I fixed them.

Bugs fixed/features added:
Noise in Apache logs caused by unquoted variables and non-critical stderr outputs from ls and rm.
Adding scanners would fail if the scanner name included a forward slash.
Multiple scanner support broken due to a lack of newlines between entries in the scanner.conf file.
No support for scanners connected via parallel-ports.

I wanted to try to break the beta before deploying it to my clients and I did - as soon as I connected a second scanner. I decided to fix the bug even though none of my clients have more than one attached to any given system. I just happen to have a bunch of them on hand and, thanks to a local tech recycling center, I added a few more. Sane supports scanners connected to parallel ports but LSS doesn't so I decided to fix that, well, just because. Yes - I went out and paid money for more scanners including an obsolete parallel port Mustek model just to fix LSS.

The deciding factor in doing this was that LSS is based on a shell script and a lot of sed scripts. Shells scripts are about the only programming language I know to any depth (and Applesoft BASIC). Some of the regex/sed stuff still throws me but I had help from some of my LUG mates. These are the scanners I tested with (and tested simultaneously):

AGFA SnapScan 1212U (snapscan:libusb:002:003)
Brother Industries MFC-440CN (brother2:bus4;dev1)
Hewlett-Packard ScanJet ADF (C7190A, identified as 5200C) (hp:libusb:004:002)
Hewlett-Packard ScanJet 4470c (rts8891:libusb:004:003)
Hewlett-Packard ScanJet 6100c (C6260A but identified as C2520A) (hp:/dev/sg5)
Microtek ScanMaker E3 (microtek:/dev/sg3)
Mustek 600 III EP Plus (/dev/parport0)
UMAX Vista-S8 (umax:/dev/sg4)

Fixing the multiple scanner support was a pain. LSS relies on scanimage for all scanner functions. Getting scanimage to provide a newline at the end of the device list was trivial but the message printing function for the web page doesn't tolerate them and they all have to be converted to HTML breaks. Forward slashes in the model names from scanimage also required escaping but not anywhere else (like in the device paths). This got into sed loops which are really hard to do.

Adding support for scanners on parallel ports was also difficult. They have to be defined manually in the sane config files (/etc/sane.d/*.conf) but scanimage doesn't report them regardless. The sane-find-scanner utility does find them and will indicate what brand is on which port but no additional details like the model name. Since sane can use auto-probing to find which parallel port the scanner is on there is no deterministic way to use the information from sane-find-scanner and the sane conf files to indicate a specific model. The only solution I could come up with is to manually specify parallel port scanners in a separate "scan/config/manual_scanners.conf" file and then merge it after the rest are detected. The format is the same as for scanners.conf but the value for ID needs to be specified as %i (same as the device entry in the format line for scanimage). The modified LSS index.cgi script will replace it with an auto-incremented value when merging. The NAME= value doesn't matter but forward slashes have to be escaped with backslashes and anything longer than 30 characters will be truncated in the pull-down list on the Scan Image page.

Setting up a parallel port scanner is a bit confusing. The Mustek model I used was configured in /etc/sane.d/mustek_pp.conf simply by uncommenting the line "scanner Mustek-600-IIIEP * ccd300". The second parameter is the name. The third is the port with an * indicating autoprobing which in my case became /dev/parport0. The last is the actual driver. With scanimage the device is not specified by the port but rather the backend driver and then the name. With the settings I used it became "mustek_pp:Mustek-600-IIIEP" (also specified for the "DEVICE=" value in manual_scanners.conf). If only the backend is specified scanimage will default to whichever is enabled in the conf file. I only have the one parallel port scanner (the ScanJet 4470c has USB and parallel but there's no driver for the latter) so I don't know how it handles multiple ones configured in the same file/backend.

There are still bugs in LSS. The most obvious one is a fault with the "Print_Message" function. There are several page updates that don't occur, mostly the "Please wait" ones that are supposed to display during scanner detection and image scanning. I don't know enough about the interaction between javascript and the browser to identify if it is a bug in the code or an architectural problem with the page design.

Another bug is with the scanner names. As you can see from my list above, some of the scanners are not named correctly. It may be that the model reported is the base one that the actual model is compatible with and sane just isn't more specific than that. LSS just uses whatever scanimage reports. This isn't a major problem as most systems will only have one scanner.

A third problem is with the scanner driver options that LSS specifies - basically none. Some scanner/driver combinations require specific options to be specified else the scanner has problems. The only one I encountered with the models tested was that the default resolution of 200 was unacceptable to one of them so it was downgraded to 150. These errors show up in the Apache logs (/var/log/apache2/error.log) but only refer to index.cgi and not a specific point within the file. I'm not sure how this bug could be fixed. Parsing the options out of the sane conf files may work but different versions of the same base model may require different settings.

Future enhancements that would be nice are cropping and internationalization support but that's more than I'm going to take on. My LUG mates also suggested using anything other than shell scripts.

To use the patch, first download and extract LSS into your web server data directory (/var/www/scan). Then download the patch archive, extract it into the "/var/www" directory and apply with

patch -p1 --directory /var/www/scan --input=/var/www/scan_1.2_Beta4.patch

Just reload any browser window that has the old version loaded to make the new one active. Restarting the server is not necessary.

LSS is GPL 2 but it's not clear in the package as the author didn't follow the recommended method for applying the terms.

Note - there's a nagging problem with Ubuntu in that LSS can't access any scanners due to device permissions. It runs as user www-data but the old scanner group no longer exists so devices need chmod o+rw applied manually. For regular users (UID >1000) it seems to happen automatically but nobody seems to know how that works. I wrote Scanner Access Enabler to solve the problem.

Update: If you also have saned configured for scanner sharing then duplicates may be detected from both the raw devices and saned shared versions. If you don't want the ones from saned then comment out "localhost" in the "/etc/sane.d/net.conf" file and restart saned.

Update: I and pqwoerituytrueiwoq have been making more improvements to the beta. You can follow along and download updated files from this thread at the Ubuntu forums.

20110207 Update: I and pqwoerituytrueiwoq made a bunch of fixes and I've released 1.2 Beta 4 of the patch. The links and instructions above have been updated. It is a recursive patch so it will affect several files. You still need to add the favicon for it to "/var/www/scan/inc/images". I performed a feasibility study of adding proper preview, settings selection, and cropping. I found that the difficulty of adding them to the existing code base is extreme, even though some are needed to get LSS functioning correctly. For example, the Brother MFC-44CN doesn't scan because the modes that LSS uses (like "Color") are hard-coded in the html and don't match up with what the Brother driver offers. Because of these problems (and my lack of time) I've ended my involvement with the project. For my needs Beta 4 functions adequately. I also found another scanner project, phpSANE, that seems to have a better code base on php although it has many limitations otherwise.

3 comments:

Metamorph said...

I cannot download scanner-access-enabler from http://www.thescrut.com, could you give me updated address?

jhansonxi said...

Latest version is included with PHP version of the scanner.

Metamorph said...

thank you, I have got it..

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.