Diary and notebook of whatever tech problems are irritating me at the moment.

20140829

Wgetting to the correct file name

One of the annoyances of using wget in scripts is the problem of files from some web sites being named "index.html" instead of the actual file name.  This is due to the use of the content-disposition header field (part of MIME) to indicate the file name instead of the URL.  Support for this field by wget is incomplete and disabled by default.

There are a few ways around this.  The easiest is to tell wget to look for it with the --content-disposition option.

Another option is to use curl --remote-header-name or aria2 instead of wget.

To easily create commands for more complicated file retrieval (with cookies, etc.) the Firefox add-on cliget can provide a command line for curl, wget, and aria2 in the opening download window.  It only shows the curl command line by default.  You can enable the others in cliget preferences (available in the cliget entry in about:addons).

No comments:

About Me

Omnifarious Implementer = I do just about everything. With my usual occupations this means anything an electrical engineer does not feel like doing including PCB design, electronic troubleshooting and repair, part sourcing, inventory control, enclosure machining, label design, PC support, network administration, plant maintenance, janitorial, etc. Non-occupational includes residential plumbing, heating, electrical, farming, automotive and small engine repair. There is plenty more but you get the idea.