Quantcast
Channel: Raj at the Internet Archive » docs
Viewing all articles
Browse latest Browse all 2

New Upload Format, *_images.zip, for Scribe-style Uploads

$
0
0

Hank says:

I’d like to provide some information about a new file format to some of you who have been involved with uploading already-digitized materials to the Archive. (Please share this message with anyone I didn’t include and should have.)

You may be familiar with (and may be using) our existing _jp2.zip and _jp2.tar files. Making these from your own existing images is inconvenient and error-prone, due to the rigid expectations for individual image filenames and directory structure.

The new format is much more flexible. If you provide a file whose name ends in _images.zip, we’ll make a _jp2.zip from it: the _images.zip will be unpacked, its contents sorted alphabetically (and any subdirectories flattened), and the set of images found within converted into a standard _jp2.zip, which we’ll then process as usual.

In a bit more detail, the _images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .tif, or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored. You can mix and match different image formats. All image files found will be sorted alphabetically (including any directory names, so that files originally in the same directory stay together in the new sequence), converted to jp2 if they’re not already, renamed the way our code expects, and packed into a new _jp2.zip, leaving your _images.zip in place as it was.

For an example of how messy an _images.zip we can deal with, see:

http://ia700400.us.archive.org/zipview.php?zip=/25/items/hr100106/hr100106_images.zip

listing from hr100106_images.zip
	767010/	01-06-10 13:18	0
	767010/76701057/	01-06-10 06:59	0
	767010/76701057/00000001.jpg	01-06-10 06:59	268802
	767010/76701061/	01-06-10 07:00	0
	767010/76701061/00000001.jpg	01-06-10 07:00	292476
	767010/76701067/	01-06-10 07:01	0
	767010/76701067/00000001.jpg	01-06-10 07:01	230612
	767010/76701068/	01-06-10 07:02	0
	767010/76701068/00000001.jpg	01-06-10 07:02	235011
	767010/76701069/	01-06-10 07:05	0
	767010/76701069/00000001.jpg	01-06-10 07:05	281997
...

The 589 images files found there were converted into:

http://ia700400.us.archive.org/zipview.php?zip=/25/items/hr100106/hr100106_jp2.zip

listing from hr100106_jp2.zip
	hr100106_jp2/	02-22-11 05:31	0
	hr100106_jp2/hr100106_0000.jp2	(JPG)	02-22-11 05:30	143845
	hr100106_jp2/hr100106_0001.jp2	(JPG)	02-22-11 05:30	191348
	hr100106_jp2/hr100106_0002.jp2	(JPG)	02-22-11 05:30	93923
	hr100106_jp2/hr100106_0003.jp2	(JPG)	02-22-11 05:30	100340
	hr100106_jp2/hr100106_0004.jp2	(JPG)	02-22-11 05:30	164196
	hr100106_jp2/hr100106_0005.jp2	(JPG)	02-22-11 05:30	169330
...

Note that the new _jp2.zip, and the files it contains, are named according to the name of the original _images.zip file (“hr100106″), regardless of how directories and files are names inside the _images.zip. Those files and directories can be named any way you like; the names matter only in that they determine the sequence of the images in the new _jp2.zip.

Again, please share this info with anyone you think will be interested.

Thanks, Hank!


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images