02: NON-TRANSCRIBED TEXT, NO TRANSCRIPTION NEEDED, one or more pages per metadata (.xml) record

 

ONLY USE THIS BUCKET if the text is typewritten in a clear and simple font. Why? README

 

The person who processes incoming tiffs for derivatives for delivery will OCR these tiff files to create a text file for each one.

 

Not sure whether to transcribe? README

 

The following files belong to the same object.

 

a single-page letter (clear handwriting or typed):

0023_000003_000012_0000.xml

0023_000003_000012_0001.tif

0023_000003_000012_0001.txt

(this last is the text file we create with OCR software from the tiff)

 

OR:

 

a five-page letter (clear handwriting or typed):

0023_000003_000012_0000.xml

0023_000003_000012_0001.tif

0023_000003_000012_0002.tif

0023_000003_000012_0003.tif

0023_000003_000012_0004.tif

0023_000003_000012_0005.tif

 

the corresponding OCR text files will be created after they get to the DLC:

0023_000003_000012_0001.txt

0023_000003_000012_0002.txt

0023_000003_000012_0003.txt

0023_000003_000012_0004.txt

0023_000003_000012_0005.txt

 

0023 means that this is institution 23

000003 means that this is their collection number 3 (other institutions may have a collection 3 also)

000012 is the item number. This is the 12th item in the series

0000 is the metadata record, with the .xml extension

0001 this is the first digital object that applies to this metadata record; here the number is the sequence to be applied in display

(This may NOT be the same as the page number for the scanned image!)

0002 then is the 2nd page in the sequence 0003 is the third page in the sequence, and so on

 

The scanned pages and their corresponding metadata files thus do NOT have the same filename!

However, each tiff has a corresponding text file that DOES have the same filename.

 

Return to Filenaming Schemes


Page Information

  • 9 months ago [history]
  • View page source
  • You're not logged in
  • No tags yet learn more

Wiki Information

Recent PBwiki Blog Posts