locked Re: Index to Southern contracts Vol XII


George Eichelberger
 

David:

As tiffs are full resolution files, I have always had the best luck doing anything with an image using at least 300 dpi tiff scans. Note in my example, the tiff version is more than twice the size of the jpeg. The finer the detail on the image helps OCR software do its job.

The jpeg versdion is fine for us to read but I’d use the tiff version for an OCR.

Ike

PS Merry Christmas to all!



On Dec 24, 2018, at 10:54 PM, David Friedlander <davidjfriedlander@...> wrote:

So I will attempt the OCR "assignment". 

I will say that I was able to OCR the first jpeg (not TIFF) image in a matter of 20 seconds just by opening the image up in Google Documents. It created a document with 2 pages - one with the image and second with the text.  I had to fix 2 letters because they were not crisp enough in the image.  Not exactly what I had in mind... I was hoping for a searchable PDF, but I will see what else I can do with what you gave us and throw it in my Google Drive to provide back to you to distribute back out to everyone.  Maybe I convert the image to PDF and then see if OCR will work on that.

David Friedlander

On Mon, Dec 24, 2018 at 3:37 PM George Eichelberger <geichelberger@...> wrote:
Here are three links to essentially the same file on Google, Drive. They are tiff, jpeg and a zip versions of the scans for the index to Southern Contracts Volume XII. There are approximately 28 Volumes of bound contracts in the SRHA archives, Vol. XII is dated June 30, 1918. It includes all contracts in effect at that time. When a contract expired was cancelled or replaced it simply dropped out of the following volume’s index.

Any contracts begun after 6-30-1918 are found in later volumes. If there is enough interest, we can scan and publish the index to Vol 28. Those two will provide a fair idea of all of the various SR contracts, trackage rights, agreements and joint facilities.

The jpeg version is about 320 Mb, the full 300 dpi resolution version is 783Mb and the zipped version of the jpeg folder is 185Mb.

If someone will volunteer, the tiff version can be “OCRd” to give a fully searchable text version.

The three Google Drive links are:




Ike




Join main@SouthernRailway.groups.io to automatically receive all group messages.