locked Re: Index to Southern contracts Vol XII


David Friedlander
 

Ike,

I finally had a day to get back and look at OCRing this (holidays+work took precedence...Happy 2019 btw.)...but all three Google drive links no longer work.  Did you move them and can you reshare so I can take a stab at OCRing them in whatever process makes sense?


Jason,

My original intent was to create PDF's or some other file that you can just search for stuff using the built-in search features.  There was a worry from the SRHA on providing material for free.  Producing a PDF would allow the SRHA to take the files out of Google Drive and package them up in some other digital collection, whether that be free or for sale, etc. It also removes any dependence on keeping things in Google Drive, especially if Ike runs out of space and does not feel like paying for more space.

Thanks,
David Friedlander


On Tue, Dec 25, 2018 at 10:33 AM George Eichelberger <geichelberger@...> wrote:
David:

As tiffs are full resolution files, I have always had the best luck doing anything with an image using at least 300 dpi tiff scans. Note in my example, the tiff version is more than twice the size of the jpeg. The finer the detail on the image helps OCR software do its job.

The jpeg versdion is fine for us to read but I’d use the tiff version for an OCR.

Ike

PS Merry Christmas to all!



On Dec 24, 2018, at 10:54 PM, David Friedlander <davidjfriedlander@...> wrote:

So I will attempt the OCR "assignment". 

I will say that I was able to OCR the first jpeg (not TIFF) image in a matter of 20 seconds just by opening the image up in Google Documents. It created a document with 2 pages - one with the image and second with the text.  I had to fix 2 letters because they were not crisp enough in the image.  Not exactly what I had in mind... I was hoping for a searchable PDF, but I will see what else I can do with what you gave us and throw it in my Google Drive to provide back to you to distribute back out to everyone.  Maybe I convert the image to PDF and then see if OCR will work on that.

David Friedlander

On Mon, Dec 24, 2018 at 3:37 PM George Eichelberger <geichelberger@...> wrote:
Here are three links to essentially the same file on Google, Drive. They are tiff, jpeg and a zip versions of the scans for the index to Southern Contracts Volume XII. There are approximately 28 Volumes of bound contracts in the SRHA archives, Vol. XII is dated June 30, 1918. It includes all contracts in effect at that time. When a contract expired was cancelled or replaced it simply dropped out of the following volume’s index.

Any contracts begun after 6-30-1918 are found in later volumes. If there is enough interest, we can scan and publish the index to Vol 28. Those two will provide a fair idea of all of the various SR contracts, trackage rights, agreements and joint facilities.

The jpeg version is about 320 Mb, the full 300 dpi resolution version is 783Mb and the zipped version of the jpeg folder is 185Mb.

If someone will volunteer, the tiff version can be “OCRd” to give a fully searchable text version.

The three Google Drive links are:




Ike




Join main@SouthernRailway.groups.io to automatically receive all group messages.