Skip to content

Google as a poor man’s OCR

I was reading a couple days ago a post comparing different OCR programs. Turns out that Google’s own proprietary OCR engine is often better under some circumstances. It seems they’ve adjusted it to use specific handling for many more use cases.

So, I’ve scanned the first two pages of my upcoming semester book. You can read the Chinese here:

http://china.panlogicsoftware.com/ocr/Chapter16.pdf

Hopefully in a few days, Google will have scanned and OCR’ed it. Will let you know!

  • sych
    It appears it didn't work.

    If you go to Google and look for his document... (http://www.google.com/#q=site%3Achina.panlogics...) you'll notice it's been indexed and there is a "View as HTML" option. If the OCR worked, the "View as HTML" option should bring up the OCR'd Chinese text of the document. It doesn't - it simply brings up two blank pages.

    I am wondering if it might have worked if the document was scanned right-way up instead of turned 90 degrees.
  • tabletguy
    No it didn't work.

    I replied immediately to the first person's comment, but for some reason that reply was blocked by my own system :(
    I've been using some good OCR software for Chinese, and hope to write a review soon. There's also a wikipedia article on OCR which lists many of the software packages, and which languages they say they support.
  • siddharta
    so, did it work out?
blog comments powered by Disqus