Image representing Google as depicted in Crunc...
Image via CrunchBase

OCR is the technology used to turn an image of text into plain (editable, search-able) text. If you’re like me (i.e., a nerd) you probably have a pile of scanned journal articles and books and such meticulously sorted on your hard drive (PDFs for example). You can read them and print them, but you can’t search them or edit them. Wouldn’t it be nice if you could?

Well, there are a number of free options on the web, but they all have their problems. Google has some of the best OCR technology out there–they recently acquired CAPTCHA to make it even better–and they have apparently been rolling this out into Google Docs. The Google Docs version is not as wonderful as you might like, but it works on high-res documents. Read about how to turn your images into text here.

Update: I was not able to get this to work with PDFs, surprisingly. The web-app only accepts PNG, JPEG, or GIF images right now. That is unfortunate, and I assume will be “corrected” in the future. Has anyone tried this on an image yet?

Related posts:

  1. Send Web Documents Straight to Google Docs
  2. Amazed by Google Books
  3. Search your PDFs with OCR
  4. A Guide to Using Zotero in Biblical Studies: Collecting, Annotating and Citing Bibliographic Data
  5. Google Will Archive Your Sermons (and Papers)

  2 Responses to “Get Real Text out of your Scanned Documents”

  1. Thanks for the link. I will give it a try.

    Linux user here, so we're like cousins.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

   
© 2012 Nerdlets Suffusion theme by Sayontan Sinha