Amazon.com Widgets

Get Real Text out of your Scanned Documents

Image representing Google as depicted in Crunc...
Image via CrunchBase

OCR is the technology used to turn an image of text into plain (editable, search-able) text. If you’re like me (i.e., a nerd) you probably have a pile of scanned journal articles and books and such meticulously sorted on your hard drive (PDFs for example). You can read them and print them, but you can’t search them or edit them. Wouldn’t it be nice if you could?

Well, there are a number of free options on the web, but they all have their problems. Google has some of the best OCR technology out there–they recently acquired CAPTCHA to make it even better–and they have apparently been rolling this out into Google Docs. The Google Docs version is not as wonderful as you might like, but it works on high-res documents. Read about how to turn your images into text here.

Update: I was not able to get this to work with PDFs, surprisingly. The web-app only accepts PNG, JPEG, or GIF images right now. That is unfortunate, and I assume will be “corrected” in the future. Has anyone tried this on an image yet?

Related posts:

  1. Send Web Documents Straight to Google Docs
  2. Search your PDFs with OCR
  3. Even More PDF Tools
  4. Amazed by Google Books
  5. A Guide to Using Zotero in Biblical Studies: Collecting, Annotating and Citing Bibliographic Data


Vote This Post DownVote This Post Up (No Ratings Yet)
Loading ... Loading ...
Print This Post Print This Post

Tags: , , , ,

If you enjoyed this post, please leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

Comments

Well it isn't the best, but there is a free OCR app that has been around for a while, simple OCR. http://www.simpleocr.com/

If you are a mac user (if you aren't, you should be) a phenomenal database application called Devonthink Pro Office has an OCR engine built in to it, and it is the best I've ever used.

Thanks for the link. I will give it a try.

Linux user here, so we're like cousins.

Leave a comment

(required)

(required)