PDF OCR and Tagging
What is Optical Character Recognition (OCR) and why is it important?
When a PDF is scanned, it basically takes a picture of the document. This means that it is one big graphic and has no text that can be understood by a screen reader, making it completely inaccessible for someone who is blind or with a reading disability. It also makes it less useful for all students as the text is not searchable or usable for the student who may want to highlight sections or pull quotes for their notes.
- One giant graphic
- Not searchable
- Not Editable
- Not usable by screen readers (there is no text – only an image)
- Text based
- Usable by screen readers
OCR is only part of the equation. Documents should also be tagged so that it is easy for a screen reader to navigate easily through a document. For more information on tags, visit the Headings page. The best way to do this is to this is to:
- Create your Word or PowerPoint using styles/headings (and following all other accessibility recommendations) and then convert the document to a PDF using the save as option or Acrobat DC Pro Ribbon (if available) or,
- Create your document in Google Suites and use the Grackle Accessiblity Tool to check for accessiblity then use the Export to PDF button in Grackle to create an accessible PDF.
If you do not have the original document and cannot find an accessible version, you can still make sure that it is OCR’d making it readable (if not terribly usable). Most scanners have an OCR option in their settings. You can also use the Ally alternative format function to create an OCR’d version of the document.