PDF Accessibility: Creating Accessible PDF Documents
What makes a PDF truly accessible — and how to achieve it through proper tagging, structure, alt text, and testing.
What Makes a PDF Accessible?
An accessible PDF is one that can be read and understood by people using assistive technologies, including screen readers, refreshable Braille displays, and text-to-speech tools. Where a standard PDF may render visually correctly for sighted users, it can be completely opaque to a screen reader if it lacks the structural information the reader needs to interpret the content.
The four fundamental requirements for an accessible PDF are: a tagged logical structure tree, a defined reading order, alternative text for non-text content, and a specified document language. Without these, a screen reader has no reliable way to determine the order of content on a page, the meaning of an image, or the language to use for pronunciation. Beyond these basics, full accessibility requires correct table markup, proper heading hierarchy, and accurate character encoding so text is both selectable and speakable.
WCAG 2.1 and PDF Accessibility
The Web Content Accessibility Guidelines (WCAG) 2.1, published by the W3C, are the dominant international reference for digital accessibility. While WCAG is primarily written for web content, PDF documents published on the web or distributed electronically are expected to meet WCAG 2.1 Level AA in most regulatory contexts — including EN 301 549 (the European standard referenced by public sector accessibility regulations) and the US Section 508 standards.
The W3C has published a set of PDF Techniques for WCAG 2.0 that map specific WCAG success criteria to concrete PDF implementation requirements. These cover tagging, headings, tables, forms, links, language, and more. For example, WCAG 1.1.1 (Non-text Content) maps to providing alt text for images in PDFs; WCAG 1.3.1 (Info and Relationships) maps to using appropriate structural tags; WCAG 2.4.2 (Page Titled) maps to the document having a title in its metadata.
PDF/UA: Universal Accessibility (ISO 14289)
PDF/UA (Universal Accessibility) is the ISO standard specifically designed for accessible PDF. Published as ISO 14289-1 in 2012 (and subsequently updated), PDF/UA defines normative requirements for both PDF creators and PDF viewers/assistive technologies.
PDF/UA requires everything that WCAG PDF Techniques recommend, but is more specific and auditable because it is a technical standard with a formal conformance claim embedded in the document's XMP metadata (pdfuaid:part set to 1). Key PDF/UA requirements include: all real content must be tagged, no tag may be missing from the structure, headings must be sequentially nested, tables must have header cells, all images must have alt text or be marked as artifacts, the document must have a title, the language must be set, and tab order must be consistent with the structure order.
PDF/UA conformance is increasingly required by government procurement policies and is the definitive reference for accessible PDF production.
The Tag Structure: Logical Structure Tree
The logical structure of a PDF is represented as a tree of structure elements, each described by a tag that indicates its role. The tree mirrors the semantic structure of the document — chapters, sections, paragraphs, lists, tables, figures — independently of the visual layout.
Common tag types defined in the PDF specification include:
- Document, Part, Sect: Container elements for major document divisions.
- P: Paragraph — the workhorse block-level element for body text.
- H, H1–H6: Heading elements. PDF supports both a generic H tag (with heading level implied by nesting) and explicit H1 through H6 tags equivalent to HTML headings.
- L, LI, Lbl, LBody: List, list item, label, and body. Used for both ordered and unordered lists.
- Table, TR, TH, TD: Table, table row, table header cell, and table data cell.
- Figure: Used for images, illustrations, and other non-text visual content. Must carry an Alt attribute.
- Caption: A caption associated with a Figure or Table.
- TOC, TOCI: Table of contents and table of contents item.
- Link, Reference, Annot: Hyperlinks and cross-references.
- Form: Form fields.
- Span: Inline-level element for marking up portions of text with specific language or other attributes.
Role Mapping for Custom Tags
PDF allows the use of custom (non-standard) tag names, but every custom tag must have a role map entry in the document's structure tree root that maps it to a standard PDF tag type. This ensures that viewers and assistive technologies that do not recognise the custom name can still interpret it correctly by falling back to the mapped standard role. Without role mapping, custom tags are inaccessible to conforming readers.
Alternative Text for Images and Figures
Every Figure tag (and any other non-text element that conveys information) must carry an Alt attribute containing descriptive alternative text. The alt text should convey the meaning or purpose of the image as it relates to the surrounding content — not merely describe it visually. Purely decorative images should be marked as artifacts (removed from the tag structure) so screen readers skip them entirely, rather than being tagged as empty Figures.
The ActualText attribute, distinct from Alt, is used for situations where a character or symbol does not have a Unicode representation — for example, a decorative ligature or a symbol rendered as a glyph. ActualText provides the text that should be read by a screen reader in place of the glyph's encoding.
Table Structure: TH, TD, Scope, and Headers
Tables are among the most challenging elements to make accessible in PDF. Simply using TR, TH, and TD tags is not sufficient for complex tables. Each TH cell must carry a Scope attribute indicating whether it is a header for a column (Scope="Column"), a row (Scope="Row"), or both (Scope="Both"). For irregular tables with merged cells or non-linear header relationships, each TD can carry a Headers attribute listing the IDs of the TH cells that header it.
PDF/UA also requires that tables used purely for visual layout (layout tables, not data tables) be either avoided or correctly identified so that assistive technologies do not try to interpret them as data. Spanning cells must be correctly encoded with RowSpan and ColSpan attributes.
Reading Order and Artifact Tagging
The reading order of a PDF is determined by the order of elements in the logical structure tree, not by their visual position on the page. Content that appears at the top of the page visually may be late in the structure tree, causing a screen reader to announce it out of sequence. Correcting reading order requires reordering elements in the tag tree using Acrobat's Order panel or a PDF processing library.
Content that is purely decorative or presentational — such as background images, decorative rules, page headers/footers that repeat across pages, watermarks, and column borders — should be tagged as artifacts. Artifact content is excluded from the logical structure tree and ignored by screen readers, reducing noise and improving the accessibility experience.
Language Specification
The document-level language is set in the document's catalogue (Lang entry) as a BCP 47 language tag (for example, "en-GB" or "de"). This tells assistive technologies which language rules to apply for pronunciation and hyphenation. Inline language changes — for example, a French phrase in an English document — are marked with a Lang attribute on a Span element enclosing the affected text, allowing the screen reader to switch pronunciation accordingly.
Testing with Screen Readers and Accessibility Checkers
Adobe Acrobat Pro includes an Accessibility Checker (Tools > Accessibility > Full Check) that tests a document against a set of PDF/UA and WCAG-aligned criteria. It checks for a document title, language setting, tagged content, alt text on figures, tab order, and several other requirements. It produces a panel report with pass/fail/needs manual check results and links to the affected elements.
Screen reader testing provides ground truth that automated checkers cannot. JAWS (Job Access With Speech) and NVDA (NonVisual Desktop Access) are the most widely used Windows screen readers for PDF testing. Adobe Acrobat's Read Out Loud feature offers a quick first-pass check. Testing should include navigating by heading, reading table cells, activating links, and using the reading order to ensure the document is comprehensible in linear read mode.
Remediation Workflow
Remediating an inaccessible PDF follows a consistent pattern: run the Accessibility Checker to identify issues, use Acrobat's Tags panel and Reading Order tool to add or correct tag structure, set or correct the document language and title, add alt text to all figures, correct the reading order, fix table headers, and re-run the checker to verify. For large documents or high volumes, programmatic remediation using the Adobe PDF Library or specialist tools provides a more scalable approach than manual remediation in Acrobat.
Proper navigation aids such as bookmarks and a table of contents are important companions to accessibility. Bookmarks allow users to jump directly to sections without linear navigation, while a well-structured table of contents provides an overview of document structure. Mapsoft's Bookmarker automates the creation of accurate, structured PDF bookmarks, and TOCBuilder generates navigable tables of contents — both of which materially improve the accessible user experience.
Build Better Navigation Into Your PDFs
Mapsoft's Bookmarker and TOCBuilder tools automate the creation of structured bookmarks and tables of contents, making your PDFs more navigable and accessible for all users.