PDF #
Warning: This post hasn't been updated for over a year. The information may be out of date.
Everything below is macOS-specific.
I decided to start to collect everything I know about PDF in one place, because it has been very, very frustrating for a very, very long time.
Deep link #
Haven’t tried:
How to create URL link to the specific section of the PDF file? - Super User
Export annotations #
Typically, there are a number of things that you can do to a PDF file in standard PDF readers that are considered to be “annotations” instead of “edits”. These are the names I shall call them by hereafter:
- Highlight: Add background colour to words and characters
- Underline: Add lines under words and characters
- Annotate: Insert text boxes with text inside
- Draw: Insert shapes or freehand drawings on pages
Very frustratingly, not every PDF reader understands the same set of annotations, and not every PDF reader can export everything that they understands.
If you have a PDF file with pre-existing annotations in them, here is how to export them as (more or less) portable text files.
PDF Expert #
PDF Expert is a paid software.
, then choose from , , and .
Differences between the three versions:
| Context of highlights | TOC | Page number | Author and time | |
|---|---|---|---|---|
| HTML | ✓ | ✓ | ✓ | ✓ |
| Markdown | × | ✓ | ✓ | × |
| Plain text | × | × | ✓ | × |
I really preferred the HTML version, and went out of my way to find a HTML reader plugin for Obsidian.
Skim #
Website: Skim download | SourceForge.net
Step 0: Make a backup copy of the PDF file in question
Step 1: Convert PDF annotations to Skim Notes:
At this point, all of the annotations are converted to Skim Notes, and will no longer appear in other PDF readers.
To export back, use and choose . However, all the highlights and underlines now contain comments, just like Adobe Acrobat’s option (screenshots of Acrobat).
Step 2: Export Notes:
In the menu, selecting one of , , , or will export only the Notes.
Both plain text and RTF file only have the highlighted words and their page number, nothing else. I am not familiar with the other two formats.
Since Skim provides CLI tools, I wonder whether there is a way to do everything in a programmatically. This repo alexandergogl/SkimPDF seems to do Step 1 but not Step 2.
Partial exports: Obsidian/Zotero/Highlights #
Main entries:
I have tried every existing plugin and/or built-in function that I could find with Obsidian and Zotero, but none of them exports annotations, only highlights and underlines. Same for Highlights (I only tried the free option which did not work. Hard to imagine that it would work for other formats).
- akaalias/obsidian-extract-pdf-highlights: Also very buggy in Obsidian 1.4.
- munach/obsidian-extract-pdf-annotations
- jlegewie/zotfile: Removed its function after Zotero 6. Zotero 6’s function does not extract annotations.
- stefanopagliari/bibnotes: Its option uses notes from Zotero, yada yada yada.
- windingwind/zotero-better-notes: Have to manually select and export every single note within Zotero’s PDF viewer (which I really don’t like), so I just lost interest and did not test if it handles annotations.
CLI tools to test #
- 0xabu/pdfannots: Extracts and formats text annotations from a PDF file
- jiversen/pdfannotations: Extract annotations, including highlighted text, from pdf
- dimi2/DyAnnotationExtractor: DyAnnotationExtractor is software for extracting annotations (highlighted text and comments) from e-documents like PDF.
Word count #
ezgranet/pdfwordcounter: PDF Word Counter
Very minimal GUI app, straight to the point.