Moving text from PDF to InDesign

We’re working on another EPUB conversion now. For this one, we’re moving from PDF to EPUB via InDesign. Currently, the whole book exists as a single PDF. The text in the PDF includes a small but not insignificant amount of character styling (bold, italic, etc). The best workflow I’ve found so far seems to be:

1. Break the single PDF into multiple PDFs by chapter. I’ll be using a Book organization in InDesign, which means creating a separate InDesign document for each chapter. For that reason, I’d like to import the text chapter-by-chapter.

2. Copy-Paste the text for each chapter into a separate Word document. Remove extraneous carriage returns. There are two main methods for getting text into InDesign. You can copy-paste directly into a text frame, but for big sections of text, this is not recommended. The better method (I think) is to use InDesign’s Place command (ctrl+D). To do this, text needs to be in a place-able format like TXT, DOC, or RTF. Because I’m working with formatted text, DOC or RTF are my best options. After a few rounds of experimentation, I find Word to offer the best tools for maintaining character formatting and eliminating extraneous line breaks. Depending on what happens upon paste (or paste special – I’ve experimented with all of the options and it seems like I get a different result each time), I may use AutoFormat to fix extra carriage returns while maintaining paragraph breaks.

3. ‘Place’ text in InDesign. When you place text, you can “Show options”. The only adjustment to the defaults I’ve made is to omit page breaks. I’ve left several defaults that don’t apply to my document and it doesn’t seem to break anything. This is what my Place options looks like:


So far, these three steps seem to work pretty well. It seems like there’s a fair amount of variation in the way my pasted text lands in Word (especially in terms of the presence or absence of extra carriage returns). Also, although the first chapter I worked on seemed to transmit fairly clean text, in the second and third chapters I’m finding many instances of words running together (“inhalesthem” instead of “inhales them”). I suspect this is related to the carriage return issue. Although I’m not always seeing extra carriage returns, it seems like their absence is sometimes accompanied by a missing space.

If anyone has a better workflow for getting styled text from PDF to InDesign, please share in the comments!

Posted by cc on June 24, 2011 at 5:34 pm


