PDFs & XLIFF

Why does the XLF contains gibberish characters despite the original PDF looking fine?

This is probably caused by bad font mappings in the PDF.

Each font contain mappings from letter shapes to the letter meanings e.g. the shape 'B' means 'capital b'. If these mappings are not setup correctly in the PDF there is no way of knowing what each character means.

You can check the mappings by opening the PDF in Infix PDF Editor. Select some of the text that is not exporting correctly using the 'T' tool then choose Text->Remap selected characters... You will see each character shape labelled with its meaning.

You can often correct these mappings using Infix. Press the Help button in the dialog to see how or read the on-line help.

If there are too many problems to fix, the other option is to remove all the text then process the PDF with OCR. You can remove all the text in Infix PDF Editor by choosing Text->Create Outlines...

Be aware that OCR comes with it's own problems and isn't always the best approach for a specific PDF.

See all questions (FAQ)