Ray McCarthy
Sentient Marmite: The Truth may make you fret.
You can download html or text from Gutenberg and fix them.it's a problem in transcription and does not exist in the original
Mostly OCR is used from originals and then the OCR is proof read. Very few books are actually full typed by hand from paper. Google uses a dual camera and page turner. Many people cut the binding and use industrial duplex sheet feeder scanners, quite a few old giant text books I have are just internal TIF image PDFs as they are 400 to 2000+ pages. Too much work to reset the illustrations / graphs / tables / photos etc and proof read.
Calibre even lets you edit the internal HTML of non-DRM eBooks.
Gutenberg image formatting is non-existent. I'm going to fix and publish free some of the more popular classics that are out of copyright in "mobi" (Kindle) and ePub (Kobo, Nook and Adobe Reader) format and fix the image layout and any typos / OCR errors I spot.
I've found too that Kobo Reader creates random fake typos when reading "mobi" files. Convert to ePub (or download ePub version) and they vanish. The "errors" are something in the Kobo software as the same mobi file is OK on PC mobipocket reader and physical Kindle. Odd.
Last edited: