I borrowed a technical book but before returning it, I scanned chapters using Microsoft Office Document Imaging. Each chapter of the book was saved as a multipage TIFF file, but before saving each file, I used the OCR function of the program to recognize text, so that each .tif file saved has OCR information. After returning the book, I noticed that the TIFF files saved are usually very large: one containing 11 pages is over 64 MB, while another containing 22 pages is 266 MB. All the scanned book chapters -totalizing 172 pages- occupy a total 1.56 GB on disk.
After that, I found a book with 636 pages available for download as a PDF file and I was surprised to see that it has only 34.6 MB size! From now on, I'll prefer to scan future books to PDF file format as my scanner allows me to save scanned documents as searchable PDF file format.
I wish to find good programs (or one single program), preferably free programs, which allow me to do one or both of two things:
1. To convert TIFF files with OCR information to searchable PDF files.
2. To remove pages from a PDF file and replace them with pages from another PDF file. That is, I don't need to edit the text in the PDF document, I just need to remove pages, and to insert pages from another PDF (to copy pages from a pdf and paste them on another pdf). In Microsoft Office Document Imaging this is easily achieved, it's easy to copy pages from a multipage TIFF document and paste them in another TIFF document, inserting them in the exact right place and to remove the pages we want to replace. Now I want to do the same thing in PDF documents. This is useful when scanning all sorts of documents, because sometimes, after the scan is finished and the file is saved, you find that a few pages of the document were badly scanned, and you'd want to replace only the few bad pages, rather than rescanning the whole document. This is also useful if you don't have the time to scan an entire document, you need to stop the scan, save the file as it is and resume it later, later on, you'd scan the remaining pages, save them as a new file and insert the pages of the new file in the previous file, which would result the same as if you merge the two files into one.