How-To

Compressing Scanned Documents: What Works and What Doesn't

Scanned PDFs are massive. Here's the real deal on compressing them without making everything unreadable.

Puneet
Puneet
Content Writer
February 15, 2024
6 min
Compressing Scanned Documents: What Works and What Doesn't

You've scanned a 50-page document, and the PDF is 30MB. You need to email it, but your limit is 10MB. You try compressing it, and either it doesn't shrink much, or it shrinks but the text becomes unreadable. Sound familiar?

Scanned documents are tricky. They're essentially images of text, which means they can be compressed, but you need to balance file size with readability. Let me show you what actually works.

Why Scanned PDFs Are So Large

Scanned PDFs are large because they're images, not text. When you scan a document, you're creating an image of each page. Images are much larger than text.

High DPI scans create huge files. A 300 DPI scan of a letter-sized page can be 2-3MB per page. A 50-page document at 300 DPI? That's 100-150MB.

Images don't compress as well as text. Text compresses really well. Images compress, but not as dramatically.

Each page is a separate image. Unlike text PDFs where content is shared across pages, scanned PDFs have separate images for each page.

The result? Scanned PDFs are often 10-20 times larger than the same document would be if it were text-based.

The DPI Problem

DPI (dots per inch) is the key to scanned document size. Higher DPI means better quality but much larger files.

300 DPI: Standard for print quality. Creates large files but text is very clear.

200 DPI: Good for most purposes. Text is readable, files are smaller.

150 DPI: Minimum for text readability. Files are much smaller, but text might be slightly less sharp.

100 DPI or lower: Usually too low for text. Text becomes hard to read.

The trick is finding the lowest DPI that still gives you readable text. For most scanned documents, 200 DPI is the sweet spot.

Compression Methods That Work

Here's what actually works for scanned documents:

Method 1: Reduce DPI (The Most Effective)

This is usually the best approach. Instead of compressing a 300 DPI scan, reduce it to 200 DPI, then compress.

How it works: You're reducing the resolution of the images, which dramatically reduces file size while keeping text readable.

When to use it: When file size is a problem and you need readable text.

How to do it: Many PDF tools let you reduce DPI. Look for "reduce resolution" or "optimize scanned documents" options.

Result: You can often reduce file size by 50-70% while keeping text perfectly readable.

Method 2: Compress Images Aggressively

Since scanned PDFs are images, image compression works well.

Use lossy compression. For scanned documents, moderate lossy compression (70-80% quality) usually works fine. Text stays readable.

Compress each page separately. Some tools let you compress pages individually, which can work better than compressing the whole document.

Test the result. Always check that text is still readable after compression.

Method 3: OCR and Recreate

This is more work, but it can dramatically reduce file size.

Run OCR on the scanned PDF to extract text.

Recreate the PDF with the OCR text and lower-resolution images (or no images if you don't need them).

Result: You get a searchable PDF that's much smaller. The text is actual text (small), and images are optional.

This is the best long-term solution if you need the document to be searchable anyway.

Method 4: Split and Compress

Sometimes compressing the whole document doesn't work, but compressing sections does.

Split the PDF using our Split PDF tool into smaller sections (10-20 pages).

Compress each section with our Compress PDF tool using aggressive settings.

Merge back using our Merge PDF tool. The result is often smaller than compressing the whole document.

What Doesn't Work

Here's what usually doesn't work for scanned documents:

Standard PDF compression alone. If your scan is 300 DPI, standard compression might only reduce it by 10-20%. That's not enough.

Lossless compression only. Lossless compression doesn't reduce scanned PDFs much. You need lossy compression or DPI reduction.

Just compressing without reducing DPI. You can compress a 300 DPI scan, but it won't shrink much. Reduce DPI first, then compress.

Aggressive compression without testing. Don't just compress aggressively and hope for the best. Test to make sure text is still readable.

The Right Workflow

Here's my workflow for compressing scanned documents:

  1. **Check the DPI.** See what resolution your scan is. If it's 300 DPI or higher, that's your problem.
  1. **Reduce DPI first.** Reduce to 200 DPI (or 150 DPI if you're okay with slightly less sharp text).
  1. **Then compress.** After reducing DPI, compress the images. Use moderate lossy compression (75-80% quality).
  1. **Test readability.** Open the compressed PDF and make sure text is readable. Check a few pages, especially any with small text.
  1. **Adjust if needed.** If text isn't readable, try less aggressive compression or higher DPI. If file is still too large, try more aggressive compression.
  1. **Consider OCR.** If you need the document to be searchable anyway, OCR and recreate. This often gives you the smallest file size.

Quality vs Size Trade-offs

You need to balance quality and file size. Here are the trade-offs:

300 DPI, no compression: Perfect quality, huge files. Use for print or archiving.

300 DPI, moderate compression: Good quality, large files. Use when you need high quality.

200 DPI, moderate compression: Good quality, manageable files. This is usually the sweet spot.

200 DPI, aggressive compression: Acceptable quality, small files. Use when file size is critical.

150 DPI, moderate compression: Lower quality, small files. Use when file size is more important than perfect quality.

The key is matching your settings to your needs. Don't use 300 DPI if you're just emailing for screen viewing. Don't use 150 DPI if you need to print it.

Real-World Examples

Let me give you some actual numbers:

Case 1: 50-page scanned contract at 300 DPI

  • Original: 30MB
  • After reducing to 200 DPI: 12MB
  • After compression: 6MB
  • Result: Text perfectly readable, file is email-able

Case 2: 100-page scanned book at 300 DPI

  • Original: 80MB
  • After reducing to 200 DPI: 32MB
  • After compression: 15MB
  • Result: Text readable, file is manageable

Case 3: 20-page scanned document at 300 DPI

  • Original: 12MB
  • After OCR and recreate: 2MB
  • Result: Searchable text, much smaller file

Special Considerations

Small text: If your document has small text (like footnotes or fine print), be more careful with compression. Small text becomes unreadable faster than large text.

Handwriting: Scanned handwriting is harder to compress without losing readability. Be more conservative with compression.

Mixed content: If your scan has both text and images, you might need different compression for each. Some tools let you do this.

Color vs grayscale: Color scans are larger than grayscale. If you don't need color, convert to grayscale first. This can cut file size in half.

Best Practices

Here's what I've learned:

  1. **Reduce DPI before compressing.** This is usually more effective than just compressing.
  1. **Test readability.** Don't just compress and send. Actually read the compressed document to make sure text is readable.
  1. **Start conservative.** Try less aggressive compression first. You can always compress more if needed.
  1. **Consider OCR.** If you need searchable text anyway, OCR often gives you the best file size.
  1. **Keep the original.** Never delete your original high-quality scan. You might need it later.
  1. **Match settings to purpose.** Use higher quality for print, lower quality for screen viewing.

Getting Scanned Documents Right

After compressing hundreds of scanned documents, I've learned that the secret isn't in the compression settings—it's in understanding what you're working with. Scanned PDFs are images, not text documents. That changes everything.

The most effective approach? Reduce DPI first, then compress. Going from 300 DPI to 200 DPI can cut your file size in half before you even touch compression settings. Then moderate compression gets you the rest of the way without destroying readability.

But here's what many people miss: you need to actually test the result. Don't just compress and send. Open the compressed PDF. Read a few pages. Check small text. Make sure everything is still readable. I've seen people compress scanned documents so aggressively that the text becomes unreadable, and they only discover it after sending.

If you need the document to be searchable anyway, consider OCR. It's more work upfront, but you get searchable text and a much smaller file. For documents you'll reference frequently, that's often worth the extra effort.

The goal isn't to make scanned documents as small as possible—it's to make them small enough for your purpose while keeping text readable. Find that balance, test it, and you'll have compressed scanned documents that actually work.

Ready to compress your scanned PDF? Try our Compress PDF tool now. Upload your scanned document, choose your compression level, and download your compressed file. Our tool handles scanned documents well, reducing file size while maintaining text readability. For best results with very large scans, consider splitting first with our Split PDF tool, compressing each section, then merging back. It's free, works in your browser, and keeps your files private.

Share:
Tags:How-To