This assumes watermark is in same bounding box. Real watermarks rotate, semi-transparent, or appear per-page differently. 4. Advanced: Remove by Redaction (Forensic Clean) import fitz def redact_watermark(input_pdf, output_pdf, search_text="Confidential"): doc = fitz.open(input_pdf) for page in doc: text_instances = page.search_for(search_text) for inst in text_instances: page.add_redact_annot(inst, fill=(1,1,1)) page.apply_redactions() doc.save(output_pdf)
No single tool works universally. The deep approach: 3. Deep Dive: PyMuPDF Script (Most Effective) import fitz # PyMuPDF def remove_watermark_by_rect(input_pdf, output_pdf, rect_tolerance=0.1): """ Remove all vector/text elements inside specified rectangular regions. rect_tolerance: match watermark position across pages (fraction of page) """ doc = fitz.open(input_pdf)
This physically removes the text—even from copied text layer. Image watermarks (scan of a stamp, logo) require a different approach:
for page_num in range(len(doc)): page = doc[page_num] # Method 1: Draw white over watermark (crude but works) page.draw_rect(common_rect, color=(1,1,1), fill=(1,1,1), width=0) # Method 2: Remove text objects (more aggressive) page.clean_contents() doc.save(output_pdf) doc.close()