hacktricks/forensics/basic-forensic-methodology/specific-software-file-type.../pdf-file-analysis.md

4.4 KiB

PDF File analysis

Support HackTricks and get benefits!

Do you work in a cybersecurity company? Do you want to see your company advertised in HackTricks? or do you want to have access the latest version of the PEASS or download HackTricks in PDF? Check the SUBSCRIPTION PLANS!

Discover The PEASS Family, our collection of exclusive NFTs

Get the official PEASS & HackTricks swag

Join the 💬 Discord group or the telegram group or follow me on Twitter 🐦@carlospolopm.

Share your hacking tricks submitting PRs to the hacktricks github repo.

From: https://trailofbits.github.io/ctf/forensics/

PDF is an extremely complicated document file format, with enough tricks and hiding places to write about for years. This also makes it popular for CTF forensics challenges. The NSA wrote a guide to these hiding places in 2008 titled "Hidden Data and Metadata in Adobe PDF Files: Publication Risks and Countermeasures." It's no longer available at its original URL, but you can find a copy here. Ange Albertini also keeps a wiki on GitHub of PDF file format tricks.

The PDF format is partially plain-text, like HTML, but with many binary "objects" in the contents. Didier Stevens has written good introductory material about the format. The binary objects can be compressed or even encrypted data, and include content in scripting languages like JavaScript or Flash. To display the structure of a PDF, you can either browse it with a text editor, or open it with a PDF-aware file-format editor like Origami.

qpdf is one tool that can be useful for exploring a PDF and transforming or extracting information from it. Another is a framework in Ruby called Origami.

When exploring PDF content for hidden data, some of the hiding places to check include:

  • non-visible layers
  • Adobe's metadata format "XMP"
  • the "incremental generation" feature of PDF wherein a previous version is retained but not visible to the user
  • white text on a white background
  • text behind images
  • an image behind an overlapping image
  • non-displayed comments

There are also several Python packages for working with the PDF file format, like PeepDF, that enable you to write your own parsing scripts.

Support HackTricks and get benefits!

Do you work in a cybersecurity company? Do you want to see your company advertised in HackTricks? or do you want to have access the latest version of the PEASS or download HackTricks in PDF? Check the SUBSCRIPTION PLANS!

Discover The PEASS Family, our collection of exclusive NFTs

Get the official PEASS & HackTricks swag

Join the 💬 Discord group or the telegram group or follow me on Twitter 🐦@carlospolopm.

Share your hacking tricks submitting PRs to the hacktricks github repo.