With automation systems systematically analysing and cleaning your mailbox from phishing and malware; thesedays you would think it would be on the rare occasion that malware still manages to creep through.

Attackers have started to use encrypted pdf’s, so that these automated systems fail to correctly identify the malware or phish contained within. In this blog post we look at utilising Didier Stevens toolkit to analyse the suspicious pdf’s and extract any URL’s for further analysis.

The Sample

  • Name: Project#1542292355.pdf
  • Size: 204586 Bytes
  • MD5: a11aa6cee2f5db1591444f256c89a924
  • SHA1: 5c898848ec3e30fd3a3642dc0785395d262b81f4

The attacker has also given us the password within the email body

  • Password=123456

This is what the pdf looks like after you’ve entered the password:


And the URL link is:


But how can we automate extraction of this URL for automated systems….

Using Didier Stevens toolkit

Didier Stevens has lead the research in analysing pdf’s, you can refer to his website pages here:

But in summary, you can read the below:


Pdfid will give us a summary of the type of objects within a pdf.

In this sample, we are interested in:

  • its encrypted
  • it contains URLs
    $ python Project#1542292355.pdf
    PDFiD 0.2.5 Project#1542292355.pdf
     PDF Header: %PDF-1.4
     obj                   18
     endobj                18
     stream                 6
     endstream              6
     xref                   1
     trailer                1
     startxref              1
     /Page                  1
     /Encrypt               1
     /ObjStm                0
     /JS                    0
     /JavaScript            0
     /AA                    0
     /OpenAction            1
     /AcroForm              0
     /JBIG2Decode           0
     /RichMedia             0
     /Launch                0
     /EmbeddedFile          0
     /XFA                   0
     /URI                   2
     /Colors > 2^24         0


We can use the -s flag to search for strings.

In the example below we want to locate the encrypted object:

 $ python -s /Encrypt Project#1542292355.pdf
    /Size 19
    /Root 18 0 R
    /Info 17 0 R
    /Encrypt 16 0 R
    /ID [<8679b726ea493460d3f8798bce7a7b1a><8679b726ea493460d3f8798bce7a7b1a>]

Extracting the encryption key

If we wanted to extract the hash for cracking we can use pdf-parser n this way:

$ python -o 16 Project#1542292355.pdf
obj 16 0

    /Filter /Standard
    /V 1
    /R 2
    /O '(\xb1\xdbV\xa8\x83\xca\xb5\xa2-\xd5\xfc9\x06\x18\xa0\xf8\xe1l\xab\x8a\xf1Ng\xcc\xba_\x90\x83z\xac\x89\x8b)'
    /U '(\xf6\x17>\xe3\xafq\x17\xec\xfb\xf5\xdb_\xe5l\xe0\x89\x02\xb1\x0e}\xbb^\xbeH\x90\xf3\xed\xcd0\x9b\xbbG)'
    /P 4294963392

The hashes are:

  • /O - Owner hash (usually used to export the unencrypted form)
  • /U - User hash (used to view the document)

or simply use pdf2john, which is part of JohnTheRipper?

$ ~/JohnTheRipper/run/ Project#1542292355.pdf
Project#1542292355.pdf:$pdf$1*2*40*4294963392*1*16*8679b726ea493460d3f8798bce7a7b1a*32*f6173ee3af7117ecfbf5 \ 

In the next example, we attempt to observe any URL’s but they are encrypted:

$ python -s /URI Project#1542292355.pdf
obj 5 0
 Type: /Annot

    /Type /Annot
    /Subtype /Link
    /Rect [327.195 393.321 494.351 371.496]
    /NM '(&\xd6\x00\xdbl\xf1\xe5\x83\xdb)'
    /M '(R\xdc\x02\xdap\xf9\xe4\x82\xda\xc7;\x11v\x83\x1a\xd9)'
    /Border [0 0 0]
        /S /URI
        /URI '(~\x92D\x9a2\xfb\xfa\x9c\x88\x9ekW.\xc2A\x8dq\xae\xf8\xd3:\xc1\x14\x81Eh\x97\xc0\xbe\xaeo


As the potenital attacker has given us the password decryption is easy with the help of qpdf (should be available via your package manager)

  qpdf --decrypt --password=123456 Project#1542292355.pdf Project#1542292355.pdf_2

We save the unencrypted version as Project#1542292355.pdf_2, we can confirm this with pdfid

$ python Project#1542292355.pdf_2
PDFiD 0.2.5 Project#1542292355.pdf_2
 PDF Header: %PDF-1.4
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /URI                   2

Extracting URLs

Now we can easily extract the unencrypted URLs like so:

$ python -s /URI Project#1542292355.pdf_2
obj 5 0
 Type: /Annot

        /S /URI
        /URI (
    /Border [ 0 0 0 ]
    /M (D:20181115143235)
    /NM (0001-0000)
    /Rect [ 327.195 393.321 494.351 371.496 ]
    /Subtype /Link
    /Type /Annot

Phishing URL

We can then perform additional reconnaissance and research on the URL and confirm it is indeed a phishing website

phishing website

Third Party scans can be found here:

Automation with docker

We even built a docker container with these tools, to aid other analysts in extracting URLs from encrypted pdfs:

Our script accepts two arguments

  1. pdf
  2. password (optional)

If the pdf is encrypted, and you dont supply a password, it will attempt a simple and small dictionary attack

$ docker run -v /tmp:/tmp -it pdftools:latest /opt/pdftools/ /tmp/Project#1542292355.pdf
No password found, if encrypted may not be able to proceed!
PDF is encrypted...
trying password...
Trying default passwords
/tmp/Project#1542292355.pdf: invalid password
/tmp/Project#1542292355.pdf: invalid password
Valid Password, continuing...
extracting URIs:

Share on: