Analyzing ZIP (OOXML) Files with YARA (part 3) – YARA Post #7

My prior posts about examining ZIP archives have covered matching the file names within a ZIP archive, as well as matching the pre-compression CRC values of the files within the archive. In this blog I am going to reference an interesting example of parsing the OOXML format used by modern Microsoft Office products. This office format essentially is a ZIP archive that contains certain files within it (describing the office document).

Aaron Stephens at Mandiant wrote a blog called “Detecting Embedded Content in OOXML Documents“. In that blog Aaron shared a few different techniques used to detect and cluster Microsoft Office documents. One of these examples was detecting a specific PNG file embedded within documents, the image was using to guide the user towards enabling macros. The presence of the image in this phishing doc could be used to indicate a clustering of these attacks.

Given the image files CRC, size, and that it was a png file, the author was able to create a YARA rule that would match if this image file was located within the OOXML document (essentially a ZIP archive). This rule approached the ZIP file a little differently than we have in my prior couple of blogs. The author skips looking for the ZIP file entry and references the CRC ($crc) and uncompressed file size ($ufs) hex strings directly to narrow down the match. They also checked if the file name field entry ended with the ".png" extension.

rule png_397ba1d0601558dfe34cd5aafaedd18e {
    meta:
        author = "Aaron Stephens <[email protected]>"
        description = "PNG in OOXML document."

    strings:
        $crc = {f8158b40}
        $ext = ".png"
        $ufs = {b42c0000}

    condition:
        $ufs at @crc[1] + 8 and $ext at @crc[1] + uint16(@crc[1] + 12) + 16 - 4
}

In this example the condition is using the @crc[1] as the base from which the offsets are calculated, unlike our prior examples where the offsets were based from the start of the local file header. The use of the at operator tests for the presence of the other strings at a specific offset (to the CRC value in this case).

An alternative approach to consider is using the wildcard character ? in the hex string, this allows us to match on the CRC and uncompressed file size fields together while skipping over the 4 bytes used to store the compressed file size field. Then validating that the four letter .png extension is at the end of the file name field.

rule png_alt {
    strings:
        $crc_ufs = {f8158b40 ???????? b42c0000}
        $ext = ".png"

    condition:
        $ext at @crc_ufs[1] + uint16(@crc_ufs[1] + 12) + 16 - 4
}