Attack TypeMetadataDocument Security

Metadata Injection: The Hidden Attack Surface in Files

26 Jan 20265 min readBordair

Every file format has metadata: data about data. Images have EXIF tags. PDFs have document properties. Word documents have comments, revision history, and author fields. When an LLM processes a file, it often reads this metadata alongside the main content, creating a hidden attack surface.

Image metadata

  • EXIF data: Camera settings, GPS coordinates, and custom fields. The "ImageDescription" or "UserComment" EXIF tag can contain injection text.
  • PNG text chunks: PNG files support tEXt, iTXt, and zTXt chunks for arbitrary text metadata.
  • XMP data: Extensible Metadata Platform data embedded in images, often processed by document analysis tools.

Document metadata

  • PDF properties: Title, Author, Subject, Keywords fields can contain injection text
  • DOCX comments: Track-changes comments are often included in extracted text
  • Hidden columns: Spreadsheets with hidden columns or rows that contain injection payloads
  • Speaker notes: PowerPoint speaker notes that are extracted but not displayed

Why metadata matters

Many document processing pipelines extract "all text" from a file, which includes metadata. If the pipeline does not distinguish between content and metadata, the injected metadata becomes part of the LLM's context.

How Bordair handles it

Bordair extracts and scans metadata separately from document content. Each metadata field is checked for injection patterns. In our multimodal dataset, metadata-based attacks account for a significant portion of document and image injection payloads.

Protect your LLM application

Add prompt injection detection in minutes with Bordair's API.

Get started free