XXE Vulnerability: XML External Entity Attacks and Prevention Guide
XXE Usually Hides in the Parts of the Stack Nobody Revisits
Very few teams ship brand-new features and announce, "we built an XML parser today." XXE usually shows up more indirectly than that. It is sitting inside a SAML library, a SOAP integration that has been running for years, an SVG upload flow, or some document-processing code nobody wants to touch because it only breaks twice a year.
That is why XXE keeps surviving into modern systems even though the bug class is well known. The XML is still there. It is just not where people expect it.
XML External Entity (XXE) attacks happen when an XML parser accepts external entities or Document Type Definitions (DTDs) from untrusted input. That lets attackers turn ordinary XML parsing into:
- Local file disclosure
- Server-Side Request Forgery (SSRF)
- Internal network scanning
- Denial of service through entity expansion
Even if your application is "mostly JSON," XXE still appears in SAML integrations, SOAP services, file uploads, office document processing, SVG handling, and third-party SDKs.
| XXE Impact | Example |
|---|---|
| File disclosure | Read /etc/passwd or app config files |
| SSRF | Reach cloud metadata endpoints or internal admin panels |
| Credential exposure | Read keys, tokens, or connection strings from local files |
| DoS | Exponential entity expansion consumes CPU and memory |
Basic XXE Attack Example
Malicious XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE user [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<user>
<name>&xxe;</name>
</user>
If the parser resolves external entities, the application may include local file contents in logs, responses, database records, or error messages.
XXE to SSRF: The Cloud Risk
XXE is not limited to files. External entities can make network requests.
Example Payload
<?xml version="1.0"?>
<!DOCTYPE data [
<!ENTITY meta SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&meta;</data>
If successful against cloud workloads, this can expose instance metadata or temporary credentials.
Vulnerable Java Example
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputStream);
With default or weak configuration, the parser may process attacker-supplied DTDs and external entities.
Hardened Java Configuration
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputStream);
Vulnerable Python Example
from lxml import etree
def parse_xml(xml_bytes: bytes):
return etree.fromstring(xml_bytes)
Hardened Python Example
from defusedxml.lxml import fromstring
def parse_xml(xml_bytes: bytes):
return fromstring(xml_bytes)
Use hardened parser libraries where available instead of trying to remember every parser flag by hand.
Vulnerable .NET Example
var xml = new XmlDocument();
xml.Load(stream);
Hardened .NET Example
var settings = new XmlReaderSettings {
DtdProcessing = DtdProcessing.Prohibit,
XmlResolver = null
};
using var reader = XmlReader.Create(stream, settings);
var xml = new XmlDocument {
XmlResolver = null
};
xml.Load(reader);
Why XXE Keeps Slipping Through Reviews
1. Developers Assume XML Is Just Data
XML can instruct parsers to fetch external resources. That makes it more dangerous than plain text formats.
2. The XML Is Hidden Inside Other Formats
Common examples:
- SVG uploads
- SOAP requests
- SAML assertions
- DOCX, XLSX, and other zipped XML-based formats
3. Secure Defaults Differ Across Libraries
One parser may disable DTDs by default, while another quietly allows them. Teams often copy examples without checking security flags.
How to Test for XXE
File Read Probe
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hosts">]>
<foo>&xxe;</foo>
SSRF Probe
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://collaborator.example/xxe">]>
<foo>&xxe;</foo>
DoS Probe
Use a safe test environment only. Entity expansion payloads can destabilize services quickly.
Best Practices for XXE Prevention
Prefer JSON When Possible
If you control the protocol, JSON removes an entire parser-risk category.
Disable DTDs and External Entities
This is the primary defense for XML parsers.
Use Hardened Libraries
Libraries such as defusedxml exist for a reason. Use them.
Restrict Outbound Network Access
If XXE does occur, network egress controls can limit SSRF impact.
Review Indirect XML Inputs
Do not focus only on explicit .xml endpoints. Look at document imports, SSO flows, and image handling.
XXE Hardening Checklist
- Disable DTD processing
- Disable external general entities
- Disable external parameter entities
- Disable external DTD loading
- Use hardened XML parser libraries
- Prefer JSON for new integrations
- Restrict outbound network egress from parser workloads
- Add tests for malicious XML samples in CI
Final Takeaway
XXE is not an obscure parser edge case. It is a reminder that input formats can carry behavior, not just structure. The teams that avoid it consistently do two things well: they know exactly where XML still enters the stack, and they lock those code paths down instead of assuming a library default will stay safe forever.
Advertisement
Free Security Tools
Try our tools now
Expert Services
Get professional help
OWASP Top 10
Learn the top risks
Related Articles
OWASP API Security Top 10 (2023): Every Vulnerability Explained With Real Attacks
The OWASP API Security Top 10 is the definitive framework for API vulnerabilities. This guide explains all 10 risks with real-world attack scenarios, vulnerable code examples, and production-ready fixes for Node.js, Python, and Java.
Threat Modeling for Developers: STRIDE, PASTA & DREAD with Practical Examples
Threat modeling is the most cost-effective security activity — finding design flaws before writing code. This guide covers STRIDE, PASTA, and DREAD methodologies with real-world examples for web, API, and cloud applications.
Building a Security Champions Program: Scaling Security Across Dev Teams
Security teams can't review every line of code. Security Champions embed security expertise in every development team. This guide covers program design, champion selection, training, metrics, and sustaining engagement.