Escaping

This article is part of the WordPress guide. Read the introduction.

Sanitization happens right after input and before the data gets processed or saved in the database. Escaping happens right before the data is displayed. Why would you escape the data if you already sanitized it before storing it? Well, do you remember the first rule of security? Don’t trust any data.

You should escape as late as possible – ideally right when the data is being outputted. There are a few reasons for that. The most important one is to prevent the data from changing after being escaped but before being rendered. This could introduce vulnerabilities. It’s also much easier to maintain code when you don’t have to wonder if that piece of data has already been escaped or not.

Similarly to sanitization, escaping functions are easiest to remember if you categorize them. In that case, the most sensible approach to categorization is by the place you’re displaying the data in.

1. Plain Text

This is the most common use case. You have a piece of data and you want to display it “as is” (e.g., in a paragraph, in a header, in a div, etc.). The two functions you’re looking for are:

  • esc_html()
  • esc_textarea()

You can see these are counterparts to the general text sanitization functions.

esc_html() converts all reserved HTML characters into their entities. It uses the PHP htmlspecialchars() function underneath with the ENT_QUOTES flag. What this means is that this function will replace these characters with these character sequences:

  • & => &
  • < => &lt;
  • > => &gt;
  • => &quot;
  • => &#039;

What this does is it ensures the text will not be parsed as part of the HTML but as plain text. It means that if you do esc_html( ‘<strong>bold text</strong>’), you will not see “bold text” on the frontend, you will see “<strong>bold text</strong>”. If you didn’t escape the string, echoing “<script>alert(‘xss’)</script>” would result in JS execution.

If you sanitized a piece of data with wp_kses() and wanted to later display this data as a code snippet on the website, you’d use this function to actually show the tags in text. esc_html() is the most used escaping function. By the way, you can see how escaping differs from sanitization. The sanitization function removes the tags. The escaping function only encodes them.

esc_html() also passes $double_encode = false to htmlspecialchars(). It means that “&amp;” does not become “&amp;amp”.

esc_textarea() is a confusing function. Semantically, you should always use it when escaping text in textarea elements. The only way it differs from esc_html() is that it doesn’t check for invalid UTF-8 and it passes $double_encode = true to htmlspecialchars().

Double encoding is the primary characteristic. In most cases, it’s the only pragmatic difference between this function and esc_html(). Consider what happens when a user enters the literal string “&amp;” in a textarea. If you used esc_html() when rendering it, the user would see “&”. That’s a different string than the one they entered!

Double encoding ensures that this string gets returned as “&amp;amp;”. The browser will convert the first entity to “&” and display the final string as “&amp;” – exactly what the user entered. This is good, but notice that this is not a textarea-specific problem.

Let’s say you had an input field on your website that allowed the user to store some string which would then be displayed on the page. If you display said string using esc_html(), and the user inputted a literal entity, the string would again be different than what they typed in. In this case, you’d have to use esc_textarea(), even though you’re not escaping it inside a textarea.

This is the confusing part, and it’s a misleading name of the function. In reality, esc_textarea() would better be called esc_html_double_encode() or something. The name is what it is because 99% of cases where you want to preserve the exact inputted text is with textareas. You want the person to see the content of what they wrote to be exactly the same after they refresh the page.

It’s a little confusing so let me hone this point in using an example. Let’s say you’re making twitter. Every user has a textarea they use to write a tweet. A user types in “I love &lt;b&gt;cats&lt;/b&gt;”. When they save this tweet as a draft and refresh the page, they expect to be able to edit the exact same text they put in. That’s what esc_textarea() does.

When they publish the tweet, two different things can happen. If you use esc_html() when displaying the post on their feed, the rendered content will be “I love <b>cats</b>”. If you use esc_textarea(), the rendered content will be “I love &lt;b&gt;cats&lt;/b&gt;”. The use of this function is therefore not limited to textareas. You can use it whenever you want to render the exact text the way it’s stored, including not converting HTML entities to their associated characters.

PS: Take note of what sanitization functions you use when storing the data in the database. Different functions might or might not convert characters to entities.

PS 2: Note that, technically, the content in the DOM is different from the content the user first inputted in the textarea – it contains the additional double-encoded entity. This is necessary to make it look the same. You have to coordinate your sanitization and escaping carefully, otherwise each subsequent POST of the textarea form could add yet another layer of encoding to the stored content.

2. HTML Attributes

There is only one function for escaping HTML attributes – esc_attr(). You should use this function whenever you’re outputting the value of an HTML attribute, such as ‘alt’, ‘value’, ‘title’, etc.

This function ensures that the data can’t “break out” of the attribute. It achieves this by encoding the special HTML chars into their entities – exactly like esc_html(). As a matter of fact, the source code of esc_attr() and esc_html() is identical.

If these two functions are the same, why should you use esc_attr() instead of esc_html()? Because it’s good practice. Their semantic meaning is different. Although unlikely, their implementations might change at some point, as they are used for achieving different things. It’s just a coincidence that the current way of achieving them is the same.

Knowing that esc_attr() works just like esc_html(), you may sometimes find a need to use esc_textarea() instead (if you want to double encode entities). Treat esc_attr() as the baseline for escaping almost all HTML attributes, but use your best judgement when choosing the final function.

3. URLs

There is really only one URL escaping function – esc_url(). It ensures that the URL contains an allowed scheme (http, https, ftp, etc.). This prevents URLs like javascript:alert(‘XSS’) from being output. If you just used esc_attr(), this wouldn’t have been caught.

esc_url() also encodes special HTML chars to entities (&, “, ‘, etc.), just like esc_html() does. That’s in accordance with the HTML specification. You should use esc_url() when rendering any URL anywhere. It doesn’t matter if you want to display it as plain text or use it in a src or href attribute. Use esc_url(). It’s an exception to using esc_attr() for any HTML attributes.

There’s one more url escaping function you might come across – esc_url_raw(). The reason I didn’t mention it earlier is because it’s just an alias to sanitize_url(). It does what esc_url() does (validates the URL protocol), except it doesn’t encode special chars to entities. It’s not used for outputting, only for storing the URL in the database (which is the job of sanitization, so you should never really have to use this function).

4. Allowing HTML

When you want to allow some HTML to be processed as HTML, you should use one of these two:

  • wp_kses()
  • wp_kses_post()

Yes, these are exactly the same functions as for sanitization of content with HTML. They are universal sanitization and escaping functions. I’m not going to explain them again – read the sanitization section if you forgot them already (you goldfish).

Some people are going to tell you that you don’t need to use wp_kses() again if you’ve already sanitized the content when storing it in the database. I don’t agree with that. The reasoning, as far as I understand, is that it’s not going to change the result in any way, and that wp_kses() is a rather computationally expensive function.

My argument for doing double wp_kses() is that there’s never enough security. Think back to the first rule in the security mindset. Don’t trust any data – even the one in your database. The skeptics will say “if someone hacked your database, you have a bigger problem than not escaping the content”.

That is true, but what if no one has? Are you absolutely, unequivocally sure that you and everybody else on your team will remember to sanitize the data 100% of the time? How sure are you that the data in your database is actually sanitized? What if the code changes? A new junior maintainer on your team decides to rewrite the code handling the input and forgets to call wp_kses()? What then?

The point is – why open yourself up for a potential vulnerability, when you can just call wp_kses() again? Even if it takes additional 2 ms to render a page (which you will probably cache anyway), it’s just not worth sacrificing the security.

5. Miscellaneous

Again, these are niche functions you will rarely use or see. It’s good to know about them though:

  • esc_js() – escaping inline JavaScript (like in the onclick attribute).
  • esc_xml()
  • esc_sql() – don’t use it unless you have a good reason to. Read the section on database security below.

Escaping With Localization

There are some helper functions which allow you to localize the string before escaping it. Here they are:

  • esc_html__()
  • esc_html_e()
  • esc_html_x()
  • esc_attr__()
  • esc_attr_e()
  • esc_attr_x()

It’s only a subset of the most commonly used escaping and i18n functions. They are simple wrappers. For example, here’s the code of esc_html__():

PHP
function esc_html__( $text, $domain = 'default' ) {
	return esc_html( translate( $text, $domain ) );
}

Note that you can’t use those functions on dynamic strings, as the gettext i18n system parses the code statically in order to generate the POT file (we’ve already discussed all of this in the i18n & l10n chapter). This means it’s only useful for hard-coded strings.

The idea here is not to protect yourself from XSS attacks, as those would be impossible in a hard-coded string, but to encode the special chars to entities so that they don’t break your HTML. That’s why you should still escape the strings, even if they aren’t read from the database.