So, let’s say my WordPress extension needs to generate a string like this 300 × 400px. I would probably choose to use the following html markup:
300 × 400<abbr title="pixels">px</abbr>
Simple enough, but I would also like this string to be properly escaped and localized. In my current state of thinking, localization presents a new form of input. As any security-concious developer knows, all input must be escaped before sending it to it’s destination and translated text is no different. My first attempt at this (coded a few months ago) looked a bit like this:
function my_extension_format_dimensions_pixels( $width, $height ) {
return sprintf( __( '%1$d × %2$d<abbr title="pixels">px</abbr>', 'my_extension' ), $width, $height );
}
While this appears to work, I don’t feel that it fully meets the requirements stated above. Basically, it is not properly escaped. While I’m pretty new to localization in WordPress, I’ve been making it a point to enable my extensions to be translate in other languages. Situations like this present a bit of a headache for me though. The string has so many different parts to it. The presence of an html tag with an attribute as well as 2 numbers makes the above solution a bit short-sighted. I’m going to try to illustrate the steps that I would take to correct it and hopefully someone with more knowledge can tell me if I’m doing it wrong.
The first and probably easiest thing would be to deal with the numbers. I recently discovered the number_format_i18n() function while reading through trac. This function will convert a given integer to the format specified by the user-defined locale.
function my_extension_format_dimensions_pixels( $width, $height ) {
return sprintf( __( '%1$s × %2$s<abbr title="pixels">px</abbr>', 'my_extension' ), number_format_i18n( $width ), number_format_i18n( $height ) );
}
Here, I’ve filtered the values of $width and $height using number_format_i18n() as well as updated the placeholders to output strings. number_format_i18n() outputs a string and converting this to a decimal would probably be a bad idea.
If there were no html present in this string, I would probably just call it done and move onto the next task, but there is an html tag with an attribute and I’m starting to get confused.
I’m aware that WordPress provides a few convienient functions that will escape and translate data simultaneously. Normally, I would just filter this string through esc_html__() and all would be good, but since the string contains html, this is not really an option and here’s the proof:
$tests = array(
'This is just a normal string',
'This string has a special char in decimal notation: ×',
'This string has <abbr title="Hypertext Markup Language">html</abbr>.'
);
foreach ( $tests as $test ) {
print '<br>' . esc_html__( $test );
}
If you run this code on a WordPress installation, you will notice that $test[0] and $test[1] display unchanged while $test[2] is ran through something similar to htmlentities(). This is precicely what I believe that esc_html__() should be doing. IMO it is meant to be used to prepare text to be used in html and not as html.
It appears at this stage that there is no super-easy way to do this. In my mind it may be a good solution to break the string into parts so that each part can be filtered properly. Here’s what I have for the next step:
function my_extension_format_dimensions_pixels( $width, $height ) {
$tag = '<abbr title="' . esc_attr__( 'pixels', 'my_extension' ) . '">' . esc_html__( 'px', 'my_extension' ) . '</abbr>';
return sprintf( __( '%1$s × %2$s%3$s', 'my_extension' ), number_format_i18n( $width ), number_format_i18n( $height ), $tag );
}
I’ve moved the entire html tag outside of the translateable string and am filtering both the tag’s title attribute and contents with appropriate functions. Just to see what would happen, I decided to try esc_html__() instead of __() to escape the entire string and to my suprise, it actually worked without creating entities for the abbr tag.
function my_extension_format_dimensions_pixels( $width, $height ) {
$tag = '<abbr title="' . esc_attr__( 'pixels', 'my_extension' ) . '">' . esc_html__( 'px', 'my_extension' ) . '</abbr>';
return sprintf( esc_html__( '%1$s × %2$s%3$s', 'my_extension' ), number_format_i18n( $width ), number_format_i18n( $height ), $tag );
}
This seems to work rather well at the moment but the question still remains…
Am I doing it wrong?
…and if so, what is the best method of approaching such a string. Strings like this present themselves quite a bit in my development. While I can sometimes work around them by deleting html from the strings, this is not always an option. Please let me know your thoughts, I want to get this stuff right and hopefully this post can help others too.



2 things:
- there are functions for localization & escaping at the same time: esc_attr__() and esc_attr_e(), esc_html_e() and esc_html__(), and such
- the sprintf statement can force variables to certain types. In your example, using %d forces to integer, so I think you’re safe passing rather anything to this (untested, might be proven wrong:)
Hi Ozh, Thanks for reading. esc_html__() will convert any special chars. That’s why it can’t be used alone for this scenario. I also needed to change the %d to %s to accommodate the output of
number_format_i18n()in the later examples. The docs for this function show it as always returning a string. I would assume that this is because certain languages do not use Arabic numbers, there’s gotta be at least a few.Basically your issue is that you have a lot of substrings that must pass through different escaping functions and it gets too verbose?
My idea would be to use wp_sprintf() and define custom fragment markers that would make argument pass through respective escaping functions.
See http://andy.wordpress.com/2011/01/05/wp_sprintf/
Rarst. Wow, that’s a really neat function that I’ve never seen before. This is quite possibly the perfect solution to my problem. I’ve read through
wp_sprintf()‘s definition in core as well thewp_sprintf_l()filter. The only question I have before I jump into this is: “Are fragments required to be only a single character?”. This seems to be the case especially after reviewing the man page forsprint_f(). It would be nice – although maybe a bit sloppy? – to define fragments that relate to the appropriateesc_*()function. Like:%attror%html. Thanks for the solution!I had not used it extensively, but it passes larger chunk of string (probably till following whitespace) for fragment name so most probably you can check for more than one character just fine.