More secure WordPress code by validating, sanitizing and escaping dataJanuary 20, 2017
Security in WordPress is a broad topic and there are many aspects to it. I was listening to a podcast by Post Status 1 the other day and realized that I really need to improve my understanding of security from a developer perspective. So I’m not trying to find out how to make a WordPress installation more secure, but how to write more secure WordPress code for themes and plugins. The basic philosophy is to not trust any data that your program interacts with and always make sure it will work with the given input. In code, there are three main steps to achieve this. They usually happen in this order over the life cycle of the data.
- Validating: Before interacting with data, always make sure that it is of the expected type and format.
- Sanitizing: Before storing data to the database, remove all potentially malicious parts.
- Escaping: Prepare data to be safe for a certain context
Of course these concepts apply to a much wider range of software than a content management system, but the cool thing about WordPress is that it encourages and helps me as a developer implementing them. So what I’ll try to do here is establish a few easy-to-implement rules on how to write more secure WordPress code by leveraging helper functions that ship with PHP and the core.
Because PHP does not have static types many different things can hide behind a variable name. That can be quite flexible and powerful, but also introduces a million ways a program can break. Pretty much all code is based on some assumptions and if they are violated, the code breaks. So instead of leaving the assumptions implicit, validation makes them explicit. Usually that means wrapping some code inside an if-clause that checks the assumption it is based on.
The else-clause could then decide to default into some other case, throw an exception or whatever makes most sense in that particular situation. In addition to that, multiple input formats can be checked and distinguished. A function could for example work with a post id or the post object itself. Both options might make sense and using input validation the code will adjust itself during each function call. Lots of these assumptions involve type checking and seeing if something “is there”. Here is a list of some handy PHP functions to do this:
is_numeric( $var ); is_array( $var ); is_object( $var ); is_string( $var ); isset( $data['key'] ); in_array( $var, $arr ); strlen( $var ); count( $arr); empty( $var );
Sanitizing data means removing all potentially malicious parts from it. Validation is usually read-only, but sanitization manipulates the data and only leaves the safe parts untouched. But which parts are safe or not is determined by the context so this seems like a tedious task. Good for us that WordPress provides an arsenal of helper functions to achieve this. Most of those functions are inside a file called formatting.php2.
In fact this file also contains a function that I always wondered where it resides in WordPress: sanitize_title. When you start a new post in WordPress, enter a title and hit save it will generate a slug for you. I did not look it up, but I think this function is probably what is used or comes very close to it. I will need more experience to decide on the most useful ones, but here are some that look very promising:
sanitize_text_field( $str ); sanitize_title( $str ); sanitize_email( $str ); sanitize_file_name( $str );
The other two steps were about working with external data in a safe way. Escaping is more about making sure data is safe and properly formatted for a certain context. This is a really general concept, but in WordPress the context often happens to be some part of an HTML document. Therefore it is quite crucial to make sure all dynamic parts that are sent to visitors are cleaned beforehand. This avoids things like inline script-tags or click handlers from being inserted into the markup. Furthermore there are methods that make sure that only valid characters are sent, special characters are properly encoded and more. For many of these, there is an equivalent function that escapes and translates or even escapes, translates and outputs the resulting safe string. These are the ones with those unreadable function names like esc_attr_e. This one for example prepares a string to be safe as an HTML attribute, translates and echoes it. And there are many more:
esc_html( $str ); esc_attr( $str ); esc_url( $str ); esc_js( $str ); esc_textarea( $str );
Extra security requires extra work
It is not that I never thought about these things before, but I always imagined it to be a lot of extra work that does not pay off very well. Now I think it is just a habit that one needs to practice until it becomes second nature. The payoff in code security and stability will definitely be worth the few extra lines here and there. Here are my conclusions that I need to start implementing:
- Validate all assumptions about data from user input, the database or an external resource and react to things that did not come in as expected.
- Before storing data into the database, think about what I expect the data to be and sanitize it accordingly
- All dynamic data put into markup must be wrapped inside some escaping function.
- Escape late and escape often. Escaping data twice is a minor performance loss, escaping data zero times is a major security vulnerability.