PHP - Sanitise and Validate Data, Escape Output

Sanitise and Validate Inputs

Writing PHP applications that run on the web, will more often than not require user inputs. The goal of this short post is to remind you of some of the best practices on handling input and output data.

Mantra: trust no one!
This is the most basic precaution against sql injection and xss. Never trust any data that originates outside your direct control.

Examples:

  • $_POST
  • $_GET
  • $_REQUEST
  • $_COOKIE
  • $argv
  • php://stdin
  • php://input
  • file_get_contents()
  • Remote databases
  • Remote APIs
  • Data from your clients

Escape or remove unsafe characters before they reach the storage layer. Therefore, when an input is received, it cannot be passed straight to the database. Rather it is first cleaned thoroughly... more like what the doctors would do before a surgery. All instruments are thoroughly sterilised. As such we can confidently reply all xss attempts with Gandalf's words: You shall not pass!!!


An example of how a user can easily pass scripts into the comment field of a blog.

<p>
    This post was very helpful. Will pay you back someday.
</p>
<script>    window.location.href='www.collectAllValuableInfo.com';
</script>

If the input is not sanitised, the code will be taken to the database and rendered on the blog's markup. Any user that goes to the page with the unsanitised comment is redirected to a malicious website.

htmlentities and htmlspecialchars funcitions

You can use the htmlentities() and htmlspecialchars functions to convert special characters into their various HTML entity equivalents. The major difference between the two is that the later converts only a set of predefined characters (&, ", ', < and >) while the former converts all characters which have HTML entity equivalent. Both functions take these four arguments:

  1. the $string,

  2. the flag,

  3. the encoding-type and

  4. the double_encode.
    Example:

    Wow! These are special characters: Ñ, ®, Æ

    ";

    echo htmlentities($string, ENT_QUOTES, 'UTF-8');

    echo htmlspecialchars($string, ENT_QOUTES, 'UTF-8');

    HTML Output 1:
    "<p>Wow! These are special characters:Ñ, ® Æ </p>";

    HTML Output 2:
    "<p>Wow! These are specail characters: Ñ, ®, Æ </p>";

Choice of function depends on the circumstances one is faced with. But the difference is clear, one converts everything that has entity equivalent while the other converts only five characters. Just know that if your HTML is UTF-8 encoded, there may not any need to use the htmlentities function.

filter_var and filter_input functions

The duo also form a formidable weapon to sanitise data.

filter_var($variable, FILTER, options): this function can take three arguments. The $variable to be filtered and the FILTER arguments are like the two arguments you always want to provide at least. You can check also check the documentation.

For example:

<?php 

$email = mymail@me.com;
$ipAddress = "128.31.0.13";

$email = filter_var($email, FILTER_VALIDATE_MAIL);
$ipAddress = filter_var($ipAddress, FILTER_VALIDATE_IP, FILTER_FLAG_IPV4);

In the example above, the FILTER_VALIDATE_EMAIL filter ensures that the string passed is an email. While the FILTER_VALIDATE_IP ensures that the variable passed is an IP address. The FILTER_FLAG_IPV4 option ensures it is IP version 4. This both sanitises and validates the data one receives. Validation is simply to be sure the data received is in the right format. Which we even start from the HTML form if the data is received from a form. We set the type attribute to the appropriate value, which depends on the datatype we expect.

filter_input(type, $variableName, FILTER, option): this funtion takes four arguments.
Type could be any of these - INPUT_GET, INPUT_POST, INPUT_COOKIE, INPUT_SERVER, or INPUT_ENV. Then, the other three are same as in filter_var() function.

For example:

<?php
GET['name'] = "eche"; // This will not affect filter_input 
$username = filter_input(INPUT_GET, 'name', FILTER_SANITIZE_STRING);

echo $username;

Check out the documentation.

Try to avoid using regex (regular expressions) for validation as they can get complicated. Rather, if you require a more robust tool. Check out the HTML Purifier component.

Note: sanitising data removes potentially harmful code while validating data confirms that the input data meets your expectation. Lack of data validation could lead to error in the database.

Escape Output

Escape output before you render to a web page or an API. This ensures that no harmful code is executed by the users.
Use htmlspecialchars() or htmlentities() to escape your output.

Example:

<?php
$render = '<p><script>alert("If you don't escape me I will show a prompt on the browser");</script>';
echo htmlspecialchars($render, ENT_QUOTES, 'UTF-8');

Many PHP template engines escape output automatically. But if you aren't using one simply escape your output following the method above or any other way you prefer.