Resumable Chunked PHP Upload Server

Uploading files from a server to a client is very common. When making a POST with files in the request body PHP provides an easy way of picking up the files via the $_FILES global variable.

This works fine. For most cases. In Africa, Nigeria to be precise, internet connection has improved quite a bit in recent times but there is still quite a bit to be desired in terms of speed and reliability. From a user's point of view it would be really cool to not have to restart an upload every time there is an interruption in internet connection and if you are fan of Digital ocean's 512 MB droplets you would love to handle large uploads without exceeding PHP's memory limit. Even if none of these cases pertain to you, chunked upload is a nice trick to have up your sleeves.

This blog post demonstrates a simple technique for achieving a resumeable and chunkable upload server. There is also a simple php client for uploading a file in chunks as well.

Libraries et al

For this demo, we'll be using a simple but powerful routing library in php: klein to handle routing (durrh!) and for storing data we'll use a simple sqlite database

Initializing the Upload

We'll create a POST /uploads/ endpoint for initializing uploads.

The corresponding server code is given below

if (!file_exists("uploads")){
    $db = $createDB();
} else {
    $db = new SQLite3("uploads");
}

$errors = [
    "invalid_data" => "Request Contains invalid parameters",
    "sql_error" => "Error Querying the Database",
    "not_found" => "Upload not found",
    "upload_complete" => "Upload completed",
    "invalid_content_range" => "Invalid details in content range"
];

$klein = new \Klein\Klein();

$klein->respond('POST', '/uploads/?', function($request, $response) use ($db, $errors){
    $data = json_decode($request->body(), true);
    if (!isset($data['filename']) || !isset($data['file_size']) || !isset($data['chunk_size']) || !isset($data['checksum'])) {
        $response->code(400);
        return $response->json([
            "error" => $errors['invalid_data']
        ]);
    }

    $ext = pathinfo($data["filename"], PATHINFO_EXTENSION);
    $uuid = uniqid("mobnia", true);
    $localFilename =  $ext ? $uuid . "." . $ext : $uuid;
    $next = ($data['chunk_size'] > $data['file_size']) ? $data['file_size'] : $data['chunk_size'];

    $obj = [
        "filename" => $data["filename"],
        "file_size" => $data["file_size"],
        "chunk_size" => $data['chunk_size'],
        "checksum" => $data["checksum"],
        "local_filename" => $localFilename,
        "chunk_start" => 0,
        "chunk_end" => $next,
        "created_at" => (new \DateTime())->format(\DateTime::ISO8601)
    ];
    $sql1 = "(";
    $sql2 = "(";
    foreach (array_keys($obj) as $key){
        $sql1 .= $key.",";
        $sql2 .= ":".$key.",";
    }
    $sql1 = rtrim($sql1, ",") . ")";
    $sql2 = rtrim($sql2, ",") . ")";

    $sql = "insert into uploads ".$sql1." values ".$sql2;
    $stmt = $db->prepare($sql);
    foreach ($obj as $key => $value) {
        $stmt->bindValue(":".$key, $value);
    }

    $ok = $stmt->execute();
    if (!$ok) {
        $response->code(500);
        return $response->json([
            "error" => $errors['sql_error']
        ]);
    }

    $id = $db->lastInsertRowID();
    $result = $db->querySingle("select * from uploads where id=$id", true);

    return $response->json(["data" =>$result]);
});

The handler for the POST /uploads/ endpoint accepts the filename, file size, chunksize and checksum and stores this in the database with a corresponding id. The created row in the database is returned to the client.

The php client doing the upload makes the request like so :

$ch = curl_init("http://localhost:8081");
curl_setopt($ch, CURLOPT_HEADER, [
 "Content-Type: application/json"
]);

$data = [
 "filename" => pathinfo($path, PATHINFO_BASENAME),
 "file_size" => filesize($path),
 "chunk_size" => 1024 * 1024,
 "checksum" => md5_file($path)
];

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));

$response = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$exec = substr($response, $header_size);

$json = json_decode($exec, true);
if (!$json || isset($json['error'])){
 $message = isset($json['error']) ? $json['error'] : "Error Connecting to Upload Server";
 throw  new Exception($message);
 }

return $json;

The snippet above takes uses the $path variable as the full path of the file to be uploaded. It makes a request to our upload server with the filename, filesize, chunksize and checksum of the file. The chunksize determines how large each chunk would be when we start uploading and the checksum helps give a unique id to the file. This would come in handy when resuming an upload. And that tidies the initialization flow.

Uploading in Chunks

The code below shows a simple file writing function

$writeChunk = function($filename, $chunk){
    $UPLOAD_DIR = dirname(__FILE__)."/uploads_dir/";
    if (!is_dir($UPLOAD_DIR)){
        mkdir($UPLOAD_DIR, 0777, true);
    }
    $status = false;
    $file = fopen($UPLOAD_DIR.$filename, "a");
    if (!$file) return $status;

    if (!flock($file, LOCK_EX)) {
        return $status;
    }

    $status = fwrite($file, $chunk);
    fclose($file);
    return $status;
};

This writes the incoming chunk to a file as long as a write lock can be obtained for the file. The lock helps prevent simultaneous writes.

We then create a PUT /uploads/:id/ endpoint on our server which would accept incoming chunks

$klein->respond('PUT', '/uploads/[:id]/?', function($request, $response) use ($errors, $db, $writeChunk){
    $id = $request->id;
    $upload = $db->querySingle("select * from uploads where id=$id", true);
    
    $bytes = $writeChunk($upload["local_filename"], $request->body());

    $chunk_start = $upload['chunk_end'];
    $chunk_end = $upload['chunk_end'] + $bytes;
    if ($chunk_end > $upload['file_size']) $chunk_end = $upload['file_size'];

    $upload['chunk_start'] = $chunk_start;
    $upload['chunk_end'] = $chunk_end;

    $upload['updated_at'] = (new \DateTime())->format("Y-m-d H:i:s");

    if ($upload['chunk_end'] ==  $upload['chunk_start']) {
        $upload['completed_at'] = $upload['updated_at'];
    }

    unset($upload["id"]);
    $sql = "update uploads set ";
    foreach (array_keys($upload) as $key) {
        $sql .= $key."=:".$key.",";
    }
    $sql = rtrim($sql, ",");
    $sql .= " where id=:id";
    $stmt = $db->prepare($sql);
    foreach ($upload as $key => $value) {
        $stmt->bindValue(":".$key, $value);
    }
    $stmt->bindValue(":id", $id);

    $stmt->execute();

    $result = $db->querySingle("select * from uploads where id=$id", true);
    return $response->json(["data" =>$result]);
});

The chunk to be written is contained in the request body. Based on the data earlier stored in the database the chunk is written to a file and the database is updated with the current position of the writer. The updated data is then sent to the client.

The client code for uploading the chunks is given below:

function uploadChunk($file, $chunkStart, $chunkEnd, $id, $fileSize, $options){

    fseek($file, $chunkStart-$fileSize, SEEK_END);
    $chunkSize = (($diff = $chunkEnd-$chunkStart) < 10240) ? 10240 : $diff;
    $chunk = fread($file, $chunkSize);

    $rangeHeader = implode(" ", ["bytes", implode("-", [$chunkStart, $chunkEnd, $fileSize])]);


    $url = rtrim($options['url'], "/") . "/" . $id;
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_HTTPHEADER, [
        "Content-Type: application/octet-stream",
        "Content-Range: " . $rangeHeader
    ]);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
    curl_setopt($ch, CURLOPT_POSTFIELDS, $chunk);


    $exec = curl_exec($ch);

    $json = json_decode($exec, true);
    if (!$json || isset($json['error'])){
        $message = isset($json['error']) ? $json['error'] : "Error Connecting to Upload Server";
        throw  new Exception($message);
    }

    return $json;
}

The function takes a file resource and based on the position of the writer and the chunksize, it reads parts of the file and sends this chunk of data to the server.

This back and forth between the server and client continues until the file is completely uploaded (which happens when the writer position is at the end of the file or is equal to the file size).

Resuming a download

Since each upload has a unique id on the server, with that id it would be possible to request the Upload record and then determine the writer position and continue the upload process.

To handle fetching the Upload object, we create a GET /uploads/:id/ endpoint like so:

$klein->respond('GET', '/uploads/[:id]/?', function($request, $response) use ($errors, $db){
    $id = $request->id;
    $data = $db->querySingle("select * from uploads where id=$id", true);
    if ($data === false) {
        $response->code(500);
        return $response->json([
            "error" => $errors['sql_error']
        ]);
    } elseif ($data === null || count($data) == 0) {
        $response->code(404);
        return $response->json([
            "error" => $errors['not_found']
        ]);
    }

    return $response->json(["data" =>$data]);
});

The returned data contains the current writer position which would allow for a seamless resume. The checksum data collected during the initialization process can be used throughout the upload to ensure that the file being uploaded is the exact same one that was created.

A full working implementation can be found here