PHP parsing URL and return its components. 1 amazing article

(PHP 4, PHP 5, PHP 7)

parse_url — PHP parsing URL and return its components

PHP parsing URL Description

PHP

<?php
	mixed parse_url ( string $url [, int $component = -1 ] )

This function parses a URL and returns an associative array containing any of the various components of the URL that are present.

This function is not meant to validate the given PHP parsing URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.

PHP parsing URL
PHP parsing URL

Please, watch the video instruction:

Parameters

url

The PHP parsing URL. Invalid characters are replaced by _.component

Specify one of PHP_URL_SCHEMEPHP_URL_HOST,
PHP_URL_PORTPHP_URL_USERPHP_URL_PASSPHP_URL_PATH,
PHP_URL_QUERY or PHP_URL_FRAGMENT to retrieve just a specific URL component as a string (except when PHP_URL_PORT is given, in which case the return value will be an integer).

Return Values

On seriously malformed URLs, parse_url() may return FALSE.

If the component parameter is omitted, an associative array is returned. PHP parsing URL. At least one element will be present within the array. Potential keys within this array are:

  • scheme – e.g. HTTP
  • host
  • port
  • user
  • pass
  • path
  • query – after the question mark?
  • fragment – after the hashmark #

If the component parameter is specified, parse_url() returns a string (or an integer, in the case of PHP_URL_PORT) instead of an array. If the requested component doesn’t exist within the given URL, NULL will be returned.

Changelog

VersionDescription
5.4.7Fixed host recognition when scheme is omitted and a leading component separator is present.
5.3.3Removed the E_WARNING that was emitted when URL parsing failed.
5.1.2Added the component parameter.

Examples

A parse_url() example

PHP

<?php
$url = 'http://username:[email protected]:9090/path?arg=value#anchor';

var_dump(parse_url($url));
var_dump(parse_url($url, PHP_URL_SCHEME));
var_dump(parse_url($url, PHP_URL_USER));
var_dump(parse_url($url, PHP_URL_PASS));
var_dump(parse_url($url, PHP_URL_HOST));
var_dump(parse_url($url, PHP_URL_PORT));
var_dump(parse_url($url, PHP_URL_PATH));
var_dump(parse_url($url, PHP_URL_QUERY));
var_dump(parse_url($url, PHP_URL_FRAGMENT));
?>

The above example will output:

Shell

array(8) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(8) "hostname"
  ["port"]=>
  int(9090)
  ["user"]=>
  string(8) "username"
  ["pass"]=>
  string(8) "password"
  ["path"]=>
  sxtring(5) "/path"
  ["query"]=>
  string(9) "arg=value"
  ["fragment"]=>
  string(6) "anchor"
}
string(4) "http"
string(8) "username"
string(8) "password"
string(8) "hostname"
int(9090)
string(5) "/path"
string(9) "arg=value"
string(6) "anchor"

A parse_url() example with missing scheme

PHP

<?php
$url = '//www.example.com/path?googleguy=googley';

// Prior to 5.4.7 this would show the path as "//www.example.com/path"
var_dump(parse_url($url));
?>

The above example will output:

Shell

array(3) {
  ["host"]=>
  string(15) "www.example.com"
  ["path"]=>
  string(5) "/path"
  ["query"]=>
  string(17) "googleguy=googley"
}

Notes:

This function doesn’t work with relative URLs.

This function is intended specifically for the purpose of parsing URLs and not URIs. However, to comply with PHP’s backward compatibility requirements it makes an exception for the file:// scheme where triple slashes (file:///…) are allowed. For any other scheme, this is invalid.

Also

If you haven’t yet been able to find a simple conversion back to string from a parsed url, here’s an example:

PHP

<?php

$url = 'http://usr:[email protected]:81/mypath/myfile.html?a=b&b[]=2&b[]=3#myfragment';
if ($url === unparse_url(parse_url($url))) {
  print "YES, they match!\n";
}

function unparse_url($parsed_url) {
  $scheme   = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
  $host     = isset($parsed_url['host']) ? $parsed_url['host'] : '';
  $port     = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
  $user     = isset($parsed_url['user']) ? $parsed_url['user'] : '';
  $pass     = isset($parsed_url['pass']) ? ':' . $parsed_url['pass']  : '';
  $pass     = ($user || $pass) ? "[email protected]" : '';
  $path     = isset($parsed_url['path']) ? $parsed_url['path'] : '';
  $query    = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
  $fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
  return "$scheme$user$pass$host$port$path$query$fragment";
}

?>

Here is utf-8 compatible parse_url() replacement function based on “laszlo dot janszky at gmail dot com” work. PHP parsing URL. Original incorrectly handled URLs with user:pass. Also made PHP 5.5 compatible (got rid of now deprecated regex /e modifier).

PHP

<?php

    /**
     * UTF-8 aware parse_url() replacement.
     *
     * @return array
     */
    function mb_parse_url($url)
    {
        $enc_url = preg_replace_callback(
            '%[^:/@?&=#]+%usD',
            function ($matches)
            {
                return urlencode($matches[0]);
            },
            $url
        );
       
        $parts = parse_url($enc_url);
       
        if($parts === false)
        {
            throw new \InvalidArgumentException('Malformed URL: ' . $url);
        }
       
        foreach($parts as $name => $value)
        {
            $parts[$name] = urldecode($value);
        }
       
        return $parts;
    }

?>

Created another parse_url utf-8 compatible function.

PHP

<?php
function mb_parse_url($url) {
    $encodedUrl = preg_replace('%[^:/?#&=\.]+%usDe', 'urlencode(\'$0\')', $url);
    $components = parse_url($encodedUrl);
    foreach ($components as &$component)
        $component = urldecode($component);
    return $components;
}
?>

Here’s a good way to using parse_url () gets the youtube link.
This function I used in many works:

PHP

<?php
function youtube($url, $width=560, $height=315, $fullscreen=true)
{
    parse_str( parse_url( $url, PHP_URL_QUERY ), $my_array_of_vars );
    $youtube= '<iframe allowtransparency="true" scrolling="no" width="'.$width.'" height="'.$height.'" src="//www.youtube.com/embed/'.$my_array_of_vars['v'].'" frameborder="0"'.($fullscreen?' allowfullscreen':NULL).'></iframe>';
    return $youtube;
}

// show youtube on my page
$url='http://www.youtube.com/watch?v=yvTd6XxgCBE';
youtube($url, 560, 315, true);
?>

parse_url () allocates a unique youtube code and put into iframe link and displayed on your page. PHP parsing URL. The size of the videos choose yourself.

Based on the idea of “jbr at ya-right dot com” have I been working on a new function to parse the url:

PHP

<?php
function parseUrl($url) {
    $r  = "^(?:(?P<scheme>\w+)://)?";
    $r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
    $r .= "(?P<host>(?:(?P<subdomain>[\w\.]+)\.)?" . "(?P<domain>\w+\.(?P<extension>\w+)))";
    $r .= "(?::(?P<port>\d+))?";
    $r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
    $r .= "(?:\?(?P<arg>[\w=&]+))?";
    $r .= "(?:#(?P<anchor>\w+))?";
    $r = "!$r!";                                                // Delimiters
   
    preg_match ( $r, $url, $out );
   
    return $out;
}
print_r ( parseUrl ( 'me:[email protected]:29000/pear/validate.html?happy=me&sad=you#url' ) );
?>

This returns:

Shell

Array
(
    [0] => me:[email protected]:29000/pear/validate.html?happy=me&sad=you#url
    [scheme] =>
    [1] =>
    [login] => me
    [2] => me
    [pass] => you
    [3] => you
    [host] => sub.site.org
    [4] => sub.site.org
    [subdomain] => sub
    [5] => sub
    [domain] => site.org
    [6] => site.org
    [extension] => org
    [7] => org
    [port] => 29000
    [8] => 29000
    [path] => /pear/validate.html
    [9] => /pear/validate.html
    [file] => validate.html
    [10] => validate.html
    [arg] => happy=me&sad=you
    [11] => happy=me&sad=you
    [anchor] => url
    [12] => url
)

So both named and numbered array keys are possible.

Some example that determines the URL port.
When port not specified, it derives it from the scheme.

PHP

<?php
function getUrlPort( $urlInfo )
{
    if( isset($urlInfo['port']) ) {
        $port = $urlInfo['port'];
    } else { // no port specified; get default port
        if (isset($urlInfo['scheme']) ) {
            switch( $urlInfo['scheme'] ) {
                case 'http':
                    $port = 80; // default for http
                    break;
                case 'https':
                    $port = 443; // default for https
                    break;
                case 'ftp':
                    $port = 21; // default for ftp
                    break;
                case 'ftps':
                    $port = 990; // default for ftps
                    break;
                default:
                    $port = 0; // error; unsupported scheme
                    break;
            }
        } else {
            $port = 0; // error; unknown scheme
        }
    }
    return $port;
}

$url = "http://nl3.php.net/manual/en/function.parse-url.php";
$urlInfo = parse_url( $url );
$urlPort = getUrlPort( $urlInfo );
if( $urlPort !== 0 ) {
    print 'Found URL port: '.$urlPort;
} else {
    print 'ERROR: Could not find port at URL: '.$url;
}
?>

Here’s a simple class I made that makes use of this parse_url.
I needed a way for a page to retain get parameters but also edit or add onto them. PHP parsing URL.
I also had some pages that needed the same GET parameters so I also added a way to change the path.

Useage:

Test.php?foo=1:

PHP

<?php
class Paths{

    private $url;
    public function __construct($url){
        $this->url = parse_url($url);
    }
   
    public function returnUrl(){
        $return = $this->url['path'].'?'.$this->url['query'];
        $return = (substr($return,-1) == "&")? substr($return,0,-1) : $return;
        $this->resetQuery();
        return $return;
    }
   
    public function changePath($path){
        $this->url['path'] = $path;
    }
   
    public function editQuery($get,$value){
        $parts = explode("&",$this->url['query']);
        $return = "";
        foreach($parts as $p){
            $paramData = explode("=",$p);
            if($paramData[0] == $get){
                $paramData[1] = $value;
            }
            $return .= implode("=",$paramData).'&';
           
        }
       
        $this->url['query'] = $return;
    }
   
    public function addQuery($get,$value){
        $part = $get."=".$value;
        $and = ($this->url['query'] == "?") ? "" : "&";
        $this->url['query'] .= $and.$part;
    }
   
    public function checkQuery($get){
        $parts = explode("&",$this->url['query']);
       
            foreach($parts as $p){
                $paramData = explode("=",$p);
                if($paramData[0] == $get)
                    return true;
            }
            return false;
       
    }
   
    public function buildQuery($get,$value){
        if($this->checkQuery($get))
            $this->editQuery($get,$value);
        else
            $this->addQuery($get,$value);
       
    }
   
    public function resetQuery(){
        $this->url = parse_url($_SERVER['REQUEST_URI']);
    }
}
?>

PHP

<?php
$path = new Paths($_SERVER['REQUEST_URI']);
$path->changePath("/baz.php");
$path->buildQuery("foo",2);
$path->buildQuery("bar",3);
echo $path->returnUrl();
?>

returns: /baz.php?foo=2&bar=3   

Hope this is of some use to someone!

UTF-8 aware parse_url() replacement.

I’ve realized that even though UTF-8 characters are not allowed in URL’s, I have to work with a lot of them and parse_url() will break.

Based largely on the work of “mallluhuct at gmail dot com”, I added parse_url() compatible “named values” which makes the array values a lot easier to work with (instead of just numbers). I also implemented detection of port, username/password and a back-reference to better detect URL’s like this: //en.wikipedia.com
… which, although is technically an invalid URL, it’s used extensively on sites like wikipedia in the href of anchor tags where it’s valid in browsers (one of the types of URL’s you have to support when crawling pages). This will be accurately detected as the host name instead of “path” as in all other examples.

I will submit my complete function (instead of just the RegExp) which is an almost “drop-in” replacement for parse_url(). It returns a cleaned up array (or false) with values compatible with parse_url(). I could have told the preg_match() not to store the unused extra values, but it would complicate the RegExp and make it more difficult to read, understand and extend. The key to detecting UTF-8 characters is the use of the “u” parameter in preg_match().

PHP

<?php
function parse_utf8_url($url)
{
    static $keys = array('scheme'=>0,'user'=>0,'pass'=>0,'host'=>0,'port'=>0,'path'=>0,'query'=>0,'fragment'=>0);
    if (is_string($url) && preg_match(
            '~^((?P<scheme>[^:/?#]+):(//))?((\\3|//)?(?:(?P<user>[^:]+):(?P<pass>[^@]+)@)?(?P<host>[^/?:#]*))(:(?P<port>\\d+))?' .
            '(?P<path>[^?#]*)(\\?(?P<query>[^#]*))?(#(?P<fragment>.*))?~u', $url, $matches))
    {
        foreach ($matches as $key => $value)
            if (!isset($keys[$key]) || empty($value))
                unset($matches[$key]);
        return $matches;
    }
    return false;
}
?>

UTF-8 URL’s can/should be “normalized” after extraction with this function.

Just for fun, watch the funny video compilation

Leave a Reply