ncoded contact information

website design services web design service online shops service optimisation and seo services graphical design servicese-commerce service search engines services graphic design services

PHP 5 Web Page Scraping

Posted by admin On May - 15 - 2009

If you want to scrape or mash-up elements from an external website then take a look at the following code.

As you can see it fetches the document (web pages) and then iterates around via DOM and SimpleXML.

$html = @DOMDocument::loadHTMLFile($url); // fetch the remote HTML file and parse it (@ suppresses warnings).
$xml = simplexml_import_dom($html); // convert the DOM object to a SimpleXML object.
foreach ($xml->xpath('//a') as $node){ // run an XPath query and iterate through the array of results
  print (string) $node . "\n"; // casting to string produces the text contents of the node.
  print $node['href'] . "\n"; // attributes of the node are accessible as array attributes.
  print $node->asXML() . "\n\n"; // asXML() produces the whole XML string.
}

Note: if namespaces are involved, use

$xml->registerXPathNamespace('NAMESPACE_PREFIX', 'NAMESPACE_URI');

and

$xml->xpath('//NAMESPACE_PREFIX:ELEMENT')

replacing the text in capitals as appropriate.

  • Share/Save/Bookmark

Other related resources that may be of interest to you

Comments are closed.

Website & E-Commerce Solutions


nCoded Website, IP & Content © 2003 - 2009 nCoded - All Rights Reserved - No part of this website may be reproduced without permission.