Synook

Parsing RSS feeds using PHP

RSS is a popular format for syndicating news and other periodicals. There are many readers and aggregators for RSS available already, but sometimes you may want to display a certain feed on a website. In this article we’ll go through a simple way of parsing these feeds using PHP and the DOMDocument class.

If you haven’t used DOMDocument before, it is a really nifty tool for accessing, parsing and updating XML documents, as it allows you to traverse and manipulate the structure through standard Document Object Model methods, unlike other PHP XML readers, which had their own, well, slightly odd implementations.

First, we need to create the DOMDocument instance and load the RSS feed for processing.

$rss = new DOMDocument();
$rss->load("http://domain.com/feed.rss");

Then, we get need to create a for loop that will iterate through the <item> elements, to extract the contents. The getElementsByTagName() method will retrieve a DOMNodeList of elements matching that name. Fortunately, this is accessable through foreach, just like an array.

$items = $rss->getElementsByTagName("item");
foreach ($items as $item) { //code below goes here }

Inside the foreach loop, we then use getElementsByTagName again to extract the information we want, namely the <title>, <link>, and <pubDate>. Here though, since the RSS DTD states that we can only have one title, link and pubDate per item, we don’t have to construct a loop and instead can just access the first (and only) instance of that nodename. From that, we can use the nodeValue property to get the contents of the element.

$title = $item->getElementsByTagName("title")->item(0)->nodeValue;
$link = $item->getElementsByTagName("link")->item(0)->nodeValue;
$time = $item->getElementsByTagName("pubDate")->item(0)->nodeValue;

Only one more thing to do – in RSS feeds the time is in RFC 2822 format, while we want it in a more readable form. Therefore, we first use strtotime() to convert it into a UNIX timestamp, then date() to format it the way we want. All that’s left to do after that, is echo! Of course, since we are in a loop here that process will be repeated for every <item> node.

$time = date("j M g:i a", strtotime($time));
echo "<a href=\"$link\">$title</a> $time<br />";

That’s all! Of course, this technique can be expanded to include other information such as author details and category metadata.