Rayhan’s blog (raynux.com)

Rayhan’s Personal Web Blog Site

Entries Comments


RayFeedReader – PHP class for parsing RSS and Atom Feed

2 September, 2009 (00:35) | class, FEED, PHP, programming, Reference

Tags: , , , , , ,


RayFeedReader, A Simple and Easy SimpleXML based feed reader class using PHP.

First of all it’s another reinvention of wheel. This class is designed to provide quick access to any XML based RSS feed and it’s content. Usages are very simple and require only one line of code. Html rendering functionality is provided through separate pluggable widget rendering class which can be extended easily. SimpleXML is used by default to load and fetch feed url but can be used CURL using rayHttp class. All or any CURL options can be set for HTTP request. It can auto detect feed type and supports RSS 0.92, RSS 2.0 and Atom feeds.

Features:

- Read feeds content into an array
- Supports for RSS 0.92, RSS 2.0 and Atom feed
- Detect feed type automatically, also can be set manually.
- Support pluggable html widget rendering class
- Can render html widget through optional RayFeedWidget class or your own extended class.
- Easily configurable & can work without any configuration.
- Simple & Easy to use from anywhere in your application with a single line of code.
- Support Singleton pattern
- By default SimpleXML is used as http client,
- Support Custom CURL request, may be used to extends functionality through rayHttp class.
- Light Weight

Methods Available (This class provides):
- parse()
- getData()
- widget()

Class:

File: rayfeedreader.php

<?php

    /**
     * RayFeedReader
     *
     * SimpleXML based feed reader class. A very specific feed reader designed
     * to working with no or little configurations.
     *
     * - This class can read an rss feed into array from a given url.
     * - Also can render html widget with plugable RayFeedWidget Class.
     * 
     * Supports following feed types:
     *  - RSS 0.92
     *  - RSS 2.0
     *  - Atom
     *
     * Configuration Options
     *  - array
     *      - url: (string)
     *          - feed url
     *
     *      - httpClient: (string)
     *          - default SimpleXML
     *          - value rayHttp or SimpleXML
     *
     *      - type: ([optional] string
     *          - auto detect
     *          - value rss or rss2 or atom
     *
     *      - widget: ([optional] string)
     *          - feed widget class name for rendering html
     * 
     *      - rayHttp: (array)
     *          - only if httpClient is set to rayHttp
     *          - rayHttp Options if you want to modify rayHttml CURL options
     *          - generally not required.
     *
     *
     * 
     *
     * @version 1.0
     * @author Md. Rayhan Chowdhury
     * @package rayFeedReader
     * @license GPL
     */
    Class RayFeedReader{

        /**
         * Self Instance for Singleton Pattern
         *
         * @var object
         * @access protected
         */
        static private  $__instance;

        /**
         * Instance of Parser Class.
         *
         * @var object Parser Class
         * @access protected
         */
        Protected       $_Parser;

        /**
         * Feed Url
         *
         * @var string feed url
         * @access protected
         */
        protected       $_url;

        /**
         * Runtime Options for reader
         *
         * @var array
         * @access protected
         */
        protected       $_options = array('rayHttp' => array());

        /**
         * Type of feed to be parsed.
         *
         * @var string
         * @access protected
         */
        protected       $_type = "rss";

        /**
         * HttpClient to be used for loading feed content.
         *
         *  - default SimpleXML
         *
         * @var string 'SimpleXML' or 'rayHttp'
         * @access protected
         */
        protected       $_httpClient = "SimpleXML";
        
        /**
         * Widget Class Name
         *
         * @var string
         * @access protected
         */
        protected       $_widget;

        /**
         * Parsed result data
         * 
         * @var array
         * @access protected
         */
        protected       $_content;

        /**
         * Class construct
         *
         * @param array $options
         */
        function __construct($options = array()) {
            $this->setOptions($options);
        }

        /**
         * Get Instance of the class.
         *
         * @param array $options
         * @return object self instance.
         * @access public
         * @static
         */
        static function &getInstance($options = array()) {
            if (is_null(self::$__instance)) {
                self::$__instance = new self($options);
            }
            return self::$__instance;
        }

        /**
         * Set Options for the class
         * 
         * 
         * @param array $options
         * @return object self instance
         * @access public
         */
        function &setOptions($options) {
            if (!empty($options['url'])) {
                $this->_url = $options['url'];
            }

            if (!empty($options['type'])) {
                $this->_type = $options['type'];
            }

            if (!empty($options['httpClient'])) {
                $this->_httpClient = $options['httpClient'];
            }

            if (!empty($options['widget'])) {
                $this->_widget = $options['widget'];
            }

            $this->_options = array_merge($this->_options, $options);

            return $this;
        }

        /**
         * Parse feed contents into an array and return self object
         * 
         * @return object self instance
         * @access public
         */
        function &parse() {
            /**
             * Get/load content
             */
             switch ($this->_httpClient) {
                 case 'SimpleXML':
                     $content = new SimpleXMLElement($this->_url, LIBXML_NOCDATA, true);
                     break;

                 case 'rayHttp':
                     
                     $content = RayHttp::getInstance()->setOptions($this->_options['rayHttp'])->get($this->_url);
                     
                     if (!empty($content)) {
                        $content = new SimpleXMLElement($content, LIBXML_NOCDATA);
                     }
                     break;
             }

             if (empty($content)) {
                 trigger_error("XML format is invalid or broken.", E_USER_ERROR);
             }

            /**
             * Detect Feed Type
             */

             if (empty($this->_type)) {
                    
                    switch ($content->getName()) {
                        case 'rss':
                            foreach ($content->attributes() as $attribute) {
                                if ($attribute->getName() == 'version') {
                                    if ('2.0' == $attribute) {
                                        self::setOptions(array('type' => 'rss2'));
                                    } elseif (in_array($attribute, array('0.92', '0.91'))) {
                                        self::setOptions(array('type' => 'rss'));
                                    }
                                }
                            }
                            break;

                        case 'feed':                            
                            self::setOptions(array('type' => 'atom'));
                            
                            break;
                    }
                 
             }
             
             if (!in_array($this->_type, array('rss', 'rss2', 'atom'))) {

                  trigger_error("Feed type is either invalid or not supported.", E_USER_ERROR);
                  
                  return false;
             }


            /**
             * Parse Feed Content
             */
            switch ($this->_type) {
                case 'rss':
                    $content = $this->parseRss($content);
                    break;

                case 'rss2':
                    $content = $this->parseRss2($content);
                    break;

                case 'atom':
                    $content = $this->parseAtom($content);
                    break;
            }

             if (empty($content)) {                 
                 trigger_error("No content is found.", E_USER_ERROR);
             }

             $this->_content = $content;
             
             return $this;

        }

        /**
         * Get Array of Parsed XML feed data.
         *
         * @return array parsed feed content.
         * @access public
         */
        function getData() {
            return $this->_content;
        }

        /**
         * Return html widget based rendered by widget class
         *
         *
         * @param array $options for html widget class
         * @return string html widget
         * @access public
         */
        function widget($options = array('widget' => 'brief')) {
            if (!empty($this->_widget) && !empty($this->_content)) {
                $Widget = new $this->_widget;
                
                return $Widget->widget($this->_content, $options);
                
             } else {
                 return false;
             }
        }
        
        /**
         * Parse feed xml into an array.
         *
         * @param object $feedXml SimpleXMLElementObject
         * @return array feed content
         * @access public
         */
        function parseRss($feedXml) {
            $data = array();

            $data['title'] = $feedXml->channel->title . '';
            $data['link'] = $feedXml->channel->link . '';
            $data['description'] = $feedXml->channel->description . '';
            $data['parser'] = __CLASS__;
            $data['type'] = 'rss';

            foreach ($feedXml->channel->item as $item) {
                $data['items'][] = array(
                                        'title' =>  $item->title . '',
                                        'link' =>   $item->link . '',
                                        'description' => $item->description . '',
                                    );
            }
            
            return $data;
        }

        
        /**
         * Parse feed xml into an array.
         *
         * @param object $feedXml SimpleXMLElementObject
         * @return array feed content
         * @access public
         */
        function parseRss2($feedXml) {
            $data = array();

            $data['title'] = $feedXml->channel->title . '';
            $data['link'] = $feedXml->channel->link . '';
            $data['description'] = $feedXml->channel->description . '';
            $data['parser'] = __CLASS__;
            $data['type'] = 'rss2';

            $namespaces = $feedXml->getNamespaces(true);
            foreach ($namespaces as $namespace => $namespaceValue) {
                $feedXml->registerXPathNamespace($namespace, $namespaceValue);
            }

            foreach ($feedXml->channel->item as $item) {
                $categories = array();
                foreach ($item->children() as $child) {
                    if ($child->getName() == 'category') {
                        $categories[] = (string) $child;
                    } 
                }

                $author = null;
                if (!empty($namespaces['dc']) && $creator = $item->xpath('dc:creator')) {
                    $author = (string) $creator[0];
                }

                $content = null;
                if (!empty($namespaces['encoded']) && $encoded = $item->xpath('content:encoded')) {
                    $content = (string) $encoded[0];
                }

                $data['items'][] = array(
                                        'title' =>  $item->title . '',
                                        'link' =>   $item->link . '',
                                        'date' =>   date('Y-m-d h:i:s A', strtotime($item->pubDate . '')),
                                        'description' => $item->description . '',
                                        'categories' => $categories,
                                        'author' => array( 'name' => $author),
                                        'content' => $content,
                                        
                                    );
                
            }

            return $data;
        }

        /**
         * Parse feed xml into an array.
         *
         * @param object $feedXml SimpleXMLElementObject
         * @return array feed content
         * @access public
         */
        function parseAtom($feedXml) {
            $data = array();

            $data['title'] = $feedXml->title . '';
            foreach ($feedXml->link as $link) {
                    $data['link'] = $link['href'] . '';
                break;
            }

            $data['description'] = $feedXml->subtitle . '';
            $data['parser'] = __CLASS__;
            $data['type'] = 'atom';

            foreach ($feedXml->entry as $item) {
                foreach ($item->link as $link) {
                    $itemLink = $link['href'] . '';
                    break;
                }

                $categories = array();
                foreach ($item->category as $category) {
                    $categories[] = $category['term'] . '';
                }

                $data['items'][] = array(
                                        'title' =>  $item->title . '',
                                        'link' =>   $itemLink . '',
                                        'date' =>   date('Y-m-d h:i:s A', strtotime($item->published . '')),
                                        'description' => $item->summary . '',
                                        'content' => $item->content . '',
                                        'categories' => $categories,
                                        'author' => array('name' => $item->author->name . '', 'url' => $item->author->uri . ''),
                                        'extra' => array('contentType' => $item->content['type'] . '', 'descriptionType' => $item->summary['type'] . '')
                                    );
            }

            return $data;
        }

    }
?>

External feed widget class to get rendered html widget.

File: rayfeedwidget.php

<?php

    /**
     * Feed Widget Interface
     *
     * @version 1.0
     * @author Md. Rayhan Chowdhury
     * @package rayFeedReader
     * @license GPL
     */
    interface RayFeedWidget_Interface{
        /**
         * Public widget method
         *
         * @param <type> $data
         * @param  $options
         * @return string html
         */
        public function widget($data, $options = array());
    }

    /**
     * Widget Plugin Class for RayFeedReader
     *
     * Render html widget for feed reader based on options
     *
     * config options
     *  - array
     *      - widget:
     *          - optional string
     *          - value 'brief' or 'compact' or 'detail'
     *      - showTitle
     *          - boolean
     *          - whether add blog title or not.
     *
     * @version 1.0
     * @author Md. Rayhan Chowdhury
     * @package rayFeedReader
     * @license GPL
     */
    class RayFeedWidget implements RayFeedWidget_Interface{

        /**
         * HTML widget structure
         *
         * @var array
         * @access public
         */
        public $html = array('brief' => "<div class=\"feed-item feed-brief\">\n
                                            <div class=\"feed-item-title\">\n
                                                <h3><a href='%s'>%s</a></h3>\n
                                                <div class='feed-item-date'>%s</div>\n
                                            </div>\n
                                            <div class=\"feed-item-description\">%s</div>\n
                                        </div>",
            
                                'compact' => "<div class=\"feed-item feed-compact\">\n
                                    <div class=\"feed-item-title\">\n
                                        <h3><a href='%s'>%s</a></h3>\n
                                        <div class='feed-item-date'>%s</div>\n
                                    </div>\n
                                </div>",

                                'detail' => "<div class=\"feed-item feed-detail\">\n
                                    <div class=\"feed-item-title\">\n
                                        <h3><a href='%s'>%s</a></h3>\n
                                        <div class='feed-item-date'>%s</div>\n
                                    </div>\n
                                    <div class=\"feed-item-content\">%s</div>\n
                                </div>",
                            );
        
        /**
         * Return html widget based rendered by widget class
         *
         * @param  $data
         * @param  $options
         * @return 
         */
        function widget($data, $options = array('widget' => 'brief')) {
            switch ($options['widget']) {

                case 'compact':
                    return $this->widgetCompact($data, $options);
                    break;

                case 'detail':
                    return $this->widgetDetail($data, $options);
                    break;
                
                case 'brief':
                default:
                    return $this->widgetBrief($data, $options);
                    break;
            }
        }

        /**
         * Render feed widget with title and date only
         *
         * @param  $data
         * @param  $options
         * @return  
         */
        function widgetCompact($data, $options = array()) {
            if (empty($data['items'])) {
              return false;
            }

            $out = array();
            foreach ($data['items'] as $item) {
                if (empty($item['date'])) {
                    $item['date'] = '';
                }
                $out[] = sprintf($this->html['compact'], $item['link'], $item['title'], $item['date']);

            }

            $title = '';
            if (empty($options['showtitle'])) {
                $title = sprintf("<div class='feed-title'><h2>%s</h2><hr>%s</div>\n", $data['title'], $data['description']);
            }

            $out = "<div class='feed-container'>\n"  . $title . join(" \n", $out) . "</div>";

            return $out;
        }

        /**
         * Render feed widget with title, date & description
         *
         * @param  $data
         * @param  $options
         * @return  
         */
        function widgetBrief($data, $options = array()) {
            if (empty($data['items'])) {
              return false;
            }

            $out = array();

            foreach ($data['items'] as $item) {
                if (empty($item['date'])) {
                    $item['date'] = '';
                }

                if (empty($item['description'])) {
                        $item['description'] = $item['content'];
                }
                
                $out[] = sprintf($this->html['brief'], $item['link'], $item['title'], $item['date'], $item['description']);

            }
            
            $title = '';
            if (empty($options['showtitle'])) {
                $title = sprintf("<div class='feed-title'><h2>%s</h2><hr>%s</div>\n", $data['title'], $data['description']);
            }

            $out = "<div class='feed-container'>\n"  . $title . join(" \n", $out) . "</div>";
            

            return $out;
        }

        /**
         * Render blog widget with title, date & content
         *
         * @param  $data
         * @param  $options
         * @return  
         */
        function widgetDetail($data, $options = array()) {
            if (empty($data['items'])) {
              return false;
            }

            $out = array();
            foreach ($data['items'] as $item) {
                if (empty($item['date'])) {
                    $item['date'] = '';
                }
                if (empty($item['content'])) {
                    $item['content'] = $item['description'];
                }
                $out[] = sprintf($this->html['detail'], $item['link'], $item['title'], $item['date'], $item['content']);

            }
            
            $title = '';
            if (empty($options['showtitle'])) {
                $title = sprintf("<div class='feed-title'><h2>%s</h2><hr>%s</div>\n", $data['title'], $data['description']);
            }

            $out = "<div class='feed-container'>\n"  . $title . join(" \n", $out) . "</div>";

            return $out;
        }
    }
?>

Usage Example: some quick examples are given below to introduce you with the class and it’s methods.

File: example.php

<?php
   
    /**
     * Example Usage or RayFeedReader
     */
   
    require_once('rayfeedreader.php');   
   
    /**
     * get Instance
     */
    $reader1 = RayFeedReader::getInstance();
    $reader2 = RayFeedReader::getInstance($options);
    $reader3 = new RayFeedReader();
    $reader4 = new RayFeedReader($options);

    /**
     * get data from a feed url
     */
    $options = array('url' => 'http://example.com/feed');

    $feedData = RayFeedReader::getInstance($options)->parse()->getData();

    // Or options can be set anytime using setOptions method.
    $feedData = RayFeedReader::getInstance()->setOptions($options)->parse()->getData();

    /**
     * Get html widget
     */

    $options = array(
                        'url' => 'http://example.com/feed',
                        'widget' => 'RayFeedWidget',
                    );
    
    /**
     * Load rayFeedWidget class file
     */
    require_once('rayfeedwidget.php');

    $html = RayFeedReader::getInstance()->setOptions($options)->parse()->widget();

    // OR with widget options.
    $widgetOptions = array('widget' => 'detail', 'showTitle' => true);
    
    $html = RayFeedReader::getInstance()->setOptions($options)->parse()->widget($widgetOptions);

    /**
     * Full options
     */
    $options = array(
                        'url' => 'http://example.com/feed',
                        'widget' => 'RayFeedWidget',
                        'httpClient' => 'rayHttp',
                        'type' => 'atom',
                    );

    /**
     * Load rayHttp class file.
     */
    require_once("rayhttp.php");    

    /**
     * Html widget with full options
     */
    $html = RayFeedReader::getInstance()->setOptions($options)->parse()->widget();

    /**
     * Get Feed Data with full options
     */
    $feedData = RayFeedReader::getInstance($options)->parse()->getData();
?>

Please let me know if you find this class helpful for you..

Class Update:

An updated version of this class and example script are posted at PHPClasses.org, http://www.phpclasses.org/browse/package/5652.html. Please use class from PHPClasses.org for bug fixed latest version.

«

  »

Comments

Comment from joetke
Time: November 6, 2009, 1:03 pm

The best php rss/atom reader around. I found it through google and there are a tremendous amount of occurrences repeating each other’s dragged and dropped code chunks. Thank you very much. It’s clear, standard code writing compliant. Definitely India is a big threat to decaying powers even in its smallest details. Congrats India!

Comment from ClubPenguinCheats
Time: March 22, 2010, 11:41 am

I found it through google and there are a tremendous amount of occurrences repeating each other’s dragged and dropped code chunks. Thank you very much. It’s clear, standard code writing compliant.

Comment from Gregory Orton
Time: November 13, 2010, 3:04 pm

Great work here. I’ve been trying to write my own feed reader with simpleXML and I found that determining between Atom and RSS was the big pain in the butt. I

I wrote an effective caching and reader function to simplify what I want and save the file, whilst only updating the cache every hour. However, the fact that Atom uses namespaces totally sucks the cahunas. Good work!

Write a comment