Leer feeds RSS [0.92][1.0][2.0] y Atom 1.0 en Rails
Wednesday, March 14th, 2007Aquà va mi primera contribución con la comunidad Rails.
En el proyecto que estoy creando necesitaba una libreria que leyera feeds ya que los usuarios pueden pueden indicar su blog para que aparezcan las 10 últimas entradas en su perfil.
He visto que internament, Ruby da soporte a RSS 0.92, 1.0 y 2.0, pero no lo hace bien. Ciertos campos de los posts los ignora y no se pueden recuperar y además no habia soporte para Atom.
Asà que aquà está la libreria que permite obtener un hash con la info de los posts del feed. Solo teneis que insertarla en la carpeta /lib/, declararla en environment.rb con
require ‘rssReader’
Libreria:
require ‘rexml/document’
class RssReader#**************************************************************************
# -PARAMs: url feed
# url accepts feed’s:
# RSS 0.92
# RSS 1.0
# RSS 2.0
# Atom
#
# -OUTPUT: Hash
# [’title’] –> blog’s title
# [’link’] —> blog’s link
# [’description’] –> blog’s description
# [’rss_url’] –> feed’s url
# [’items’] –> array of posts
#Â Â Â [’title’] –> post’s title
#Â Â Â [’link’] –> post’s link
#Â Â Â [’description’] –> summary of post
#Â Â Â [’content’] –> content of post
#Â Â Â [’author’] –> author of post
#Â Â Â [’publication_date’] –> publication date post
#
#**************************************************************************
def read_rss(feed_url)
@content = Net::HTTP.get(URI.parse(feed_url))
xml = REXML::Document.new(@content)
data = {}if !xml.root.elements[’channel/title’].nil?
#TIPO RSS (RSS 0.92 | RSS 1.0 | RSS 2.0)data[’title’] = xml.root.elements[’channel/title’].text unless !xml.root.elements[’channel/title’]
data[’link’] = format_url(xml.root.elements[’channel/link’].text) unless !xml.root.elements[’channel/link’]
data[’description’] = xml.root.elements[’channel/description’].text unless !xml.root.elements[’channel/description’]
data[’rss_url’] = format_url(feed_url)
data[’items’] = []
xml.elements.each(’//item’) do |item|
it = {}
it[’title’] = item.elements[’title’].text unless !item.elements[’title’]
it[’link’] = format_url(item.elements[’link’].text) unless !item.elements[’link’]
it[’description’] = item.elements[’description’].text unless !item.elements[’description’]
it[’content’] = item.elements[’content:encoded’].text unless !item.elements[’content:encoded’]
it[’author’] = item.elements[’dc:creator’].text unless !item.elements[’dc:creator’]
it[’publication_date’] = item.elements[’dc:date’].text unless !item.elements[’dc:date’]
it[’publication_date’] = item.elements[’pubDate’].text unless !item.elements[’pubDate’]
data[’items’] < < it
end
elsif !xml.elements['/feed/title'].nil?
#TIPO ATOM
data['title'] = xml.elements['/feed/title'].text unless !xml.elements['/feed/title']
data['link'] = xml.elements['/feed/link'].text unless !xml.elements['/feed/link']
data['description'] = xml.elements['/feed/tagline'].text unless !xml.elements['/feed/tagline']
data['rss_url'] = feed_url
data['items'] = []xml.elements.each('//entry') do |item|
it = {}
it['title'] = item.elements['title'].text unless !item.elements['title']
it['link'] = item.elements['link'].text unless !item.elements['link']
it['description'] = item.elements['summary'].text unless !item.elements['summary']
it['content'] = item.elements['content'].text unless !item.elements['content']
it['author'] = item.elements['author/name'].text unless !item.elements['author/name']
it['publication_date'] = item.elements['issued'].text unless !item.elements['issued']
data['items'] << it
end
endreturn data
end#****************************************************************
# -PARAMs url
#
# -OUTPUT clean url with "http://" protocol inserted
#
#****************************************************************
def format_url(url)
if !url.nil?
"http://" + url.downcase.gsub("http://","")
end
endend
