¿Cómo analizar un feed xml usando python?

Estoy tratando de analizar este xml (http://www.reddit.com/r/videos/top/.rss) y estoy teniendo problemas para hacerlo. Estoy intentando guardar los enlaces de youtube en cada uno de los elementos, pero tengo problemas debido al nodo secundario del “canal”. ¿Cómo llego a este nivel para poder recorrer los elementos?

#reddit parse reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') #convert to string: reddit_data = reddit_file.read() #close file because we dont need it anymore: reddit_file.close() #entire feed reddit_root = etree.fromstring(reddit_data) channel = reddit_root.findall('{http://purl.org/dc/elements/1.1/}channel') print channel reddit_feed=[] for entry in channel: #get description, url, and thumbnail desc = #not sure how to get this reddit_feed.append([desc]) 

Puedes probar findall('channel/item')

 import urllib2 from xml.etree import ElementTree as etree #reddit parse reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') #convert to string: reddit_data = reddit_file.read() print reddit_data #close file because we dont need it anymore: reddit_file.close() #entire feed reddit_root = etree.fromstring(reddit_data) item = reddit_root.findall('channel/item') print item reddit_feed=[] for entry in item: #get description, url, and thumbnail desc = entry.findtext('description') reddit_feed.append([desc]) 

Escribí eso para ti usando expresiones Xpath (probadas con éxito):

 from lxml import etree import urllib2 headers = { 'User-Agent' : 'Mozilla/5.0' } req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers) reddit_file = urllib2.urlopen(req).read() reddit = etree.fromstring(reddit_file) for item in reddit.xpath('/rss/channel/item'): print "title =", item.xpath("./title/text()")[0] print "description =", item.xpath("./description/text()")[0] print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0] print "link =", item.xpath("./link/text()")[0] print "-" * 100