I’m parsing more and more RSS feeds and I’m seeing some very basic problems. While the feeds are valid, they’re obfuscated. Harder to parse – not by poor tag usage – but by misusing the content within the tags.
Here’s a quick run down of the tags as documented in the RSS 2.0 Spec and my interpretation of them.
Item Tags
One or both of the following are required by the spec:
title, description
I’m fine with encoded HTML. I’m fine with using the first 50 characters or so as a title if you’d rather not use title (it is overrated). I’m not cool with having author or pubDate info in either of these tags. Just makes it harder to parse.
and then any of these. I consider the first 4 required:
author
A special place for the author. Author info doesn’t belong in the link, description, or title – it belongs here. If there’s author info, the author tag should be used. (Twitter, Twittergram)
link
The URL pointing to this specific item, most likely something that’ll load in a web browser. Not a tinyurl or another redirect, this should be the permalink at the originial source (Feedburner)
guid
A unique string identifying the item. For simplicity in publishing this tag may be identical to link, doesn’t have to be. For example, in Twittergrams, the guid is a tinyurl. Still unique, but not technically the link.
pubDate
At some point, every item was published – so it has a publication date. Put it here, in RFC 2822 format, e.g. “Thu, 7 Apr 2005 01:46:36 -0300”. In my aggregators, I don’t guess what the publication date is, I set it to Jan 1, 1970 00:00:00 2, so items don’t show up in reverse chronological order, but they’re still in the system.
enclosure
One enclosure per item. Thanks. Other tags aren’t duplicated, no reason to duplicate this one. Remember, any file can be an enclosure. Not just audio and video files (Flickr).
source
The URL of the site the items was originally published at, think of it as a more general link. This one should be used more by aggregators.
category, comments
I have no qualms with how these tags are used. Yet.
Elsewhere 18 Aug 2008:
1. I argue link is the URL attribute of the Twittergram’s enclosure
2. Specifying ‘now’ sometimes backfires, bringing old posts back from the past.