Problems with RSS as it is deployed

I have a some longstanding issues with RSS for example the method for RSS autodiscovery, however the two most important problems with respect to RSS are:

  1. Entity encoding in the <description> element.
  2. Resolving relative URLs.

As I use a decent news aggregator, I don’t suffer from the second problem. The first problem however is something that should interest us all. As Tim Bray notes, entity encoding in the description element and then expecting the encoding to be resolved back is prone to errors. This is due to the under specified nature of the various RSS branches and people just doing it in an effort to crowbar HTML (not necessarily well formed XHTML fragments) into the early RSS deployments.

How to do it right! In order to include even html in your RSS then there are a few steps you need to take.

Step 1: Convert to RSS 1 or RSS 2, earlier 0.9x versions do not support what I am proposing here.
Step 2: Include the <encoded> element from the RSS 1.0 content module namespace, using the namespace prefix “content” as in <content:encoded> will work in more readers.
Step 3: Wrap your content in a CDATA section and put the result into the <content:encoded> element.
Step 4: Ensure the result is well formed XML.

This solution can be used to ensure that the content is included in an element recognised as holding encoded data, rather than the much abused description element. This is the method I use for my own feed which you can take a look at it to get some ideas.

Using the Vera typeface with CSS

As I mentioned yesterdaynew typeface, Vera, has been released that is suitable for open source projects. In order to use it a bit more fully I wrote a quick test page and compared the rendering of each of the fonts in my font viewer program with the fonts used by my web browser, Mozilla 1.3, to ensure that the right font was consistently applied. Below is a short list of each of the fonts and the appropriate CSS font selection properties that will request the desired font.

  • Bitstream Vera Sans.
    font-family:'Bitstream Vera Sans';
  • Bitstream Vera Sans Bold.
    font-family:'Bitstream Vera Sans';
  • Bitstream Vera Sans Oblique.
    font-family:'Bitstream Vera Sans';
  • Bitstream Vera Sans Mono.
    font-family:'Bitstream Vera Sans Mono';
  • Bitstream Vera Sans Bold Oblique.
    font-family:'Bitstream Vera Sans';
  • Bitstream Vera Sans Mono Bold.
    font-family:'Bitstream Vera Sans Mono';
  • Bitstream Vera Sans Mono Oblique.
    font-family:'Bitstream Vera Sans Mono';
  • Bitstream Vera Sans Mono Bold Oblique.
    font-family:'Bitstream Vera Sans Mono';
  • Bitstream Vera Serif.
    font-family:'Bitstream Vera Serif';
  • Bitstream Vera Serif Bold.
    font-family:'Bitstream Vera Serif';

I’ve uploaded the test file I used for the typeface. It includes some lorem ipsum text to help get a better feel for the typefaces.

Back from Holiday!

Yes I’ve been away in Cyprus for a short while, there was some great weather while I was over there and some excellent historical sites to visit. There was also some fun to be had as well as can be seen in the photographs below.

A photograph of me paragliding, this is me taking off Me taking off while paragliding.

A photograph of me paragliding, this is me in flight In flight.

I’ve received a few emails from my contact form where people have asked me something but then forgot to include their email! If you ask me a question make sure you include your email address. I have cleared up my contact form a bit so it should be harder to forget now.

In other news the Vera font 1.10 has been released. Hopefully this will lead to further improvements in the quality of freely available fonts. They are truetype fonts so they can be used with most modern operating systems that I know of, windows users can use them but they are distributed in a compressed format, I got the tar.gzip format, so they will have to use something like winzip to access them. thanks to Russell Beatie for pointing these fonts out.

RSS 1.0 Modules Explored: Taxonomy

In the first of an infrequent set of weblog entries I’ll be briefly exploring the RSS 1.0 modules. The first on the list is the Taxonomy module. The taxonomy module is a way of specifying topical structure within an RSS channel.

The Taxonomy modules has a namespace is referenced by including the following namespace in your RSS document.
xmlns:taxo = ""

The modules contains two elements the “topic” and “topics” elements. Use of these elements, and the taxonomy namespace, is not very widespread at present (Syndic8 stats). The taxonomy module has many potential uses however. For example filtering an RSS channel by topic could be an interesting development for a large feeds. Other aggregation strategies could be based upon gathering items from a variety of channels with a commonly identified topic, this kind of meta information is potentially more useful than simple textual analysis to discover similarities, and may enable knowledge discovery even when synonyms to common terms are used. See the taxonomy module details for more information on how to integrate this module with an RSS 1.0 feed.

New Mozilla Roadmap Released

Big news from mozilla, a new roadmap for the application suite has been released. I’m not going into too much analysis of it here, but as far as I am concerned it takes things in the right direction. Rather than bundling everything as one “monolithic application” the new suite is based around separating the components out and thus cleaning up the user interface and easing integration with other applications. Ever since I figured out how to stop Mozilla (the browser) opening mozilla mail instead of my default mail client (Outlook Express) I’ve wished for a simpler UI and better integration with other applications.

So in summary the new roadmap is a good step, I fully support the mozilla teams efforts. This has been coming for a while with the release of the phoenix browsers and the recent information released regarding minotaur.

P.S. To get the default mail client activated use the following in a user.js files (the same folder as your prefs.js file)user_pref("network.protocol-handler.external.mailto", true);

A few useful links though:

nntp//rss 0.3 hits the street

A new release of my favourite newsreader has hit the streets. The latest release implements a number of improvements over the previous builds. One of the main highlights includes support for setting individual polling intervals, this was really useful for me as some of my feeds were timing out on the connection when nntp//rss tried to retrieve them at the same time. I can now carefully stagger them to balance the load somewhat and resolve some timeout issues I had.

The big feature improvement though is the support of various blogging API’s to allow items to be posted to the weblog from the newsreader. Some people think this is a bit of an overkill, but I think it is quite cool. Here’s an extract from the release:

Posting – You can now post entries to your blog from within your newsreader. nntp//rss has support for the popular Blogger, LiveJournal and MetaWeblog APIs.

The next logical step would be integrating some kind of commenting integration. a link to the comments for an article is already included in the aggregated content, what I am thinking of though is the kind of integration where I can just reply to a post from my news reader. Is this possible? Well I don’t know how the API’s Jason Broome listed currently work, but at least one weblog I read implements thecommentAPI developed by Joe Gregorio, creator of my previous favourite newsreader Aggie.

Well at least I have identified some reading to do in the future, the weblogging API’s. I’ll post links to them here so I know where to look when I get the time.
Further Reading:

Content delivery using XML

Having read Jay Small’s recent discussion of user interactivity on news websites I had one of those moments when you just think to yourself, “Yes, this is how it is supposed to work”. The prompting for this wasn’t solely the content of the newsletter, but some of the thoughts I had related to the article.

The article talked about message boards and how they resembled Usenet. For those who aren’t initiated Usenet is alive and kicking, in fact it is possible to integrate message boards with Usenet to great effect. One of the best examples I have seen of this is on the website. This is a fansite for the classic Elite series of games with a very active message/bulletin board. I haven’t visited this site for months, but it provides an interface that suits my needs and still lets me interact with the content and with other users. I like using my newsgroup reading software, Outlook Express (Mozilla Mail still hasn’t converted me), and being able to get what I want delivered the way I want it is the key to my interaction with that particular site.

This prompted another thought, RSS is another means of doing this. I use the nntp//rss news aggregator to read my favourite weblogs with all my other newsgroups. I guess this is one of the benefits of XML, delivering content to the user and letting them consume it in the way that best suits them. What better branding is there than serving a customer well?

Miscellaneous. I have been busy these last few days, a couple of pointers for you, the Feedster RSS search engine, despite not including <content:encoded> data in the search results as a replacment for the <description> element it’s pretty cool. I’ve also created a Mozilla/Netscape search plugin for Feedster.

Yahoo marking competition as Spam

Recently while setting up someone to use Windows Messenger I came across something interesting. Yahoo’s spam mail filters deliver Microsoft’s request for confirmation e-mail to the bulk mail folder. Yahoo competes with Microsoft in the instant messenger market with their Yahoo Messenger application. Throwing vital information from a competitor into the bulk mail folder is something that should be carefully looked at. This caused me a delay in performing the sign up for this person as I assumed the mail was delayed in internet traffic somewhere, it wasn’t until several hours later when I checked again that I found the mail still wasn’t in the inbox, this e-mail was delivered to the bulk mail folder.

This post is not saying that Yahoo is engaged in a conspiracy to cause problems to its e-mail users who try to signup with alternate instant messenger providers, however it is an issue which Yahoo should treat carefully, if people feel Yahoo is attempting to restrict them and coerce them into using their products then Yahoo’s public image will be damaged. Something similar has happened with all the anti-trust issues surrounding Microsoft, in the eyes of many people Microsoft is not a corporation they “trust”, Yahoo should be careful that they don’t follow Microsoft down that road.

Related Links: (Partial) Screenshot of the Microsoft passport e-mail (used with windows messenger service) in the Yahoo bulk mail folder. A previous weblog entry on Hotmail shutting out the (then) latest Mozilla browser.

Mozilla has “rich editing” in latest release

Mozilla version 1.3 has been released, this is the first stable version to include the Midas rich editing component. This brings the editing functionality of the Mozilla Composer and embeds it in a web page. Internet Explorer has had this functionality for a while, however like the Mozilla implementation the standard of the HTML generated is not optimal. This is an encouraging step for Mozilla to take, as this feature is refined and produces better HTML adoption as the front end of some open source publishing systems is likely to happen at a reasonable pace.

Love that splash screen. Yes mozilla has replaced its little dragon image with a solid orange splash screen, lovely. If on the other hand you prefer a nicer splash screen simply create a bitmap image file, call it “mozilla.bmp” and place it in the same directory as the Mozilla application.
Here’s one I prepared earlier:

The latest alternate Mozilla startup splash screen
(provided in jpeg format, remember to convert it to a bitmap)

My original splash screens for the Mozilla 1.0 release, which this latest splash is based on, are available in my weblog archives.

Longhorn vs. RDF

Microsoft’s successor to Windows XP, Longhorn, is set to promote a new file system that advances on the hierarchical paradigm that is common in contemporary operating systems. Built upon a database system, users will have a much more flexible interface to use when searching for information. This information is of course already known and has been reported on, however what I find interesting are some of the potential use cases for this system.

One reported use case, searching for specific digital photographs:

You don’t want to search by file name, because they’re all called dsc035.jpg. You want to search,

  • show me the pictures I took last month,
  • show me the pictures of me and my wife,
  • show me the pictures of my children,
  • show me the pictures from Christmases.

To do that requires a change both in our user interface and also a change in our model for how information is stored on the computer.

Retrieval of complex data sets is an interesting problem area, what struck me though as I came upon this topic was how similar the use cases are compared with contemporary systems using RDF as the metadata “container”. FOAF for example has explored topics such as attaching metadata to photographs for example, the co-depiction photo metadata experiment for example has spawned tools which can answer a range of queries:

  • Get me all image of person X,
  • Get me all the images of person X that also have person Y in them,
  • Order me a Hawaiian Pizza with… (sorry, slipped in from an article about internet connected fridges)

The point is that there is a lot of interesting applications for RDF, and its vocabularies, that could in many ways be even more innovative than Microsoft’s current plans, especially if it is hooked up to a nice XML database that can deal with semi-structured data much more effectively than an SQL based system.

More innovative? Sure, the Microsoft paradigm is quite interesting, but the extensibility inherent in RDF allows for many more potential uses, as well as tight integration with internet based data. Microsoft are still fundamentally distanced, on a file system level, from the internet and from integration with other operating systems, whereas RDF is an established standard that can be deployed widely on both personal machines and over networks.

UI Problems Microsoft isn’t going to be persuading non-gee ks to learn SQL anytime soon, the UI is an important tool in leveraging all this metadata we want to query. How is it going to be done on the RDF end? A natural language interface perhaps, or graphical exploration as demonstrated by FOAFNaut? As the amount of information we produce and consume increases at an ever increasing rate it needs to be tackled, you should see the size of the “My Pictures” folder of my PC since I bought a digital camera!

Further Reading:
[provided by]