Content delivery using XML

Having read Jay Small’s recent discussion of user interactivity on news websites I had one of those moments when you just think to yourself, “Yes, this is how it is supposed to work”. The prompting for this wasn’t solely the content of the newsletter, but some of the thoughts I had related to the article.

The article talked about message boards and how they resembled Usenet. For those who aren’t initiated Usenet is alive and kicking, in fact it is possible to integrate message boards with Usenet to great effect. One of the best examples I have seen of this is on the alioth.net website. This is a fansite for the classic Elite series of games with a very active message/bulletin board. I haven’t visited this site for months, but it provides an interface that suits my needs and still lets me interact with the content and with other users. I like using my newsgroup reading software, Outlook Express (Mozilla Mail still hasn’t converted me), and being able to get what I want delivered the way I want it is the key to my interaction with that particular site.

This prompted another thought, RSS is another means of doing this. I use the nntp//rss news aggregator to read my favourite weblogs with all my other newsgroups. I guess this is one of the benefits of XML, delivering content to the user and letting them consume it in the way that best suits them. What better branding is there than serving a customer well?

Miscellaneous. I have been busy these last few days, a couple of pointers for you, the Feedster RSS search engine, despite not including <content:encoded> data in the search results as a replacment for the <description> element it’s pretty cool. I’ve also created a Mozilla/Netscape search plugin for Feedster.

Yahoo marking competition as Spam

Recently while setting up someone to use Windows Messenger I came across something interesting. Yahoo’s spam mail filters deliver Microsoft’s request for confirmation e-mail to the bulk mail folder. Yahoo competes with Microsoft in the instant messenger market with their Yahoo Messenger application. Throwing vital information from a competitor into the bulk mail folder is something that should be carefully looked at. This caused me a delay in performing the sign up for this person as I assumed the mail was delayed in internet traffic somewhere, it wasn’t until several hours later when I checked again that I found the mail still wasn’t in the inbox, this e-mail was delivered to the bulk mail folder.

This post is not saying that Yahoo is engaged in a conspiracy to cause problems to its e-mail users who try to signup with alternate instant messenger providers, however it is an issue which Yahoo should treat carefully, if people feel Yahoo is attempting to restrict them and coerce them into using their products then Yahoo’s public image will be damaged. Something similar has happened with all the anti-trust issues surrounding Microsoft, in the eyes of many people Microsoft is not a corporation they “trust”, Yahoo should be careful that they don’t follow Microsoft down that road.

Related Links: (Partial) Screenshot of the Microsoft passport e-mail (used with windows messenger service) in the Yahoo bulk mail folder. A previous weblog entry on Hotmail shutting out the (then) latest Mozilla browser.

Mozilla has “rich editing” in latest release

Mozilla version 1.3 has been released, this is the first stable version to include the Midas rich editing component. This brings the editing functionality of the Mozilla Composer and embeds it in a web page. Internet Explorer has had this functionality for a while, however like the Mozilla implementation the standard of the HTML generated is not optimal. This is an encouraging step for Mozilla to take, as this feature is refined and produces better HTML adoption as the front end of some open source publishing systems is likely to happen at a reasonable pace.

Love that splash screen. Yes mozilla has replaced its little dragon image with a solid orange splash screen, lovely. If on the other hand you prefer a nicer splash screen simply create a bitmap image file, call it “mozilla.bmp” and place it in the same directory as the Mozilla application.
Here’s one I prepared earlier:

The latest alternate Mozilla startup splash screen
(provided in jpeg format, remember to convert it to a bitmap)

My original splash screens for the Mozilla 1.0 release, which this latest splash is based on, are available in my weblog archives.

Longhorn vs. RDF

Microsoft’s successor to Windows XP, Longhorn, is set to promote a new file system that advances on the hierarchical paradigm that is common in contemporary operating systems. Built upon a database system, users will have a much more flexible interface to use when searching for information. This information is of course already known and has been reported on, however what I find interesting are some of the potential use cases for this system.

One reported use case, searching for specific digital photographs:

You don’t want to search by file name, because they’re all called dsc035.jpg. You want to search,

  • show me the pictures I took last month,
  • show me the pictures of me and my wife,
  • show me the pictures of my children,
  • show me the pictures from Christmases.

To do that requires a change both in our user interface and also a change in our model for how information is stored on the computer.

Retrieval of complex data sets is an interesting problem area, what struck me though as I came upon this topic was how similar the use cases are compared with contemporary systems using RDF as the metadata “container”. FOAF for example has explored topics such as attaching metadata to photographs for example, the co-depiction photo metadata experiment for example has spawned tools which can answer a range of queries:

  • Get me all image of person X,
  • Get me all the images of person X that also have person Y in them,
  • Order me a Hawaiian Pizza with… (sorry, slipped in from an article about internet connected fridges)

The point is that there is a lot of interesting applications for RDF, and its vocabularies, that could in many ways be even more innovative than Microsoft’s current plans, especially if it is hooked up to a nice XML database that can deal with semi-structured data much more effectively than an SQL based system.

More innovative? Sure, the Microsoft paradigm is quite interesting, but the extensibility inherent in RDF allows for many more potential uses, as well as tight integration with internet based data. Microsoft are still fundamentally distanced, on a file system level, from the internet and from integration with other operating systems, whereas RDF is an established standard that can be deployed widely on both personal machines and over networks.

UI Problems Microsoft isn’t going to be persuading non-gee ks to learn SQL anytime soon, the UI is an important tool in leveraging all this metadata we want to query. How is it going to be done on the RDF end? A natural language interface perhaps, or graphical exploration as demonstrated by FOAFNaut? As the amount of information we produce and consume increases at an ever increasing rate it needs to be tackled, you should see the size of the “My Pictures” folder of my PC since I bought a digital camera!

Further Reading:
[provided by Amazon.com]

Valid HTML

A few days ago Marko Karppinen released the latest results on his analysis of validity of the W3 member homepages. The table below compares the results of this latest analysis with the previous results, released Aug 22nd 2002 and Feb22nd 2002.

Number of pages tested Number of valid pages % of valid pages
Feb 02 501 18 3.7%
Aug 02 454 21 4.6%
Feb 03 429 28 6.5%

There is an slow improvement in the validity of the homepages analysed, but the total validating is still very small. However it is favourable contrasted with the percentage of validating pages, 0.7%, gathered from a more general analysis of the validity of the web.

The home page of a member of the W3 is over 9 times more likely to validate than an average page.

I’ve been away

I’ve recently returned from Madrid, it was quite sunny even though the air was a little brisk. I also took the time to visit Toledo, famous for making swords, a very beautiful little city. Toledo apparently gets a bit overrun with tourists during the summer but as I went there in February it wasn’t as hectic.

XML subsets

I came across the concept of subsetting XML for ease of parsing by reading an article by Joe Gregorio called Regexable XML. It raises some interesting points and I would recommend you go over there and have a look around if that kind of thing interests you.

Total Information Awareness

I’ve stumbled across a few interesting articles recently regarding civil liberties, relax I’m not getting all warblogger I am just interested in the technological and social ramifications of this stuff. In fact I may even take bets (if I was a betting man) on how much money the British government is going to spend while they screw up the indentity card scheme they’re intent on introducing. Lets face it, introducing new information technology is not this governments strongpoint, cases in point the Home Office and the MOD (check out the list of screw ups at the bottom of that last link). Read a report on system failure.

At least the Government has not openly introduced anything like the TIA system those lucky Americans are going to enjoy.

One benefit of the system is that it may prevent your population desiring to learn more about the potential conflict in Iraq and it’s background from sources in the area, like this one. While reading comments elsewhere I came across this gem:

I’m curious to check out websites from around the world, especially in the middle east, to get a different view of what is going on, but am entirely too afraid that I may be black-listed or linked to a terrorist group.

But thats ok because if you are innocent you have nothing to fear, good citizen.

IAO Logo
Scientia Est Potentia, Knowledge Is Power.

It’s all about the questions

Foaf, great isn’t it? The reactions for those who know what Foaf is (let’s face the fact that it isn’t mainstream technology) are mixed, a common reaction is I haven’t found a practical use for it yet. There are some intresting memes floating around regarding interconnecting people using FOAF, RSS and other assorted metadata schemes. Application of these schemes is at a rudimentary stage at the moment (hence FOAF’s 0.1 version number), whether FOAF wins out over later formats is not really of concern, to me at least, what is interesting is thinking of the information we want to get from the data that is provided.

Two aspects of the same question. This introduction of large amounts of both personal and content based data leads to the question, who is the consumer? Two types of consumer are interested, potentially, in this data:

  • Data Miners
  • Geeks

One example of potential data mining applications is the sending of spam, luckily FOAF provides a means of hiding the email address of people who have FOAF data. However think of the potential in tying together email address with detailed information on a persons interests. This is certainly possible with FOAF, although the likelihood (aka potential payoff) is probably too low for it to be contemplated at the moment, the potential is there.

An example for the second consumer is more easy to come across, they are the creators of stuff like FOAF. There have been plenty of potential applications aired by those in the FOAF community, Using it with your blogroll and just generally finding “friends”. I’ve been examining some ways of using FOAF data myself, I am currently running a FOAF Harvesting robot for research purposes into potential applications. One possible application is the integration of FOAF based data into the browsing environment.

<<< Start Vapourware Content >>>

Bring on the Vapourware. First I will state that I have no intention of building this system myself, I am far too busy concentrating on the programming I’m having to do to get myself a degree! Anyway here is my idea, a foaf viewing sidebar. A simple implementation exists already that can be used to find out more information on the author of the page. The way the author information is found though is not in widespread use however (it uses meta information rather than link information to get data). The potential for a more polished implementation that supports the current trend for <link> based referencing of FOAF files would be quite interesting.

<<< End Vapourware Content >>>