A few days ago Marko Karppinen released the latest results on his analysis of validity of the W3 member homepages. The table below compares the results of this latest analysis with the previous results, released Aug 22nd 2002 and Feb22nd 2002.

Number of pages tested Number of valid pages % of valid pages
Feb 02 501 18 3.7%
Aug 02 454 21 4.6%
Feb 03 429 28 6.5%

There is an slow improvement in the validity of the homepages analysed, but the total validating is still very small. However it is favourable contrasted with the percentage of validating pages, 0.7%, gathered from a more general analysis of the validity of the web.

The home page of a member of the W3 is over 9 times more likely to validate than an average page.

I’ve been away

I’ve recently returned from Madrid, it was quite sunny even though the air was a little brisk. I also took the time to visit Toledo, famous for making swords, a very beautiful little city. Toledo apparently gets a bit overrun with tourists during the summer but as I went there in February it wasn’t as hectic.

XML subsets

I came across the concept of subsetting XML for ease of parsing by reading an article by Joe Gregorio called Regexable XML. It raises some interesting points and I would recommend you go over there and have a look around if that kind of thing interests you.

Total Information Awareness

I’ve stumbled across a few interesting articles recently regarding civil liberties, relax I’m not getting all warblogger I am just interested in the technological and social ramifications of this stuff. In fact I may even take bets (if I was a betting man) on how much money the British government is going to spend while they screw up the indentity card scheme they’re intent on introducing. Lets face it, introducing new information technology is not this governments strongpoint, cases in point the Home Office and the MOD (check out the list of screw ups at the bottom of that last link). Read a report on system failure.

At least the Government has not openly introduced anything like the TIA system those lucky Americans are going to enjoy.

One benefit of the system is that it may prevent your population desiring to learn more about the potential conflict in Iraq and it’s background from sources in the area, like this one. While reading comments elsewhere I came across this gem:

I’m curious to check out websites from around the world, especially in the middle east, to get a different view of what is going on, but am entirely too afraid that I may be black-listed or linked to a terrorist group.

But thats ok because if you are innocent you have nothing to fear, good citizen.

Scientia Est Potentia, Knowledge Is Power.

It’s all about the questions

Foaf, great isn’t it? The reactions for those who know what Foaf is (let’s face the fact that it isn’t mainstream technology) are mixed, a common reaction is I haven’t found a practical use for it yet. There are some intresting memes floating around regarding interconnecting people using FOAF, RSS and other assorted metadata schemes. Application of these schemes is at a rudimentary stage at the moment (hence FOAF’s 0.1 version number), whether FOAF wins out over later formats is not really of concern, to me at least, what is interesting is thinking of the information we want to get from the data that is provided.

Two aspects of the same question. This introduction of large amounts of both personal and content based data leads to the question, who is the consumer? Two types of consumer are interested, potentially, in this data:

  • Data Miners
  • Geeks

One example of potential data mining applications is the sending of spam, luckily FOAF provides a means of hiding the email address of people who have FOAF data. However think of the potential in tying together email address with detailed information on a persons interests. This is certainly possible with FOAF, although the likelihood (aka potential payoff) is probably too low for it to be contemplated at the moment, the potential is there.

An example for the second consumer is more easy to come across, they are the creators of stuff like FOAF. There have been plenty of potential applications aired by those in the FOAF community, Using it with your blogroll and just generally finding “friends”. I’ve been examining some ways of using FOAF data myself, I am currently running a FOAF Harvesting robot for research purposes into potential applications. One possible application is the integration of FOAF based data into the browsing environment.

<<< Start Vapourware Content >>>

Bring on the Vapourware. First I will state that I have no intention of building this system myself, I am far too busy concentrating on the programming I’m having to do to get myself a degree! Anyway here is my idea, a foaf viewing sidebar. A simple implementation exists already that can be used to find out more information on the author of the page. The way the author information is found though is not in widespread use however (it uses meta information rather than link information to get data). The potential for a more polished implementation that supports the current trend for <link> based referencing of FOAF files would be quite interesting.

<<< End Vapourware Content >>>

Interesting newsreader

I’ve been investigating a few new newsreaders recently, although I’m reasonably happy withAggie it isn’t fitting into my workflow as nicely as I would like. I’ve been evaluating a couple of potential replacements, such asnewsmonster but the one that has me really interested is nntp//rss.

This newsreader ties together RSS and NNTP, this is well suited to me as I spend a lot of time reading emails and catching up with my newsgroups, the aggregation runs in the background as well, altogether it doesn’t require as much effort (or another desktop icon), just like it should be.

One initial drawback was the lack of autodiscovery and integration with my web browser, however I wrote a bookmarklet that takes care of that. So if you download and install rss/nnttp come back here and get my bookmarklet (subscribe). Unfortunately it won’t work well if you try to subscribe to different sites on the same host, however I will work around that after I’ve had some sleep!

This bookmarklet was released for version 0.2 of nntp//rss. For further info on bookmarklets see