Longhorn vs. RDF

Microsoft’s successor to Windows XP, Longhorn, is set to promote a new file system that advances on the hierarchical paradigm that is common in contemporary operating systems. Built upon a database system, users will have a much more flexible interface to use when searching for information. This information is of course already known and has been reported on, however what I find interesting are some of the potential use cases for this system.

One reported use case, searching for specific digital photographs:

You don’t want to search by file name, because they’re all called dsc035.jpg. You want to search,

  • show me the pictures I took last month,
  • show me the pictures of me and my wife,
  • show me the pictures of my children,
  • show me the pictures from Christmases.

To do that requires a change both in our user interface and also a change in our model for how information is stored on the computer.

Retrieval of complex data sets is an interesting problem area, what struck me though as I came upon this topic was how similar the use cases are compared with contemporary systems using RDF as the metadata “container”. FOAF for example has explored topics such as attaching metadata to photographs for example, the co-depiction photo metadata experiment for example has spawned tools which can answer a range of queries:

  • Get me all image of person X,
  • Get me all the images of person X that also have person Y in them,
  • Order me a Hawaiian Pizza with… (sorry, slipped in from an article about internet connected fridges)

The point is that there is a lot of interesting applications for RDF, and its vocabularies, that could in many ways be even more innovative than Microsoft’s current plans, especially if it is hooked up to a nice XML database that can deal with semi-structured data much more effectively than an SQL based system.

More innovative? Sure, the Microsoft paradigm is quite interesting, but the extensibility inherent in RDF allows for many more potential uses, as well as tight integration with internet based data. Microsoft are still fundamentally distanced, on a file system level, from the internet and from integration with other operating systems, whereas RDF is an established standard that can be deployed widely on both personal machines and over networks.

UI Problems Microsoft isn’t going to be persuading non-gee ks to learn SQL anytime soon, the UI is an important tool in leveraging all this metadata we want to query. How is it going to be done on the RDF end? A natural language interface perhaps, or graphical exploration as demonstrated by FOAFNaut? As the amount of information we produce and consume increases at an ever increasing rate it needs to be tackled, you should see the size of the “My Pictures” folder of my PC since I bought a digital camera!

Further Reading:
[provided by Amazon.com]

Valid HTML

A few days ago Marko Karppinen released the latest results on his analysis of validity of the W3 member homepages. The table below compares the results of this latest analysis with the previous results, released Aug 22nd 2002 and Feb22nd 2002.

Number of pages tested Number of valid pages % of valid pages
Feb 02 501 18 3.7%
Aug 02 454 21 4.6%
Feb 03 429 28 6.5%

There is an slow improvement in the validity of the homepages analysed, but the total validating is still very small. However it is favourable contrasted with the percentage of validating pages, 0.7%, gathered from a more general analysis of the validity of the web.

The home page of a member of the W3 is over 9 times more likely to validate than an average page.

I’ve been away

I’ve recently returned from Madrid, it was quite sunny even though the air was a little brisk. I also took the time to visit Toledo, famous for making swords, a very beautiful little city. Toledo apparently gets a bit overrun with tourists during the summer but as I went there in February it wasn’t as hectic.

XML subsets

I came across the concept of subsetting XML for ease of parsing by reading an article by Joe Gregorio called Regexable XML. It raises some interesting points and I would recommend you go over there and have a look around if that kind of thing interests you.

Total Information Awareness

I’ve stumbled across a few interesting articles recently regarding civil liberties, relax I’m not getting all warblogger I am just interested in the technological and social ramifications of this stuff. In fact I may even take bets (if I was a betting man) on how much money the British government is going to spend while they screw up the indentity card scheme they’re intent on introducing. Lets face it, introducing new information technology is not this governments strongpoint, cases in point the Home Office and the MOD (check out the list of screw ups at the bottom of that last link). Read a report on system failure.

At least the Government has not openly introduced anything like the TIA system those lucky Americans are going to enjoy.

One benefit of the system is that it may prevent your population desiring to learn more about the potential conflict in Iraq and it’s background from sources in the area, like this one. While reading comments elsewhere I came across this gem:

I’m curious to check out websites from around the world, especially in the middle east, to get a different view of what is going on, but am entirely too afraid that I may be black-listed or linked to a terrorist group.

But thats ok because if you are innocent you have nothing to fear, good citizen.

IAO Logo
Scientia Est Potentia, Knowledge Is Power.

It’s all about the questions

Foaf, great isn’t it? The reactions for those who know what Foaf is (let’s face the fact that it isn’t mainstream technology) are mixed, a common reaction is I haven’t found a practical use for it yet. There are some intresting memes floating around regarding interconnecting people using FOAF, RSS and other assorted metadata schemes. Application of these schemes is at a rudimentary stage at the moment (hence FOAF’s 0.1 version number), whether FOAF wins out over later formats is not really of concern, to me at least, what is interesting is thinking of the information we want to get from the data that is provided.

Two aspects of the same question. This introduction of large amounts of both personal and content based data leads to the question, who is the consumer? Two types of consumer are interested, potentially, in this data:

  • Data Miners
  • Geeks

One example of potential data mining applications is the sending of spam, luckily FOAF provides a means of hiding the email address of people who have FOAF data. However think of the potential in tying together email address with detailed information on a persons interests. This is certainly possible with FOAF, although the likelihood (aka potential payoff) is probably too low for it to be contemplated at the moment, the potential is there.

An example for the second consumer is more easy to come across, they are the creators of stuff like FOAF. There have been plenty of potential applications aired by those in the FOAF community, Using it with your blogroll and just generally finding “friends”. I’ve been examining some ways of using FOAF data myself, I am currently running a FOAF Harvesting robot for research purposes into potential applications. One possible application is the integration of FOAF based data into the browsing environment.

<<< Start Vapourware Content >>>

Bring on the Vapourware. First I will state that I have no intention of building this system myself, I am far too busy concentrating on the programming I’m having to do to get myself a degree! Anyway here is my idea, a foaf viewing sidebar. A simple implementation exists already that can be used to find out more information on the author of the page. The way the author information is found though is not in widespread use however (it uses meta information rather than link information to get data). The potential for a more polished implementation that supports the current trend for <link> based referencing of FOAF files would be quite interesting.

<<< End Vapourware Content >>>

Interesting newsreader

I’ve been investigating a few new newsreaders recently, although I’m reasonably happy withAggie it isn’t fitting into my workflow as nicely as I would like. I’ve been evaluating a couple of potential replacements, such asnewsmonster but the one that has me really interested is nntp//rss.

This newsreader ties together RSS and NNTP, this is well suited to me as I spend a lot of time reading emails and catching up with my newsgroups, the aggregation runs in the background as well, altogether it doesn’t require as much effort (or another desktop icon), just like it should be.

One initial drawback was the lack of autodiscovery and integration with my web browser, however I wrote a bookmarklet that takes care of that. So if you download and install rss/nnttp come back here and get my bookmarklet (subscribe). Unfortunately it won’t work well if you try to subscribe to different sites on the same host, however I will work around that after I’ve had some sleep!

This bookmarklet was released for version 0.2 of nntp//rss. For further info on bookmarklets see bookmarklets.com

Just FOAFing around

I’ve managed to get my FoafHarvester up and running, it is currently gathering some data for a foaf exploration application I am building. All told I picked up about 350Kb of FOAF data during the crawl. Other than a couple of minor glitches it went quite well. The data is currently in an XML file, I’m just writing a program to put all the data into an SQLServer database at the moment.

Some useful FOAF stuff. If you’re interested in learning more about FOAF then the best bet is the RDFWeb FOAF page, this links to all kinds of good stuff so I won’t replicate it here. Well apart from FOAFNaut, and this presentation I found “Photo RDF, Metadata and Pictures” that talks a little about FOAF.

Setting up a serious development system

The time has finally come that all my programming projects are getting a little bit unwieldy to manage, version control consisting of backing up all the files now and again to a different directory. I’ve finally got around to installing CVS on my computer. One of the advantages of this is that my project folders have suddenly become a lot more organised as I can just keep old projects in the CVS and out of my frequently used folders.

After setting all this up I’m beginning to feel more like a serious programmer, my development tools are taking shape. As I am a bit of a geek I’ll give you a rundown of what my current programming setup is like.

  • CVSNT – versioning control system.
  • TortoiseCVS – Graphical CVS Interface for Windows.
  • Textpad – Favourite text editor (It’s that good I even registered my copy!)
  • .Net SDK – Big download, but the reference is invaluable, and the command line tools are good.
  • Visual Studio 6 – I can’t afford the VS.Net yet as they haven’t released a student edition yet.
  • CYGWin Bash ShellNTEmacs – For when I want to do the unix vibe.
  • Active State Perl – Because its just so damn useful.

Add a few batch scripts I’ve written, a few custom macros and commands to integrate my text editor with the .Net command line tools and voila, a nice little development system.