Open data standards and microformats across networks

There’s been a lot of talk about microformats lately, brought about the increasing presence of web-standards, awareness and development of design patterns and concern about open data formats for information exchange. Microformats are not something revolutionary, but they hold a simple, yet effective concept, that aims to take the best practices of the web and multiply them, thus, among other things, improve the way we interact with data on the Internet.

Microformats

At its most basic, microformats are a concept that advocates identifying certain pieces of information so that they can be better collected, parsed, and otherwise aggregated for other uses. For example, the hReview microformat proposes a way to mark up the data that comprises a review (of a product or service, let’s say) so that the pieces of information that comprise a review (author, date reviewed, rating, comments) can be easily discerned. This example deals with the microformat in XHTML or HTML, but it could also be applied to XML documents.

Since microformats do not have strict rules like XML documents do, applying one to an existing document to make it microformat friendly is straightforward and doesn’t require you to totally re-arrange your data. A nice example is shown in the hReview specification. The changes basically amount to adding a few class attributes on some elements, in order to identify certain pieces of information. For example, the name of the item being reviewed must be in an element that has the classes “fn” and “item” applied to it. (Either by the cascade, or directly) This a simple, but effective way to identify data – that is, to use metadata identify the sematics of the information.

Defined by usage, rather than defining usage

The neat thing about this is that it works entirely within the confines of HTML or XHTML to enhance the semantics of the language. Very few changes are needed, and in fact, the changes that microformats suggests are better since they will allow you to style elements better since they will be identified; in fact the way this turned out is no coincidence. Microformats were designed with current “best practices” in mind, so many “in the wild” examples of markup on sites were referenced before the format was specified, in order to develop something that better suited current usage, rather than trying to develop a new standard that would be radically different from what anyone was using. This makes it super-easy to adopt microformats if you’ve already been following current web-standards with separation of content and presentation.

Microformats: out in the open

Also, microformats are not just some idealized standard that’s being discussed, rather than implemented. Many sites already use them, such as Cork’d. As mentioned before, this is because they are quite easy to adopt in a site that has already been well-designed, and so they don’t require a headache-inducing, total re-working of the structure. Thus, web-developers can easily see the benefits of adopting microformats – in many cases, they’re already using them, albeit not with the same class names or attributes. Admittedly, there could be more support for microformats, but I believe that will come very soon, as open source developers of blogging software, CMSs and other portal systems update their code to provide support.

Another important aspect, already mentioned, is the organic development of the microformats. As seen on the examples page of the microformats website, there is much open discussion before any microformat specifications are developed; for this reason, almost all of them are open to discussion, based on what is considered to be the current “best practice” out on the Internet nowadays. In this way, the specifications tend to evolve around what are the best examples out there, rather than the specifications being a pragmatic announcement of what “should be done” and what “is the right way”.

For this reason, I believe microformats will be better accepted, since they will strike the right balance between standardization, ease-of-implementation, and practicality. They will avoid the fate of other web-standards, some of which have been good on paper, but suffer from poor implementation either because of practical issues or compatibility problems.

Open data formats

Microformats are also a simple but effective open data format. With any data format, interoperability and compatibility are issues in the long run. If I save a file in a certain format, can I be assured it will be readable by the hardware and software of computers 10, 20 or 50 years from now? For some data, this is not important, but for some it is very important, such as a digital photo album, any archive information, or records. With the fast pace of development for computer formats, and technology in general, this is a definite problem.

So how are microformats any better than any other format, open or closed? Surely being an open format doesn’t guarantee operability in the long run? That may be true, but one proven hallmark of future-proof data formats is being an established and traditional format. As pointed out by Tantek, this is why Project Gutenberg (“the first and largest collection of eBooks”) has decided to store their archives in the venerable ASCII format. While this doesn’t offer any sort of markup options, it does virtually guarantee compatibility in the future – it’s highly unlikely that ASCII will be forgotten completely in the future.

For these same reasons, Tantek also believes that “Compatible XHTML” (or valid XHTML 1.0 strict compatible with HTML) is also dependable over time. HTML has been in use for over a decade on the Internet, the most widely-distributed, distributed-database in human history. Thus, its ubiquity is certainly not in doubt; its ease of use also helps, as it enables wide adoption.

Openness and the present

Okay, so open data formats can be good in the long run. But what about now? Well, there’s currently some pretty exciting stuff going on. Recently, People Aggregator announced they’d be starting a social network (nothing new) that would also allow users to post across multiple, different, social networking or blogging sites. This is currently accomplished by using APIs such as Metaweblog, but in the future, they envision more complete interoperability. The idea is that switching from one service to another shouldn’t be hard, as if open standards are used, compatibility can be ensured – rather than having to make sure the new service provider has a conversion utility to convert from your old provider.

The additional benefits could also be quite useful. For example, creating a “friends” page listing your friends’ most recent blog posts and content wouldn’t be difficult. If you’re using Livejournal, this is already easy, but the people on your friends list must also be on Livejournal. If People Aggregator is successful at getting more networks onboard their idea, adding people to a friends list would be easy, regardless of what service they were using. There is already a WordPress plugin that does this, by parsing a friends’ RSS feed. But People Aggregator’s idea is to make this functionality accessibile from blogging service.

Problems

The main threat I see against adoption of standards a direct result of the benefits it offers. As mentioned, these open data formats would allow content from one service to be syndicated and/or displayed elsewhere on the web. This helps increase the accessibility of information, as you no longer have only one place where you can get it from – you can display the information how you want to, and in what order you want.

This could directly threaten the ability for services to make money because of the circumvention of ad revenue that would take place. Many social networking sites survive because they allow people to freely make content and share it from their sites, so as long as ads are served up to viewers. If the content can be aggregated, and shown elsewhere, then the ads aren’t shown. Unless agreements are reached about this, (such as ensuring ads are still delivered in any data delivered outside the site, or ad-revenue sharing), this may be a major impediment to adoption.

This would be sad, since open data formats and the ability to “subscribe” to content and display it in a non-traditional way are basic concepts of what I would consider “web2.0″ technologies. This is a natural extension to what HTML allowed, in that it will allow for content to flow easier, just like HTML did some 10-15 years ago.

Comments for this entry are closed

But feel free to indulge in some introspective thought.