Metadata is the biggest new idea to hit the internet since... well... data itself. In some subtle and not so subtle ways, the prevalence of metadata is transforming every apsect of our digital lives. But metadata's reach extends well past our del.icio.us tags and flickr categories. Metadata is transforming the way software is built and managed as well. We'll take a look at the metadata phenomenom from top to bottom in my first ever blarticle.
I Never Met a Data I Didn't Like
Apparently we have enough digital stuff. I’ve come to this conclusion because lately all I seem to be doing is organizing things. After some deep contemplation, I think I have tracked the source of this obsession for organizing things back to the physical world: Pottery Barn. For years, like good little consumers, we all went out and bought candle holders, throw pillows (umm… not me.. but I have a “friend” that buys throw pillows), ungodly amounts of picture frames (I’ve made new friends just to fill all the ones I bought), and seasonal cocktail sets (I should write a blog entry on this phenomenon alone). Well, the clever folks at Pottery Barn must have predicting with actuarial accuracy the threshold at which we’d realize we actually had too much stuff – and launched a successful preemptive strike. They opened Hold Everything. This store has actually done extremely well (from what I can see) selling only one thing: containers for things we bought at Pottery Barn. How did they guess I’d need a nice box for all my extra throw pillows? Now tell me – is that not a strategy or what? I seriously have to tip my capitalistic hat at that one a few times. But who can blame them. Americans are the ultimate consumer pack rats, and I am coming to realize we’re no different in the digital world. I now save every piece of email, every photo, every revision of every PowerPoint presentation, and every other digital scrap that litters my desktop.
After confessing this to a few people, I realized I was not alone in my new obsession for saving and organizing every digital thing I own. I spent a mind-expanding last weekend talking to my best friend and digital futurist (he’d better have smug smile on his face after receiving that label), Matt Cutler. He has two major claims to fame as far as I can tell: he’s actually in that tiny little group of people that DO use the 95% of Microsoft Office functionality that most of us mere mortals would never touch with a 10 foot pole, and he is minor deity when it comes to good PowerPoint. He mail merges like you and I change a font. He uses pivot tables in Excel like you and I double space, and the man can make animating slides in PowerPoint look as easy as printing in Landscape mode in Word. Apparently because of his affinity for these mystical skills, he has a lot of stuff to organize. This came up when we were on my deck in Boston, drinking a beer, and he started to extol the virtues of Google’s semi-new Gmail. To Matt, it wasn’t the unlimited disk space, it wasn’t the nifty clean interface, it was the revolutionary fact that they have no concept of folders. Being an Outlook folder junkie I proclaimed, “No folders! How can you find anything?” You just “tag it” he said with Obi-Wan Kenobi calmness. You get an email from your mother about your sister’s birthday party just tag it “Mom” and “Sister”. Then, whenever you want to find those emails, you just type in the tags that make sense for whatever you are looking for. Simple, elegant, brilliant – like Apple not including a battery cover on the iPod. We are clearly standing on the shoulders of giants when we use GMail.
This led to an interesting discussion on an emerging phenomenon that I have come to realize we all use on a daily basis. Maybe you don’t know you are using it yet, but I guarantee it will be a part of your every day life within 5 years: metadata. Its everywhere you look if you look closely enough. Data about Data. Redundantly redundant, isn’t it. The beauty about this is that metadata can be used for organization, classification, AND abstraction (we’ll come back to this last golden nugget in a little bit). There is really a consumer side to this coin and a technical one. Lets start with the consumer side for a second.
Let me first point out that you are using a structured form of metadata right now, today, without even knowing it. It’s called Yahoo. When you use the Yahoo directory you’re really using a metadata directory that the good souls at Yahoo built for you. Ever put together a playlist on that iPod – that’s metadata working for ya as well. Ever scoured Match.com for your future ex-boyfriend (everyone has to try it once, I know, I know)? Those profiles are tightly wrapped bits of highly romantic metadata (because in the end, doesn’t “Loves to listen to music” sum someone up much better than a short essay on their experiences following Lyle Lovitt around the country for a year after college?)
Matt pointed out that there are many examples of metadata getting more free form and more exposed directly to the user. Del.icio.us (this is a website, and yes the periods are intentional) is essentially a metadata overlay system for the internet. It’s a collaborative place where people can tag various HTML pages with concepts that are meaningful to them and then share these concepts with others. To me, a web page on Emerging Research in Crystallography might get tagged as “HUH?” or “REASONS I FAILED CHEMISTRY” but to someone about to start a new biotech business, it may get tagged as “THERE GOES OUR PATENT”. Such is the beauty of metadata – depending on who enters it, it can have different meaning. The emergence of Wikis and the new commercial versions therein (JotSpot, etc..) essentially allow users to collaborate and comment on data and information that users enter into a central place. I found it interesting that there is so much attention on wiki’s these days, as I was reminded that the original version of Mosaic (for those newer to the Internet, Mosaic was the web browser that predated everything you may use today IE, Netscape, Opera, Firefox, etc..) actually had a web page annotation feature built right into it. Well – sort of. If you looked carefully at the code (as I did back then as I was hacking it apart the MIT Media Lab for a project), it was all commented out with a big preceding comment block that said something like:
// I’ll get this to work later. For now I have to go meet some guy named Jim about starting a
// company. I’m not sure what he’s really interested in, but he keeps saying something about
// needing some money for a boat he’s building. - Marc
(if you don’t get that very vague joke/reference then please refer to the August 2005 cover article in Wired)
As a side note, apparently Tim Berners Lee has been recently reminiscing through the Mosaic code comments as well because he keeps going on and on about this thing called the “Semantic web”. More on this too in a little bit.
Well whether you like it or not, you’re a metadata junkie already. And its only gonna get worse. Microsoft announced that Windows Vista will support a file tagging scheme that will be the backbone of their file system search. I know this is truly disappointing to many of you because its really quite pleasurable to wait 5 minutes for Windows search to find that Best Man speech you were working on last night and can’t remember where you saved it. Ahh, if you’d only skipped implied metadata (file directories) and jumped right to intentional metadata (tagging) you’d be having them rolling in the aisles by now.
Okay – so lets get technical for a second. The concept of metadata is not new. Metadata has been doing its thing quietly behind the scenes for a long time now. To some extent I lament that all these new technologies are sort of stealing the behind-the-scenes metadata thunder. But so it goes. However, technical metadata is on a comeback. There is actually a lot happening in terms of technology architecture right now that is calling back up that old metadata friend and asking if he’s up for having a beer sometime soon. This re-emergence appears all over the place, from the operating system to the way new programming languages are built, all the way up to how enterprise applications do their thing.
The kudos really go to Java for being the first to reemerge with strong support of metadata. As I mentioned somewhere in this article already, one of the great things about metadata is abstraction. I don’t expect this piece of the metadata puzzle to really make it to the consumer forefront any time soon, but it is one of the most powerful concepts in metadata. One of the apparently ground breaking things that java did (the ANSI C guys are rolling in their graves right now I am sure) is make its language operating system independent. It didn’t matter if you had a Windows machine, a Mac, a Next, a cell phone, or a Solaris box (remember them – ahem – bitter SUNW shareholder speaking here). All you needed was two things – the Java bytecode and a Java Runtime Environment that knew what to do with it. See, the smarties at Sun built metadata into their programming language. When you were done doing whatever you wanted in the programming language, it simply turned the whole thing into bytecode (metadata). This abstraction could be passed around to different operating system JREs and viola, it was interpreted the right way in the right context for that operating system. Well, never to be outdone, Microsoft launched .NET and its associated programming languages (C#, JScript.NET, Java.NET, VB.NET, etc..). They basically used the exact same idea except they went one step further and said “start with any language you want” and we’ll compile it all down to metadata. It’s a pretty nifty concept to be able to write half your program in VB and half in C# and have it all blend together (but it’s a horrible concept in practice in terms of management).
We also see emerging uses for metadata in the content arena. In the end of the 90’s, with the advent of multiple client devices (such as new mobile handhelds) many content management systems moved their native storage of content to XML. XML is the belle of the metadata ball. Essentially it is just a format for description. Not only can it contain data, but it can contain structure, taxonomy and descriptions of data as well. Why would CNN hard code all of its web pages in HTML when it knows it will have to deliver them through email, RSS, mobile devices (both big and small through WAP and other messaging), etc.. Why not just start with the raw data and then put a ton of metadata around it so that each device can pick and choose the parts of it that matter. No images allowed when sending to a cell phone, no problem, just filter out all the content that is tagged as an IMAGE or even better, use the textual metadata caption for the image instead. As a side note, there is a whole emerging issue related to managing the growth of metadata itself. Patricia Seybold and friends have actually been ahead of the curve here for a few years and they are a pretty good source of thinking around this future. I am sure a blarticle to come will probably start out saying something akin to “Apparently we now have enough metadata too”.
Back to TimBL for a sec. Tim Berners-Lee has been on a crusade for a few years now to introduce the concept of metadata, but at a semantic level, into the fabric of the web itself. By the way, I love sentences like that last one, it makes me feel very George Gilder-esque (one key difference however is that I did not start the sentence with “My good friend” or “I went to Boston to talk to”). At first I had to admit, the whole semantic web thing was a bit of a yawner to me. I tend to roll my eyes whenever someone (no matter their legendary web cred) wanders into a vision of the future that includes the phrase “my agent talks to your agent”. However, I am coming to realize that Tim may well be onto something here. You see, the vast amount of information in the content of the web (e.g. inside of a webpage) actually does have some semantics to it. He’s right – some of the content on my website is my address, some may be my contact information, and eventually I might even publish my calendar as well. The tricky part is that it is hard for a machine to just know this without a little flesh and blood nudging. As a side note, Google has made a nice flirting pass at handling this sort of thing with their latest toolbar feature AutoLink. Press Autolink on a page and 8.5 times out of 10 it actually finds all the addresses on the page and turns them into hyperlinks to maps.google.com. It works pretty well and heck, if it makes me cut and past less then I’m all for it. But Tim’s vision is much grander. One where humans can admire the iceberg from above water and automated systems can actually take advantage of the other 90% of semantic relevance below the waterline. Keep an eye on this – its an interesting hybrid of content level tagging, evolution in web services definitions, and other metadata related concepts.
It is also a fantastic segue to the discussing the metadata that I play with most in my life: packaged application metadata (here’s where I earn my paycheck). One of the problems that packaged applications have had to deal with over their decade long (and longer) existence is an ever changing user interface platform. In ye olde days everything was green screens. Then we got slightly smarter terminals, then fat clients on multiple platforms, then a web browser, then Java in the browser, and now mobile devices. At some point in time the industry as a whole must have woken up from their collective nightmare and screamed “we just can’t write the whole thing for each new platform”. In addition, their customers wanted to customize the vanilla system they delivered and they needed a system that could be extended. So they moved towards a metadata architecture. Rather than write a different system to display the application on a windows machine and another on a unix machine with a web browser, they decided to write all of it into metadata and then simply make translators to display any old application page the metadata allowed them to. This is sort of a quick and dirty version of Java for the UI in a way. This was a great move as it got them all the flexibility they needed to move across UI platforms and allow end user customization at the same time. If an owner of PeopleSoft wanted to add a new field on the hiring page, to collect, say, the phone number of your preferred manicurist, the user just tweaked the metadata through their design tool and viola, the application UI appeared to change. Not revolutionary, but definitely ahead of their time in terms of how applications are getting built today. The funny thing is that all the packaged application vendors did it, and at about the same time. And now this metadata architecture is a powerful weapon which they are waging inside the enterprise as part of their TCO battle to win more development on their platform rather than having people build custom apps. And it appears to be working (see my other blarticle to come on the emerging battle of the Application Operating System). PeopleSoft uses AppDesigner which puts metadata into the PS tables in the database. JDE uses their OM Workbench to edit their Specs, Oracle uses Forms Designer to edit Forms files (fmx, fmb), SAP uses DynPro and the new Java compliant WebDynPro to generate metadata files and Siebel has their own set of tools to edit metadata in their srf files. All of these use metadata to abstract the logic of the application from the actual presentation of data in the interface.
Well, the smarties of Newmerix realized you could really take advantage of this. If you know how to navigate around the metadata (read and understand the PS tables in PeopleSoft for example), you might actually be able to do some pretty neat tricks with it. Well, that’s what we have done. One of the nuggest that metadata exposed to us is that we can actually see all the changes being made to a packaged application in a very consolidated way. Add a page, remove a field, change business logic on a page, change the underlying definition of an EMPLOYEE behind the scenes – it’s all stored in the metadata. We don’t need to stare at the application interface and try and figure out if anything is different. We just know because we can compare old metadata to new metadata. And because we’re experts at understanding the implications of changing this piece of metadata or that one, we can do some pretty slick things with it. For example, one of the classic problems with automated testing tools which use record and replay (basically they all do at some level), is that when the application changes, the automated testing tool gets confused. It’s a little bit like giving someone directions based solely on the businesses that are on the corner of each relevant street. Turn left at the Conico, right at the Dunkin donuts and we are across from the Baby Gap. That’s great, but what if the Dunkin Donuts can’t sell enough crawlers to stay in business at their current location and they move up the street to the new mall that opened. Well, the descriptive directions are pretty damn useless at the point in time. As a side note, apparently this is how it works in Costa Rica. Multiple reports from the field tell me that there are no street signs in most smaller cities in Costa Rica. For most tourists that’s okay because they just keep going in one direction until they hit the beach. But imagine if you had something more critical at stake, like finding the hospital. Small changes in the landscape would make your directions pretty useless. Fortunately Costa Rica doesn’t change very fast, but lets hope you don’t ever pass a gall-stone while on vacation there and need good directions. So anyway – back to testing. So replay engines effectively follow a descriptive set of directions. When they are looking for something and can’t find it, they are pretty lazy and just give up and say “No mas replay”. That’s actually not such a bad thing because the testing software can’t tell if the thing should have been there and it is not, or is really there but is now a Krispy Kreme instead of a Dunkin Donuts (to beat our analogy to death a bit more). Well, it turns out that with metadata you can get around this issue. Because you understand the thing underneath the covers that generated the web page button or link or textbox or whatever you are looking for, you can also know when it changes and EXPECT a change on the web page. This makes Automate!Test (our testing tool) “muy inteligente”. It also saves your testers from running around the PeopleSoft development team screaming “I can’t find the H1-B Visa Status field, where did it go?” Chances are, it will take your developers a while to figure that out. All of this could be alleviated my knowing the metadata and what it can do for you. Well it turns out that we were right: inviting metadata to the Automated Testing party really does get the party going. The time it takes to update test scripts related to changes in the application drops dramatically because we can tell you exactly the problems you will encounter. It also works wonders as an impact analysis tool – tell me if that new patch my developers put in will actually change anything in my test suite. Keep an eye on this, more innovation to come from inside the Newmerix halls soon.
So, why the heck should you care? Well, I’d say that we’re moving into a brave new world where we spend as much time working on the content as we do organizing and describing it. Perhaps after we all use Vista for a while, it will seem common place. One does wonder though when the amount of metadata will outpace the actual data. In any case, Newmerix will be there to use it.