Everyone in the software industry is required to have an opinion, especially on open source. I figured i'd better weigh in soooner rather than later. But rather than tell you how I think it WILL all work out, i'm going to try something different for this debate and tell you how it ALREADY has worked out. This blarticle argues that packaged applications were the first true open source model and explores the immense hairball that this model creates when put into a realistic practice.
You can download the PodCast of this blarticle: Download platos_software_children_podcast.mp3
Plato's Software Children
Everyone has opinions. In the software industry, we’ve turned opining into a way of life. Java versus .NET. Agile versus Waterfall. Two tier versus three tier. Tastes great versus less filling. Having an opinion is just part of the job. But every once and a while we all have to step back and ask ourselves why we’re in the middle of the forest arguing about how all the damn trees got there?
One subject that seems to have trees sprouting up everywhere is the debate about open source. I was introduced to open source for the first time when I was working as an undergrad at MIT’s LCS (Lab for Computer Science). Having just attained my first research position by fudging the fact that I knew C (K&R who?), I was quickly introduced to the concept of Richard Stallman. I say “concept” because Richard was as much a myth as he was a reality, even in the halls of LCS. I strongly believe that Stallman contributed two major things to the software world. First, he is arguably the father of the open source movement through his creation of the Free Software Foundation. Second, I think he has single handedly done more to promote the image that Unix hackers can only be taken seriously if they have long beards. For the sake of this blarticle I’ll focus on the first.
Stallman pioneered the concept of free software. As he has surely explained ad infinitum, the idea of “free” software is not that it shouldn’t cost anything. The idea is that code itself wants to be free, released into the wild where it will have a chance to procreate with other people’s code and have lots of little code babies, all of which will be raised in a sort of Plato’s children approach to IP ownership. To do this, Stallman and the FSF created the first (and now most widely used) public source code licensing scheme – the Gnu Public License (GPL). For many, the design of the GPL is a blessing. For many the design is a curse. But lest I start planting my own trees in our open source forest, let me continue on for a bit and I’ll come back to the GPL shortly.
You see, the problem I have with the open source debate is that I think it always quickly deteriorates into a discussion of the wrong thing. From an academic perspective, the classic battle line draw is between those that are pro software IP protection (e.g. proprietary source code) and those that are anti software IP protection. The pro-IP folks argue “why invest in proprietary software IP if you can’t protect it”. The anti-IP folks say “code just wants to be free” and then go back to editing their wikipedia entry on the history of Emacs.
When you take the open source debate into a corporate setting you see the battle lines drawn in a slightly different way. When someone drops the o-bomb at the office, corporate lawyers usually stream into the room and the debate deteriorates into a pro-precedent versus anit-precedent one. The problem for the legal beagles at your company is that no one on either side of the open source debate has ever successfully sued over something. Therefore, there is little to no case precedent for the legal team to rely on when evaluating the risk of using an open source code base. And that is why all eyes are on the SCO Unix trial. Regardless of how it pans out, the most important aspect of this court battle is that it is going to put some serious points on the open source jumbotron for the home or away team. Its just that right now no one really knows who’s gonna score first.
Let’s go back to the GPL for a second and throw some more fuel on the risk fire. One of the problems (some may argue benefits) of the GPL is that is has a viral nature to it. The way the GPL is written (or is interpreted by most corporate legal firms) is that any code base that is compiled together with GPL licensed code is assimilated, in truly Borg-like fashion, into also being covered by the GPL. This viral effect applies just the same for code that simply calls a GPL library through external APIs and does not even compile in the original open source code base. In other words, if you are Acme Corp. writing a piece of expensive proprietary enterprise software, and you happen to include a GPL-based piece of open source in your code base, your whole code base becomes GPL licensed as well. Yikes! For a company that spends time and money writing their own software, this scenario is like a high-tech version of 28 days later.
So put yourself in a corporate lawyer’s shoes for a second. How would you make a recommendation whether or not to use open source code in your internal development? Isn’t it bad enough that you don’t know which way the courts are going to rule on open source cases (e.g. the courts could rule that any contributor to the original open source project could turn around and require your company to pay a royalty to them for every piece of software you sell that uses their code). But add the kicker that you just realized your code base might already have been intrinsically linked to a GPL licensing scheme that requires it to be published for open review. It shouldn’t be hard to figure out why so many companies are still so gun shy with open source. To be fair, there are other old and emerging licensing schemes (BSD, Creative Commons, etc..) which reduce the epidemic effect of the GPL. Even the FSF has released LGPL to loosen this effect when using open source libraries (as opposed to the code itself). They are also proposing a major revamp of the GPL as GPLv3. However, as things stand there are some pretty risky propositions with using open source.
But you’re reading this because you wanted an opinion, so I’d better give you one. And here it is: none of that matters at all.
Huh? You’re telling me I might have to publish all my proprietary source code and you’re trying to tell me that that does not matter? Well, sort of. To be fair, I probably should have stuck a big “in the grand scheme of things” in there somewhere. What I am really claiming is that there is a much bigger problem with open source that everyone seems to be completely ignoring. To understand what I’m getting at, we need to go all the way back to the beginning and ask why someone would want to use open source in the first place.
Before we go back to the beginning though, let me first say that for the balance of this article I’m NOT talking about self contained open source systems like Linux or OpenOffice or Firefox. Clearly, there is a lot of value in getting a self contained software system like Linux for free. Assuming it does what you need and is maintained frequently enough to deal with security or integration issues you may care about, I have no argument there. Frankly I don’t think most of the open source debate is about that type of usage anyway. What I am going to talk about is using open source code as a part of your custom code base with the intent to deliver a for-sale software product.
So with this caveat, let’s examine the basic premise of why a company or developer would want to use open source in the first place. The simple impetus would be to get a jump start on a piece of the software puzzle they don’t want to build themselves. There are lots of good reasons for not wanting to build something. If a standard or commodity replacement is available, why not use that? Many times a piece of the final system is viewed as non-strategic and it makes more sense to spend time on the proprietary pieces. And sometimes you just don’t have the skillsets to build a specific subsystem. For the same reason you’ll crack open 5 Minute Uncle Ben’s instead of pulling out the rice cooker every night, open source can get you to the whole meal faster than if you cooked everything from scratch. It is a very tempting premise, I know.
Unfortunately, after considering the value of this basic premise, most companies stop asking further questions and just start downloading SourceForge zip files. I really believe that most companies think of open source as just a quick way to get a few extra lines of code into the source control system to speed things up. What’s the alternative anyway - get the folks in development to type faster?
The truth about open source is only revealed though when you consider that using open source is like getting yourself into a long term relationship. The first few dates are always amazing. Open source seems to be what you have always been looking for. In an odd, Jerry McGuire sort of way, open source seems to complete you (and your development project). Well, as amazing as those first few moments may seem, before you have a few highballs and ask open source to move in with you, you may want to ask a few more questions about what the long term relationship will really be like. I’m here to tell you that it is definitely not going to be the roses you think it will. How would I know? Well, how about I give you roughly 55,000 examples to prove my point.
The Big Idea
Tonya McKinney (Newmerix’s whip-smart VP of Marketing) and I were lounging around her office one day talking about all the problems of owning a packaged application. Out of the blue it dawned on us that packaged application providers have actually been doing open source with their customers (hence the 55,000 number) well before Mr. Torvalds was banging out TRS80 kernel microcode in his diapers. You see, a fundamental part of the packaged application architecture is the exposure the customer gets to the vendor’s code base. The customer can read it, they can fix it, they can extend it, and they can swap it out completely with their own code. Sounds a heck of a lot like open source to me. And literally every vendor is doing it. SAP does it. Oracle does it. PeopleSoft does it. They all do it! Let me explain a bit more what I mean for those less familiar with the architecture of a packaged application.
Basically packaged applications have evolved into very advanced data presentation, collection, workflow, batching, and reporting tools. That’s all they really do when it comes down to it (I am sure 4000 developers at Oracle are writing me flame mail as I type this). But over time, these PCWBR (I’m making acronyms up on the spot here) have had to deal with a lot of different user interfaces. Back in the day it was the green screen terminals, and then we all moved to fat clients in a client server environment, and then most recently we migrated to the browser via HTML or java applets. Even though the user interface changed a lot, the business processes did not, and the data model stayed relatively static. To accommodate this situation, the packaged application architecture evolved into a pretty basic MVC (model/view/controller) model. As a great generalization, in an MVC model, you put the data model (the structure of your data) in the database, you stick the viewer in the terminal, browser or fat client, and you put all the business logic in the application server. That way when a new user interface comes along, all you need to do is write a converter for that presentation layer and the rest of the application still works. When a lot of the packaged application vendors moved from fat client to HTML interfaces in the end of the 90’s, this is essentially what they did. It’s amazing what a little lipstick on the interface pig can do.
In addition to the problem of changing user interfaces, SAP, Oracle, PeopleSoft, Siebel, JD Edwards, etc.. knew that not everyone was going to want their core business processes to work exactly the same way. While 90% of my hiring process may work like 90% of your hiring process, 10% will likely be different for strategic, historic, or completely random reasons. To accommodate this, the packaged application vendors took advantage of the MVC architecture (okay – maybe they had thought this all through beforehand), and built developer tools to help their customers modify all layers of the MVC. If you needed a new screen layout for the hiring process, no problem, use the UI editor. Need to extend the core EMPLOYEE record with your own fields, just use their data modeling tool. If you need to change some of the business logic executed during the hiring process, just add, edit, or remove the code associated with the business logic.
For those of you reading along quickly, I’m going to slow you down for a second. My last statement about business logic is really important as its where I am going to stake my claim that packaged applications were the first real open source systems. Here’s how I get there. To allow users to enter business logic into the application, all of the major packaged application vendors adopted some form of programming interface. Business logic is implemented by writing little code snippets. Some vendors implemented commonly known scripting languages while others built their own programming languages from scratch. For example, PeopleSoft has PeopleCode, SAP has ABAP (Advanced Business Application Programming), JD Edwards has something called Business Function Language which is a C derivative and Siebel, a little younger than the rest, has the more contemporary Siebel VB based on Visual Basic.
Ironically, and as a side note to the open source discussion, Oracle’s consolidation of PeopleSoft, JD Edwards, Oracle EBS, and Siebel has brought them a hairball of different languages to deal with. All of these different languages (and the millions of lines of customer code written in these languages) have left Oracle with a hairy problem as they try to migrate it all to Java. I have some ideas (shocking surprise, I know!) on how they can do this which I promise to stick into an upcoming blarticle.
Regardless of the fact that Oracle is now juggling these four language balls, the programming interface concept is consistent across all major packaged applications. Over time, rather than pre-compile the basic factory delivered business logic code into the application server, each vendor shipped all their proprietary application business logic as external code snippets using their programming interfaces. Thus, all the business logic (code) of PeopleSoft HRMS is really sitting there as PeopleCode, accessible to any old developer who fires up AppDesigner (their developer tool) and starts digging around. Going back to the premise that not all customers hiring processes will work the same way, this gives customers of the packaged application some major advantages. First, if there is a defect in PeopleSoft’s factory delivered hiring process, the code for this process is exposed directly to the customer and can be fixed by the customer themselves. Second, the customer can extend the core code base delivered to them if they want to keep 90% of what PeopleSoft gave them but change 10% for their own business processes. And third, the customer might replace what PeopleSoft delivered with their factory code, or add something completely new that PeopleSoft never contemplated. This model brings a lot of value to the customer in allowing them to start with PeopleSoft’s base and end up with exactly what their business needs.
But hang on here. A packaged application customer gets all the proprietary vendor code for their applications? And it is completely readable, completely editable, and completely extendable by the customer? This really does sounds a lot like open source! I think one of the reasons why few have realized the correlation between the packaged application architecture and open source is due to historical semantics. One of the original sales pitches of packaged applications was “no more writing your own code”. To explain the extensibility of their architecture without using the horrible c-word, vendors euphemistically referred to the process of editing business logic (e.g. code) as “making customizations”. To this day, even though the practice is clearly about writing code, "customization" is still the language used in packaged application departments.
We Are Not Alone
Well it turns out that Tonya and I are not the only ones to admit there is a striking similarity between the packaged application architecture and the open source movement. Shai Agassi, SAP’s mid-thirties wunderkind recently wrote a nice little piece in his blog alluding to this exact same phenomenon. Granted he comes at his argument from the “I believe in protecting proprietary IP” open source sub-debate, but I’ll let him off the hook because he’s supporting my case.
The following excerpt is from his entry:
“The need for a consumer of software to receive transparency into the source code versus the 'black box' approach of delivering systems without transparency is a key issue in the Open Source debate, and an issue that SAP has followed closely for as many years as we are in business. During those 30+ years, SAP shipped its application code, probably one of the largest software products in the world, with the source code available to every customer. The result was that almost every customer modified our code to suit their needs, either on their own, or through one of our many implementation partners.
Some of our customers found that process essential to make the system fit their needs. Some customers found that the ability to modify the code made it possible for programmers to veer too far away from the original application they received, and, in future implementations, they reduced the amount of such modifications. When we get to foundation software, such as operating systems or databases, customers mostly want the code in order to debug systems they build on top of those OSS components. Usually they do not modify the code that much, yet the ability to simply walk through the execution of calls into engines is somewhat the best way to learn how the code should be fixed to perform to a customer's unique needs.
So, as you can see, I am not just a proponent of openness of source, at SAP we actually live by that rule on a daily basis.”
Perfect, thanks Shai. I could not have said it better myself. He’s definitely getting a Christmas card this year. Unfortunately though, Mr. Agassi kind of swerved a bit when he got to the most important thing: what happens when a customer customizes SAP's baseline. This, my friends, is the real fine print of the open source movement. For all the benefits it has brought to SAP (and Oracle et al) customers, there are also an amazing amount of problems it has introduced.
The problem with the packaged application open source model is that the minute you customize SAP’s (or Oracle’s to be fair) code base, you’re opening yourself up to a whole world of hurt. One of the gnarliest (and not in the 1980’s Huntington beach sense) problems with owning a packaged application is what to do about the legacy of customizations that get built up over time. Here is the rub: in this model, you don’t own the code baseline. It is subject to change at the pace and scope that a 3rd party deems appropriate. Sometimes that’s gonna work for you and sometimes it ain’t. Every time you customize (make a code change) to what SAP or PeopleSoft (or anyone else) delivered to you, you just dropped a pretty big come bet on the packaged application craps table and now you’re blowing hard on SAP’s dice hoping they don’t roll a seven. The more customizations you make, the more bets you place and the riskier each roll becomes. To be fair, just like in playing craps, you stand to get a lot of benefit from making these small bets over time. Each customization allows you to get the factory delivered system one step closer to how your business realy works. As long as SAP doesn’t all of a sudden send out a critical patch to a module you’ve heavily customized, you’re in the clear. In reality though, this happens all the time.
Vendors like Oracle and SAP put out hundreds of changes to their codelines every year. Depending on the vendors these changes come monthly, quarterly or ad hoc as needed. These changes occur for a whole host of reasons. Good reasons do exist like adding new functionality to a module. But many times, changes occur for bad or annoying reasons such as bug fixes in badly written vendor code. Talk to anyone working with customizations to their packaged application and they will tell you that every time a new patch is proposed by the vendor, its time to call Dominos (who ironically uses PeopleSoft themselves - so I am not sure who they call).
The problem with this open model is that, as a customer, you’re stuck between a rock and a hard place. Every customization you’ve made must be evaluated against the vendor’s new baseline. For each customization you have to decide on one of three paths. Has the customization been superseded by the vendor’s fix or new functionality? Does the customization completely replace the vendor functionalitys because of a core difference between how you do business and the vendor thinks most people do business? Or are you somewhere in the middle and the customization needs to be re-merged with the latest vendors changes because you want 90% of their new functionality but still want to keep the 10% that is very specific to your business.
Fortunately all of us developers are religious about commenting every line of code we write and jotting down in depth the implications to other areas of the code base our modifications might have. For those of you not rolling your eyes right now, this was a joke. Don’t get me wrong, I’m pro-developer to the core, but let’s face it, if developers wanted to write prose instead of code they’d be the ones writing this blarticle and I’d be stuck pouring through a corrupted GAC at 3:00am. Of course the reality for most development teams lies somewhere in between zero comments and a small novella’s worth of them. It’s just the way the world works, no use crying over spilled milk.
Now compound this issue over 5-7 years (depending on how long you have owned your packaged application). Customizations, upgrades, patches, changing development teams, out-sourced development, in-sourced development, limited code reviews, poor or non-existent version control systems for packaged applications, and changing underlying business processes. Quickly you will realize you have a major issue on your hands. The problems caused by this open architecture are so dramatic for such a large number of packaged application owners that we started Newmerix just to help solve the problem. Newmerix exists solely because of the flaws in this architecture. Fortunately, our work has not fallen on deaf ears. Newmerix is enjoying a constantly strengthening relationship with Oracle around helping solve these problems. You'll here more about that soon, but for now, back to the issue.
So this is where my gripe about the misdirection of the open source debate comes from. No one really wants to talk about the fact that “classic” open source is going to present you with exactly the same problem that occurred in the packaged application market. But then again, I’m looking at the data. I have 55,000 examples on my side, how many data points do you have on yours? If you still don’t believe me, then all I can do is present you with a situation to consider. Let’s say you’re one month away from shipping the next release of your server software. You’re using large pieces of an open source code base in this product and have customized it quite a bit to do what you want. That’s okay because you shaved three months off of the schedule by using it. Then one day you’re sucking down your morning crowler scanning Slashdot and you see that someone found a major security hole in the open source code base you’re using. The good news is that the security hole has already been fixed and the patch has been integrated into the latest release. The bad news is that you didn’t upgrade your code base to the latest release yet. You're still using the open source code base from last year. In this situation it won’t take you long to realize it will take you the whole next month just trying to figure out what customizations you made and which ones you need to keep if you upgrade to the latest release. What’s the alternative though - ship your product with a major known security hole? This is a real life situation folks. It happens all the time.
Well if you’ve read this far and you’re a wee bit scared, its probably just because you don’t know how deep a hole you are standing in right now. While I personally don’t have a measuring tape handy, let me point you to a useful company I came across that might just have one for you. Black Duck Software is the open source equivalent to an MRI. Just slide your patient into the MRI machine and you get a colorful read out of where all the potential problems lie. From my understanding, Black Duck has a huge archive of open source and 3rd party software projects on file. Using some advanced lexical analysis techniques, Black Duck will scream through your code base and essentially tell you if they find signatures of code that look like open source and what those open source projects might be. From there you can determine what level of Dante’s open source licensing hell you might actually be in. If nothing else, I’d highly recommending doing this sort of audit every once and a while just to find out how many feet you’ve might have already sunk into the quicksand.
Anyway, to conclude briefly, I think the concept of a group of people dedicated to contributing to the general value of the software industry is fantastic. In the long run, such effort is bound to make all of our lives easier. By no means am I saying “don’t use open source”. I am simply pointing out that there is a huge body of historical evidence with which to make wise decisions about how to use open source and what to expect from your relationship with it. If nothing else, let the packaged application pioneers’ adventures serve as a cautionary tale. It really is worth considering before you jump both feet directly into the open source deep end without blowing up your water wings first.