Keep your (Java) CMS light

I tried my first CMS circa 2004, when I created the first website for the blueMarine project. I remember that at the time I searched for a Java-based CMS, but the landscape was almost a desert; so I reverted to a PHP-based product, Mambo. I was really dubious about PHP for many reasons, even though the technology was and is still very popular, because I didn't want to write integration code in PHP.

In any case, after a few months my website was vandalized and I discovered that the hacker took advantage of a trivial vulnerability of PHP. Now, security is a tough problem and it's also a matter of you frequently applying patches; in other words, I shared parts of the responsibility with Mambo. There's no technology which is 100% secure and doesn't require patching. But the episode was enough for me to drop PHP and never look back.

At the end of 2005 I ran another search about Java CMS platforms and this time I found something interesting: InfoGlue. It's the CMS I've used for years for all my websites (in the meantime, I created many more, not only related to my software products). There were and there are still many things of InfoGlue that I like; but in the end the tool doesn't fit with my needs. I realized that editing or posting new contents on my websites is a painful experience, for a number of reasons (including the fact that InfoGlue has a web-based editor), so most websites of mine are not updated as frequently as I want; for others I've resorted to publish part of the things to a Wiki hosted by, but I find this hybrid approach cumbersome (BTW, I don't like wikis at all).

That's why I've searched for another way, and it's the topic of this post.

I need to give you some more bits of context, in order to fully understand the problem: my websites are not complex, are relatively small in size, they don't require a sophisticated publishing workflow, such as write-review-approve. Furthermore, they are hosted in a "virtualized Tomcat" environment by PerformanceHosting, which I find very convenient in price and in cost of ownership (in particular for security: you have little control over the server, basically you can just deploy and undeploy .war files, but in change most of the security is handled by the provider). I can have database instances, but with a limited number of connections and my pool easily goes out of resources with InfoGlue.

Of course, I can tune, configure, patch, whatever; or I can spend more money for a hosting service with more features. But the point is: why should I do that? My websites are simple and the hosting service is adequate. It's the CMS that doesn't fit the picture!

In my software design experience I've matured two basic points, supported by many people:

  1. Software Tends To Grow Overcomplex
  2. One Size Doesn't Fit All

So, the solution must be to use a simpler software. In particular, something that starts simple and small, and eventually can be enhanced, but only when there's need for that. I searched around for one written in Java, but everything I found was too complex. So a few weeks ago I started writing my own ultra-lightweight CMS, which is called NorthernWind. It will be open sourced, but not before I've performed some refactoring and clean up: now my priority is to migrate my existing websites (blueBill Mobile and StoppingDown have been succesfully migrated so far).

One of the unnecessary pieces of complexity in a CMS, from my perspective, is the way the content repository is implemented. Typically it's a relational database, with the related stuff such as an ORM, eventually with a JSR-170 implementation on top of it. I understand their role in an Enterprise environment, but not for my needs.

That's why the content repository of NorthernWind is just a filesystem. To be more precise, it's a "logic" filesystem, that is something that looks like a filesystem, but can be implemented in many ways - actually, it could be implemented on top of a database. But for my needs Mercurial is perfect: I can edit stuff on my laptop, and no required internet connection, and later push to the server, where the website is rendered. Mercurial gives transactionality, versioning, branching, asynchronous editing and support for multiple authors, all without requiring a line of Java code. This makes it possible to completely split the responsibilities of rendering and editing. In fact, at the moment NorthernWind is just a rendering front-end, and editing is being done with a mix of tools such as the U*ix command line, vi and an HTML editor. I'll probably write a lightweight editor later, based on the NetBeans Platform, because it makes definitely sense for the most common operations; but in the meantime, I'm enjoying the command line so much. What's faster and easier than

find . -name "*.html" | xargs perl -w -i -p -e "s/old/new/g"

to replace contents in a bunch of files? To add a set of photos for a gallery, what's easier than just copying the required files into a folder and then push / rsync it to the server?

So far, I've found only one thing that the relational database does better: the support for inner identifiers of entities. I mean: with a file system, the path of a file is the identifier of a resource, say /my/directory/foo/bar. This means that other resources referring to it uses that path: if you want to move the resource around, you have to search and replace all references (note that physical resource location is independent of the logical web view, but in any case it makes sense to move resources around for a better organization of the project files). In contrast, a database would rather use a numeric id, such as 1234, which is never changed once created. But is this really a problem? No, searching and replacing references is a common refactoring operation, which is supported by IDEs. An IDE-based editing tool would do it easily, and with no complexity or performance impact on the frontend.

As a final note, for some of my websites I need integration with my own code: for instance, to support the Semantic Web, or for my project websites to integrate with Maven (e.g. to retrieve resources, such as code chunks, by means of the BlueBook plugin); the blueBill Mobile website has a simple API to receive "pings" from mobile appliances where the app runs. Custom code integration can be done with CMS products, that are usually extensible (especially those that are open sourced). But their complexity makes the job harder than it should: you have to write code in a specific way and you're forced to use the technologies which the CMS is based upon. Instead, NorthernWind can be used just as a library: you write your webapp, picking your preferred technology, and integrate NorthernWind for the CMS section of the webapp. Actually, the NorthernWind frontend can be implemented with any technology you like: so far, I have one based on Spring MVC and one based on Vaadin (another based on Wicket is on the way) - that's easy because I have a core, with all the basic stuff, which is web technology independent.