Tuesday, January 12, 2010

Content Repositories

Managing content vs. data is something that is becoming more clear to me in recent months. Over the last 2 years I’ve had the opportunity to get familiar with XML repositories (mainly MarkLogic Server) but did not really see the benefit of using such a technology outside of an enterprise-search-like application. MarkLogic provides a rich search API that includes content processing and enrichment along with some other pretty powerful features. All of this functionality is great but what if your project budget can’t accommodate the steep price? Are lesser known or open source content repositories still worth it?

I think so. First - what does it really mean when someone says their application manages content vs. data? In most cases, content and data can describe the same type of information but vary significantly in terms of extensibility. Data has a rigid structure that can be difficult to change over time while content tends to have a less rigid structure that can absorb additions and transformations more gracefully over the life of an application or solution. With that loose distinction between content and data in place picture using an XML repository to store messages exchanged in a SOA publish / subscribe paradigm. The messages will most likely evolve over time requiring developers to update the storage and retrieval mechanisms traditionally involving the update of relational queries in the application along with the structure of the database. Since XML repositories don’t have a set structure – developers are only concerned with how to retrieve the information and don’t need to concern themselves with the semantics behind storing the document (message validation isn’t included in this discussion).

This use case is fundamental and primitive but hopefully illustrates a key benefit for those looking to distinguish managing content from data.

No comments: