A title isn't that important...

Tuesday, April 26, 2011

Simply SaaS (Part 3)

This is part three of a multi part series where I will talk about what it means to develop and deliver a Software-as-a-Service application. I'll be trying to strike a balance between the technical aspects of bringing an app into the cloud and the advantages of using the SaaS approach as an Independent Software Vendor (ISV). I am NOT an expert in SaaS by any means...I'm writing this to educate myself and hopefully any reader that has the same curiosity. Your mileage may vary.

In the last post we talked about a sample SaaS application to be developed, discussed getting started with our development framework (SaaSGrid Express), and covered setting up our deployment infrastructure (Amazon EC2). With this post we will take a look at the SaaSGrid SDK in action to give a better idea of how easy it is to SaaS-enable a .NET application. To get started we need the following:

Visual Studio 2010 - Since I don't own a copy of Visual Studio, I decided to give the 30 day trial a spin for this exercise. Although I didn't try it you can use the Express version of VS with SaaSGrid if you follow the instructions here.
SaaSGrid SDK - You may have already installed this when you installed SaaSGrid. If not you can download it here.
Familiarity with Windows Communication Foundation (WCF). It's pretty much essential to all things SaaSGrid. I'd highly recommend the book "Programming WCF Services" if your knowledge is advanced enough to circumvent the Hello World tutorial and move directly into best practices.
You should read the "Developer Topics" section of the SaaSGrid Developer Guide at a minimum - if you are really serious about using SaaSGrid you should take the time to work your way through the guide in its entirety before you sit down to develop.

In order for SaaSGrid to function properly, your Visual Studio project must adhere to a semi-rigid structure. The project must consist of a root project that houses the Web UI and one or more projects that contain the WCF services needed to make your Web UI function. Additionally a "Database Scripts" folder with a file called ApplicationProvisioning_Script.sql must contain the SQL necessary to build the table structure(s) for your WCF services to use. For my project I relied on the SaaSGrid template (that's included in the install of the SDK) to provide the basic structure. I ended up replacing the root web application project with an ASP.NET MVC 2 web application project since I was interested in checking out the .NET MVC framework. If you follow a similar path be sure to copy the SaaSGrid.cfg.xml file from the original root web application.

In a SaaSGrid application WCF services will provide the majority of the application's functionality. This paradigm doesn't fit the traditional MVC development methodology but I was able to find some people that took the same plunge. Additionally, If you're like me and think most modern web applications will have a mobile counterpart, the push to a "pure" service oriented web application is a good thing. Not only will your application be ready to scale at the click of a button - you'll be able to pump out a mobile app without the added effort of exposing your application's business logic after the fact. Sweet.

Recalling the description I gave for the sample application, a movie buff (Fred Film) will be able to manage a list of his/her favorite movies. Movie rental store owners (Ralph Rental) will be able to connect with users like Fred Film and tell Fred how many copies of his favorite movies they have in stock so that Fred can plan accordingly before heading out to the store. As an added bonus Ralph will be able to collect demographic information on users like Fred and perform some very simple analysis on his client base so that he can optimize his marketing strategies moving forward. Since we are building one application instead of two separate apps (one for the general public and another for store owners) we need to define the features of the application that will be for paying subscribers (i.e., store owners) only:

The ability to set the inventory level for a given movie
Associate a movie with the store
Provide 10 reports a month that show the number of users (based on zip code) connected to the subscribers store

At a general level, here are the features of the application that the general public can use:

Find and associate with a local rental store
Add favorite movies
View the inventory level of the user's favorite movie at his/her local rental store

Another great feature about SaaSGrid is that it controls the account (and role) creation for all users of your application. We can organize the functionality that we need to implement into the various roles and start coding. Let's take a look at our first SaaSGrid enabled method signature in a service interface we've defined for a rental store owner:

The FaultContract attribute will raise an exception if there is a failed authorization when trying to execute the implemented version of this method. This attribute indicates that the implemented function will contain SaaSGrid authorization. Now let's look at the implemented method for the signature above:

The Secured attribute in the implemented method defines a role based operation for associating a movie to a store. As you can see there is very little effort for this level of granularity. In fact all we are doing is decorating the method with an attribute - no code change required! Also note that I'm taking advantage of the TenantId instead of making the developer supply a Guid for a given store. The SaaSGrid SDK provides access to a series of heirarchal contexts (e.g., ProviderContext, TenantContext) allowing developers to access relevant details about the current tenant, user, provider, etc. This concept is another reason why it's important to go through the SaaSGrid Developer Guide. This post provides some additional details on accessing information that belongs to tenant's outside of the scope of the current context.

To help with testing your application locally you'll need to use the Mocker application (installed as part of the SDK) to add sample tenants, users, roles, as well as securables. You will need to define a securable for each Secured attribute that's in your application (essentially the attribute value is the securable definition). When you define each role you'll assign one or more securables to a role to define permissions. Moving on, roles are assigned to users and users belong to a tenant (or subscriber). Finally, before you launch the debugger for your WCF services and your web app, you will need to create the database and tables by executing the SQL contained in the ApplicationProvisioning_Script.sql script that was previously defined.

In the next post I'll show case some snippets that highlight finer grained monetization constructs, discuss how to package your application for deployment, discuss the product management features included with SaaSGrid, and talk about versioning your application. Enjoy!

Tuesday, April 12, 2011

Simply SaaS (Part 2)

This is part two of a multi part series where I will talk about what it means to develop and deliver a Software-as-a-Service application. I'll be trying to strike a balance between the technical aspects of bringing an app into the cloud and the advantages of using the SaaS approach as an Independent Software Vendor (ISV). I am NOT an expert in SaaS by any means...I'm writing this to educate myself and hopefully any reader that has the same curiosity. Your mileage may vary.

In Part 1 I provided a general overview on what it means to develop and deliver a SaaS application. In this post I will outline an example application and discuss the infrastructure I will be using to deploy my app into the cloud. This discussion should demonstrate the benefits of why SaaS applications are awesome from the perspective of a developer and a business owner (ISV).

Since the intent of this series is to talk about SaaS (and not about original ideas for future SaaS applications) I didn't put a lot of creative thought into my example. The premise of the application is pretty simple...we are going to take the ASP.NET MVC Movie Tutorial (let's call it MoviesAndYou) and add the ability for brick-and-mortar movie rental stores to communicate with their customers. I decided on this example because I wanted to build a .NET application using the ASP.NET MVC framework and I also wanted an opportunity to try out Apprenda's SaaSGrid Express product. The application will have two types of users:

Let's call the first user Fred Film. This guy loves movies and uses MoviesAndYou to manage a list of movies that he has watched. Fred rates each movie he watches and likes to rent his favorite movies frequently from his local store. Fred doesn't want to pay for the MoviesAndYou service.
The second user will be Ralph Rental. Ralph owns several movie rental stores (as in brick-and-mortar) near Fred. Ralph would like to let his customers know the availability of the movies in his stores so that the customer can reserve the movie online and pick it up in store. Ralph sees the value in having an online reservation system but cannot afford to pay a large upfront fee for this service.

Seem easy enough? Ralph will be paying monthly for the online reservation system. Each store he owns can be thought of as a traditional software license seat. To start, Ralph will introduce the system in one store and then expand to include other stores as he realizes an increase in business from the online reservation system. For the sake of this example we'll pretend that brick-and-mortar rentals stores still exist!

From the ISV's perspective, MoviesAndYou is going to be a single web application. Roles will be used to delineate between consumers and brick-and-mortar users. The data for all users will be stored in a single, logical database. From my last post we know that the SaaSGrid framework will take care of co-mingling the database records so that the developer doesn't have to worry about setting up a new database instance each time a new brick-and-mortar customer subscribes. With SaaSGrid you're able to bring new customers online with a few clicks of the mouse - no messy deployment/configuration scripts. Additionally SaaSGrid provides the infrastructure for metering and billing of your application (that will be covered later). At the end of the day the ISV can focus on the functional aspects of their application instead of stressing out over infrastructure.

We have the business case laid out for the application and we also have our technology selected. To get started developing with SaaSGrid we need a place where we can install the framework. Since I don't have a spare server laying around, I thought this would be a great opportunity to get my feet wet with an Amazon EC2 windows instance.

Getting Started with Amazon EC2

I found the signup process to be straightforward enough for EC2 that I'm not going to cover that here. Once I had my account setup I selected a Windows Server 2008 with SQL Server Express + IIS instance to deploy to. Once you complete the sign up process and select the appropriate AMI you'll need to login to the EC2 console to access your instance. From here you're able to get the connection details that will allow you to remote desktop into your instance and start the install of SaaSGridExpress.

Before you get started installing SaaSGrid make sure you can successfully connect to the SQL Server Express service. At the time of this writing, the SQL Server Express service is set to 'Manual' by default and is not running out of the box. Open up the Server Manager and browse through the Services MMC and set all of the SQL Server Express services to start automatically. I wasn't able to get the SQL Server Agent to start but found that it wasn't necessary to have a successful install of SaaSGrid. With all of the services running, open up SQL Server Management Studio and connect to the .\EC2SQLEXPRESS instance using Windows Authentication. Once you've done that then move on to installing SaaSGrid.

Installing SaaSGridExpress on Your EC2 Instance

The installer for SaaSGrid will first validate your EC2 instance to ensure that all of the necessary services are in place for the framework to operate. Don't be surprised if this process finds some issues with your configuration - a 'repair' button is provided in most cases where the minimum requirements aren't met. Also be aware that some of the repairs will require you to restart the server.

After validation, the actual installation will take place. Since I'm using an inexpensive EC2 instance (in other words it doesn't have a lot of horse power) the install process took a fair amount of time (approx. one hour). Once the initial install has completed, you'll be prompted to install the administrative tools and SDK.

Finally, even though I didn't receive any warnings during the install I was getting a 500.21 error when trying to view any of the SaaSGrid related apps deployed to IIS. Not to worry, Apprenda has a pretty active developer community. I searched for the error and found this post. Since the fix described there assumes .NET 2.0 I had to update the command to run in .NET 4.0 - also note that you'll need to navigate to the Framework64 folder if the EC2 AMI you selected is 64-bit.

Note: It was my experience that if you need to re-install SaaSGrid be sure to perform a complete uninstall before going through the installation process again. For some reason I didn't have any luck with the 'reinstall' feature.

Now we are all setup to build our application. In the next post I'll take a look at the SDK provided by SaaSGrid...

Thursday, March 24, 2011

Simply SaaS (Part 1)

This is part one of a multi part series where I will talk about what it means to develop and deliver a Software-as-a-Service application. I'll be trying to strike a balance between the technical aspects of bringing an app into the cloud and the advantages of using the SaaS approach as an Independent Software Vendor (ISV). I am NOT an expert in SaaS by any means...I'm writing this to educate myself and hopefully any reader that has the same curiosity. Your mileage may vary.

I think it's an understatement to say that the cloud computing term is overused. What does it really mean to deliver an application in the cloud? In recent years developers put together a web application or solution by marrying a database with a development framework and an application server resulting in a technology stack (e.g., LAMP, etc). Once the application was developed using these components it was deployed to a hosted environment or installed on premise to be part of a private enterprise. Ongoing maintenance for the application involved generating patches to fix bugs and supply enhancements. Sounds familiar and perhaps a little boring right? Around 1999 this guy, named Marc Benoiff took a sabbatical from his day job at Oracle (and by sabbatical I mean he was in Hawaii for 3 months and then in India "finding himself" for another 2 months) and started Salesforce.com. This, for historical purposes and to the best of my knowledge, was the start of the cloud buzz.

Marc wanted to offer easy to use enterprise software that didn't cost organizations an arm and a leg to implement and support. He did this by creating a pioneering CRM application that was nothing more than a web application / site that provided access via the Internet instead of the Intranet. Brilliant! Needless to say the application was a huge hit and later spawned the Platform-as-a-Service (PaaS) offering known as force.com.

To me, delivering a SaaS application seems like a smart move for an ISV. You have a low barrier to entry to get started, you can pass on that cost savings to customers, and you don't necessarily need to worry about supporting the infrastructure that runs your app at the end of the day. At a high level this seems like nirvana but you'll quickly find that it's difficult to understand what technologies you should be concerned with (as a developer) and how you will handle the subscriptions (as an ISV). I hope to fill in some of the blanks to developing and delivering a SaaS application by documenting the process I follow as I take an (albeit trivial) app from inception to delivery.

I'm a developer, what tools or frameworks exist to help migrate my project into the cloud?

First off there is a general distinction we need to make. SaaS is a type of software not a technology in particular. There exist a few "on demand" concepts that enable SaaS application developers. These concepts would be Infrastructure-as-a-Service (IaaS) and Platform-as-a-service (PaaS). Think of IaaS as Amazon EC2 - you get access to a hosted virtual machine where you can deploy your application. As your application grows and requires more resources, you can add more iron in your equation to get you that extra horse power. The application developer is still responsible for worrying about things like load balancing and clustering. An example of a PaaS would be force.com. If you haven't looked at this already it's worth signing up for the free developer account and poking around the documentation. Basically force.com allows developers to build apps with a domain specific language (DSL) known as APEX. You build your app using APEX and as you scale your app to customers you are charged a metered rate. Force.com also recently bought Heroku...this is a ruby on rails based PaaS. You build your app in rails, deploy it to the cloud using Heroku gem, and you are billed for your usage as your application scales to meet demand. There are too many PaaS offerings to mention in one post but hopefully you get the idea.

I'm an ISV what does it mean to build a cloud application?

Low barrier to entry for your customers. SaaS supports the idea of Multi-tenancy that allows you to co-mingle customer data (as long as the law doesn't prevent it) and therefore provides your customer with access with a push of a button. Customers pay for the software with a subscription so that they only get what the need instead of paying for countless licenses that ended collecting dust somewhere.

This is great but what if I already built my application? Do I need to re-develop or re-architect it?

Possibly. Apprenda provides a technology called SaaSGrid that can mostly eliminate that concern provided that you have an existing .NET application. SaaSGrid is advertised as an application framework that allows you to migrate your .NET application from a more traditional application service provider (ASP) model to a SaaS offering. The technology provides things like multi-tenancy so all you need to concentrate on is building your application like you normally would. In my opinion if you have a .NET web application that's worthy of the cloud or know .NET and want to build a new application, you should definitely give Apprenda a glance. You can download an express version that will allow you to install it on a server and see how the whole thing works. While I tend to be more of a Java / Ruby guy I definitely see the value in using something that's .NET based. Many enterprise grade SDK's (I'm thinking of ESRI since I'm a GIS geek) play nicely with .NET and maybe aren't fully featured for integration with other technologies like rails or a PaaS DSL such as APEX. This argument won't hold for forever but seems to be logical for the present.

In the next post I will talk about the application that I'm using as an example for this series and how it will benefit from an "on demand" delivery instead of shipped bits. Stay tuned!

Tuesday, March 15, 2011

Before Using a SDK...

...What types of questions should you try to answer? Here's a short list of evaluators that I've used in the past:

How easy is it to find a tutorial? It always makes me feel better when I can produce something with the SDK without having to waste an entire day.
When was the last release made? Although it's a generalization, SDK's that have current releases seem to have better support.
If the project is open source, how big is the community? If it's not open source, how easy is it to get technical support?
Is the SDK developed natively for your target platform? In the age of VM based languages, this questions seems somewhat dated. However, from a maintenance perspective it is easier to integrate an SDK that closely matches the syntactic style for the rest of your project. Additionally, if the language is a translation, you may need to develop wrappers in order to maintain a coherent workflow.

Are there other questions you ask yourself before you download and code?

Thursday, March 03, 2011

Transformations in PostGIS

Lately I've been doing a lot of development centered around PostGIS (http://postgis.refractions.net/) and ran across a problem that has a relatively easy solution (for those well versed in GIS) but finding an answer proved to be somewhat difficult.

Here are the assumptions and the requirements before we look at the problem:

A point is defined by a latitude and longitude.
A line is a defined by a series of points.
A buffer is defined as a shape that envelopes a line. The width of the buffer is specified by the user in meters. The distance value the user supplies is a +/- distance from the supplied line.
We are assuming WGS 84 as the target map projection.

PostGIS provides a handy buffer function called ST_Buffer and in order to use this function the geometry supplied to the buffer function along with the desired buffer distance need to be in the same coordinate system. See the problem yet? My coordinates are in decimal degrees (i.e., latitude and longitude) and my buffer distance is in meters. So I need to transform the line into a different coordinate system that will allow me represent my geography based shape in metric coordinates. This transformation is non-trivial since converting meters to decimal degrees relies on the use of a map projection because the world isn't flat (sorry!). Before we go any further let's look at the query I've developed so far:

SELECT ST_Buffer(GeomFromText('LINESTRING(-76.543 42.567, -76.012 42.345, -75.890 42.445, -75.543 42.330)'), 300.0) AS buffer_shape

The query is valid but the buffer distance is thought to be in decimal degrees (i.e., WGS84) with respect to PostGIS and therefore produces a shape that pretty much covers the world. I tried guesstimating the conversion from decimal degrees to meters but quickly realized, at best, a guesstimate would be terribly wrong.

To solve this problem - I employed the Transform function from PostGIS. In order to get this function to work properly I had to determine what coordinate system to transform my geometry into...this is where my Google-fu fell relatively short. Luckily I went to http://gis.stackexchange.com and found a reference to the proj4js.org project. Using proj4js I was able to determine the SRID of the desired metric coordinate system (that SRID is 900913). This coordinate system bases coordinates off of meters instead of decimal degrees. So I changed my original query to be the following:

SELECT Transform(ST_Buffer(Transform(GeomFromText('LINESTRING(-76.543 42.567, -76.012 42.345, -75.890 42.445, -75.543 42.330)'), 900913), 300.0), 4326) AS buffer_shape

The above statement transforms the line from WGS 84 to a Mercator projection, performs the buffer operation, and then does another transformation to project the resulting buffer shape back into WGS 84. Voila! Again, nothing earth shattering here but if you don't speak GIS all day then performing the transformations may not be entirely obvious. In my opinion the PostGIS documentation falls short when mentioning the transformation details.

Tuesday, February 02, 2010

In my last post I provided a small tutorial on how to use the Client-Side Event API within the ASPxGridView component from DevExpress. In this post I would like to cover another snippet that I came up with that may complement my previous tutorial. This snippet assumes that you have multiple selection enabled in the GridView and wish to preserve the selection as the user pages through results.


If Session("pks") IsNot Nothing Then
   Dim selectedPKs As List(Of Object) = 
     CType(Session("pks"), List(Of Object))
   For i As Integer = 0 To selectedPKs.Count - 1
     Dim selectedIdx As Integer = ASPxGridView1.
       FindVisibleIndexByKeyValue(selectedPKs(i))

     If selectedIdx >= ASPxGridView1.VisibleStartIndex And 
       selectedIdx <=   ASPxGridView1.VisibleStartIndex + 
       ASPxGridView1.VisibleRowCount Then         

       ASPxGridView1.FocusedRowIndex = selectedIdx         
       Exit For       
     End If     
    Next i   
End If

This snippet goes in the Data_Bound event so that it executes each time the user moves to a different page within the grid view.

Monday, January 25, 2010

Command Scripts

Recently I had to create a test that simulated a catastrophic failure to ensure the integrity of an embedded database. Development time for the test was limited so I set out for the quickest approach and saved myself the hassle of mocking test cases for another day. The test would initialize the database and then insert a random number of rows. During the test the OS would kill the process while rows were being inserted and then restart the process to ensure the integrity of the database.

I ended up coming up with a command script in Windows XP that I felt was worth sharing due to its relative obscurity. I will save the majority of the details and only highlight the important points.

First, let's look at the script I made to launch the java process responsible for bootstrapping the database:


START java -cp . Sleeper
ping 1.0.0.0 -n 1 -w 5000 >NUL
FOR /F "tokens=1-2" in ('jps') DO (
IF "%%j" == "Sleeper" (
SET PID=%%i
)
)
TASKKILL /PID %PID%

For me, the coolest part of this script was using the jps command to get the PID for the process that I launched in the first step. My second favorite feature (*cough* hack *cough*) was the use of the ping command (not my idea see the link here). It's the only way I could get the script to wait for a few seconds before killing my process - allowing the test running a separate process to reach the portion of the test where the insertion was occurring. It's not the most reliable or sane approach but it worked perfectly for creating a quick simulation.

The second snippet isn't as exciting as the first but think it's a huge timesaver when trying to execute a Java program that has a very large number of dependencies all located in the same directory. This script scans the directory (e.g., the lib/ folder) and adds every filename ending with .jar to the classpath. Again, keep in mind that this isn't optimal - but it's something quick-and-dirty that you can use to get going:


::EnableDelayedExpansion must be turned on in order to
::programmatically append to the classpath
SETLOCAL EnableDelayedExpansion
SET CLASSPATH=.

FOR %%i IN (lib\*.jar) DO SET CLASSPATH=!CLASSPATH!;%%i

%JAVA_HOME%\bin\java -cp %CLASSPATH YourClassGoesHere

Be sure to include the SETLOCAL (http://ss64.com/nt/setlocal.html) directive and to use the ! operator instead of the % operator when doing the assignment.

Sunday, January 24, 2010

ASPxGridView Client-Side API

When it comes to managing tabular data within an ASP.NET application I have found that the ASPxGridView control from DevExpress delivers a nice upgrade from the standard ASP.NET GridView control. In addition to the UI, the ASPxGridView control provides you with a seemingly endless number of customization options and provides great support for CRUD operations, exporting data, and filtering data with as little of ceremony (ok so maybe the ceremony is still there, it's just hidden inside of all of the design time configurations) that is possible with respect to an ASP.NET application.

Overall, I'm 85% satisified with how DevExpress implemented this control. The 15% of dissatisfaction that remains is distributed across missing features, method semantics, and the lack of documentation for the Client-Side API. Since my first two complaints are based on my opinion and I have no influence on how DevExpress conducts its business - I will use this post to cover the absolute basics for exercising the JavaScript API provided by the ASPxGridView control.

Accessing the Client-Side events is done by clicking on the smart dag on a ASPxGridView control while in design mode. With the "Tasks" panel open:

Click the link for "Client-Side Events..."
Select the event that you wish to customize.
You'll see an empty function:

function(s, e) {
 
}

The two parameters passed into the function are the sender object (that's the 's' parameter which is the ASPxGridView in our example) and the event object.
For this tutorial let's detect a selection change within the ASPxGridView and perform a simple validation on selection to enable or disable a button that's on the same form. After selecting the SelectionChanged event I provide the following JavaScript:
```
function(s, e) {
s.GetSelectedFieldValues("ID",
function onGetValues(result){
   var button = document.getElementById("btnCreateID");
   if (s.GetSelectedRowCount() != 1) {
      button.disabled = true;
  } else {
     if (result[0] = null)
            button.disabled = false;
         else
        button.disabled = true; 
    }});
 }
```
The above code invokes the GetSelectedFieldValues method from the Client-Side API and supplies the column value we wish to retrieve (a column named "ID" in this case) and a callback function that's used to process the result of getting the selected field values. In our example if we've selected one column and if that column doesn't already have an ID assigned to it, the create ID button is enabled.

That's all there is to it. The example is basic and somewhat contrived but hopefully it fills the void for your typical 'Getting Started' documentation that I'd like to see DevExpress provide.

Wednesday, January 20, 2010

My Top 10 Commonly Used Eclipse Shortcuts

Since getting reaquainted with the Eclipse IDE in the past few months I thought I'd share my top 10 most commonly used keyboard shortcuts (along with some commentary of course):

Alt + Up or Down Arrow: This shortcut moves a line of code up or down based on the directional arrow you choose. Identation is respected so if you move a line of code inside of something like a loop or conditional you maintain formatting.

CTRL + Shift + F: Formats the code. This is a real time saver especially if you have a custom formatter defined for your workspace.

CTRL + Alt + Up or Down Arrow: Duplicates a line of code. I find myself using this shortcut when making variable declarations.

CTRL + Shift + T: Opens a Type. Simliar to CTRL + Shift + R (open a resource) but this is more precise when you're truly looking to find a type and not a file in your workspace.

CTRL + Shift + O: Organizes imports. My code always feels amateur until I remove all of the unnecessary imports along with removing the blanket import statements (e.g., some.package.*).

Alt + Shift + R: Refactor. I'm not sure when Eclipse added the UI-sugar that live updates your code during a refactor operation but I like it!

CTRL + H: Search, not find. Not a huge fan of the search utility but sometimes it can't be avoided.

CTRL + /: Toggle Comment. Another huge time saver especially when you're trying to diagnose an offending block of code. For the purists at heart, the CTRL+Shift+\ and CTRL+Shift+/ add and remove (respectively) block comments.

CTRL + W: Close window. Definitely not revolutionary but I feel this one is often overlooked.

CTRL + L: Go to line. For the moments when your stack traces appear in a separate log file and not in a console inside of eclipse (where a hyperlink is usually provided if the source is available).

Tuesday, January 12, 2010

Content Repositories

Managing content vs. data is something that is becoming more clear to me in recent months. Over the last 2 years I’ve had the opportunity to get familiar with XML repositories (mainly MarkLogic Server) but did not really see the benefit of using such a technology outside of an enterprise-search-like application. MarkLogic provides a rich search API that includes content processing and enrichment along with some other pretty powerful features. All of this functionality is great but what if your project budget can’t accommodate the steep price? Are lesser known or open source content repositories still worth it?

I think so. First - what does it really mean when someone says their application manages content vs. data? In most cases, content and data can describe the same type of information but vary significantly in terms of extensibility. Data has a rigid structure that can be difficult to change over time while content tends to have a less rigid structure that can absorb additions and transformations more gracefully over the life of an application or solution. With that loose distinction between content and data in place picture using an XML repository to store messages exchanged in a SOA publish / subscribe paradigm. The messages will most likely evolve over time requiring developers to update the storage and retrieval mechanisms traditionally involving the update of relational queries in the application along with the structure of the database. Since XML repositories don’t have a set structure – developers are only concerned with how to retrieve the information and don’t need to concern themselves with the semantics behind storing the document (message validation isn’t included in this discussion).

This use case is fundamental and primitive but hopefully illustrates a key benefit for those looking to distinguish managing content from data.