Factors Impacting Content Migration

Whether one is implementing a new system or moving from one system to another, content migration is always an important aspect. Each situation is unique and so each migration scenario will have its own roadmap. However, there are some common factors that are present in each migration and can determine how long will the migration last. I’m listing down a few. If you think there are other factors as well, please feel free to comment.

In order to take stock of these factors, one needs to follow a good migration approach and spend decent amount of time in analysis. I will not go into details of such an approach – There are quite a number of good articles on these approaches and in particular I like this one by James Robertson. This post is also not about the importance of content analysis, governance and other best practices  🙂

So here are some factors. At this point in time, these are only high level thoughts and I will probably expand these as and when I get time. So Feedback most welcome.

Source(s) of Content

Where and how the original content lives is probably the most important factor defining migration. It could be in a flat file system, database, another content management system, another application or somewhere else. It is important to understand whether it is stored in a proprietary format or not and how easy is it to access it. Obviously content stored in a table in a relational database is easier  to access as compared to something that is stored in a proprietary format.

Type of Content

Content Type could be Media (Images, Video, Audio), Documents (DOC, XLS, PDFs), text (html, xml), database or something else. Migration time lines are hugely dependent on this – migrating X-RAY images where each file could be couple of MBs or more has different challenges than migrating small text files. And when done across multiple environments, the effort only multiplies.

Closely related to this is what the content actually contains? So for example, do you need to migrate multiple languages, encodings and charactersets?

Quality of content

Pranshu gave me this example on twitter the other day. He was involved in a migration scenario in which the headlines of news articles were actually images. So even though migration of body and other fields could be automated, there was a good amount of manual intervention required to convert image headlines to text headlines for destination. Some other examples could be:

  • do all html files follow a template?
  • content with inconsistent metadata (like Male/Female Vs M/F)
  • content with missing metadata that could be mandatory on destination system
  • How much of the content is still relevant?

Amount of Content

Amount as well as the size of files is very relevant. In case of document scenarios, it is important because huge files take time to move across and in case of web content scenarios, it is important especially when things need manual intervention.

Destination System

Is the target system a content management system or is it something else? Does it support the fields that you require or do you need workarounds? Does it provide some mechanism to ingest content? Does it support your metadata and permissioning requirements?

Transformation or Value Add required

In my opinion, this is the factor that is probably the most important. The amount of transformation required between source and destination can actually define how much automation is possible and how much manual intervention is required. If you were to do an “as is” migration, things would possibly be trivial. So for example:

  • Title field in Source needs to be mapped to headline field in destination
  • Do all source fields need to be migrated?
  • Is there a need to define additional fields?
  • Is there a need to transform fields based on constraints (for example an ID in the source CMS would be stored as “123-45-6789” where as in the new CMS, “-” would not be permitted and it needs to be stored as “123.45.6789”)
  • Data cleansing
  • Do you need other value adds (like SEO, copyrighting and so on)?
  • Do you need to repurpose the same content for say delivery to mobile devices?
  • Are there links between files that need to be preserved? (like an XLS embedded within a Doc)
  • Do you want to migrate only the latest version or all versions? what happens to content that is part of an incomplete workflow?

Users and Roles

The difference in how users, roles and the whole permissioning system works in source as compared to destination also plays an important role. This is dependent on capabilities of the system as well as how comprehensively your organization has defined these. In some cases, just like data mapping, you might also need to map these for permissions etc. Read permission in source could be mapped to view permission in Destination. There would also be cases when there is no one to one mapping of permissions between source and destination.

Amount of automation possible

Based on some of the above factors, you will have an idea of how much of the migration can be automated. The extent of automation is dependent on source as well as destination systems:

  • Does source allow export of content?
  • Does destination allow import of content?
  • Are 3rd party products for analysis and migration being used?
  • Do these products allow ETL kind of activities?
  • etc

Roll out

How you want to roll out the new system also impacts your time lines. In scenarios where there are multiple geographies or multiple business units involved, it could be tricky. The reasons for these are more organizational and less related to technology. So whether you do a big bang roll out or do a phased roll out impacts the migration process.

Parallel Run

This is in some way related to the point above. Will Source and Destination systems be required to run in parallel? If yes, content will have to reside possible at both places and if users continue to modify content during the migration, you have to consider doing multiple iterations.

Infrastructure and Connectivity

The speed at which content can be moved across, exported, imported or ingested is also dependent on the connectivity between source, destination, databases etc.

So do you have similar experiences? Are there any other factors that can impact migration time lines?

(Thanks to @pranshuj, @rikang and @lokeshpant for inputs)


Fatwire "Rescues" Interwoven and Vignette

Forrester recently named Fatwire a Leader in their WCM for external Sites Quadrant. And the folks at Fatwire have already called two of their fellow-quads (for the lack of a better term), Interwoven and Vignette as legacy WCM products! Incidentally, Interwoven sits nicely in the Leader quadrant in the same report and was also named the fastest growing ecm vendor by rival analyst firm Gartner. (Yeah, yeah I know –  the report by Forrester is on WCM and the other one by Gartner is on ECM).

On a more serious note though, Fatwire has been making some news in recent times. Among other things, recently they announced a rescue program for “legacy” Interwoven and Vignette customers – an offer to move to Fatwire at no license cost (only the support costs). They have announced this offering in partnership with Vamosa and Kapow. Vamosa and Kapow both have content migration offerings and compete in this space. Fatwire says they both add value to this proposition. I suspect they have partnered with both because Vamosa, along with expertise in many aspects of content migration, has connectors for Interwoven and Vignette while Kapow has connectors for Fatwire. Any content migration scenario will require both set of connectors – one set that exports from interwoven or vignette and one set that imports into Fatwire. You could obviously roll up your own migration scripts by publishing from Interwoven/Vignette as XML and then using Fatwire’s XMLPost or BulkLoader to import into Fatwire. But then the offer for free licenses wouldn’t be free or would it?

BTW, even though Fatwire’s release mentions these as partners, neither of these two have issued their press release nor have mentioned it on their respective sites. I think that’s natural because they probably have partnerships with those “legacy” vendors 🙂

This is an interesting and I’d say an aggressive move by Fatwire. After all there are only few niche WCM vendors remaining and they are one of them. There is a clear divergence happening in the marketplace – On the one hand, there are more web oriented scenarios (Web Content Management, Site Management, Portals, Web Sites and so on) and on the other hand are more repository/management oriented scenarios (Document Management, Records Management). The requirements, challenges as well as decision makers (and stake holders) for both these areas are usually different. Fatwire for one has been focusing on and targeting the needs of interactive marketers which usually fall under the former category of web oriented scenarios (or Web Experience Management, as they like to call it). While many other products have been diversifying horizontally. Call it vertical Vs horizontal diversification if you will.

If there was any time to go aggressive, this was possibly it when the two other big ones have been acquired. Interwoven and Vignette, though can by no means be called “Legacy”, even though they have been acquired. There are probably a few customers out there who are not convinced about Interwoven’s and Vignette’s future after their acquisition by Autonomy and OpenText respectively. But then, as Forrester’s Tim Walters says on his blog, there are many customers out there, including Fatwire customers who are unhappy with their current implementation. So nothing stops the other vendors to come out with this kind of offer for existing Fatwire customers. In fact, as Tony Byrne says, there’s nothing new in these kind of Competitive upgrades.

If you indeed take up this offer, remember that even though there is no license cost, there are quite a few other costs apart from the support costs that you would have paid to Vignette or Interwoven. Here’s Irina’s post on real costs of implementation.

For one, you will have to work with Fatwire’s “proven migration tools and services” which probably means you will need to work with Fatwire, Vamosa and Kapow’s professional services. All the three products (Interwoven, Vignette and Fatwire) have decent mechanisms for importing and exporting content. So content migration per se is certainly not the most challenging aspect. In particular, when you migrate from Interwoven to Fatwire, there are many other challenges depending on what version of TeamSite you are using. TeamSite’s delivery templates are totally different from those of Fatwire’s. If you are using the Perl based PTs (Presentation Templates) and doing a static publishing, your challenges are even bigger. There are many other issues as well – different ways of defining assets, all the complex customizations, different storage (XML Vs Database), workflows and so on.  Vignette, although more similar to Fatwire than Interwoven in terms of architecture, will also have similar challenges. Apart from technical challenges, any content management implementation and content migration has its own sets of challenges in terms of user training, ensuring content quality (Vamosa has some useful offerings here as well), different skill sets and so on. Here’s a nice take on different issues by Jon

I could write a big article on just the differences between Fatwire and Vignette/Interwoven and resulting challenges but the point is that don’t assume it is only about “content” migration. You will need to budget for many other things as well.

Content Repositories – Coexistence, Migration and Consolidation

Many of our customers have more that one content repositories. So we often get into situations where there is a need for:

  1. Coexistence: Business requires these multiple repositories to exist simultaneously. This could be because there are different applications for different requirements or because the migration effort is so huge that it is not possible to retire one system immediately. So there is a need for a common interface so that business users can access all the repositories without knowing they are different. They should be able to checkin from one, checkout to another and generally work on multiple repositories as if there was a single backend system.
  2. Migration: Because of multiple reasons (licensing, satisfaction etc), there is a requirement to move content from one system to another. Or perhaps deploying content from a content repository to a delivery channel.
  3. Consolidation: To save costs (licensing, training, infra), they want to consolidate to less number of repositories.

Now obviously, in any of these scenarios, it becomes very important to do content inventory, content analysis and mapping, taxonomy assessment and so on. However, when the content size is huge (some of our customers have terabytes and more of content, that too in the form of huge documents), it becomes important to automate the migration process to the extent possible. What this essentially means is that you need to have an intermediate layer that can talk to source and target repositories and move content across. Depending on whether you want coexistence or migration, the intermediate layer would need to be two way (read and write) or just one way (import or export). I can think of three ways to achieve this:

  1. Roll your own: This possibly provides most flexibility but needs maximum time to develop. You essentially write your own code that exports content from source, does transformation and cleansing and then finally imports it to the target repository. Most decent content management systems provide APIs that can be used in conjunction with code to achieve this.
  2. Use connectors/features provided by CMS vendors: Many CMS vendors provide some mechanism for importing and exporting. They might even provide some way of importing content from “specific” systems.
  3. Use third party tools

I have been doing some research and have come across these vendors who can help you automate this process to a great extent:


EntropySoft provides an amazing collection of two way connectors for 30 or so different repositories. These connectors have the ability to read from and write to these repositories. They essentially provide two mechanisms:

A Content Federation Server which is a web application. It allows you to configure these repositories and exposes the functionality via a web interface. So using this interface you can access these repositories and your business users will not know that they are different repositories. So as an example, you can check out a policy document from Documentum and check it in FileNet. The same interface also lets you do migration from one to another, create tasks that will automatically migrate as and when a document is updated in one. In the screenshot below, you can see Livelink, Alfresco, FileNet and some other repositories shown.

EntropySoft Web Interface
EntropySoft Web Interface
Configure new repo
Configure new repo

Now this interface is simplistic in the sense that you can do an as-is migration. For more complex migration where you have to transform content, map metadata from source to destination, map permissions, users and roles, it provides an ETL product which is an eclipse based environment. Using this ETL, you can create complex migration processes, using drag and drop.


EntropySoft also works with many search engines for creating federated search applications.

The best thing about EntropySoft is its ease of setup. You can actually get up and running and start an as-is migration (meaning no transformation, no mapping) in just about 15-20 minutes (abt 7-10 minutes for setting up source and destination each). I think where they lag is possibly in terms of having connectors for more web content management systems.


OpenMigrate is an open source alternative from TSG.  Currently they have adaptors for Documentum, Alfresco, JDBC, FileSystem. I believe they are probably working on sharepoint and filenet connectors as well.


Vamosa is also a good alternative. To me it appears that their strength lies in web content management. Here’s a list of their connectors. I think Vamosa’s differentiation is that they not only focus on connectors but holistically look at migration. So they have some good products that help you with all the steps that I mentioned above – content inventory, analysis etc and then migration.

Many people have said that with something like CMIS (if and when it becomes a standard), there will be an adverse impact on the connector industry. I actually think it will actually be good for these connector vendors because they would be able to use CMIS instead of relying on proprietary APIs of each repository. Plus I think connecting to a repository is only one, although an important aspect. There is a lot more that goes along with connecting to a repository – transformation, ability to map source data to target repository, reporting, exception management and so on and that is where such products add a lot of value.

Do you know of any other products in this space? What do you think of these?