Factors Impacting Content Migration

Whether one is implementing a new system or moving from one system to another, content migration is always an important aspect. Each situation is unique and so each migration scenario will have its own roadmap. However, there are some common factors that are present in each migration and can determine how long will the migration last. I’m listing down a few. If you think there are other factors as well, please feel free to comment.

In order to take stock of these factors, one needs to follow a good migration approach and spend decent amount of time in analysis. I will not go into details of such an approach – There are quite a number of good articles on these approaches and in particular I like this one by James Robertson. This post is also not about the importance of content analysis, governance and other best practices  🙂

So here are some factors. At this point in time, these are only high level thoughts and I will probably expand these as and when I get time. So Feedback most welcome.

Source(s) of Content

Where and how the original content lives is probably the most important factor defining migration. It could be in a flat file system, database, another content management system, another application or somewhere else. It is important to understand whether it is stored in a proprietary format or not and how easy is it to access it. Obviously content stored in a table in a relational database is easier  to access as compared to something that is stored in a proprietary format.

Type of Content

Content Type could be Media (Images, Video, Audio), Documents (DOC, XLS, PDFs), text (html, xml), database or something else. Migration time lines are hugely dependent on this – migrating X-RAY images where each file could be couple of MBs or more has different challenges than migrating small text files. And when done across multiple environments, the effort only multiplies.

Closely related to this is what the content actually contains? So for example, do you need to migrate multiple languages, encodings and charactersets?

Quality of content

Pranshu gave me this example on twitter the other day. He was involved in a migration scenario in which the headlines of news articles were actually images. So even though migration of body and other fields could be automated, there was a good amount of manual intervention required to convert image headlines to text headlines for destination. Some other examples could be:

  • do all html files follow a template?
  • content with inconsistent metadata (like Male/Female Vs M/F)
  • content with missing metadata that could be mandatory on destination system
  • How much of the content is still relevant?

Amount of Content

Amount as well as the size of files is very relevant. In case of document scenarios, it is important because huge files take time to move across and in case of web content scenarios, it is important especially when things need manual intervention.

Destination System

Is the target system a content management system or is it something else? Does it support the fields that you require or do you need workarounds? Does it provide some mechanism to ingest content? Does it support your metadata and permissioning requirements?

Transformation or Value Add required

In my opinion, this is the factor that is probably the most important. The amount of transformation required between source and destination can actually define how much automation is possible and how much manual intervention is required. If you were to do an “as is” migration, things would possibly be trivial. So for example:

  • Title field in Source needs to be mapped to headline field in destination
  • Do all source fields need to be migrated?
  • Is there a need to define additional fields?
  • Is there a need to transform fields based on constraints (for example an ID in the source CMS would be stored as “123-45-6789” where as in the new CMS, “-” would not be permitted and it needs to be stored as “123.45.6789”)
  • Data cleansing
  • Do you need other value adds (like SEO, copyrighting and so on)?
  • Do you need to repurpose the same content for say delivery to mobile devices?
  • Are there links between files that need to be preserved? (like an XLS embedded within a Doc)
  • Do you want to migrate only the latest version or all versions? what happens to content that is part of an incomplete workflow?

Users and Roles

The difference in how users, roles and the whole permissioning system works in source as compared to destination also plays an important role. This is dependent on capabilities of the system as well as how comprehensively your organization has defined these. In some cases, just like data mapping, you might also need to map these for permissions etc. Read permission in source could be mapped to view permission in Destination. There would also be cases when there is no one to one mapping of permissions between source and destination.

Amount of automation possible

Based on some of the above factors, you will have an idea of how much of the migration can be automated. The extent of automation is dependent on source as well as destination systems:

  • Does source allow export of content?
  • Does destination allow import of content?
  • Are 3rd party products for analysis and migration being used?
  • Do these products allow ETL kind of activities?
  • etc

Roll out

How you want to roll out the new system also impacts your time lines. In scenarios where there are multiple geographies or multiple business units involved, it could be tricky. The reasons for these are more organizational and less related to technology. So whether you do a big bang roll out or do a phased roll out impacts the migration process.

Parallel Run

This is in some way related to the point above. Will Source and Destination systems be required to run in parallel? If yes, content will have to reside possible at both places and if users continue to modify content during the migration, you have to consider doing multiple iterations.

Infrastructure and Connectivity

The speed at which content can be moved across, exported, imported or ingested is also dependent on the connectivity between source, destination, databases etc.

So do you have similar experiences? Are there any other factors that can impact migration time lines?

(Thanks to @pranshuj, @rikang and @lokeshpant for inputs)

Fatwire "Rescues" Interwoven and Vignette

Forrester recently named Fatwire a Leader in their WCM for external Sites Quadrant. And the folks at Fatwire have already called two of their fellow-quads (for the lack of a better term), Interwoven and Vignette as legacy WCM products! Incidentally, Interwoven sits nicely in the Leader quadrant in the same report and was also named the fastest growing ecm vendor by rival analyst firm Gartner. (Yeah, yeah I know –  the report by Forrester is on WCM and the other one by Gartner is on ECM).

On a more serious note though, Fatwire has been making some news in recent times. Among other things, recently they announced a rescue program for “legacy” Interwoven and Vignette customers – an offer to move to Fatwire at no license cost (only the support costs). They have announced this offering in partnership with Vamosa and Kapow. Vamosa and Kapow both have content migration offerings and compete in this space. Fatwire says they both add value to this proposition. I suspect they have partnered with both because Vamosa, along with expertise in many aspects of content migration, has connectors for Interwoven and Vignette while Kapow has connectors for Fatwire. Any content migration scenario will require both set of connectors – one set that exports from interwoven or vignette and one set that imports into Fatwire. You could obviously roll up your own migration scripts by publishing from Interwoven/Vignette as XML and then using Fatwire’s XMLPost or BulkLoader to import into Fatwire. But then the offer for free licenses wouldn’t be free or would it?

BTW, even though Fatwire’s release mentions these as partners, neither of these two have issued their press release nor have mentioned it on their respective sites. I think that’s natural because they probably have partnerships with those “legacy” vendors 🙂

This is an interesting and I’d say an aggressive move by Fatwire. After all there are only few niche WCM vendors remaining and they are one of them. There is a clear divergence happening in the marketplace – On the one hand, there are more web oriented scenarios (Web Content Management, Site Management, Portals, Web Sites and so on) and on the other hand are more repository/management oriented scenarios (Document Management, Records Management). The requirements, challenges as well as decision makers (and stake holders) for both these areas are usually different. Fatwire for one has been focusing on and targeting the needs of interactive marketers which usually fall under the former category of web oriented scenarios (or Web Experience Management, as they like to call it). While many other products have been diversifying horizontally. Call it vertical Vs horizontal diversification if you will.

If there was any time to go aggressive, this was possibly it when the two other big ones have been acquired. Interwoven and Vignette, though can by no means be called “Legacy”, even though they have been acquired. There are probably a few customers out there who are not convinced about Interwoven’s and Vignette’s future after their acquisition by Autonomy and OpenText respectively. But then, as Forrester’s Tim Walters says on his blog, there are many customers out there, including Fatwire customers who are unhappy with their current implementation. So nothing stops the other vendors to come out with this kind of offer for existing Fatwire customers. In fact, as Tony Byrne says, there’s nothing new in these kind of Competitive upgrades.

If you indeed take up this offer, remember that even though there is no license cost, there are quite a few other costs apart from the support costs that you would have paid to Vignette or Interwoven. Here’s Irina’s post on real costs of implementation.

For one, you will have to work with Fatwire’s “proven migration tools and services” which probably means you will need to work with Fatwire, Vamosa and Kapow’s professional services. All the three products (Interwoven, Vignette and Fatwire) have decent mechanisms for importing and exporting content. So content migration per se is certainly not the most challenging aspect. In particular, when you migrate from Interwoven to Fatwire, there are many other challenges depending on what version of TeamSite you are using. TeamSite’s delivery templates are totally different from those of Fatwire’s. If you are using the Perl based PTs (Presentation Templates) and doing a static publishing, your challenges are even bigger. There are many other issues as well – different ways of defining assets, all the complex customizations, different storage (XML Vs Database), workflows and so on.  Vignette, although more similar to Fatwire than Interwoven in terms of architecture, will also have similar challenges. Apart from technical challenges, any content management implementation and content migration has its own sets of challenges in terms of user training, ensuring content quality (Vamosa has some useful offerings here as well), different skill sets and so on. Here’s a nice take on different issues by Jon

I could write a big article on just the differences between Fatwire and Vignette/Interwoven and resulting challenges but the point is that don’t assume it is only about “content” migration. You will need to budget for many other things as well.

Random Notes on EMC World

These are some observations, in no particular order. I will possibly post some “more sensible” posts on specific topics later.

  • It was first time for me at EMC World and I thought the focus was much more on storage and infrastructure as compared to content management. They did certainly much better though in terms of integrating CMA (Content Management and Archival) with the overall EMC World. A lot of people who I talked to thought it was actually much better than that in the past when CMA folks felt quite out of place.
  • A big theme at the conference was about building social communities. Joe Tucci, the EMC Chairman started his key note with some statistics on tweets about the EMC World. He spoke about how EMC is working to give its customers more choice, better control and improved efficiencies. There was a dedicated blogger’s lounge, set up by Len Devanna and his team, which provided a great informal environment for bloggers and tweeps to come together and socialize. I am glad I was able to meet Laurence (pie), Len and Stu. There were other lounges on similar lines and in particular, the Momentum lounge provided a good place for Documentum users to meet.
  • Then there was CMA president Mark Lewis’ key note. He talked of ROI as return on information.
  • I was particularly interested in EMC’s initiatives around Customer Communication Management (or rather around their xPression product which came via the acquisition of Doc Sciences). Although, there were a few (and good) sessions on this, I was hoping for a bigger presence. They had a small, not so prominent booth within a large EMC booth.
  • Another interesting announcement (although this was done a couple of days before EMC World) was about free availability of the developer edition of Documentum. I think this is a great move to increase usage and acceptance of Documentum. EMC claims it takes 23 minutes to get up and running with Documentum, although i suspect it will take much more to download it – It is almost a 2 GB download and has steep RAM requirements (recommended 4 GB although 3 GB would work too) and so it would not be as easy to run it (on a laptop) as it is with some other products.This will essentially enable developers to get their hands dirty which in turn will help in more spreading of Documentum.  The developer edition comes bundled with Jboss and SQL Server Express database.
  • Some claimed that there were 7000 attendees but I felt the number was lower. I also think that number of customers, especially those interested in content management were far less than previous times. Although there were quite a few partners, the big partners were noticeable by their absence.
  • CMIS was reasonably covered. There was a dedicated session by Laurence and Karin Ondricek as well as Victor Spivak covered it in his session on D 6.5 architecture. Laurence demoed the federated CMIS sample application and according to him, the fact that Alfresco and Nuxeo allowed their servers to be up for Documentum conference showed the high amount of cooperation happening on CMIS.
  • Victor was quite clear about the scope of CMIS and more importantly what it is not. According to him, “I” is the most important letter in the acronym and in that sense, the objective is to provide interoperability and not implement more sophisticated features. And so the focus is only on basic services, mashup type of applications and not real business applications which are best handled by proprietary APIs (like DFS) or CMS specific features. He also said If you were to describe 6.5 release in 1 sentence, it would be “high volume services”.
  • There were quite a few sessions on WCM and more “Delivery oriented” aspects like Dynamic delivery, site management, Web 2.0, RIAs and so on. EMC has also latched on to the term Web Experience Management (WEM), something that Vignette and Fatwire have been using for some time. Web Publisher is not yet as sophisticated a platform for WCM and it remains to be seen how they do it.
  • Most of the sessions were EMC specific and by EMC and I think the number of independent sessions should be increased. I attended the one by Jeetu Patel of Doculabs in which he talked about different type of ROI modeling for ECM projects.
  • There were quite a few sessions on CenterStage. Victor talked about the philosophy behind center stage and that was to separate front end completely from business logic and backend because front end technologies have been changing quite often. I think this is an obvious way and wonder why this was not done in Webtop. He also explained the increasing support for restful apis etc. (See Pie’s post here ).
  • There were also few discussions around Lucene replacing FAST search in EMC’s products.

Open Text acquires Vignette

After Autonomy/Interwoven and Oracle/Sun news, here comes the third big news of the year.

If Unilever can have multiple soaps and GM can have multiple car models, why can’t a Content Management vendor have multiple products? OT’s acquisition of Vignette points to this increasing “commoditization” of Content Management marketplace.

There may be a lot of overlaps in products across OT and Vignette but we all know that one size does not fit all and so why not have different products for different scenarios, different price points, different technology stacks and different requirements?  OT now has multiple options for Document Management, DAM, WCM etc plus a bonus portal server that they lacked before. They had a portal integration kit (PIK) that exposed LiveLink’s functionality as portlets that could be deployed on some of the portal servers (but not VAP and Sun as far as I know).

There’s some good analysis here and here.

On a side note, I think people who worked closely with Vignette knew it coming. A colleague of mine told me this:

One Singapore based vignette customer we were talking to  suddenly went quiet and our sales guy spotted him meeting OpenText. Another one who we were talking to, suddenly decided not to continue with Vignette and decided to migrate to Day communiqué. A senior person in Vignette Singapore joined OpenText about 2-3 months back – and was not replaced. There were many other signs in the way Vignette was handling people and partnerships that showed something is on.

I always considered Interwoven, Vignette and Fatwire (Open Market, Divine and FutureTense before that) as the leaders and pioneers in pure play Web Content space. With Interwoven and Vignette gone, what does this mean for the WCM marketplace? An end of the era?

Autonomy Acquires Interwoven

It was a usual hectic day at work when I read about this sudden interesting development of Interwoven to be acquired by Autonomy. You can read more about it at CMS Watch and CMS Wire.

I felt a bit sad – Interwoven was one of the few pure play CMS vendors and pioneered many of the Content Management concepts. Okay, so the products will still be there but you never know how they evolve in the context of a new setup. A lot of attention  is now on Vignette, the other major CMS vendor. I wonder, why is no one talking of Fatwire?

Lee Dallas calls this a consolidation in a different direction. Most others in this space have been with infrastructure vendors or with other related vendor. So in that sense, this brings in a unique differentiation for both these vendors. What could be interesting in this context is what now happens to Autonomy’s relationships with other CMS vendors. Many CMS vendors had integrations and OEM relationships with Autonomy and those will probably get redefined now. Similarly, Interwoven’s partnership with other search vendors (like FAST) will probably also get reviewed.

Even though Autonomy is known more for its search products, it also has offerings for BPM (Cardiff), Records Management (Meridio) and Digital Assets (Virage). So it would be interesting to see how and when overlaps are rationalized with Interwoven’s MediaBin, WorkSite and other related offerings.

In other interesting news this week, Alfresco released the final version of Alfresco 3 Labs, which among other things has Web Studio, a designer tool to build web application. But that is a topic of another post.

Goodbye 2008, Welcome 2009

Okay so another year comes to an end and while we welcome the new year, here’s a look at some of the themes (in a random order) of the year gone by that might have an impact on the Content Technologies next year.

Verticalized Applications

Content Management Systems as horizontal solutions have been there for long and most known vendors provide similar features. The industry however is asking for more domain specific solutions built on standard CMS repositories. Based on this demand and the fact that this provides a differentiation to CMS vendors, I hope to see more and more domain or vertical specific solutions like Loan Origination, Claims Processing and other similar solutions/accelerators from many CMS vendors. Also, with the slowdown in economy, it is easier to sell a domain solution than a pure horizontal solution.

Portal and Content Consolidation

Many enterprises struggle with multitude of applications doing overlapping functionality. Organizations have multiple CMS repositories and many portals. This often leads to duplication of content varied user experience and huge costs. Because of huge cost pressures, many organizations have been considering consolidation of their content applications.

This will lead to following benefits:

  • Reduced Hardware Infrastructure as you don’t need those 5 different ECM repositories
  • Reduced employee costs as you do not need skilled people across 5 different portal servers
  • Standardized processes and hence increased productivity
  • Reduced employee training costs
  • Unified User Experience
  • Reduced Integration, Maintenance and Support Costs

I believe this could be a very important way to reduce and control costs as well as bringing in some standardization. So many organizations would start focused initiatives to consolidate their existing applications.

Open Source

Open Source Content Management and Portal solutions have matured quite a bit. Because of this and the fact that there is cost pressure on everyone, enterprises that would not even consider Open Source solutions are now more favorable towards them. They are becoming open to experimenting with technologies that are generally not considered *enterprisey*.  Many of the open source products are being tracked by waves and quadrants of major analysts and  that reflects a huge change. This is also good for the Open Source vendors because many enterprises use these analysts’ reports for shortlisting.  Many open source products have also released commercial versions and that is another reason that gives these vendors a foot hold within enterprises who did not want to use these citing lack of support options.

Another factor that encourages the use of Open Source products is that people want to quickly build “informal” applications which many commercial products can not do well. There are many popular Open Source (and free) products that do certain things much better.

Although, initial cost could reduce by using Open Source, organizations should carefully look at the impact over a longer horizon and should consider Open Source as another alternative in the market place. They should select Open Source based on overall fitment to their requirements and not just make a decision based on initial licensing cost.

Web 2.0

Widgets and Gadgets have been popular for quite sometime. Some products had gadgets much before portlet spec. I am sure many people have seen examples of counters, ad banners etc which are essentially widgets only. However, there is a considerable interest now in using these within the enterprises for more sophisticated portal like applications.

Currently, most social networking is horizontal – you become a member of a social network, I become one and we write scraps on each other. What next?  I believe Vertical Social Networking is becoming popular.  Some areas where we already see this or have potential are in the areas of Jobs, Real Estate and Classifieds. After all, It is easier to buy an old laptop from a contact’s contact rather than an unknown person who’s advertised in classifieds.

In order to reduce cost, many enterprises, especially those that require product support want to leverage the communities for customer support. They want people to help each other and come to their support only as a last resort. What this means is increasing use of tools that enable collaboration – wikis for example. Many enterprises are using these communities not just for support but also as a way to generate revenues.

Some organizations are also using web 2.0 as a means to Knowledge Management. Instead of regular process oriented KM which forces people to contribute, they want to use mechanisms that encourage people who in turn want to contribute. This is a huge shift – people don’t like contributing if they are forced to do it but are likely to contribute if they enjoy doing it. This also means a shift from “control and process” to “informality and accessibility”.

In spite of all this, I still think how to use Web 2.0 within the enterprise is still not very clear to many organizations and there is a huge scope for improvement. One of the reasons people cite is that workforce is used to applications that became successful on the consumer Internet and want to have same kind of experience for enterprise applications but they need to be very careful. Here’s a nice post by Vilas.

Alternate Delivery Models

There is more acceptance for SaaS based offerings. This is especially true for applications that are not business mission critical. Businesses are experimenting with SaaS based providers because this saves them dependence on their internal IT apart from other benefits like faster time to market, no capital expenditure, low risk and so on. Along with this,  alternate pricing models are also being looked at. Some examples are pay per document, pay per loan, pay per claim etc.


The portlet spec 2.0 or JSR 286 was released. Although the portlet standards (JSR 286 and JSR 168) have been relatively successful in terms of adoption and support, the content repository standard, JSR 170 has not been that popular. Meanwhile, vendors are collaborating on technologies that will help customers reuse existing investments. As an example, many vendors have come up with CMIS. Okay this is not a standard yet but is possibly in that direction. A standard like this is very much needed and hopefully CMIS will achieve what JSR-170/283 did not.

I would also hope that a standard emerges for Gadgets/Widgets.

Site Management and Personalization

Traditionally Content Management was decoupled from Site Management. However, marketing and business people now want more control and there is increasing convergence of Content Management and Site Management. This essentially means better user experience, rich and dynamic sites. This also means features like personalization are making a come back. This has also resulted because of cheap bandwidth and better client side technologies

Document Services

Document Composition and Generation is becoming part of mainstream ECM. There have been a few partnerships as well as mergers in this space. Related terms in this space are Document Output Management and Forms Management.

This was probably the last post of this year. Thanks for reading the blog and here’s wishing you a great year ahead.

Commercial Offerings from Open Source Product Vendors

Two vendors released their commercial offerings based on very popular open source products. Earlier this month, Acquia released commercial Drupal which is a collection of popular 3rd party applications packaged with Drupal to extend its social publishing capabilities.

Liferay also followed and released an enterprise edition of its Portal product. The enterprise edition will be a commercially supported version of its free standard edition. This release also came with a newly done website as well as a new offering called Social Office (to be released soon) which extends Liferay’s collaboration features.

A major reason that hinders Open Source product usage especially among enterprises has been lack of commercial support. If a bank’s loan origination system is down, they wouldn’t be too happy to depend solely on community support! Alfresco already had this model and now with Acquia and Liferay having announced it, I think enterprises will increasingly consider open source products as viable options. After all, they will get benefits of Open Source and Community along with the promise of commercial grade support at lower price points. This also gives them the comfort that there is seriousness behind the product and not just a hobby-ist’s effort.