Machine Learning as an alternative to rule based processes

There’s a lot of discussion about machine learning these days and pretty much every one (vendors, users) is talking about it.

I remember attending courses on Artificial Intelligence, Machine Learning and even Artificial Neural Networks back in 1998. So what’s new?

How have AI and ML evolved?

I think a big reason why everyone is talking about machine learning now is that it it’s much simpler to use machine learning now for everyday, business use cases. Earlier, machine learning was mostly used for really complicated scenarios – think enterprise search (with advanced capabilities for proximity, sounds etc) or content analytics to do sentiment analysis. All these were useful but required expensive software and resources.

Not anymore. It’s become far easier to use machine learning for simpler problems. In fact, for lot of scenarios which required complex rules, you can actually use machine learning to take decisions. Let’s take an example. You are building a website that allows users to sell their old mobile phones. The website should be able to suggest a price based on a series of questions that a user answers. So you could have a set of rules that “rule-fy” each question.

For example:

Question 1: Phone model

If phone == A, Price = p

If phone == B, Price = q

Question 2: Age of phone

If phone == A, and bought within last year, price = P

If phone == A and bought more than one year ago but less than 2 years ago, price = 0.9 P

Question 3: Color

If phone == A, and bought within last year and color == black, price = P

If phone == A, and bought within last year and color == silver, price = 0.95 P

If phone == A and bought more than one year ago but less than 2 years ago, and color == black, price = 0.9 P

And so on. You can add more rules depending on questions about age, colour, defects, screen quality and so forth. And your rules become increasingly complex. And then what happens if a user wants to enter a value that the rule doesn’t handle?

Of course, in real life, you wouldn’t write rules like this. You will probably have a rules engine that that combines multiple rules and so forth but you get the idea.

Machine Learning as an alternative to Rules-based processing

Here’a how machine learning can replace a complex rules based application.

Let’s say you have historical data about phone sales. Yeah, I admit this is a big assumption but if you are creating rules and deciding prices, then you probably have some historical data anyways. So assume you have data such as this (this is just a sample; the more you have it, the better it is):

phone data

Fig: Second hand phone sales data

Now your original problem can be stated as a machine learning problem as follows:

How do you predict the price of a phone, that is not already there in the sample (or training set) above based on features and data available as part of training set?

Essentially, instead of you or your application making decisions based on pre-defined rules, you are now relying on your application to make decisions based on historical data. There are many techniques that can help you achieve this.

One relatively simpler technique is to use Linear regression. Linear regression is basically a statistical technique to predict an outcome (or dependent variable) based on one or more independent variables. Based on example above, you can describe Price P as a function of variables model, age, colour etc. Or in linear regression, it can be expressed as:

P = b0 + b1*model + b2*age + b3*colour + b4*condition…..

Machine learning algorithm then calculates values of b0, b1, b2 etc based on historical data and then you use this equation to predict price for an item that was not there in the training set. So if a new user now comes and offers a phone for sale on your site, you can recommend a price to her based on past sales.

Okay, that was a rather simplistic machine learning example and you can use many other more sophisticated techniques. For example, you can do a factor analysis or Principal Component Analysis (PCA) to reduce large number of items (e.g., news articles’ attributes) into smaller set of variables. Or use logistic regression instead of linear regression.. or whatever. The key point is that it is now much easier to use machine learning for everyday use cases without spending a lot on expensive software or resources. Pretty much all programing languages and development platforms have machine learning libraries or APIs that you can use to implement these algorithms.

The main drawback of using this approach (as in this example) is that the results might not always be as good as you would get with rules based technique. The quality of result is highly dependent on training set and as the training set improves (in terms of quality as well as quantity), the results would improve.

Are you using machine learning for your applications? If yes, what techniques are you using?



Fatwire "Rescues" Interwoven and Vignette

Forrester recently named Fatwire a Leader in their WCM for external Sites Quadrant. And the folks at Fatwire have already called two of their fellow-quads (for the lack of a better term), Interwoven and Vignette as legacy WCM products! Incidentally, Interwoven sits nicely in the Leader quadrant in the same report and was also named the fastest growing ecm vendor by rival analyst firm Gartner. (Yeah, yeah I know –  the report by Forrester is on WCM and the other one by Gartner is on ECM).

On a more serious note though, Fatwire has been making some news in recent times. Among other things, recently they announced a rescue program for “legacy” Interwoven and Vignette customers – an offer to move to Fatwire at no license cost (only the support costs). They have announced this offering in partnership with Vamosa and Kapow. Vamosa and Kapow both have content migration offerings and compete in this space. Fatwire says they both add value to this proposition. I suspect they have partnered with both because Vamosa, along with expertise in many aspects of content migration, has connectors for Interwoven and Vignette while Kapow has connectors for Fatwire. Any content migration scenario will require both set of connectors – one set that exports from interwoven or vignette and one set that imports into Fatwire. You could obviously roll up your own migration scripts by publishing from Interwoven/Vignette as XML and then using Fatwire’s XMLPost or BulkLoader to import into Fatwire. But then the offer for free licenses wouldn’t be free or would it?

BTW, even though Fatwire’s release mentions these as partners, neither of these two have issued their press release nor have mentioned it on their respective sites. I think that’s natural because they probably have partnerships with those “legacy” vendors 🙂

This is an interesting and I’d say an aggressive move by Fatwire. After all there are only few niche WCM vendors remaining and they are one of them. There is a clear divergence happening in the marketplace – On the one hand, there are more web oriented scenarios (Web Content Management, Site Management, Portals, Web Sites and so on) and on the other hand are more repository/management oriented scenarios (Document Management, Records Management). The requirements, challenges as well as decision makers (and stake holders) for both these areas are usually different. Fatwire for one has been focusing on and targeting the needs of interactive marketers which usually fall under the former category of web oriented scenarios (or Web Experience Management, as they like to call it). While many other products have been diversifying horizontally. Call it vertical Vs horizontal diversification if you will.

If there was any time to go aggressive, this was possibly it when the two other big ones have been acquired. Interwoven and Vignette, though can by no means be called “Legacy”, even though they have been acquired. There are probably a few customers out there who are not convinced about Interwoven’s and Vignette’s future after their acquisition by Autonomy and OpenText respectively. But then, as Forrester’s Tim Walters says on his blog, there are many customers out there, including Fatwire customers who are unhappy with their current implementation. So nothing stops the other vendors to come out with this kind of offer for existing Fatwire customers. In fact, as Tony Byrne says, there’s nothing new in these kind of Competitive upgrades.

If you indeed take up this offer, remember that even though there is no license cost, there are quite a few other costs apart from the support costs that you would have paid to Vignette or Interwoven. Here’s Irina’s post on real costs of implementation.

For one, you will have to work with Fatwire’s “proven migration tools and services” which probably means you will need to work with Fatwire, Vamosa and Kapow’s professional services. All the three products (Interwoven, Vignette and Fatwire) have decent mechanisms for importing and exporting content. So content migration per se is certainly not the most challenging aspect. In particular, when you migrate from Interwoven to Fatwire, there are many other challenges depending on what version of TeamSite you are using. TeamSite’s delivery templates are totally different from those of Fatwire’s. If you are using the Perl based PTs (Presentation Templates) and doing a static publishing, your challenges are even bigger. There are many other issues as well – different ways of defining assets, all the complex customizations, different storage (XML Vs Database), workflows and so on.  Vignette, although more similar to Fatwire than Interwoven in terms of architecture, will also have similar challenges. Apart from technical challenges, any content management implementation and content migration has its own sets of challenges in terms of user training, ensuring content quality (Vamosa has some useful offerings here as well), different skill sets and so on. Here’s a nice take on different issues by Jon

I could write a big article on just the differences between Fatwire and Vignette/Interwoven and resulting challenges but the point is that don’t assume it is only about “content” migration. You will need to budget for many other things as well.

Open Text acquires Vignette

After Autonomy/Interwoven and Oracle/Sun news, here comes the third big news of the year.

If Unilever can have multiple soaps and GM can have multiple car models, why can’t a Content Management vendor have multiple products? OT’s acquisition of Vignette points to this increasing “commoditization” of Content Management marketplace.

There may be a lot of overlaps in products across OT and Vignette but we all know that one size does not fit all and so why not have different products for different scenarios, different price points, different technology stacks and different requirements?  OT now has multiple options for Document Management, DAM, WCM etc plus a bonus portal server that they lacked before. They had a portal integration kit (PIK) that exposed LiveLink’s functionality as portlets that could be deployed on some of the portal servers (but not VAP and Sun as far as I know).

There’s some good analysis here and here.

On a side note, I think people who worked closely with Vignette knew it coming. A colleague of mine told me this:

One Singapore based vignette customer we were talking to  suddenly went quiet and our sales guy spotted him meeting OpenText. Another one who we were talking to, suddenly decided not to continue with Vignette and decided to migrate to Day communiqué. A senior person in Vignette Singapore joined OpenText about 2-3 months back – and was not replaced. There were many other signs in the way Vignette was handling people and partnerships that showed something is on.

I always considered Interwoven, Vignette and Fatwire (Open Market, Divine and FutureTense before that) as the leaders and pioneers in pure play Web Content space. With Interwoven and Vignette gone, what does this mean for the WCM marketplace? An end of the era?

Oracle buys Sun

Oracle announced it will acquire Sun.

Another big Portal/Content Management vendor is now an infrastructure vendor. Sometimes I wonder if  everything will soon become an appliance – you buy a Solaris box and it will come bundled not only with the OS (obviously) but also with WebCenter (or one of the numerous Oracle Portal type products), Content Server and so on. IBM, EMC and Microsoft can do this already in some sense.

Sun had open sourced its entire JES or Java ES (Java Enterprise System) sometime back and more recently dropped the JES Portal Server in favor of a partnership with Liferay. The result was WebSynergy, Sun’s branded portal based on Liferay’s codebase. It is not clear how Oracle will continue this partnership and frankly  they already have too many portal kind of offerings to continue with this. However, I think Liferay has a strong offering (and recently opened a new office in India) and will continue to be a good open source alternative whether or not Oracle continues this partnership.

The other component of JES that might have some relevant features is probably Sun Java Communications Suite which has features for collaboration  – things like calendar, messaging, Instant messenger as well as support for mobile communications. Some of these could be good additions to Oracle’s Fusion.

On a different note though, Janus had this to say on twitter:

Oracle buys sun – now Oracle has 5 enterprise portals! a new commercial for Larry: 5 out of 12 most significant portals are powered by ORCL

In spite of that, they had to resort to static pages!?

Autonomy Acquires Interwoven

It was a usual hectic day at work when I read about this sudden interesting development of Interwoven to be acquired by Autonomy. You can read more about it at CMS Watch and CMS Wire.

I felt a bit sad – Interwoven was one of the few pure play CMS vendors and pioneered many of the Content Management concepts. Okay, so the products will still be there but you never know how they evolve in the context of a new setup. A lot of attention  is now on Vignette, the other major CMS vendor. I wonder, why is no one talking of Fatwire?

Lee Dallas calls this a consolidation in a different direction. Most others in this space have been with infrastructure vendors or with other related vendor. So in that sense, this brings in a unique differentiation for both these vendors. What could be interesting in this context is what now happens to Autonomy’s relationships with other CMS vendors. Many CMS vendors had integrations and OEM relationships with Autonomy and those will probably get redefined now. Similarly, Interwoven’s partnership with other search vendors (like FAST) will probably also get reviewed.

Even though Autonomy is known more for its search products, it also has offerings for BPM (Cardiff), Records Management (Meridio) and Digital Assets (Virage). So it would be interesting to see how and when overlaps are rationalized with Interwoven’s MediaBin, WorkSite and other related offerings.

In other interesting news this week, Alfresco released the final version of Alfresco 3 Labs, which among other things has Web Studio, a designer tool to build web application. But that is a topic of another post.

Goodbye 2008, Welcome 2009

Okay so another year comes to an end and while we welcome the new year, here’s a look at some of the themes (in a random order) of the year gone by that might have an impact on the Content Technologies next year.

Verticalized Applications

Content Management Systems as horizontal solutions have been there for long and most known vendors provide similar features. The industry however is asking for more domain specific solutions built on standard CMS repositories. Based on this demand and the fact that this provides a differentiation to CMS vendors, I hope to see more and more domain or vertical specific solutions like Loan Origination, Claims Processing and other similar solutions/accelerators from many CMS vendors. Also, with the slowdown in economy, it is easier to sell a domain solution than a pure horizontal solution.

Portal and Content Consolidation

Many enterprises struggle with multitude of applications doing overlapping functionality. Organizations have multiple CMS repositories and many portals. This often leads to duplication of content varied user experience and huge costs. Because of huge cost pressures, many organizations have been considering consolidation of their content applications.

This will lead to following benefits:

  • Reduced Hardware Infrastructure as you don’t need those 5 different ECM repositories
  • Reduced employee costs as you do not need skilled people across 5 different portal servers
  • Standardized processes and hence increased productivity
  • Reduced employee training costs
  • Unified User Experience
  • Reduced Integration, Maintenance and Support Costs

I believe this could be a very important way to reduce and control costs as well as bringing in some standardization. So many organizations would start focused initiatives to consolidate their existing applications.

Open Source

Open Source Content Management and Portal solutions have matured quite a bit. Because of this and the fact that there is cost pressure on everyone, enterprises that would not even consider Open Source solutions are now more favorable towards them. They are becoming open to experimenting with technologies that are generally not considered *enterprisey*.  Many of the open source products are being tracked by waves and quadrants of major analysts and  that reflects a huge change. This is also good for the Open Source vendors because many enterprises use these analysts’ reports for shortlisting.  Many open source products have also released commercial versions and that is another reason that gives these vendors a foot hold within enterprises who did not want to use these citing lack of support options.

Another factor that encourages the use of Open Source products is that people want to quickly build “informal” applications which many commercial products can not do well. There are many popular Open Source (and free) products that do certain things much better.

Although, initial cost could reduce by using Open Source, organizations should carefully look at the impact over a longer horizon and should consider Open Source as another alternative in the market place. They should select Open Source based on overall fitment to their requirements and not just make a decision based on initial licensing cost.

Web 2.0

Widgets and Gadgets have been popular for quite sometime. Some products had gadgets much before portlet spec. I am sure many people have seen examples of counters, ad banners etc which are essentially widgets only. However, there is a considerable interest now in using these within the enterprises for more sophisticated portal like applications.

Currently, most social networking is horizontal – you become a member of a social network, I become one and we write scraps on each other. What next?  I believe Vertical Social Networking is becoming popular.  Some areas where we already see this or have potential are in the areas of Jobs, Real Estate and Classifieds. After all, It is easier to buy an old laptop from a contact’s contact rather than an unknown person who’s advertised in classifieds.

In order to reduce cost, many enterprises, especially those that require product support want to leverage the communities for customer support. They want people to help each other and come to their support only as a last resort. What this means is increasing use of tools that enable collaboration – wikis for example. Many enterprises are using these communities not just for support but also as a way to generate revenues.

Some organizations are also using web 2.0 as a means to Knowledge Management. Instead of regular process oriented KM which forces people to contribute, they want to use mechanisms that encourage people who in turn want to contribute. This is a huge shift – people don’t like contributing if they are forced to do it but are likely to contribute if they enjoy doing it. This also means a shift from “control and process” to “informality and accessibility”.

In spite of all this, I still think how to use Web 2.0 within the enterprise is still not very clear to many organizations and there is a huge scope for improvement. One of the reasons people cite is that workforce is used to applications that became successful on the consumer Internet and want to have same kind of experience for enterprise applications but they need to be very careful. Here’s a nice post by Vilas.

Alternate Delivery Models

There is more acceptance for SaaS based offerings. This is especially true for applications that are not business mission critical. Businesses are experimenting with SaaS based providers because this saves them dependence on their internal IT apart from other benefits like faster time to market, no capital expenditure, low risk and so on. Along with this,  alternate pricing models are also being looked at. Some examples are pay per document, pay per loan, pay per claim etc.


The portlet spec 2.0 or JSR 286 was released. Although the portlet standards (JSR 286 and JSR 168) have been relatively successful in terms of adoption and support, the content repository standard, JSR 170 has not been that popular. Meanwhile, vendors are collaborating on technologies that will help customers reuse existing investments. As an example, many vendors have come up with CMIS. Okay this is not a standard yet but is possibly in that direction. A standard like this is very much needed and hopefully CMIS will achieve what JSR-170/283 did not.

I would also hope that a standard emerges for Gadgets/Widgets.

Site Management and Personalization

Traditionally Content Management was decoupled from Site Management. However, marketing and business people now want more control and there is increasing convergence of Content Management and Site Management. This essentially means better user experience, rich and dynamic sites. This also means features like personalization are making a come back. This has also resulted because of cheap bandwidth and better client side technologies

Document Services

Document Composition and Generation is becoming part of mainstream ECM. There have been a few partnerships as well as mergers in this space. Related terms in this space are Document Output Management and Forms Management.

This was probably the last post of this year. Thanks for reading the blog and here’s wishing you a great year ahead.

Gadgets and Widgets as an alternative to Portlets

A new trend that I am seeing these days is emergence of gadgets (widgets, dashlets, blocklets) and mashups. These basically provide a quick and dirty way to create portal *like* applications. They are light weight and less expensive as compared to your typical portal servers.

iGoogle is probably the biggest example of their success on the consumer internet. For usage within the enterprise, IBM has a Mashups as well as a Widget platform. You develop components using widget factory which is based on portlet factory (erstwhile Bowstreet) and deploy them on the platform that runs on the embedded Websphere application server. Kapow Technologies and quite a few other vendors (including my company) also have a similar offering. Apache’s Shindig is an open source implementation of Open Social and Google’s gadget specification and lets you build iGoogle type applications.

Many customers are considering these for building their next generation of web properties. Many of them have been asking about Google’s and Yahoo’s offerings as well for their usage within the enterprise. The biggest reason is probably the fact that small applications can be relatively quickly built  and mashed up at the client side using light weight technologies based on Web Oriented Architecture (WOA) like REST, RSS etc instead of more involved server side technologies.  A widget can be written in many ways (java, ruby, php,…)  but  a J2EE portal’s portlet is *generally* written in JAVA and that gives more flexibility for bringing in or integrating with non java applications. So potentially, different technologies can co-exist and their functionality exposed via a uniform web interface. You can also integrate a gadget which is actually hosted by a 3rd part provider (like Google) within your environment.

Many Portal servers have been offering the ability to include Google gadgets within the portal server. So essentially they provide a gadget portlet using which you can integrate gadgets. IBM and Liferay both have this capability.

So will these replace portal technologies? I don’t think that is going to happen in the near future and the reason for that is the fact that portal ecosystem is much more evolved and matured as compared to gadgets/widgets. There are certain standards (e.g., JSR 286) that govern the portal world (at least the java portal world) and most portals support that. There are no standards yet in the gadget/widget world and if you really want to use, say a Google gadget within your environment, there would be non trivial issues to take care of. So for example, how do you do an inter gadget communication between your gadget and that hosted by a 3rd party provider? Even though a portlet and a gadget can co-exist within the portal server, getting your portlet to send an event (or talk to a gadget) is a different matter that needs to be addressed (okay –  Google and IBM have cooperated on IBM portlet Google gadget communication but it is still a non-standard way). IBM is working on a specification called iwidget but i do not think any other vendor supports that as yet. Similarly, Google also has a gadget specification.

There are also other issues related to integration with back end applications and more sophisticated personalization that need to be addressed. Till these are addressed, i think both have a place in targeting specific scenarios.