Using Factor Analysis to reduce number of attributes

In my last post on using machine learning for everyday use cases, i’d mentioned factor analysis as a way to reduce large number of items (e.g., news articles’ attributes) into smaller set of variables. Some people asked me for examples of this, so this post is an attempt to explain how factor analysis can be used for what is known as dimensionality reduction.

Issues with large number of attributes

Let’s say you have a list of customers, and you want to analyze some aspect. It’s quite easy to analyse your list if they have a relatively small number of attributes – say 10. What if the number of attributes increases to 20? 100? Sure, manageable. What about 1000 or 10000 or more? or what about attributes that are not obvious (e.g., intention to watch a movie)?

Recall that in a typical machine learning algorithm, these attributes form the input matrix based on which you predict an outcome. So as the number of attributes increase, your algorithm will get computationally expensive plus difficult to program (and debug etc). There are additional issues of overfitting — meaning your machine learning model will fit your training set extremely well but still may not be able to predict that well.

One way to address this would be to group some of the related attributes together and run your algorithm based on that “grouped” attribute as input. Now in some cases, it’s easy to group some attributes because it would be obvious.

For example, let’s say you have attributes that describe a customer’s height and weight. Are they directly proportional to each other? Probably not. But are they correlated? Probably yes. But many of these correlations are not that obvious and there could be underlying patterns that are hidden.

Factor Analysis to reduce number of variables

Factor analysis is a technique to reduce the number of attributes when the relationships between those attributes are not that obvious. Essentially, Factor analysis analyzes interrelationships (or correlations) among a large number of items and reduces the large number of these items into smaller sets of factors. This smaller set of factors can then be used in further analysis — e.g., in logistics regression or neural network to predict your outcome.

Here is another concrete example. This study analysed how social media is used within organizations and came up with a list of 31 activities. These are examples of organisational processes which can benefit by use of social media. Of course, there could actually be many more activities depending on the scenario. The linked post has a chart that shows these activities. Now, if I had to do any analysis, it meant creating a model and analysing the impact on these 31 variables. A factor analysis (actually Principal Component Analysis to be precise) was carried out on these 31 variables and it grouped them into 8 variables. So for example, the factor analysis suggested that following variables from those 31 variables be grouped together:


Fig: Multiple attributes grouped together by factor analysis

You will also probably agree that all these activities appear to be correlated as all of them relate to sales are marketing activities. So instead of analysing all these variables separately, you can thing of “Sales and Marketing” as one factor that encompasses all these 7 different activities (variables). Similarly, other groupings followed similar patten and I ended up with 8 high-level variables which in place of 31 variables.

Okay, so once you have a smaller, more manageable set of attributes, you can then use the grouped variables in your machine learning algorithms for further analyses. This will not only improve the performance but also result in better algorithms and improved predictions. In this study, i eventually used these 8 variables for further analysis using Confirmatory Factor Analysis and SEM. But more about that later.


Some data from my social media survey

In my last post, I provided an overview of how to measure impact of a digital or social media initiative. But before you can measure the impact, you need to be able to create an inventory of what all activities in your organization can use these media.

In this post, let’s look at the findings of a survey I conducted in order to identify how organizations use social media.

First, let’s look at some profile data. There’s a lot more than what’s below, however, i’m sharing only a subset.

Profile of Participants

In terms of verticals or industry segments, the participants’ organizations were spread across all major industries. In fact, “All others” category, shown in the chart below, consisted of more than a quarter of participants and pretty much all industries were represented. However, bulk of participants were from service-oriented industries such as IT (21%), BFSI (8%), Consulting (11%) and Technology (10%).

organizations businessFig 1: Organizations’ primary business

About 37% of participants reported they worked for organizations that have more than 10,000 employees. Another 20% said their organizations employed between 500 to 9999 employees. These are representative of large organizations that typically have several complex processes and challenges.

no of emps

Fig 2: Total number of employees in organization

Usage Analysis

Participants were asked to select activities for which they could use social media. Results for activities that made maximum use of social media are shown below. This figure shows top 15 activities based on number of responses. Most marketing oriented activities figure in this list. That is not surprising. In addition, many use cases from other activities also figure in this list. In particular, social hiring (55%), customer support (53%), and knowledge management (52%) are amongst the most common activities for which social media is used.

social media in value chain

Fig 3: Social media usage in value chain activities: Top 15 activities

Another aspect of usage is in terms of social media tools. Users use these tools for any social media activity and so in addition to measuring usage of social media within activities, this is another way to measure usage. The figure below shows the percentage corresponding to usage of different social media tools. Social networking sites such as Facebook and Linkedin were at the top with 86% participants using them. Facebook’s usage for social media marketing and Linkedin for social hiring are well known use cases, so this result is not unexpected. This was followed by Blogs (64%), Video/Presentation sharing such as YouTube (59%), Instant messaging like WhatsApp (58%), Microblogs such as Twitter (56%) and Document and File Sharing such as Dropbox (55%).

social media tools usage

Fig 4: Social media tools usage

There’s lot more and i’ll share more findings in future posts. Meanwhile, i’m working on a set of tools that uses all this data as a basis for further analysis — something that organizations will be able to use for their own digital initiatives.

PhD update – invitation to participate in my research

As you might be aware, I am currently doing my PhD from the Faculty of Management Studies, University of Delhi. The topic of my PhD is: Social Media for Competitive Advantage. Basically, the idea is to understand usage of Social Media, and map it to Porter’s concept of Competitive Advantage. Here are some details.

I’d like to invite you to participate in my survey.

The objectives of this survey are to understand:

  • How social media is used within organisations?
  • Social media’s impact on an organisation’s competitive advantage.

Your responses will remain confidential. Data from this research will be kept securely and reported only as a collective combined total. No one other than me will know your individual answers to this questionnaire. And if you like, i’d be happy to share my findings.

If you agree to participate in this project, please answer the questions as best you can. It will take approximately 15-20 minutes to complete the survey. Here is the link to survey:

Thank you for your assistance in this important endeavour.

My PhD: Social Media for Competitive Advantage

Its time to press this blog into service once more.

I’ve enrolled for the doctoral programme at Faculty of Management Studies, University of Delhi. The title of my PhD is (tentatively): “Social Media for Competitive Advantage: A Study of Select Indian Organizations”.  This research will focus on studying, analyzing and evolving a model for usage of Social Media technologies for competitive advantage by businesses, specifically within the Indian context.

That’s a bit vague right now but my hope is that I will be able to bring out how usage of social media can be a source of competitive advantage for organizations. As part of this research, I will also work on evolving a model — sort of a maturity model, but really detailed and based on interviews.  This research model will map sources of competitive advantage (probably based on Porter’s framework) against maturity levels and dimensions. The model will eventually describe key characteristics for each level and what an organization needs to do to go from one level to another in terms of capabilities required for using Social Media to achieve competitive advantage.

The best way to research social media is to use social media and so i’ll be posting updates here in the hope of getting feedback and insights. Please leave a comment below or email me (apoorvdurga at gmail) if you would like to share how you use social media within your organization.

The best part of doing research in this area (social media) is that I won’t feel guilty about spending time on Facebook :).

New topic for this blog

For a while, i’ve only been cross posting blogs from the Real Story Group blog on this blog. That’s going to change a bit now.

I’ve enrolled in the Doctoral Programme at the  Faculty of Management Studies, University of Delhi. My Ph.D. topic is around Social Media usage in Indian organisations and I plan to use social media, including this blog, to get inputs, bounce off ideas and carry out research.

So i’ll regularly post excerpts from my research and will hopefully get good feedback here. Wish me luck!