Market Research on Bollywood Movies Success Prediction Modelling Submitted To Dr. Atanu Adhikari Marketing Management II Course I.I.M. Kozhikod eBy Bharat Subramony PGP/16/012 Gunveer Singh PGP/16/019 Ranjan Sharma PGP/16/040 Rohit Singla PGP/16/043 Utkarsh Rastogi PGP/16/056
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A journey is easier when you travel together. Interdependence is certainly more valuable than
independence. This report on Market Research on Bollywood Movies and Modelling a
Success Prediction System is the result of work whereby we have been accompanied and
supported by many people. It is a pleasant aspect that we have now the opportunity to express
our gratitude for all of them for their valuable guidance, for devoting their precious time,
sharing their knowledge and their co-operation throughout the course of development of our
project idea and the academic years of education.
With immense pleasure we express our sincere gratitude, regards and thanks to our projectguides Prof. Atanu Adhikari for their excellent guidance, invaluable suggestions and
continuous encouragement at all the stages of our project work. We would like to thank the
staff of Crown Theatre for cooperating and assisting us in conducting our market research.
We would also like to thank the participants of Focus Group Discussions and interviews. And
finally we thank God Almighty for all that he has endowed us with and his blessings.
Bollywood is the informal term popularly used for the Hindi-language film industry based
in Mumbai, Maharashtra, India. Bollywood churns out around 800 movies every year.
While some movies end up into blockbusters, some fail miserably at the box office. The
increased emergence of educated middle class renders Bollywood movies into open
competition not only with other Bollywood movies but Hollywood too. It is one of the
largest employment generating industries of the Indian economy. Today, the growth of this
industry is quite phenomenal with the changing preferences of movie-goers and filmmakers. Some of the Bollywood movies involve funds running into millions of dollars. There is a lot
of fortune at stake in the performance of movies at the box office.
With this in mind, this project steers to first understand the viewer‟s perspectives about the
bollywood industry. There are several factors that lead to a person liking or disliking a
movie. It could range from the traumatic experience in ticket queue, to a bad or noisy
neighbour in the cinema hall, or even mal-functioning air-conditioning system. So how do
we arrive at conclusive factors? The fact remains that there are much more factors affecting
the success of a movie, than just these nimble parameters. There have been cases of movies
like Sholay, which were initially declared a flop, simply because it was ahead of its time,and then in a matter of 6 months, it was a blockbuster. Or take the case of a superlow
budget movie called Stanley Ka Dabba, which bombed in box office, and won several
critical acclaims. Bollywood is nothing if not unpredictable
In this project, we first use exploratory research techniques such as Focus Group
Discussion, Group Interviews, and survey questionnaires, to absorb the opinion of the
viewers. Based on the respondent‟s perspective about the movie, we shall try to map the
attributes to the success or failure, as perceived by the producer and the viewer. This project
has the potential to advise leading production houses and film-makers about what are the
basic dos and donts to guarantee success and avoid rapid stagnation of revenues from post-
The questionnaire was floated as mentioned above to a convenient sample. A total of 60
responses were received from respondents within Indian Institute of Management Kozhikode.
However this data could not directly be used for statistical analysis. Hence the collected data
is cleaned in the following steps –
Coding
Each of the questions from the questionnaire was assigned a specific code. Also the responses
were given specific code for the ease of analysis in SPSS statistical tool. E.g. Questions based
on Likert scale i.e. questions asking their behavior towards a particular situation based on
nine parameters had five options to choose from viz. “Most preferable” to “Least Preferable”.“Most Preferable” option was assigned value of 1 and subsequently “Least Preferable” was
given value of 5. Then to develop the data further, the data file is downloaded in EXCEL
format. The sheet contains each response in a separate row.
Data Cleaning
Out of the 60 responses that were collected, 2 responses were not complete and the responses
were not given properly. Hence those 2 responses were deleted from the data file. Then
consistency checks were also performed on the data file. This included checking the
responses for extreme values and checking the logical consistency of the responses. Also
missing responses were substituted with neutral answer values. This cleaned data was used
for performing descriptive analysis using EXCEL software as shown later.
Importing data in SPSS
After cleaning the data, the data analysis strategy was formulated. Then the EXCEL data file
was imported into SPSS software. The data was again checked for logical consistency.
This revenue score RevScore can then be scaled using the following table to evalute the
expected gross revenue.
RevScore Gross Revenue multiplier
5 or less Max 1.5 times
6 1.5 - 2 times
7 2 – 2.5 times
8 2.5 – 3.5 times
9 3.5 - 4.5 times
10 4.5 + times
The gross revenue can be estimated by
To the last part of our analysis, apart from the gross sales and intial sales, what matters is thatthe movie should resound in viewer‟s mind, which is usually meared by rating movies
themselves, as in, movies which are highly regarded by viewers will be seen again and again,
or even through sale of movie dvd or online copies, which again add to revenue of production
house. This is measured by a proxy called Score, for viewer‟s opinion.
Now, we proceed to perform the K-means cluster analysis, to determin the smaples in each
cluster, and the distances between the clusters. The result of K-means clustering is as follow
Final Cluster Centers
Cluster
1 2 3
Star-cast influence 4 4 3
Production house 3 4 2
Director influence 3 4 4
Music Influence 4 3 3
Item Songs 3 2 2
Review influence 3 2 3
Indifference to
rating3 2 3
Controversies 4 2 2
A-rating 3 3 2
ANOVA
Cluster Error F Sig.
Mean Square df Mean Square df
Star-cast influence 5.753 2 .678 55 8.490 .001
Production house 24.164 2 .630 55 38.351 .000Director influence 6.610 2 .683 55 9.681 .000
Music Influence 3.002 2 .844 55 3.557 .035
Item Songs 17.795 2 .744 55 23.924 .000
Review influence 3.748 2 1.209 55 3.100 .053
Indifference to rating 1.006 2 .878 55 1.146 .325
Controversies 15.887 2 1.051 55 15.115 .000
A-rating 4.596 2 1.162 55 3.956 .025
The F tests should be used only for descriptive purposes because the clusters have been
chosen to maximize the differences among cases in different clusters. The observedsignificance levels are not corrected for this and thus cannot be interpreted as tests of the
1. From the above descriptive analysis, we can infer that, a majority of the sample,
assuming it to be a representative of the population, is of a gossip-driven generation,and is interested in movies, that can provide with the „tadka‟ for spending time.
2. The aim of the focus group discussion was to first identify the gamut of factors that
could possibly influence the decision of the people to go and watch a movie, once, or
maybe even multiple number of times. Based on the findings of the FGD, the
independent variables were listed down.
3. Many moviegoers are driven by the special promotion activities and the television
coverage, they seek.
4. The review of critics and peer-to-peer rating also plays a crucial role, which can be
seen by way of the linear coefficient in the regression equation.
5. From the focus-group discussions, it became clear that, most people preferred to wait
for the reviews and then go watch it.
6. By way of questionnaire, we could confirm that, off-late the trend for movie making
is shifting from originality of content, to roping in elite stars, and putting together an
item number, and paying big bucks to reserve lot many screens, to push the movie in
cinema halls.
7. While collecting the data from theatre going crowd, through card-samples, we noticed
the willingness of the moviegoers to contribute in every possible way, to help make
the movie watching experience better.
8. From the literature review, it came to our notice that, most film-makers were now
shifting focus, from innovative stories, and directions, to making sequels of a already
established brand, thus handing over to new script writers, a better chance of
succeeding even with a low budget movie.
9. Secondary data collection revealed the variety of ways used by film producers to push
the film into cinema halls, and bring larger crowd to see it and break even.
10. The simple linear regression failed to reveal the hidden tendency of the gross revenue
and viewer opinion‟s variation with respect to the critic rating, which was better
explained in dummy variable regression in Conjoint Analysis.
11. The dummy variable regression for conjoint analysis of the said modeling, gave us
some interesting insights. Rajiv Masand‟s critique of a movie, as more severe impact,
1. We were able to conduct only one insightful focus group discussion, even after three
sittings, which has led to some eastage of time, energy and valuable analysis2. The assorted questionnaire was distributed through social networking sites like the
Facebook and twitter. Hence this mode of convenience sampling has not covered all
aspects of demographic variables, due to which factor analysis failed.
3. To some extent, the survey respondents provided skewed responses, because most
were from one particular area of India, or from certain lifestyle, with preconceived
notions and tastes and preferences.
4. The sample of movies collected for performing conjoint analysis was only 74, which
is very small, to have an accurate model and significance. Similarly the survey
responses were too small in numbers, for any hard conclusive inferences.
5. Detailed cluster analysis could not be performed as respondent were mainly from the
within an age group of 20 to 26.
6. On account of constraints on time and efficiency, we could not incorporate the details
for movies starting from 2005, which witnessed the shift in Bollywood paradigm.
7. Most respondents refrained from giving a perfect score to films due to the inherent
bias, of having seen a better film, or out of stigma, of being considered as a person of
poor taste in movies.
Future Research
1. With further inputs and historical aggregation of data, from all possible movies, it
should be possible to model a better and more accurate system, with dynamic data
handling and updating capabilities.2. Further the time value of money needs to be taken into consideration when looking at
revenues earned by the film-makers
3. Trends in viewer behaviors have to be monitored and analyzed, for propor modeling
by integration of movies across the horizon.
4. Better questionnaire tapping into demographic variables and additional independent
variables will help to create more reasonable, accurate and sustainable model, with
much larger scope than just predicting, but also tracking variable changes and their