1. Introduction
Over the last decade in Bangladesh, we have witnessed dramatic
changes in the socio-economic, political and educational dynamics.
In terms of education and subsequent employment, today, we see more
people completing tertiary level of education that was unimaginable
ever before. But finishing college does not automatically lead you
to your dream job. College education has additional costs. These
costs can range from the tuition fee to foregone opportunity of
employment elsewhere or the psychic cost. When an individual
perceives a return higher from college education over all these
costs; only then he/she will pursue it. This perceived gain is most
commonly a good job. However, in the recent times, we see hundreds
of students struggling to find a job even with degrees from reputed
colleges. It takes more than a year on average for a fresh graduate
to land on their first full time employment; a place to pursue what
you can call a career.
Before it was thought that it is a good CGPA that gets you a
good job. Things have changed over the years. Today just a good
CGPA is not enough. Hundreds of students graduate with a good CGPA,
competition amongst these graduates are way more complex than what
just a good grade can settle. Employers have now opportunity to
choose from a large population. They are making their screening and
filtering process more rigorous and robust. Even excellent grades
will not guarantee you your first job. Number of things now plays a
combined role in determining your fit with an organization. These
factors can be job experience as a student; shows that you are a
proactive, responsible and hardworking individual. If the job is
relevant then it means the organization have to spend less to
orient you in. Extra-curricular activities also have positive
impact on employers. It shows skill, focus, integrity and passion,
all very essential for a successful person. Most students
now-a-days do tuition for school children. That is also a good sign
of your positive attitude towards life and work. Sports, active
participation in university academic clubs, skill development
programs like photography, painting etc. also earn an added edge to
your resume over the others. In all, it is just not a transcript
that is going to impress the employer today, they need more to
believe that you may be fit to be part of an organization.
1.1. Origin of the Report
BUS 511 is a statistics course offered in the MBA program of NSU
in order to equip students with the statistical tools. The project
was initiated so that the students would get a practical exposure
of statistical analysis in a project work. Different types of
statistical tools were used in this project to find out the
results.
1.2. Objective of the study
The objective is to understand whether any statistical
relationship exists between working part time during undergraduate
studies and starting salaries when students graduate and make it
into the workforce.
1.3. Significance of the study
The study would be beneficial for students who want to determine
whether having previous work experience will influence their
starting salaries, once they enter their professional lives. Hence
if we find that having prior work experience has no statistical
relationship with starting salaries we would recommend students
concentrate more on their studies and keep their grades over, if
however we do find a statistical significance, then we will
recommend to students to take up some form of employment in order
to enhance their CVs.
2. Variables
The following table provides an overview of the variables that
were considered in this study.
In all of the cases we had to provide close ended options, as
the majority of respondents did not want to reveal their exact
starting salaries and as the other variables were mostly
qualitative.
3. Methodology
For the purposes of this study, all the data that were collect
was primary and was collected via one-on-one interviews with the
respondents.
For this study, all the respondents were full time employees of
companies/organizations from various sectors such as banking,
retail, FMCG and so on.
To ensure more accurate results, we have tried to insure all the
respondents were engaged in their undergraduate programs during
similar time periods (within last 4 years).
We had initially surveyed over 80 respondents, but due
respondents not willing to fully cooperate or giving wrong
information our sample size for the research were limited to 47
respondents.
As the data was mainly quantitative, we therefore had to code
the data and hence opted to use the statistical analysis software
SPSS to analyze the data.
A variety of descriptive statistics in the form of histograms,
bar graphs and pie charts were used to present and describe the
collected data, inferential statistics such as different hypothesis
testing techniques were used in order to analyze and understand
relationships between different variables.
4. Questionnaire
The following questionnaire was used to conduct the study.
5. Data Sheet
The following image shows the data view on SPSS.
The following image shows the variable view on SPSS.
6. Limitations
During the course of collecting and analyzing the data we faced
the following limitations:
Respondents not willing to disclose salary
Small sampling size
Sampling errors due to inaccurate recording of information or
false information given by respondents
Lack of interests of respondents many respondents left questions
unanswered and hence their questionnaire was rejected.
Time limitation
Budget constrains
7. Descriptive Statistics
Descriptive statistics are used to describe the basic features
of the data in a study. They provide simple summaries about the
sample and the measures. Descriptive statistics is the term given
to the analysis of data that helps describe, show or summarize data
in a meaningful way such that, for example, patterns might emerge
from the data. Descriptive statistics do not, however, allow us to
make conclusions beyond the data we have analyzed or reach
conclusions regarding any hypotheses we might have made. They are
simply a way to describe our data.
Descriptive statistics are very important because if we simply
presented our raw data it would be hard to visualize what the data
was showing, especially if there was a lot of it. Descriptive
statistics therefore enables us to present the data in a more
meaningful way, which allows simpler interpretation of the data.
For example, if we had the results of 100 pieces of students'
coursework, we may be interested in the overall performance of
those students. We would also be interested in the distribution or
spread of the marks. Descriptive statistics allow us to do this.
Typically, there are two general types of statistic that are used
to describe data:
Measures of central tendency:these are ways of describing the
central position of a frequency distribution for a group of data.
In this case, the frequency distribution is simply the distribution
and pattern of marks scored by the 100 students from the lowest to
the highest. We can describe this central position using a number
of statistics, including the mode, median, and mean.
Measures of spread:these are ways of summarizing a group of data
by describing how spread out the scores are. For example, the mean
score of our 100 students may be 65 out of 100. However, not all
students will have scored 65 marks. Rather, their scores will be
spread out. Some will be lower and others higher. Measures of
spread help us to summarize how spread out these scores are. To
describe this spread, a number of statistics are available to us,
including the range, quartiles, absolute deviation, variance
andstandard deviation.
For our study we have employed the use of histograms, pie charts
and contingency tables.
Histograms - A histogram is a graphical representation of the
distribution of data. It is an estimate of the probability
distribution of a continuous variable. A histogram is a
representation of tabulated frequencies, shown as adjacent
rectangles, erected over discrete intervals, with an area
proportional to the frequency of the observations in the interval.
The total area of the histogram is equal to the number of data.
Pie Chart - Apie chart(or acircle chart) is a
circularstatistical graphic, which is divided intosectorsto
illustrate numerical proportion. In a pie chart, thearc lengthof
each sector (and consequently itscentral angleandarea),
isproportionalto the quantity it represents.
Contingency Table - A contingency table is essentially a display
format used to analyse and record the relationship between two or
more categorical variables. It is the categorical equivalent of the
scatterplot used to analyze the relationship between two continuous
variables.
For our study we used descriptive statistics to summarize our
findings for gender and previous work experience and have also
looked at their relationship to starting salary.
7.1. Starting Salary
From the above information we observe the following:
Mean Salary range : Tk. 20k 30k
Only 6/47 respondents had salaries below Tk. 10k
7.2. Gender
The sample consists of an almost equal amount of male and female
respondents
7.3. Gender Vs. Starting Salary
Higher number females in the sample had higher salaries (more
than Tk. 30k). More males had salaries in the Tk.20 k Tk. 30k
range.
7.4. Employment while pursuing their under graduate degree
Almost 66% of the respondents were involved in some kind of
employment during under graduation.
7.5. Employment during UG vs. Starting Salary
Only respondents who worked as students had salaries of over
Tk.30k. More respondents who had previous experience had starting
salaries in the Tk.20k 30k range.
7.6. Extracurricular activity while pursuing under graduate
degree
Almost 53% of the respondents were not involved in any kind of
extracurricular activity during under graduation.
7.7. Extracurricular activity vs. Starting Salary
No respondents who were involved in extracurricular activities
had salaries less than Tk. 10k. Respondents who answered positively
had the same number of people who earned over Tk. 30k as those who
answered negatively.
8. Inferential Statistics
We have seen that descriptive statistics provide information
about our immediate group of data. For example, we could calculate
the mean and standard deviation of the exam marks for the 100
students and this could provide valuable information about this
group of 100 students. Any group of data like this, which includes
all the data you are interested in, is called a population. A
population can be small or large, as long as it includes all the
data you are interested in. For example, if you were only
interested in the exam marks of 100 students, the 100 students
would represent your population. Descriptive statistics are applied
to populations, and the properties of populations, like the mean or
standard deviation, are called parameters as they represent the
whole population (i.e., everybody you are interested in).
Often, however, you do not have access to the whole population
you are interested in investigating, but only a limited number of
data instead. For example, you might be interested in the exam
marks of all students in the UK. It is not feasible to measure all
exam marks of all students in the whole of the UK so you have to
measure a smaller sample of students (e.g., 100 students), which
are used to represent the larger population of all UK students.
Properties of samples, such as the mean or standard deviation, are
not called parameters, but statistics. Inferential statistics are
techniques that allow us to use these samples to make
generalizations about the populations from which the samples were
drawn. It is, therefore, important that the sample accurately
represents the population. The process of achieving this is called
sampling (sampling strategies are discussed in detail here on our
sister site). Inferential statistics arise out of the fact that
sampling naturally incurs sampling error and thus a sample is not
expected to perfectly represent the population. The methods of
inferential statistics are (1) the estimation of parameter(s) and
(2) testing of statistical hypotheses.
For this study we used descriptive statistics to summarize our
findings for many analysis such as for checking if there was a
difference in salaries in males and females in the survey, proving
the relation relates to the amount of first salary.
8.1. Two Sample t test for Comparing Two Means
At-test's statisticalsignificance indicates whether or not the
difference between two groups' averages most likely reflects a real
difference in the population from which the groups were
sampled.
Hypothesis test
Formula:
Whereandare the means of the two samples (1.83 and 1.52), is the
hypothesized difference between the population means (0 if testing
for equal means),s1ands2are the standard deviations of the two
samples (1.049 and 0.846), andn1andn2are the sizes of the two
samples (24 and 23). The number of degrees of freedom for the
problem is the smaller ofn1 1 andn2 1.
To determine whether the male and female starting salary of the
survey are equal or not. Two randomly chosen groups were surveyed
separately and then administered proficiency tests. Use a
significance level of < 0.05.
Let malerepresent the mean male starting salary of the group and
female represents the female starting salary mean for the
group.
8.2. Are the mean starting salaries of females same as those for
men?
H0: female = male
H1: female male
Significance level = 0.05
Test statistic: As samples are from independent populations and
variance can be assumed to equal, we need to use two same T
-test
Rejection region - .250 to .873
T calculated 1.118
Decision Reject H0 as T calculated > T critical
Conclusion: Mean starting salary for females are not the same as
for males
9. Chi Square Statistic
A measurement of how expectations compare to results. The data
used in calculating a chi square statistic must be random, raw,
mutually exclusive, drawn from independent variables and be drawn
from a large enough sample. For example, the results of tossing a
coin 100 times would meet these criteria.
As a simple example of how to calculate and use the chi square
statistic, consider tossing a coin 100 times. The expected result
of tossing a fair coin 100 times is that heads will come up 50
times and tails will come up 50 times. The actual result might be
that heads comes up 45 times and tails comes up 55 times. The chi
square statistic will show any discrepancies between the expected
results and the actual results.
Calculating the test-statistic
The value of the test-statistic is
Where
= Pearson's cumulative test statistic, which asymptotically
approaches adistribution.
= an observed frequency;
= an expected (theoretical) frequency, asserted by the null
hypothesis;
= the number of cells in the table.
The chi-squared statistic can then be used to calculate
ap-valuebycomparing the value of the statisticto achi-squared
distribution. The number ofdegrees of freedomis equal to the number
of cells, minus the reduction in degrees of freedom.
The result about the numbers of degrees of freedom is valid when
the original data are multinomial and hence the estimated
parameters are efficient for minimizing the chi-squared statistic.
More generally however, when maximum likelihood estimation does not
coincide with minimum chi-squared estimation, the distribution will
lie somewhere between a chi-squared distribution withanddegrees of
freedom
9.1. Is there a difference in starting salaries in between the
surveys?
H0: There is a difference in salary between the surveys
H1: No difference in starting salary between surveys
Significance level = 0.01
Test statistic: It is the Chi-square distribution
Chi Cal 6.53
Chi Critical 13.27
Decision Reject alternate hypothesis as Chi Critical > Chi
Calculated
Conclusion: There is a difference in starting salaries between
the surveys
10. T-Tests
10.1. Starting Salary vs. did employer ask for work
experience?
H0: No relation between Starting Salary and Did Employer ask for
work experience
H1: Relation exists between Starting Salary and Did Employer ask
for work experience
Confidence Interval 95% and 0.05 Level of Significance the
Tcritical value is 2.013
Tcalculated= 2.723
Since Tcalculated> Tcritical we reject the Null
Hypothesis
Conclusion: Relation exists between Starting Salary and Did
Employer ask for work experience
10.2. Starting Salary vs. Extracurricular activities
H0: No relation between Starting Salary and Extracurricular
Activities
H1: Relation exists between Starting Salary and Extracurricular
Activities
Confidence Interval 95% and 0.05 Level of Significance the
Tcritical value is 2.013
Tcalculated= 9.562
Since Tcalculated> Tcritical we reject the Null
Hypothesis
Conclusion: Relation exists between Starting Salary and
Extracurricular Activities
10.3. Starting Salary vs. Gender
H0: No relation between Starting Salary and Gender
H1: Relation exists between Starting Salary and Gender
Confidence Interval 95% and 0.05 Level of Significance the
Tcritical value is 2.013
Tcalulated= 1.318
Since Tcalculated< Tcritical we accept the Null
Hypothesis
Conclusion: No relation between Starting Salary and Gender
10.4. Starting Salary vs. Duration of work during under
graduation
H0: No relation between Starting Salary and Duration of work
during under graduation
H1: Relation exists between Starting Salary and Duration of work
during under graduation
Confidence Interval 95% and 0.05 Level of Significance the
Tcritical value is 2.013
Tcal= 1.754
Since Tcalculated< Tcritical we accept the Null
Hypothesis
Conclusion: No relation between Starting Salary and Duration of
work during under graduation
11. Regression Model
Instatistics,regression analysisis a statistical process for
estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the
focus is on the relationship between adependent variableand one or
more independent. More specifically, regression analysis helps one
understand how the typical value of the dependent variable (or
'criterion variable') changes when any one of the independent
variables is varied, while the other independent variables are held
fixed. Most commonly, regression analysis estimates theconditional
expectationof the dependent variable given the independent
variables that is, thevalue of the dependent variable when the
independent variables are fixed. Less commonly, the focus is on
aquartile, or otherlocation parameterof the conditional
distribution of the dependent variable given the independent
variables. In all cases, the estimation target is functionof the
independent variables called theregression function. In regression
analysis, it is also of interest to characterize the variation of
the dependent variable around the regression function which can be
described by aprobability distribution.
Regression analysis is widely used forpredictionandforecasting,
where its use has substantial overlap with the field ofmachine
learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent
variable, and to explore the forms of these relationships. In
restricted circumstances, regression analysis can be used to
infercausal relationshipsbetween the independent and dependent
variables. However this can lead to illusions or false
relationships, so caution is advisable, for example,
correlation.
Many techniques for carrying out regression analysis have been
developed. Familiar methods such aslinear regressionandordinary
least squaresregression areparametric, in that the regression
function is defined in terms of a finite number of
unknownparametersthat are estimated from thedata.Nonparametric
regressionrefers to techniques that allow the regression function
to lie in a specified set offunctions, which may
beinfinite-dimensional.
The performance of regression analysis methods in practice
depends on the form of thedata generating process, and how it
relates to the regression approach being used. Since the true form
of the data-generating process is generally not known, regression
analysis often depends to some extent on making assumptions about
this process. These assumptions are sometimes testable if a
sufficient quantity of data is available. Regression models for
prediction are often useful even when the assumptions are
moderately violated, although they may not perform optimally.
However, in many applications, especially with small effectsor
questions ofcausalitybased onobservational data, regression methods
can give misleading results.
11.1. The Regression Model for this study
Dependent variable was starting salary
Equation of Regression Model:
Starting Salary = gender) + Worked during undergraduate degree)
+Duration of work during under graduation) + + (Extracurricular
activities during under graduation) +
Starting Salary = gender) + Worked during undergraduate degree)
+Duration of work during under graduation) + + (Extracurricular
activities during under graduation) ++
Conclusion: We got the R value as 0.729 which indicates a direct
relationship and we got R2 value as 0.53 which means that the
result is moderately related
12. Conclusion
Based on the multiple analysis techniques conducted through this
research we have found significant relationships between starting
salaries and whether a respondent has worked during the under
graduate studies, as could be an indicator that employers prefer
hiring prospects who do have basic knowledge and understanding
about the way an organization or the way the industry works.
Surprisingly we failed to find any relationship between salary and
the duration of time a respondent had worked during under
graduation studies, this could mean that employers only care if
prospects possess some basic knowledge and do not judge entry level
candidates based on the duration of previous work experience. We
also failed to find any relationship between starting salaries and
gender, which implies the respondents were all working in more
progressive office with no gender discrimination.
From the regression model we obtained an R-square value of 0.513
which indicates the neither is the model very good at explaining
the variations in starting salaries nor is it very weak. This
implies that are other factors that affect starting salaries, some
of which could be negotiation skills during interview, how well the
interview went, and entrance test scores, or if the aspiring had
any other trainings or certifications.
Had we been able to factor in more of these other variables, and
conducted a survey with a larger and more cooperate sample of
respondents, the model would have been more accurate.
13. References
1. http://www.socialresearchmethods.net/kb/statinf.php
2.
https://statistics.laerd.com/statistical-guides/descriptive-inferential-statistics.php
3. http://math.hws.edu/javamath/ryan/ChiSquare.html
4.
http://www.investopedia.com/terms/c/chi-square-statistic.asp
5.
http://www.cliffsnotes.com/math/statistics/univariate-inferential-tests/one-sample-t-test
6. Armstrong, J. Scott (2012). "Illusions in Regression
Analysis". International Journal of Forecasting
7. David A. Freedman, Statistical Models: Theory and Practice,
Cambridge University Press (2005)
8. R. Dennis Cook; Sanford Weisberg Criticism and Influence
Analysis in Regression, Sociological Methodology, Vol. 13. (1982),
pp. 313361
27 | Page