A New Approach for Comparing Means of Two Populations By Brad Moss And Sponsored by Dr. Chand Chauhan
Jan 17, 2016
A New Approach for Comparing Means of Two Populations
By Brad MossAnd Sponsored by Dr. Chand Chauhan
The Problem
• The research that is being presented is about a current problem in statistics.
• The problem is how do you compare two means of different populations when the variance of each population is unknown and unequal.
• There are many ways to deal with this problem. We will compare a new method to one of the other methods during this presentation.
The Idea!
• After consulting with Dr. Chauhan and reading “A Note on the Ratio of Two Normally Distributed Variables”
• It was decided that we could try to get around the unknown/unequal variance problem by taking a ratio of the estimated means which has not been done before.
• We would approximate this ratio as a normal distribution and from that, decide whether or not the means are different.
A Context for the Idea
• Let’s say we are employees of “Get Better Drug Company” and that we are in the process of developing a drug for weight loss.
• Our scientists have developed two drugs, A and B.
• The company can only mass produce one of these drugs.
Things We Will Want to Know
• Overall, what is the mean weight loss for people taking each drug?
• Is one drug more effective than the other?
How To Answer
• First we approximate the means of the two samples with averages known as and .
• Then we compare the means by either taking the ratio of the averages or we take the difference of the averages.
• Lets’ consider the ratio of the averages which is what my research is about.
How this Works
• A ratio (otherwise know as a fraction) of the two means should be one or close to one.
• How close to 1 is close enough?• For this, we create what is know as a
Confidence Interval using the estimated ratio and an estimate of the variance.
• If 1 falls into this interval, we will say that the means are statistically the same.
The Ratio Can Be Approximated by a Normal Distribution.
• According to the paper mentioned on slide 3, a ratio of two normally distributed random variables can be approximated as a normal distribution.
• This happens when the standard deviation of one of the random variables is significantly smaller than the other.
• The ratio of the standard deviations should exceed 19 to 9 for this to work assuming the means are equal.
How Can We Use This
• Since the ratio of two normally distributed random variables is approximately normal.
• And the average values of normally distributed variables are normally distributed as well (see Theorem 8, page 187 in “An Introduction to Probability and Statistical Inference”)
• That is if Y and X are normally distributed, then and are normally distributed.
How Can We Use This
• The last fact is important, because we don’t know our true mean, so we approximate our true mean with the averages.
• According to the paper, the mean of is the following: when Y and X are independent.
• CVx is the ratio of the standard deviation of x to its mean. The mu’s stand for the true mean.
How Can We Use This
• The variance of is: • Once again we are assuming Y and X are
independent.• So, now we want to replace with the ratio of
our approximated means
How Does that Change Things
• Well according to statistics. We see that and and
• Where E() stands for expected value of (…) and Var stands for Variance of (…)
• The n’s stand for sample sizes with the subscript referencing to which sample they belong.
Now Substituting
• )• Since the standard deviation of X is small, then
CVx is small and is being divided by a bigger number.
• Therefore we can ignore in the approximate mean. Thus
The Confidence Interval!
• Now we need to determine how close to 1 do we have to be in order to say that the means are the same.
• So we will create an interval around our approximate mean.
• In statistics, we convert our normal values back to standard normal by subtracting the mean and then dividing by the standard deviation.
The Confidence Interval!
• So • Now we want to choose two numbers “a” and
“b” on the real line such that the probability of is 95%
The Confidence Interval!
• Since we are working with a standard normal distribution, we will choose “a” and “b” to be -1.96 and 1.96 because in a standard normal, =.95
• so,
Doing Some Algebra
• After some Algebra we get: • Thus the end points of the Interval are:• and
Results
• So, given those two endpoints, if 1 is between them, then the means are statistically the same.
• Otherwise the means are different.
Results
• Using a program called Minitab, I ran 5000 simulations for several cases and the results are on the following slides.
Case 1: Equal Sample Sizes
• Y~N(100,10), X~N(100,0.5), Sample Size 30• Confidence Interval of 94.04%• Mean Length: 0.0711821 • Y~N(100,10), X~N(100,2), Sample Size 30• Confidence Interval of94.42%• Mean Length: 0.0726839
• Y~N(100,10), X~N(100,5), Sample Size 30• Confidence Interval of 94.54%• Mean Length: 0.0795576
Case 2: Y has smaller sample size
• Y~N(100,10) Sample Size 15, X~N(100,0.5) Sample Size 20• Confidence Interval of 93.34%• Mean Length: 0.100127
• Y~N(100,10) Sample size 15, X~N(100,2) Sample Size 20• Confidence Interval of 92.88%• Mean Length: 0.101451
• Y~N(100,10) Sample Size 15, X~N(100,5) Sample Size 20• Confidence Interval of 93.68%• Mean Length: 0.108929
Case 3: X has smaller sample size
• Y~N(100,10) Sample Size 20, X~N(100,0.5) Sample Size 15• Confidence Interval of 93.8%• Mean Length: 0.0869802
• Y~N(100,10) Sample Size 20, X~N(100,2) Sample Size 15• Confidence Interval of 93.56%• Mean Length: .0891369
• Y~N(100,10) Sample Size 20, X~N(100,5) Sample Size 15• Confidence Interval of 93.84• Mean Length: 0.100611
Sources• Jack Hayya, Donald Armstrong, and Nicolas Gressis. “A Note on the Ratio
of Two Normally Distributed Variables” Management Science 21.11 (1975). 14 Jan 2011 < http://www.jstor.org/stable/2629897 >
• Roussas, George. An Introduction To Probability and Statistical Inference. San Diego: Elsevier Science, 2003