Moral Responsibility Free Will - Florian Cova

8/10/2019 Moral Responsibility Free Will - Florian Cova

1/35

Free will meta-analysis 1

RUNNING HEAD: FREE WILL META-ANALYSIS

Moral Responsibility and Free Will: A Meta-Analysis

Adam FeltzMichigan Technological University

And

Florian CovaSwiss Centre for Affective Sciences, University of Geneva

Word Count: 7,717 (excluding notes, abstract, and references)

Address Correspondence to:

Adam Feltz1400 Townsend DriveDepartment of Cognitive and Learning SciencesMichigan Technological UniversityHoughton, MI [email protected]


2/35


Abstract

Fundamental beliefs about free will and moral responsibility are often thought to shape our

ability to have healthy relationships with others and ourselves. Emotional reactions have also

been shown to have an important and pervasive impact on judgments and behaviors. Recent

research suggests that emotional reactions play a prominent role in judgments about free will,

influencing judgments about determinisms relation to free will and moral responsibility.

However, the extent to which affect influences these judgments is unclear. We conducted a meta-

analysis to estimate the impact of affect. Our meta-analysis indicates that beliefs in free will are

largely robust to emotional reactions.


3/35


Many philosophers and psychologists hold that at least a minimal belief in free will is required

for us to have healthy relationships with others and ourselves. Free will may be necessary for

autonomy, creativity, desert, reactive attitudes, dignity, love, and friendship (Kane, 1996).

However, recent advances in psychology and neuroscience may pose some threats to a belief in

free will. This research suggests that many people appear to be unaware of some of the

neurological antecedents of their behavior (Bargh & Ferguson, 2000; Wegner & Wheatley, 1999;

Wegner, 2002; Wegner & Bargh, 1998; Libet, 1985). One worry is that if these results were to

become widely assimilated, then a belief in free will would be diminished and the desirable

behaviors associated with a belief in free will would also disappear or be dramatically reduced.

For example, in the absence of belief in free will, we may have difficulty maintaining

meaningful relationships with others and interpersonal conflicts may become more common

(Kane, 1996). Empirical research supports these worries to some extent suggesting that beliefs infree will are linked to judgments about punishment (Rakos, Laurene, Skala, & Slane, 2008;

Carey & Paulhus, 2013). Moreover, belief in free will has been argued to be an important factor

for many commonly desirable behaviors such as refraining from cheating, self-control, and job

performance (Vohs and Schooler, 2008; Baumeister et al, 2007; Baumeister, Masicampo, &

DeWall, 2009; Stillman et al, 2010) and has been shown to influence some of the neurological

antecedents of behavior mentioned above (Rigoni et al., 2011). For these reasons, some take it

that belief in free will is so important and engrained that if we were to find out that people really

are not free or morally responsible, we should leave people to their mistaken beliefs (Smilansky,

2002). To disabuse people of their mistaken belief would create a world where nobody has any

of these things.

However, the extent to which advances in neuroscience and psychology call free will into

question or impact everyday conceptions of free will is still an open question (Roskies, 2006;

Mele, 2006, 2013). Belief in free will may be so engrained that it will be incredibly hard to

dislodge even in the face of extraordinary threats (Feltz, 2013; Feltz & Millan, in press). For

example, it appears as if many of the troubling findings from neuroscience have already been

assimilated in portions of the population. But this assimilation has not led to a reduction in

beliefs in the dualistic nature of humans or free will (OConnor & Joffe, 2013). Thus, one

possibility is simply that our ordinary understanding of free will is such that it can easily

accommodate the findings of neuroscience, rather than being at odds with them.


4/35


Indeed, why should people be worried by the findings of neuroscience? One answer is

that they seem to promote a deterministic view of human behavior. Determinism has been

traditionally considered as a threat to human freedom and moral responsibility. Determinism is

the thesis that whatever happens, including human behavior, is entirely caused by previous

events and the laws of nature (Mele, 2006). It means that whenever one acts, that action is

completely the product of the laws of nature and events that took place earlier in ones life, and

those events are in turn completely the product of earlier events, eventually reaching events that

happened long before the person who acted was born. However, it is not clear that determinism

prevents free will and moral responsibility, and philosophers have divergent opinions on that

matter. Compatibilists hold that free will and moral responsibility are compatible with

determinism. Incompatibilists hold that if determinism is true, then we cannot have free will or

be morally responsible for our actions. Thus, if people hold a compatibilist view of free will, itcould be that recent findings of neurosciences do not threaten at all their view of themselves as

free and morally responsible agents.

In the past years, theorists have increasingly made use of empirical methods to study

laypeoples conceptions and intuitions about free will and moral responsibility with findings that

may appear somewhat contradictory. Sometimes people seem to have compatibilist intuitions

and sometimes they appear to have incompatibilist intuitions. 1 For example, when participants

are asked the abstract question if somebody can be free and morally responsible in a

deterministic world, most people respond no. However, if participants are asked if a concretely

described person (e.g., John murdered his wife and children so he could be with his lover) can be

free and morally responsible in a deterministic world, most people respond yes. To resolve this

apparent contradiction, some theorists have proposed that peoples fundamental judgments about

free will and moral responsibility tend to be influenced by negative emotional reactions (Nichols

& Knobe, 2007). In this paper, we survey the results of 30 published and unpublished studies and

submit them to a meta-analysis in order to estimate the extent to which purported negative

emotional reactions influence judgments about the freedom and moral responsibility of agents

living in a deterministic universe. We conclude that negative emotional reactions have some

1 Intuition is a term of art (see Feltz and Bishop, 2010). Here, we consider the intuition that p as an immediate judgment that p .


5/35


impact on judgments about free will and moral responsibility, but this effect is not large enough

to play the theoretical role theorists have attributed to it.

Free will and affective reactions

Existing research concerning intuitions about determinisms relation to free will and moral

responsibility can be divided into two broad categories. 2 A few works investigate intuitions

about particular claims or cases that help inform whether free will and moral responsibility are

compatible with determinism (e.g., the Principle of Alternative Possibilities or manipulation

cases) (Miller & Feltz, 2011; Sripada, 2012; Feltz, 2013; Cova, forthcoming). However, the

majority of existing studies try to determine directly whether laypeoples conceptions of free will

and moral responsibility are prima facie compatible with determinism. Some of these latter

studies involve investigating folk concepts of free will and moral responsibility (Monroe &Malle, 2010, Stillman et al., 2011). Others address folk intuitions about the compatibility

question the question of whether free will and moral responsibility are compatible with

determinism (Sommers, 2010; Kane, 1996). In this paper, we focus on the studies that address

the compatibility question.

Conflicting answers to the compatibility question: the abstract/concrete asymmetry

Do people have the intuitions that an agent living in a deterministic universe can be free and

morally responsible (and therefore are natural compatibilists), or do they consider that

determinism precludes this agents free will and moral responsibility (and therefore are natural

incompatibilists)? Initial investigations concerning the compatibility question seemed to favor

the conclusion that people are mostly natural compatibilists (Feltz et al, 2009). For example,

Nahmias and his colleagues (2005, 2006) gave participants vignettes describing agents living in

deterministic universes and performing particular actions (such as robbing a bank). They then

asked participants whether these agents had free will and were morally responsible for their

actions. In three different studies, a majority of participants answered that these agents had free

will and were morally responsible for their actions. Given these results, it would seem tempting

2 Work using the empirical methods of the behavioral sciences to explore philosophically relevant beliefs sometimesis called experimental philosophy. For an overview of experimental philosophy, see Feltz (2009) and Cova(2011a).


6/35


to conclude that participants tended to think that free will and moral responsibility are

compatible with determinism.

However, things are not that simple. Nichols and Knobe (2007) designed an experiment

in which participants were introduced to the following description of universe A:

Imagine a universe (Universe A) in which everything that happens is completely caused by

whatever happened before it. This is true from the very beginning of the universe, so what

happened in the beginning of the universe caused what happened next, and so on right up until

the present. For example one day John decided to have French Fries at lunch. Like everything

else, this decision was completely caused by what happened before it. So, if everything in this

universe was exactly the same up until John made his decision, then it had to happen that John

would decide to have French Fries.

Participants were divided into two conditions. After reading the description, participants in the

concrete condition received the following additional paragraph and question:

In Universe A, a man named Bill has become attracted to his secretary, and he decides that

the only way to be with her is to kill his wife and 3 children. He knows that it is impossible to

escape from his house in the event of a fire. Before he leaves on a business trip, he sets up a

device in his basement that burns down the house and kills his family. Is Billy fully morally responsible for killing his wife and children?

In this case, most participants (72%) answered that Billy was fully morally responsible for

killing his wife and children. These results are perfectly consistent with the hypothesis that most

laypeople are natural compatibilists. However, participants in the abstract condition did not

receive any additional paragraph but only the following question:

In Universe A, is it possible for a person to be morally responsible for their actions?

In this condition, most participants (86%) answered that it was not possible for this person to be

fully morally responsible. This pattern of responses conflicts with participants answers in the

concrete condition . Lets call this the abstract/concrete asymmetry. The abstract/concrete


7/35


asymmetry suggests that participants answers in the abstract and concrete conditions are not

based on the same psychological processes (Sinnott-Armstrong, 2008; Weigel, 2011). However,

there is wide disagreement about what those processes are, and which of the two answers (if

any) should be considered as revealing participants true conception of free will.

Competing accounts of the abstract/concrete asymmetry

One influential explanation of the abstract/concrete asymmetry is Nichols and Knobe (2007)s

affective performance error model . According to Nichols and Knobe (2007), the key difference

between the abstract and the concrete condition is the amount of emotional reaction generated by

the vignettes. The concrete condition depicting a horrendous murder can be thought as more

upsetting than the abstract condition. For Nichols and Knobe, this is what explains the different

intuitions between the abstract and the concrete conditions. While people tend to think that freewill and moral responsibility are incompatible with determinism (hence the answers in the

abstract condition), strong emotional responses can bias participants to attribute moral

responsibility in the concrete condition. Thus, results indicating that participants are natural

compatibilists would be performance errors from participants. Since compatibilist intuitions

result from an error, they could not be used to infer what the folk truly think about the

relationship between moral responsibility and determinism.

Though influential, Nichols and Knobes affective performance error model is not the

only available account of the abstract/concrete asymmetry. Another account simply relies on the

possibility that the abstract and the concrete conditions lead to different understandings and

interpretations of human agency in Universe A . Nahmias and Murray (2010) hold that most

people are natural compatibilists, and that they have the intuition that agents living in

deterministic universes can be free and morally responsible for their actions. However, they

stress that determinism should be distinguished from bypassing. Bypassing occurs when

agents mental states do not play a role in the production of those agents actions, so that agents

will end up acting the way they do whether they want it or not. Clearly, one can think free will

and moral responsibility to be compatible with determinism while thinking that they are

incompatible with bypassing. Determinism does not entail that an agents mental states are

bypassed or irrelevant to the production of the action. Based on this distinction, Nahmias and

Murray (2010) argue that Nichols and Knobes depiction of Universe A (in which things had to


8/35


happen the way they did) could lead participants to understand that agents mental states are

bypassed. This would explain why participants judge agents not to be morally responsible for

their actions in the abstract condition in spite of their natural tendency to consider free will and

moral responsibility to be compatible with determinism. However, in the concrete condition, it is

explicitly stated that the agent acts the way he does on the basis of his desires (e.g. he wants to

be with his secretary) and beliefs (e.g. he knows that it is impossible to escape from his house in

the event of a fire), which would lead participants to revise their interpretation and to understand

that agents in Universe A, though determined to act the way they do, are not bypassed. This

would explain why the agent is judged more morally responsible in the concrete condition.

Empirical evidence supports this hypothesis, suggesting that participants are indeed more likely

to consider that agents mental states are bypassed in the abstract condition than in the concrete

condition (Murray & Nahmias, forthcoming; but see Rose & Nichols, in press).Finally, a third account focuses on another particular feature of the concrete condition:

the fact that a norm is broken . According to the NBAR hypothesis (where NBAR stands for

Norm Broken, Agent Responsible; see Mandelbaum & Ripley, 2012), people are natural

incompatibilists. This natural tendency accounts for participants answers in the abstract

condition. However, people also have the unconscious belief that whenever a norm is broken, an

agent is responsible for breaking the norm. In the concrete condition where a norm is broken,

this unconscious belief counters our natural tendency to judge free will and moral responsibility

to be incompatible with determinism, leading to an increase in judgments that the agent is

morally responsible.

Testing the affective performance error model: the high/low affect asymmetry

Thus, there are many possible accounts of the abstract/concrete asymmetry. What reasons do

Nichols and Knobe give us to prefer the affective performance error model? According to them,

their theory makes the following prediction: the same difference found between the abstract and

the concrete conditions can be found between two concrete conditions as long as affect is varied

in a similar way. To test for this hypothesis, they ran a second study in which participants

received the description of Universes A and B. Then participants received only one of the

following two pairs of sentences:


9/35


Low Affect condition:

As he has done many times in the past, Mark arranges to cheat on his taxes. Is it

possible that Mark is fully morally responsible for cheating on his taxes?

High Affect condition:

As he has done many times in the past, Bill stalks and rapes a stranger. Is it possible

that Bill is fully morally responsible for raping the stranger?

For each condition, half of the participants were told that Mark (or Bill) lived in deterministic

Universe A, while the other half were told that he lived in indeterministic Universe B. Results

(presented in Table 1) suggest that Nichols and Knobe were right. More participants considered

the agent responsible in the high affect condition than in the low affect condition when theaction was set in deterministic Universe A.

--- Insert table 1 here ---

Lets call this peculiar pattern of responses the high/low-affect asymmetry . The high/low-

affect asymmetry seems to provide support for Nichols and Knobes account of the

abstract/concrete asymmetry. Indeed, it seems that the difference between the high and low

affect cases cannot be accounted by the fact that the first makes it clearer that the agent acts on

the basis of his own desires and beliefs: in both cases, there are no direct references to the

agents mental states. Nor does it seem that the difference can be explained by the fact that a

norm is broken in the first case and not in the second: in both cases, it is clear that a norm has

been broken. Rather, it seems that the only difference between the high and low affect cases is

that the first features a much more gruesome and indignation-arousing violation than the second.

The existence of this high/low-affect asymmetry thus lends important support to Nichols and

Knobes affect-based account of the abstract/concrete asymmetry. The affect-based account has

led many to agree that emotional reactions could bias peoples judgment about free will and

moral responsibility and to speculate about what that means for our ordinary conception of free

will (e.g. Nelkin, 2007; Vargas, 2009).

Replications in experimental philosophy of free will: the trouble with the high/low affect

asymmetry


10/35


In a recent paper criticizing the methodological shortcomings of many empirical investigations

of folk intuitions about philosophically relevant topics, Woolfolk insisted on the fact that

replicability of research findings is a key to the establishment of scientifically sound inquiry

(2013, p. 84). There have been recent worries that most surprising results in psychology, and

particularly in social psychology, might not pass the test of replication, and these have led to a

demand for more replications (e.g., Young, 2012). 3 For these reasons, scientific responsibility

counsels that before speculating on what could cause the effects we described, we should make

sure that these effects are robust.

To what extent have the results we described been replicated? In the previous section, we

have described and distinguished two different phenomena: the abstract/concrete asymmetry ,

and the high/low-affect asymmetry . The abstract/concrete asymmetry (i.e., participants ascribe

less moral responsibility to agents living in a deterministic universe when the question is askedabstractly rather than concretely) has been widely reproduced. It has been reproduced using

different descriptions of the deterministic universe (Cova et al. 2012), different concrete

vignettes (Nahmias et al., 2007; Murray & Nahmias, forthcoming), cross-culturally (Sarkissian

et al, 2010), and even when the agent is forced to act by a particular neurological condition (De

Brigard et al., 2009).

The abstract/concrete asymmetry seems a robust result that deserves an explanation.

However, the same cannot be said of the high/low-affect asymmetry. So far, the dramatic

difference Nichols and Knobe found between the low-affect and the high-affect cases has not

been properly replicated in a published study. The only published paper to directly attempt a

replication failed twice and found both times that participants gave mostly incompatibilist

answers in both cases (Feltz et al., 2009). A similar effect has been found by Cova and his

colleagues (2012), but instead of comparing Nichols and Knobes low-affect and high-affect

cases, they compared the low-affect case to Nichols and Knobes concrete case, which differs in

many respect from the low- and high-affect cases (for example, the concrete condition puts more

emphasis on the agents desires and the role they play in the production of his action, which may

3 For example, Scaife and Webber (2013) found puzzling results about folk intuitions about intentional action, butfurther studies repeatedly failed to replicate these results (Cova, in press). See also Sayedsayamdost (in prep) andChristian Motts replication page: http://pantheon.yale.edu/~jk762/xphipage/Experimental%20Philosophy-Replications.html, and the Psych File Drawer project: http://psychfiledrawer.org.


11/35


explain the difference between the two cases). Thus, it is not clear that the high/low-affect

asymmetry is robust or real (i.e., early findings may reflect Type I error).

This lack of replication is all the more worrying because there are reasons to doubt that

the impact of affective reactions can explain why most participants consider agents morally

responsible for their actions in the concrete cases. First, Nahmias and his colleagues used

concrete vignettes involving neutral actions such as going jogging (Nahmias et al., 2006) or

positive actions such as giving money to charity (Nahmias et al., 2006, 2007), and still found

that most participants judged the agent morally responsible for his actions. Second, in a recent

study, Cova et al. (2012) gave various concrete cases to patients suffering from a behavioral

variant of frontotemporal dementia, a neurodegenerative disease accompanied by a deficit in

emotional responses. Contrary to what the affective performance error model would have

predicted given their lack of emotional reactions, these patients were no more incompatibilistthan control participants and gave mostly compatibilist answers.

However, there are also reasons to think that affective reactions do have an influence on

judgments about free will and moral responsibility. First, one source of evidence comes from a

series of studies suggesting that extraverts are more likely than introverts to judge an agent

living in a deterministic universe is morally responsible for his actions (Feltz & Cokely, 2009;

Schulz et al., 2011; Feltz, Perez, & Harris, 2012; Feltz & Millan, in press). A possible

explanation for this phenomenon is that extraverts are less likely to regulate their own emotions,

and thus are more susceptible to be influenced by the affective content of vignettes. Results from

neuroscience support this to some extent suggesting that some individuals are more likely to

regulate emotional reactions than others (Ochsner & Gross, 2005; see also Smillie, 2013).

Second, Feltz and his colleagues (2012) found that the affective content of a vignette could

influence the type of explanation participants give for the agents behavior. Participants faced

with a high-affect vignette were more likely to explain the agents behavior in terms of the

agents decision than participants reading a low-affect vignette. Given that the same study found

that participants explaining the agents behavior in terms of decisions were also more likely to

perceive him as free and morally responsible, this suggests that the affective content of a

vignette can indeed influence participants judgments by favoring one kind of explanation for

agents behaviors over another. Third, other evidence comes from psychological studies


12/35


showing that inducing anger in participants can increase their propensity to punish and ascribe

moral responsibility to agents (Keltner et al., 1993; Tetlock et al., 1998).

Consequently, it is not clear whether negative affective reactions have an impact on

compatibilist or incompatibilist judgments. Even if affect does influence judgments, it is still

unclear the extent to which the abstract/concrete asymmetry can be explained by this impact. To

help address these issues, and determine whether the high/low-affect asymmetry is a genuine

and replicable effect, a meta-analysis was conducted.

Meta-Analysis

Search Criteria

We used the following criteria for including studies in the meta-analysis: (1) the study included a

description of determinism. This narrowed the group of many possible studies becausedeterminism, as philosophers understand it, is a precise, technical concept. For example, this

criterion excluded a number of studies that used scenarios suggesting, but not explicitly

describing, determinism. It also excluded a number of studies where researchers inferred

compatibilist and incompatibilist judgments absent a description of determinism. (2) The study

manipulated the emotional content of the scenarios. Some studies only had high or low affect

scenarios. Because we were interested in the effect of affect, any study that did not manipulate

the emotional content of the scenario was excluded.

Search for Studies

The effect of affect was first identified by Nichols and Knobe (2007). Because they identified the

effect of interest, we first used a Google Scholar Cited Reference search to find all papers that

referenced Nichols and Knobe (2007) for possible inclusion in the meta-analysis. This method

returned 220 results. Computer based database searches were also conducted. Databases included

in the search were PsychInfo and Philosophers Index. Keywords determinism, free will, and

emotion were used in all database searches. This method returned a total of 3,254 results. Pro

Quest Psychology Journals, PsychArticles, Sage Full Text, Science Direct, Web of Science, and

Wiley Online databases were also searched returning no new papers that were not identified in

the PsychInfo and Philosophers Index search. We also conducted a search for unpublished

studies by posting calls for unpublished studies on discipline specific blogs. Additionally, we


13/35


contacted individuals who had conducted previous studies to see if they also had any

unpublished studies. Finally, we emailed relevant research groups. References of each paper that

met the inclusion criteria were searched for possible inclusion in the meta-analysis. The search

started in late 2012 and concluded in late 2013.

Results were then examined to determine if they met the two inclusion criteria. The

abstracts of the papers were read first. Then, if the abstract indicated that the two inclusion

criteria were likely to be met, the entire paper was read. Both authors agreed on which studies to

include. There were no disagreements about what studies were to be included. No results that did

not reference the original Nichols and Knobe paper met the inclusion criteria. Eleven published

studies met both inclusion criteria. We also included a number of unpublished studies ( K = 19).


Variables in Studies

Table 2 lists and includes a brief description of the 30 studies. There were a number of

differences between the 30 studies. First, two different experimental designs were usedeither a

within-subjects design where participants received both high affect and low affect scenarios or a

between-subjects design where participants received only one scenario. Second, studies gathered

either categorical (yes/no) or continuous (Likert scale) data. Third, the number of questions

asked varied. Studies that gathered categorical data only asked one question (e.g., Is it possible

for Bill to be fully morally responsible for cheating on his taxes?). Studies that gathered

continuous data typically asked more than one question. However, since participants answers to

multiple questions often had strong internal consistency, often a composite score was reported

(the mean of the responses). We used only composite scores from the studies that reported

continuous data in the meta-analyses. Fourth, there were three distinct types of scenarios used,

but all studies but one used the original Nichols and Knobe (2007), the Nahmias, Coates, and

Kvaran (2007) scenarios, or a close variation of either. The remaining study used a variation of a

Nahmias, Morris, Nadelhoffer, & Turner (2006) scenario. Finally, the action in the high affect

case varied. In one high affect scenario, a man is described as stalking and raping a stranger. In

the other variation, a man is described as falling in love with his secretary and the only way to be

with her is to kill his wife and children, and he does it.


14/35


Results

To give a sense of the overall data, all effect sizes were converted to a form comparable to the

standardized mean difference and are included in the funnel plot in Figure 1. This overall

analysis needs to be interpreted with caution because it is not necessarily permissible to combine

different types of data from different kinds of experimental designs in the same meta-analysis

(see below). A visual inspection of the funnel plot indicated that there was no publication bias.

This was likely because a relatively large number of unpublished studies were included. For all

meta-analyses, we used the methods described in Lipsey & Wilson (2001). For the overall meta-

analysis, within-subjects (proportion gain) and between subjects (odds-ratios) effect sizes for

dichotomous data were converted to a form ( logit d ) similar to the standardized mean difference

for comparison. Because larger sample sizes tend to be more representative of the population, we

weighted the effect size of each study by the inverse variance weight. 4 This method allowsstudies with a larger sample size more statistical importance than studies with smaller sample

sizes in the meta-analysis. For the overall meta-analysis, a random effect model was used

because homogeneity was rejected ( Q (29) = 49.78, p = .009) and revealed a small, statistically

significant standardized mean difference .15 (95% CI: 0.08, 0.22), Z = 4.12, p < .001. The test

for homogeneity also suggested that there were important differences between types of studies.

Because the studies differed in empirically and conceptually important ways, subsequent

analyses were performed.

--- Insert figure 1 here ---

It is controversial whether combining data from different experimental designs (e.g.,

between-subjects and within-subjects) is legitimate. 5 However, we performed a meta-analysis

4 The inverse variance weight is the inverse of the squared standard error value (Lipsey & Wilson, 2001, p. 36).For example, the inverse variance weight for the standardized mean gain is 1/(Standard error of the standardizedmean difference).5 There is no clear answer when combining data from different kinds of designs is permissible. For example, Lipseyand Wilson unequivocally state the standardized mean gain effect size statistic is different from the standardized

mean difference effect size statistic Comparison of the previous effect size statistics with thosethe standardizedmean difference should make it evident that they cannot be expected to yield comparable values. It follows thatthese two effect size statistics should not be mixed in the same meta-analysis (2001, p. 45). Others are lessconcerned with the differences, at least in practice (e.g., Eagly, Makhijani, & Klonsky, 1992). Others opt for a moremoderate position stating that in some instances data from different designs can be compared, but only if someconditions are met. These conditions include (1) putting effect sizes meaningfully in the same metric, (2) the designsdo not generate relevantly different biases, and (3) the designs estimate the effects with an acceptably similar

precision (Morris & DeShon, 2002). The current studies pretty clearly fail condition 2 since the within-subjectsdesign is likely to generate a bias toward consistency, especially in the short time-frame in which participants wereasked to respond. It is questionable whether the current studies satisfy conditions 1 and 2. Morris and DeShon


15/35


combining each type of data from both types of experimental designs (i.e., within-subjects

(studies 1-11) and between-subjects (studies 12-30). Each of these meta-analyses used the data

from overall meta-analysis above. The meta-analysis combining all categorical data (studies 1-

11) indicated that homogeneity should be rejected Q (10) = 19.53, p = .03. A random effect

model indicated that the overall effect size was small .1 (95% CI: -.02, .22), and not significant Z

= 1.64, p = .1. However, caution must be taken in interpreting the result from this meta-analysis

of categorical data. The rejection of homogeneity suggested that there was variability in the

effect sizes greater than could be expected with subject-level sampling error. This variability

could in principle be accounted for. One obvious difference in the effect sizes was the different

experimental designs. An analogue of the analysis of variance (ANOVA) was performed using

the experimental design as the moderator variable to test whether differences were a function of

experimental design (Lipsey & Wilson, 2001). They were ( Q B (1) = 9.1, p = .003). Therefore,there are good conceptual and empirical reasons not to include all categorical data in the same

meta-analysis.

For these reasons, two separate meta-analyses were performed for the experiments that

gathered categorical data. Within-subjects categorical data studies (studies 1-4) were analyzed as

one group, and between subjects, categorical data studies (studies 5-11) were meta-analyzed as a

separate group. Inverse variance weights were used in the meta-analyses. For within-subjects,

categorical data studies, a fixed effect model of the proportion gain effect size was used for the

meta-analysis because homogeneity could not be rejected ( Q (3) = 0.04, p = .99). The mean

effect size (proportion gain) was .002 (95% CI: -0.44, 0.44) and was not statistically significant Z

= .009, p = .99. For between subjects, categorical data studies, the odds-ratios effect sizes were

converted to their natural logarithm. A fixed effect model was used because homogeneity could

not be rejected ( Q (6) = 5.6, p = 0.47). The mean effect size (odds-ratio) was small 1.70 (95%

CI: .97, 2.61) but was statistically significant Z = 2.4, p = .02.

--- Insert figure 2 here ---


---Insert table 4 here ---

(2002) also offer an empirical method to determine the acceptability of combining data across different designs. Ifthe analogue of the ANOVA that uses the type of design a moderator is significant, then those data should not becombined into the same meta-analysis.


16/35


An overall meta-analysis was also conducted combining all continuous data from each

experimental design (studies 12-30). This meta-analysis indicated that homogeneity should be

rejected Q (18) = 32.59, p = .02. A random effect model indicated that the overall effect size was

small .15 (95% CI: .07, .22) and significant Z = 3.62, p < .01. The rejection of homogeneity

again suggested this variability could in principle be accounted for by the differences in

experimental designs. The analogue of the ANOVA indicated that the experimental design was a

factor in the difference Q B (1) = 6.81, p = .001. Again, there were good conceptual and empirical

reasons not to include all continuous data in the same meta-analysis.

Within-subjects, continuous data studies (studies 12-16) and between-subjects,

continuous data studies (studies 17-30) were analyzed as two separate groups. Means and

standard deviations are reported in Tables 5 and 6 (see Figure 3 for a Forest Plot). Standardized

mean differences (between-subjects, continuous data studies) and standardized gain scores(within-subjects, continuous data studies) and inverse variance weights were calculated. For

within-subjects, continuous data studies, a random effect model was used because homogeneity

was rejected Q (4) = 10.32, p = .04. Given the low number of studies analyzed and the absence

of a priori predictions about the source of the heterogeneity, we assumed that the variability

beyond subject-level sampling was random. Hence, we did not attempt to identify the source of

the heterogeneity. However, it is possible that there is some identifiable source that can account

for the heterogeneity. A random effect model suggested that the mean effect size (standardized

mean gain) was small .08 (95% CI: -.02, .17) and not statistically significant Z = 1.52, p = .13.

For between-subjects, continuous data studies, a fixed effect model was used because

homogeneity could not be rejected ( Q (13) = 15.46, p = 0.28). The mean effect size (standardized

mean difference) was small 0.22 (95% CI: .12, .32) but was statistically significant Z = 4.47, p

% & ! ( ) &

!*"+,"-,()&, .//&0* !()& "+, 123 45


35/35


Figure 2: Forest Plot, Effect Sizes Converted to be Comparable to Cohens d

Moral Responsibility Free Will - Florian Cova

Documents