US National Interest and Threat Perception toward China Since 1949

The United States' Threat Perception of China and the USSR, 1952-1975:

An Application of Automated Content Analysis Methods

A Working Paper

Namsu Kim

Department of Political Science and International Relations

Seoul National University

[email protected]

Abstract

This preliminary research focuses on quantifying ideological variables such as the United States’ threat

perception of China and the Soviet Union from 1952 to 1975 through an automated (computer-aided)

content analysis method. This is to build a foundation to quantitatively analyze impact of the US’ threat

perception on US foreign and security policy, and to develop a logic for case selection. This research used

Foreign Relations of the United States (FRUS) as a population for the content analysis, and the process

consists of four steps: 1) gathering FRUS articles from the web; 2) pre-processing the articles to be

suitable for automated content analysis; 3) selecting articles through systematic sampling and categorical

coding; and 4) the final content analysis. Python is used for the first two processes, and the last step is

conducted with ReadMe package in R. The results correspond fairly well to the actual historical events,

and thus this exercise could be seen as resulting in an effective estimation of the perception factors in US

foreign policy.

This draft has been prepared for the 2013 Midwest Political Science Association Conference.

Please do not cite without permission.

2

I. Introduction

1. Background

Recently, cooperation and conflict between the US and China (PRC, People's Republic of China) has

become one of the most important issues in East Asian security and foreign policy. It is now one of the

main factors that creates the East Asian security environment. Thus, an accurate assessment of the US-

China relationship is of primary importance for South Korea, which has comparatively less power in

military, economy and foreign policy than the surrounding states, such as China, Japan, Russia, and the

US, to maximize its national interest. Among many factors that decide foreign security policy, this paper

focuses on 'threat perception' to understand US foreign policy toward China. Material factors, like

military and economic power, are still deterministic. However, this research presupposes that threat

perceptions are an important foundation in the process of foreign policy decision-making.

US foreign policy in East Asia, which in particular began with the occupation of Japan and South

Korea after World War II, developed with the Cold War and the Korean War, in which the US interacted

with the USSR and China as threatening enemies. In this aspect, US threat perceptions of the Soviet

Union and especially China have had an important role in US foreign policy in East Asia since 1945. It is

true that specific US foreign policies have changed over the past 70 years. However, threat perceptions

are still important variables in that America thinks China and Russia are important counterparts to be

considered in its foreign policy decision-making. In this regard, this paper raises the question of the

relationship between US threat perception and its East Asian security policy. More precisely, what were

the US threat perceptions of the Soviet Union and China over time? Also, how different were the impacts

of these perceptions on US policy decision-making in size and content?

To answer these comprehensive questions, the 70 years that have elapsed since 1945 should be

analyzed. Or, at least since 1949, when the PRC was established, should be considered. It is true that the

'China threat' discourse has been prevalent since the 1990's; this research regards the Chinese threat in the

perception of the US has having existed ever since the Cold War began, and the American perception of

China more recently has inevitably stemmed from this history. Unfortunately, however, the time span that

this research could assess has been, at least for now, limited from 1952 to 1975. This will be explained in

section III-2.

2. Hypothesis

With regard to the main questions of this project, some hypotheses could be proposed as follows.

3

H1. The US threat perception of China and the USSR are different, quantitatively and qualitatively.

The primary goal of this paper is to assess the variation in the US threat perception of China and the

USSR. The size and the content of US threat perceptions of the two states will be measured and explained

respectively.

H2. The bigger the threat perceived from one state, the higher the priority of the foreign policy decisions

taken regarding it.

Even in East Asia, the US threat perception of the USSR was much bigger than that of China in the

early stages of the Cold War. So, the US' East Asian policy was strongly influenced by the Soviet threat.

Conversely, the Chinese threat became prominent during and after the Korean War, and it can be assumed

that at that point, the US’ East Asian policy was more influenced by its threat perception of China.

H3. The variations of US threat perception of China and USSR are related to US foreign policy in East

Asia.

The size of the threat perceptions of China and Soviet Union are supposed to be related,

independently or collectively, to the impact of them on the US policy decisions in East Asia. Here

becomes the threat perceptions an independent variables, while the US East Asian security policy

becomes a dependent variables indicated by military deployment, budget outlay, commitments, and

security institutions such as alliances.

To best review all these hypotheses, in-depth comparative case studies as well as a quantitative

analysis of the threat perception factor should be utilized. This working paper, however, will focus on

preliminary research on the methods that could be used measure the US’ threat perception, since

adequately answering all these questions would be too extensive for this type of limited research paper.

So, only H1 will be examined in depth, while H2 and H3 will be dealt with partially.

3. Threat Perception

Threat perception is an important notion in the fields of both international relations and foreign policy.

Since Stephen Walt proposed his Balance of Threat theory, threat has become a key word in the study of

security and alliances. Walt explains that there are various sources of threat and a state's perception of the

'aggressive intentions' of other states can trigger balancing actions.1 In foreign policy studies, the

1 Walt, 1987. pp. 25-26.

4

perceptions of decision-makers play an important role. As Carlsnaes claims, dispositional dimension of

people in the policy-making process works as a perceptional filter of the structural dimension, to say,

international environment.2 It is not untrue to say that the perception factor, especially threat perception,

has an important role in international security policy-making.

However, there are few studies that discuss how to measure threat perceptions. It is hard to measure

an ideal or intentional factor such as a perception. Raymond Cohen has offered in his book, Threat

Perception in International Crisis, a set of indicators to measure the intensity of threat perceptions. They

are: 1) decision makers' comments and evaluations; 2) other colleagues' appreciation for the original

comments and evaluations; 3) decision making processes to seek countermeasures against the threat; and

4) actual policy actions taken.3 One recent and prominent work on threat perceptions is Identifying

Threats and Threatening Identities by David Rousseu. He discusses the impact of identity factors on

threat perception and proves that two states have a lower threat perception of each other when they share

the similar political and economic systems. Using US-Chinese relations in the 1980's as a case, his

research also offers important insights for investigating Chinese cases.4

These are undoubtedly valuable studies on threat perceptions. They give us theoretical guidance

regarding what kind of approaches, indicators, and words should be primarily considered in the study of

threat perceptions. However, they are not sufficient for quantitative measurements of perceptions over the

course of many years. So, this research tries to establish content analysis methods to quantitatively

measure US perceptions through foreign policy articles, year by year. In aggregation, this will built

continuous linear graphs depicting the variations in US perceptions.

II. Method and Data

There could be different ways to get a picture of the threat perception seen in US foreign policy.

Explanatory approaches could be taken by reading and understanding historical materials such as

government documents, memoirs, oral history collections, etc. With this we are able to demonstrate

through narratives what the threat perception was like and whose idea it mainly was. Another is the

content analysis method. Historical texts here becomes datasets to be analyzed with such methods as

2 Carlsnaes, 2002. 3 Cohen, 1978. 4 Rousseu, 2006.

5

word-counting, categorization, etc. Basically, categorization makes it possible to measure the proportion

of a specific kind of document in a dataset.

This research project uses the content analysis method to understand the variation of the degree of

threat perception in US foreign policy. Because the amount of entries in the dataset will be very large,

since the time period is rather long and the quantity of government articles is quite big, a computer-aided

content analysis method is suitable for this project. Among several automated content analysis methods,

the ReadMe package in R will be tried. The ReadMe package, an automated content analysis tool based

on R and Python, offers a statistical calculation of relative proportions of each categorized theme within

an extensive set of text articles. Simply speaking, researchers first make a dataset (test set) of articles to

be analyzed, and each article should be saved as a text file in a folder. Then, researchers logically select

articles to make a control set and do the coding of the content of the articles in the control set. Here,

coding means to classify each article into a specific category that is conceptualized in advance according

to research hypotheses. Finally, we can register the coded control set and the test set in the ReadMe

program, then ReadMe will statistically calculate the proportion of articles falling in each category.5

ReadMe was originally developed by Daniel Hopkins and Gary King. On King's well-known website,

you can find the Readme software and relevant theoretical papers and technical documentations.6 In this

age of big data, computer-aided content analysis methods like ReadMe are receiving more attention in

political science. These methods are especially being used to analyze the political positions of political

parties and congresses in Europe and the US. In the meantime, applications of automated content analysis

to the foreign policy or international relations arenas are rarely found. This pilot research thus could make

a contribution by applying diachronic automated content analysis to studies of foreign policy and/or

international relations.

For the data set, Foreign Policy of the United States (FRUS), which is officially published by the

Department of State, will be used. FRUS is a large set of volumes containing selected important

government documents categorized in events and regions. It covers the years from the 1860's to the late

1970's. As a re-organized collection produced by the US government officials and historians who would

not include documents containing classified content and omit articles regarding seemingly unimportant

events at the time, FRUS could have certain selection or systemic biases. Nevertheless, containing most

5 Gary King, “Extracting Systematic Social Science Meaning from Text.” 2007. (http://gking.

harvard.edu/gking/talks/wordstlk-high.pdf); Daniel J. Hopkins, Gary King, “A Method of Automated

Nonparametric Content Analysis for Social Science,” American Journal of Political Science 54(1), 2010. 6 http://gking.harvard.edu/readme

6

of the crucial documents regarding US foreign policy decision-making is one important advantage of

FRUS. It is also a virtue of FRUS that it relieves the burden of researchers who would otherwise have to

visit government archives and identify and collect each document personally.

FRUS exists in at least in three forms: printed volumes in libraries, online volumes at the digital

library of the University of Wisconsin, and recently online articles at the Department of State (DoS)

website. To apply automated content analysis in this research project, using already digitalized material is

indispensible to save collecting and preprocessing time. The digital collection of the University of

Wisconsin, although it has the entire volumes from 1881 to 1960, is bound in volumes rather than articles,

and formatted in PDFs rather than TXTs. Thus, they require a laborious process to be ready for automated

analysis. On the other hand, DoS website offers every document of FRUS from the year of 1945 to 1976

categorized by articles, and can be downloaded as HTML files. For the purposes and requirements of this

research project, the DoS version suits best and has thus been used as the data.7

III. Analysis Procedures

The procedure is divided into four steps: 1) collecting articles on the DoS website; 2) making test sets by

pre-processing the articles to be suitable for analysis; 3) making control sets by categorically coding the

articles, selected systematically; and 4) drawing a meaningful result by applying automated content

analysis with ReadMe.

1. Data collection

To extract FRUS articles from the website of the DoS, Python, a programming language, and

BeautifulSoup, a web parsing tool for Python, are used together. In his book Visualize This, Nathan Yau

demonstrated a method for gathering data from websites with patterned internet addresses via Python.8

Since each FRUS article has distinctive and repetitive patterns in their addresses as below, it is relatively

simple to locate a comprehensive lists of addresses of the articles.

7 The digital collection of the University of Wisconsin is at http://uwdc.library.wisc.edu/collections/FRUS; and DoS

Historians website is at http://history.state.gov/departmenthistory. 8 Nathan Yau, 2012, ch. 2.

7

http://history.state.gov/historicaldocuments/frus1945-50Intel/d1

︙ http://history.state.gov/historicaldocuments/frus1945-50Intel/d435 http://history.state.gov/historicaldocuments/frus1950-55Intel/d1

︙ http://history.state.gov/historicaldocuments/frus1950-55Intel/d259

︙

Volume titles and the last article number of each volume were needed to make a complete, patterned

list of addresses. To find the volume titles, such as 'frus1945-50Intel' or 'frus1950-55Intel', the Volume

Title Search page was examined first.9 Then all the volumes on the website were scanned with Python to

find the number of the final article of each volume. The results were saved in a spreadsheet. As a result,

the number of volumes of FRUS on the DOS website was 172 and the entries of FRUS articles numbered

64,047. The volumes covered the years from 1945 to 1976.

Using the lists of the addresses, the entire HTML content of each article's web page could now be

downloaded with Python, and BeautifulSoup analyzed the HTML and extracted the actual content of the

articles (see Appendix 4 for some examples of Python codes). The content of each article was saved as a

text file in each document folder, which was named after the FRUS volumes. The programming and

debugging took less time compared to the web mining process, since it took 3 or 4 seconds to retrieve,

extract and save one article, and thus eventually more than 60 hours to process the 64,047 articles.

2. Pre-processing of the data

The ReadMe package in R is an automated content analysis tool, whose dataset should be in one

folder. The dataset consists of one control file (control.txt) and all the other text files as a test set to be

analyzed. The control file contains a list of whole file names in the folder and each entry has two comma-

separated numbers indicating 1) which set each file is belonged to (1 for the control set and 0 for the test

set); and 2) a categorical value of each file, if it is in the control set. All the first value for all the files in

the test set is zero, and the number of categories are decided by the schemes or hypotheses of the research.

Since the purpose of this paper is to analyze the US perception of China and the USSR on a yearly

basis, it is necessary to rearrange the collected FRUS articles in a yearly manner. For this, the content of

each article should be examined with Python again to find the year it was written. The production year

was located in the upper part of each article and could be scanned and recognized. After finding the year,

each article was copied into new yearly folders. FRUS volume titles indicated that they cover the years

9 FRUS Volume Title Search page. (http://history.state.gov/historicaldocuments/volume-title-search)

8

from 1945 to 1976. After this rearranging, however, the numbers of articles in the years from 1945 to

1951 and in the year 1976 were found to be too small to be analyzed with ReadMe, so these years had to

be excluded from dataset.

The next stage was to select the articles are related to China and/or Soviet Union. From the files in

each year's folder, those related to China were searched and copied into a new folder. The words for

queries were 'china,' 'sino,' 'ccp,' and 'prc'. For example, if an article was from the year 1952 and had a

word such as PRC in the content, the file was copied into the newly-created 'china1952' folder. The same

process was applied to those articles relating to the USSR, whose keywords were 'soviet,' 'ussr,' and

'russia'. As a result, the total number of articles related to China was 8,198 and those related to Russia

numbered 18,914. The yearly proportions of the total articles are described in Figure 1.

Figure 1

Figure 1 shows that the proportion of documents on China became high in the mid-1950s and again in

the early 1970s, while the overall proportion of documents on the USSR was much higher than of those

on Chinese. The ratio of USSR articles rose in the early 1960's and peaked in 1974. This pattern might be

explained by the Cuban Missile Crisis in 1962 and Detente in the 1970s. For China, FRUS shows

elevated concern over China during the early 1950s and the early 1970s. In this case, the Korean War in

1950 and the US-China reconciliation during the 1970s could be the reasons.

0

10

20

30

40

50

60

Proportion of Articles related with China and USSR (%)

USSR

China

9

The third and last part of pre-processing was to cut out the irrelevant content of each article to

improve the precision of the results. A pilot test of ReadMe without this step resulted in showing little

difference between China and the USSR in the crucial aspect of US threat perception. The reason seemed

to be that many articles contain irrelevant contents and were only partially related to China or the USSR.

Documents such as NSC Meetings Series, for example, have multiple subjects on diverse countries of

interest, while discussion on China or the USSR only takes up a little space. To fix this problem, Python is

used again to extract only the relevant part from each article and make new, shortened text files. Search

words indicating China or the USSR (again, ‘china,’ ‘sino,’ ‘ccp,’ ‘prc,’ ‘soviet,’ ‘ussr,’ and ‘russia’) were

again used to scan each article and when one of those words was found, the upper 4 lines and the lower 4

lines would be extracted and saved in a new text file. This made a final dataset of files ready to be

analyzed with ReadMe.

3. Coding

In the coding stage, conceptual categories derived from the hypotheses should be developed and the

control set prepared. Hopkins & King suggested the size of the control set should be at least 100.10 This

research tentatively set the size as 150 and selected about that number of articles from both article groups

of China and the USSR, utilizing a systematic sampling method. Consequently, the control set for China

had 151 articles out of a total 8,189, and the control set for the Soviet Union had 152 articles out of a total

18,914.

Four categories were created for coding. Since the hypotheses are focused on US threat perceptions,

three categories are assigned to measure the levels of threat perception. If an article in the control set

contained comments or expressions of threat such as 'threat,' 'enemy,' or even 'defense,' it was labeled

Category 3. On the other hand, if the article was reconciliatory or benign, it as labeled as belonging to

Category 1. Between them, there should be a practical or neutral sentiment, articles of which were set as

Category 2. Finally, among the dataset, there could be irrelevant articles, and they were put into Category

4. For example, there are many articles regarding Indochina that are sorted into the Chinese group. Those

articles belong to Category 4.11

10 Hopkins and King, 2010. pp. 241-242. 11 Although it is recommended to have at least two or three coders and to secure the inter-coder reliability, this

preliminary research is far from the ideal as a matter of fact. The author hand-coded the control set on his own, and

this shortcoming will be corrected in further research projects.

10

4. ReadMe pilot tests

To figure out how ReadMe works with the FRUS articles, two tests were implemented. ReadMe

basically uses a sampling method to analyze the test sets, so each trial of ReadMe, even on the same

dataset, would show slightly different results. Thus, we need to see how big the deviations are, and

ReadMe analyses on China dataset were conducted ten times. The R codes are demonstrated below, and

the test results are in Figure 2.

oldwd <- getwd()

library(ReadMe)

for (j in 1:10){

i<-1952

while (TRUE){

if (i >= 1976) break

DIR <- paste("frus5/china", i, sep='')

setwd(system.file(DIR, package="ReadMe"))

undergrad.results <- undergrad(control="control.txt", sep = ',')

undergrad.preprocess<-preprocess(undergrad.results)

readme.results<-readme(undergrad.preprocess)

print(readme.results$est.CSMF)

cat(j, i, readme.results$est.CSMF,'\n', file="frus5/result.txt", append=TRUE)

i=i+1

}

}

In Figure 2, CM1 denotes benign perception of China, CM2 neutral, and CM3 threat.12 CM4 is

intentionally omitted since this category contains irrelevant articles. Figure 2 demonstrates the 10

iterations of the test outcomes, which are slightly varying but have dependably similar results. Since they

show different outputs for each test, it seemed to be more accurate to take a mean value or a statistical

estimation of the multiple test results.

The next test using ReadMe was regarding the threshold for word frequency. The initial value of the

threshold is automatically set at 0.01. This means that ReadMe only includes words in its internal

dictionary that appear in more than one percent of the whole control set’s content. Users can modify the

threshold if needed, so it should be examined to obtain better control of the results.

12 In CM1 for example, each character stands for China, Mean of ten times of ReadMe results, and category 1.

11

Figure 2

To test the threshold, Chinese datasets between 1952 and 1962 are used. The test is again implemented

ten times, and each test as given by increasing the threshold by 0.01. Figure 3 shows these test results.

While in general the changes in the threshold do not result in significant differences in the results, the

initial value of 0.01 resulted in the most distinctive disparity in the results.

All in all, it was decided to set the threshold as 0.01, and from the previous test, to take mean values of

the ten iterations of tests on the same dataset. The final results and the interpretation continue in the next

section.

1955 1960 1965 1970 1975

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Year

Pro

po

rtio

n o

f Art

icle

s

C M1

C M2

C M3

U.S . T hreat P ercep tion o f C h ina , 1952-1975

(try=10, threshold=0.01)

C M1 (benign)C M2 (neutra l)C M3 (threat)

12

Figure 3

IV. Results and Interpretations

1. US perception of China and Soviet Union

The final result of the ReadMe analysis on the Chinese dataset is demonstrated in Figure 4. It shows

annually the distinctive values of the proportions of articles in each category. CM1 (benign) has the

lowest value of all, but slightly increases in the early 1970's. CM2 (neutral) fluctuates, showing a decrease

in the early 1960's, an increase in the late 1960's to 1972, and a decrease after 1973. CM3 (threat), most

importantly, increases in the early 1950's and thereafter has a tendency to decrease until 1968, except for

abrupt peaks from 1958 to 1960 and in 1966. In 1969, CM3 places itself in a relatively higher position,

but decreases from the early 1970's to cross with CM1 in 1975.

1952 1954 1956 1958 1960 1962

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Year

Pro

port

ion

of A

rtic

les

CM1

CM3

U.S. Threat Perception of China, 1952-1975

(try=10, threshold=0.01~0.1)

CM1 (benign)CM3 (threat)

13

Figure 4

To explain what the variations that the variables show really mean, a close study of the historical

events and foreign policy-making cases contained within FRUS is actually needed. However, this is not

the objective of this paper. A brief outline of US-Chinese relations proves that Figure 4 roughly

corresponds with real history. There were the Korean War and the Geneva Conference in the early 1950's

and after that there were the China-Taiwan conflict and domestically, the Great Leap Forward (Dayuejin)

Movement in China. Figure 4 shows that there is a small peak in CM3 and CM2 from 1965 to 1967,

which could be explained in part by Chinese nuclear and atomic development during the mid-1960s and

partly by the Great Proletarian Cultural Revolution (Wenhua Dageming). Interestingly, a jump of CM3

and CM2 in 1969 coincides with the inauguration of president Nixon. Decreasing CM3 in the early 1970's

reflects the US-China reconciliation and Kissinger and Nixon's visit to China during that time.

1955 1960 1965 1970 1975

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Year

Pro

po

rtio

n o

f Art

icle

s

CM1

CM2

CM3

U.S. Threat Perception of China, 1952-1975

(Mean result, threshold=0.01)

CM1 (benign)CM2 (neutral)CM3 (threat)

14

Figure 5

Figure 5 shows the result of the ReadMe test on the Soviet dataset. UM1 (USSR, Mean,

Category 1 (benign)) has a tendency to decrease from 1952 to 1968, except for the peak from 1958-60.

From 1969, it leaps and maintains a relatively higher proportion. UM2 (neutral) generally increases from

21.9% in 1952 to 29.5% in 1975, while it decreases during the late 1960s. Finally, UM3 (threat), which

maintains the highest ratio among the three variables, has sudden decreases in 1955, from 1958 to 1960,

and in 1964. From 1965 on, it keeps increasing slightly until a decrease beginning in 1969.

Does this result corresponds with reality? Since Nikita Khrushchev became the Premier in 1958,

Soviet rhetoric regarding Peaceful Coexistence started. The first summit meeting between the US and the

USSR (Eisenhower and Khrushchev) occurred in 1959, and a summit meeting between Kennedy and

Khrushchev in 1961. Figure 5 shows the sudden reconciliation between the two countries. But, the Cuban

Missile Crisis happened in 1962 and the mood of their relations became cold again. With the start of the

Nixon administration, Detente began and several negotiations such as SALT-II also took place. Those

1955 1960 1965 1970 1975

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Year

Pro

po

rtio

n o

f Art

icle

s

UM1

UM2

UM3

U.S. Threat Perception of USSR, 1952-1975

(mean of 10 tries, threshold=0.01)

UM1 (benign)UM2 (neutral)UM3 (threat)

15

negotiations continued until the early 1970s, but were suspended from 1976-79 during the Carter

Administration. Figure 5 seems to correspond well with the historical realities in the 1960s and 1970s,

which could be described in general as searching for the possibility of practical negotiations and thus

reconciliation under prevailing tensions, seeing that UM1s takes a relatively higher position during the

early 1970s, UM2 increases notably in the early1960s, and maintains a high position thereafter.

2. Regression analysis of the variables

A further question could be raised as to the relationship among the result variables CM1, CM2, CM3,

UM1, UM2, and UM3. Specifically, what kind of causal relationships existed between the US perception

of China as a dependent variable and the one of the USSR as an independent variable?

To answer this, every possible bilateral relation among the variables was put in a regression analysis

using the PAIRS command in R. PAIRS is an intuitive way to test general relations of variables when

their correlations are presumably unknown. As figure 6 depicts, each box contains plots and the LOESS

line of each bilateral relation of two crossing variables, and displays notably strong and positive linear

correlations between CM1 and UM1, CM2 and UM2, and CM3 and UM3. To find more exact numbers, a

regression analysis is conducted as shown in Table 1.

Why are they so closely correlated? One explanation could be that the agents who were perceiving

China and the USSR were part of one homogenous group, represented by the 'decision makers' in the

White House, DoS, Department of Defense, etc. Also, it could be because China and the USSR are both

communist states. So, US perceptions and policies for China and the Soviet Union synchronized because

of the homogeneity of the perceptions and processes of US foreign policy decision-making. In this case,

US officials might have simply been identifying China with their perception of the Soviet Union, which

was the most important or most threatening state.

Another explanation is also possible: an intentional or unintentional bias on the part of the editors of

FRUS. They might have a certain cognitive trait in thinking about US policies or the policy-making

structure. If so, their selection bias could be realized in the same proportion in each category. However,

this explanation needs further investigations.

16

Figure 6

Table 1

y x b S.E.(b) t adjR2

CM1 UM1 0.945*** 0.017 56.979 0.993

CM2 UM2 1.042*** 0.047 22.19 0.955

CM3 UM3 0.897*** 0.068 13.123 0.882

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

CM1

0.15 0.30 0.08 0.14 0.16 0.24 0.08 0.14

0.10

0.25

0.15

0.30

CM2

CM3

0.40

0.50

0.08

0.14

CM4

UM1

0.10

0.30

0.16

0.24 UM2

UM3

0.40

0.50

0.10 0.25

0.08

0.14

0.40 0.50 0.10 0.30 0.40 0.50

UM4

17

Then is the first hypothesis, which expected a disparity between perceptions of China and USSR,

rejected? This can be accessed through two logics. One is the existence of outliers. Figure 7 contains the

diagnostics of the regression analysis of CM3 and UM3 and points out that there are outliers. The outliers

are 1958, 1959, 1960 and 1965. In these years, the US perception of China worsened due to Sino-

Taiwanese conflict and China's nuclear development, as well as the Great Leap Movement and Cultural

Revolution. In contrast, the quantitative analysis of the years from 1958 to 1960 agrees with the US-

Soviet reconciliation represented by serial summit meetings. This difference reveals that the US

perception of China could have an independent or unsynchronized impact, even under the prevailing

influence of US perceptions of the Soviet Union. It is also worth noticing that the outliers, though usually

omitted, have important meaning in this kind of research.

Figure 7

0.38 0.42 0.46 0.50

-0.0

4-0

.02

0.00

0.02

Fitted values

Res

idua

ls

Residuals vs Fitted

14

89

-2 -1 0 1 2

-3-2

-10

12

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

14

9

7

0.38 0.42 0.46 0.50

0.0

0.5

1.0

1.5

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale-Location14

97

0.00 0.10 0.20 0.30

-3-2

-10

12

Leverage

Sta

ndar

dize

d re

sidu

als

Cook's distance

1

0.5

0.5

1

Residuals vs Leverage

9

7

14

18

Another argument is that the size or intensity of the perceptions was unequal, even if they were

synchronized. The first part of data pre-processing showed that there were big difference in the quantity

of articles discussing China and the USSR. There were 18,914 articles about the Soviet Union, compared

to 8,189 about China; the former is 2.3 times bigger than the latter. Thus, the size or intensity of US threat

perceptions of the two was absolutely different. This factor should be regarded importantly, and should be

included as an important indicator when development of an integrated Index of Threat Perception is

needed.

3. Relations with US military deployments

This part attempts to conduct a regression analyses of US overseas deployments and the threat

perceptions of China and the USSR. This is a primitive test for the third hypothesis that is searching for

some relationship between the perceptions and actual foreign policies. Undoubtedly, the size of US

deployments doesn't represent all the foreign policies of the US, and supposing a relationship between the

deployment, a small part of foreign security policies, and threat perceptions is not an accurate depiction of

reality. Also, assuming relations between a materialistic factor and a conceptual or psychological factor

may also be farfetched. However, it seems worthwhile trying to figure out how much they are related,

because it could offer some ideas for further studies.

Since the existence of the USSR is more prominent in the European sphere, US deployment to

Germany (GER), France (FRA), and Italy (ITA) were selected as dependent variables, and their sum was

considered representative of Europe (EUR). For China, US military forces in Japan (JAP), Korea (KOR),

the Philippines (PHI), and their sum of Asia (ASIA) were tested. Also, the US military budget (Bcon) and

outlays (Ocon) are included as dependent variables. The independent variables are the US threat

perceptions of China (CM3) and the USSR (UM3).

Again, PAIRS was used to review the correlations, but unfortunately there ere no significant results

(see Appendix 2). It was expected that this simple model could only give unsatisfactory results. At least,

however, it can be said that simple matching between ideational variables such as threat perceptions and

material variables such as military spending doesn't offer significant relationship descriptions. Therefore,

to judge the impact size of a threat perception to US security policy, more sophisticated models are

needed.

19

V. Discussion

Among the many challenges in finishing this research project, the first was to collect FRUS articles on

the internet, specifically Historians Office website at the DoS, with Python programming. Python is such

an intuitive programming language that it could be used without serious troubles. But unexpected

hardships came with collecting articles themselves because it took so much time. Pre-processing articles

to be suitable for ReadMe analysis was also a tedious process. The problem is that this was a relatively

easier processes when compared with using other types of sources, such as e-books in PDF format or real

government documents in archives. The PDF files of FRUS in the Digital Library at the University of

Wisconsin could be downloaded and used for this analysis since they could perhaps more adequately

cover the years between 1945-1951, which were omitted due to the small quantities of articles. Using

them is a real possibility, since they have already been digitalized, and it would only require several more

steps to reform them to be suitable for analysis. The worst case would be using hard copies of government

documents. This would necessitate visiting archives to take digital pictures of each and then digitalizing

each of the pictures with text recognition applications, which surely would cost more time and labor than

processing the other types of internet materials.

Such difficulties notwithstanding, automated content analysis using ReadMe showed meaningful

results worthy of the time and labor. It succeeded in transforming the linguistic and conceptual text data

into plausible quantitative data. Now it is possible to use the data for pattern finding, case selection, and

even statistical analysis, and to complement the disadvantages of narrative historical case explanation in

the aspects of time and parsimony. For example, as we can see in Figures 4 and 5, sudden changes in

perceptions occur when US administrations change. In particular, the Kennedy Administration shows a

much lower threat perception for China than the previous Eisenhower Administration. It is interesting that

in 1961 CM3 (threat) becomes lower than CM2 (neutral) for the first time. Also, with the start of

Kennedy's era, we can see that the threat perception of the USSR sharply increases, while the benign

perception drops.

The Nixon administration also shows interesting patterns. In 1969, especially, CM3 and CM2

increased sharply. This means that the Nixon Administration's perception of China was much different

than the Johnson Administration's. While CM2 increases until 1972 and then falls until 1975, CM3 has

drops until 1975 to cross with CM1 (benign) for the first time. It could be understood from this that

Nixon's perception of China was relatively bad at first. But, through efforts to negotiate and correspond

with China, this threat perception decreased considerably. At this point, we can raise some questions

regarding the impact of Nixon's visit to China in 1972 on the US’ threat perception. Did Nixon's visit

20

lower the threat perception, or were there other causes, such as efforts of some important decision makers,

e.g. Kissinger, to initiate conversations with China? How was it possible to reconcile with China in spite

of the notably high US threat perception? Like this, content analysis of FRUS allows us to identify and

understand interesting patterns in history and ask questions about the cases in which perception variables

can perform strongly.

Through this preliminary research, some recommendations for further studies revealed themselves.

One is to develop an index of threat perception. This index could be used by itself or imported for many

other quantitative analyses. We now have the relative proportions of articles in FRUS on China and USSR

as in Figure 1, and perception variables CM1, CM2, CM3, and so on. Mixing these variables properly, it

could be possible to develop a unified index of threat perception.

Secondly, it could produce a more interesting outcome if government documents in the archives could

be used. In FRUS, articles from all government agents and offices are mixed in a timely manner. But, we

can find in the archives articles that are sorted by offices, meetings, personnel, and themes. So, utilizing

these categorizations would make it possible to analyze a specific group's perceptions. Comparing their

perceptions and discourses could contribute to studies on foreign policy.

Finally, more in-depth and linguistic analyses on the articles’ contents are needed. Developing a

corpus and finding subtle meanings and perceptions from personal comments and evaluations would

bring out important aspects of cases and events of interest. If the ReadMe analysis of FRUS is a macro

analysis of history, micro analysis is also needed to truly understand cases and events. Practically, this

micro level analysis will also help us to find out some critical evidences in a large quantity of documents

and articles.

21

Reference

Carlsnaes, Walter. 2002. "Foreign Policy," in Walter Carlsnaes, Thomas Risse and Beth A. Simmons eds.,

Handbook of International Relations. Sage Publications.

Cohen, Raymond. 1978. “Threat Perception in International Crisis,” Political Science Quarterly, 93(1).

Daggett, Stephen and Amy Belasco. 2002. “CRS Report for Congress: Defense Budget for FY2003: Data

Summary," Congressional Research Service.

Hopkins, Daniel J. and Gary King. 2010. “A Method of Automated Nonparametric Content Analysis for

Social Science,” American Journal of Political Science 54(1).

King, Gary. 2007. “Extracting Systematic Social Science Meaning from Text.” (http://gking.

harvard.edu/gking/talks/wordstlk-high.pdf)

Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Software Design. No Starch

Press.

Rousseau, David L.. 2006. Identifying threats and threatening identities: the social construction of

realism and liberalism. Stanford University Press.

Walt, Stephen M.. 1987. Origins of Alliances, Cornell University Press.

Yau, Nathan. 2011. Visualize This: : The Flowing Data Guide to Design, Visualization, and Statistics.

Wiley.

http://gking.harvard.edu (Gary King's website)

http://history.state.gov/departmenthistory (Historians Office website in Department of State)

http://uwdc.library.wisc.edu/collections/FRUS (Digital collections in the University of Wisconsin)

http://www.vetfriends.com/US-deployments-overseas/historical-military-troop-data.cfm

(VetFriends.com offers a database for US military deployments)