Using EpiData & Epi Info for Windows1

Using EpiData & Epi-Info

for Windows

Training for Communicable Disease

Control in Local Authorities

Cardiff Council (Strategic Planning & Environment)

March 2007

Acknowledgements

i

Acknowledgements

© 2007 Cardiff Council (Strategic Planning & Environment).

This training guide was produced by Alastair Tomlinson to form part of the

Communicable Disease Lead Officer Training Programme, co-ordinated by the

Wales Centre for Health.

Please send enquiries relating to this training guide to:

Alastair Tomlinson, Chartered Environmental Health Practitioner

Team Leader (Health Improvement)

Public Protection Division

Room 134 City Hall

Cathays Park

Cardiff. CF10 3ND.

029 2087 1845

[email protected]

About the software

Epi Info™ is a public domain software

package designed for the global community

of public health practitioners and

researchers. It provides for easy form and

database construction, data entry, and analysis with epidemiologic statistics, maps,

and graphs. Epi Info can be downloaded from http://www.cdc.gov/epiinfo

EpiData Software has developed from

securing the principles of Epi Info V6 for DOS

to an independent documentation oriented

system. EpiData can be downloaded from

http://www.epidata.dk

Conventions used in this training guide

Text to be entered on screen is shown in this font.

Directions to drop-down menu items are shown in bold type, e.g. File > SaveFile > SaveFile > SaveFile > Save.

Table of Contents

ii

Table of Contents

Acknowledgements...............................................................................i

Table of Contents .................................................................................. ii

Aim and Objectives ..............................................................................1

Outbreak Scenario................................................................................2

Creating a Questionnaire using EpiData.............................................3

Entering Data using EpiData ...............................................................17

Outbreak Investigation using Epi Info Analysis.................................19

Using Analysis with routine COSURV data.........................................35

Other capabilities of EpiData and Epi Info ........................................39

Appendix I – Comparison of Epi Info & EpiData ...............................44

Appendix II – Contents of course CD-ROM.......................................48

Appendix III – Further information & resources.................................49

Appendix IV – Worksheet for 2x2 table results .................................52

Appendix V – Check code example ................................................53

Aim and Objectives

1

Aim and Objectives

Aim of the training

To provide training on the practical use of Epi Info and EpiData in communicable

disease control, with particular reference to:

♦ An outbreak situation

♦ Analysis of routine Cosurv surveillance data

Objectives

By the end of the training delegates will:

♦ Have an understanding of EpiData and Epi-Info for Windows and their

component elements

♦ Be able to use EpiData to design a data entry form for a questionnaire in an

outbreak situation

♦ Be able to use EpiData to enter outbreak investigation data into a record

suitable for analysis in Epi Info for Windows

♦ Be able to use Analysis to obtain useful statistical and epidemiological

information from an EpiData / Epi-Info for Windows database for outbreak

investigation purposes

♦ Be able to use Analysis to import routine Cosurv surveillance data into Epi-Info

for Windows, and obtain useful statistical and epidemiological information

Outbreak Scenario

2

Outbreak Scenario

On the 17th August, you receive a telephone call from a gentleman who reports that

he and several others who attended a buffet following a funeral were suffering

symptoms of food poisoning. The buffet, provided by an external caterer, was held

at a local club following the funeral, and mourners arrived at the club at around

3.00 pm on 14th August. Food left over from the buffet was placed in the main bar

areas of the club for club members to consume later that day.

Initial activity involves obtaining of a list of people who attended the funeral and

others who may have eaten the food provided for the funeral buffet. A list of food

served at the buffet has been obtained from the caterer, and cross-referenced with

initial information gathered from cases. Indications are that around 70-80 people

attended the funeral, and approximately 40-50 of these people may have

experienced symptoms consistent with food poisoning.

Table Table Table Table 1111 ---- List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet List of foods served at the buffet

An Outbreak Control Team has been convened, and has decided to undertake a

cohort study to investigate the outbreak. The OCT assigns you with the following

tasks:

♦ Establish the case definition

♦ Develop a structured questionnaire to investigate the outbreak.

♦ Enter questionnaire data into an appropriate computer database

♦ Analyse the data to describe outbreak and identify exposures associated with

illness

This training uses this scenario to introduce the various functions of EpiData and

Epi Info for Windows, and their particular use in outbreak investigation.

sausage rolls chicken rolls salmon sandwiches

pickled onions ham sandwiches egg rolls

corned beef sandwiches ham rolls egg sandwiches

chicken nuggets chicken sandwiches cheese & biscuits

crisps gateaux pasties

Creating a Questionnaire using EpiData

3


Basic Questionnaire Creation

A screenshot of the main EpiData screen is shown.

We want to create a new questionnaire, so select Define DataDefine DataDefine DataDefine Data > New > New > New > New .QES File.QES File.QES File.QES File (It is

also possible to edit an existing questionnaire, by using Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES Define Data > Open .QES

FileFileFileFile). This creates an empty text file into which we can enter information.

On creation of the file, the following toolbar

option also becomes available.

Clicking this button brings up the

‘Field pick list’ dialog.

This dialog makes it easy to create different

kinds of fields. As an example, we will create

one or two of the basic field types in our

questionnaire.


4

First, type an appropriate heading into the first line of your questionnaire, such as

“Lead Officer Training March 2007”.

Then, on the row below, enter Surname: Leave the cursor flashing after the colon. If

the Field pick list is not already showing, click the button to bring it on screen.

Select the ‘Text’ tab from the pick list. This then gives a short option list of ‘text’,

‘upper-case text’, and ‘encryption field’. For now we’ll accept the default ‘text’

option. Set the field length to 20, then click the Insert button. EpiData inserts a

series of underscore characters after the Surname: label. Underscore characters _

are how EpiData denotes plain text fields. The number of underscores indicates the

maximum length of the field.

On the next line, type Forename: Using the field pick list again, insert another text

field of 15 characters.

Now let’s try a different field type – dates. On

the next line, type Date of birth: Select the

‘Date’ tab from the field pick list. This

presents two lists of options – general date

fields on the left, and ‘automatic’ dates on

the right. General date fields are formatted in

three different ways. For most of us in

Europe, the <dd/mm/yyyy> format is most

natural, so select that. Click the Insert button,

and EpiData inserts the relevant date format

field type.

On the next line, type Gender: and insert a single character Uppercase text field.

EpiData inserts a <A> code, which denotes an uppercase field one character long.

Later, we’ll restrict the entries in this field to either M (male), F (female) or U

(unknown). Below this, add the label Occupation: and insert another 20 character

text field.

We need to be able to record interviewee address details.

First, let’s create a house number field. On

the next line type “House number:”, and then

select the ‘Numeric’ tab on the field pick list.

Select 3 digits before the decimal point, and

0 digits after it, then click the Insert button.

EpiData inserts ### after the “House

number:” label. # characters are how EpiData

denotes numeric fields, and again the

number of # characters indicates the

maximum size of the number. (Numbers with

a decimal point appear as ##.##).


5

Add another text field for House name (30 characters), and three more fields for

Street name (30 characters), District (20 characters) and Town (20 characters).

Then add another label for Postcode: and this time add an ‘Uppercase text’ field of

8 characters. EpiData inserts uppercase fields as <A > with the number of

spaces determining the total length of the field.

Finally, let’s add a field for telephone details. Initially it seems like a good idea to

create this as a numeric field, but in doing this we wouldn’t be able to record any

text details (such as ext. etc), and it’s unlikely we would ever want to order our data

by telephone number, so it’s probably easier to simply create a text field of around

15-20 characters. If you prefer you can create two fields, one for home and one for

other (e.g. work, mobile).

We’ve now created the fields for the basic contact details of the interviewee. Before

proceeding onto further work, let’s save what we’ve done so far. Click the Save

button on the toolbar (or select File > SaveFile > SaveFile > SaveFile > Save). Enter an appropriate filename and

location in the dialog box, and click Save.

We can also take a sneak preview of how the questionnaire will appear for those

entering data. Before doing that, let’s set a couple of options that determine how

our fields will be named. Click File > OptionsFile > OptionsFile > OptionsFile > Options, and then select the ‘Create data file’

tab.

Generally, the ‘automatic field names’ options is best, since it will try to make sense

of the question (i.e. the text immediately to the left of the field), and will ignore

common words like ‘who’, ‘did’, ‘or’ etc. Sometimes it may be preferable to select

the ‘First word in question’ option. For this exercise, select ‘automatic field names’.


6

Later in the module we’ll look at how we can specifically tailor the fieldnames that

EpiData will generate in the data record files. Fieldnames have a maximum length

of 10 characters.

The decision on letter case of field names is mainly one of personal preference –

the author’s preference is to use upper-case for field names to make them stand

out.

Once you have made your option selections and clicked OK, click the Preview

Data Form button (or select Make Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data FormMake Data File > Preview Data Form).

A new tab on the main display will appear, showing the questionnaire with data

entry fields in the relevant places. You can select File > Print Data FFile > Print Data FFile > Print Data FFile > Print Data Formormormorm to get an

idea of how the questionnaire will appear on paper for completion by interviewers.

You can even practice entering data into the form to check that things appear as

you expect them to. For now, it’s useful just to see how things are going to be

presented. To close the form, select File > Close formFile > Close formFile > Close formFile > Close form, or press CTRL F4.

Currently our questionnaire lets us record interviewees’ personal details, but not a

lot else. Let’s change that by adding some details specific to the event in our

scenario. The first thing to establish is whether the person actually attended the

funeral (they may have been exposed to the food under suspicion at the club bar

after the event).

This introduces us to another important field type: the Boolean field. This is simply a

Yes/No field, but this type of data is often crucial in outbreak investigation, since it

allows us to construct 2x2 tables to assess

relative risk for various exposures.

On a new line at the end of the questionnaire,

type Did you attend the funeral? Select

the ‘Other’ tab on the field pick list, and select

the Boolean (yes/no) option (the Length field

on the dialog becomes greyed out as it isn’t

relevant). Click the Insert button. EpiData

inserts a <Y> code, which is how it denotes a

Boolean field.

For clarity, let’s also include a question on whether the person attended the

members club – we don’t know if people from the members club have been

affected, or whether there is crossover with the cohort of funeral attendees, but it

may be important to be able to distinguish between them later. On a new line, type

Did you attend the members club? and insert another Boolean field.


7

We’ll also use Boolean fields to record whether or not the person was ill, and what

their symptoms were. Add the relevant lines and fields to the questionnaire for the

following fields:

Sometimes people may have described themselves as ill, but do not meet the

actual case definition, so include an additional Case definition met? Boolean

field as well.

Another key set of data to record for those who have suffered symptoms is their

onset date/time, and the duration of symptoms. Go back up the questionnaire, and

add a couple of extra lines after Were you ill? but before the list of symptoms.

Type Onset date: and then insert a general date field. On the next line type Onset

time: and then insert a numeric field with 2 digits before and 2 digits after the

decimal point (##.##). EpiData records time-related information in this numeric,

with the digits before the point representing hours and those after minutes. The 24-

hour clock is used. Then add another field for duration of symptoms – 2 digits in

size, intended to be measured in days, and a similar 3 digit field for incubation

period, this time intended to be measured in hours.

In a full outbreak we would probably also include further questions about whether

the person was hospitalised, whether specimens had been submitted, and so on,

together with details of any other household contacts, and maybe other data to

indicate severity of symptoms, but for the purposes of this exercise we’ll skip these

elements.

The final major part of the questionnaire is the recording of relevant exposures.

Comparison of the rates of illness in those exposed and not exposed will enable us

to assess which exposures are most likely to be implicated in the outbreak. For the

purpose of this exercise, we’ll assume that the OCT has decided to focus attention

on the foods consumed at the buffet. In a real life situation, it may be more

appropriate to retain an open mind and include other potentially relevant exposures

that may explain some or all of the illness.

NB – avoid use of the ampersand & symbol in questionnaires, since it tends to

cause unexpected display results.

♦ Were you ill?

♦ Diarrhoea

♦ Vomiting

♦ Abdominal pain

♦ Nausea

♦ Pyrexia

♦ Headache

♦ Other aches

♦ Other symptoms (with a separate text

field for description)


8

Add a list of Boolean fields for the relevant food items – the table from the outbreak

scenario is reproduced below.

Again, in a real situation we might add additional information on quantity eaten or

portion size to investigate the possibility of a dose-response relationship, but for this

exercise we will keep the exposures simple yes/no answers. Finally, add a general

comments text field of around 50 characters to capture any other relevant

information (e.g. perhaps interview was carried out with parent, relative, interpreter

etc).

The only remaining items to add to the questionnaire are some basic administration

fields. We need to have some way of identifying each record as unique, and EpiData

provides a specific field type for this purpose. It’s quite useful for this to be easily

seen, so at the very top of the questionnaire, type Record no.: and select the

‘Other’ tab on the Field pick list. Select the ‘Auto ID number’ field type and click

Insert. EpiData inserts an <IDNUM> code, which will include an automatically

incrementing number for each new record added. It is also useful to record the date

of interview and the name of the interviewer (initials usually sufficient), so add

appropriate date and uppercase fields for this purpose.

We can also add another date field, the date of entry into EpiData. Again EpiData

can automatically insert this for us – select the ‘Date’ tab on the Field pick list and

select the <Today-dmy> code from the right-hand list.

The basic questionnaire is now complete. At the moment it isn’t particularly easy to

read and this may make it more difficult for interviewers to complete the

questionnaire, and harder for data entry staff to accurately and quickly enter the

results. EpiData includes an ‘Align fields’ option to help address this problem. Place

the cursor in one of the longer questions/labels, such as Corned beef sandwiches,

or Did you attend the members club? Then select Edit > Align FieldsEdit > Align FieldsEdit > Align FieldsEdit > Align Fields. EpiData

will realign each line of the questionnaire so that the fields appear in a ‘column’ –

making it easier for both interviewers and data entry staff.

Save the questionnaire before we proceed any further.

sausage rolls chicken rolls salmon sandwiches

pickled onions ham sandwiches egg rolls

corned beef sandwiches ham rolls egg sandwiches

chicken nuggets chicken sandwiches cheese & biscuits

crisps gateaux pasties


9

Advanced questionnaire design

In this section we’ll cover some of the techniques and functions provided by

EpiData to help save time on data entry, and to ensure that accurate and reliable

data is entered.

Closer control over fieldnames

To start with, lets look at how our questionnaire looks in data entry mode. Select the

Preview Data Form button to display the data form. Use the TTTTabababab key to cycle through

the fields in the questionnaire. Note that for each field, the fieldname appears in

the status bar at the bottom left of the screen, and next to it information on the type

of data that can be entered (e.g. ‘Alpha: all entries allowed’, ‘Date (dmy): 0-

9 and / allowed’, ‘Boolean: Y,1,N,0 allowed’ etc.).

As you cycle through the fields, note the fieldnames that EpiData has automatically

assigned to each field. In the majority of cases, they make perfect sense, but there

are a few where the fieldname doesn’t intuitively indicate what the contents of the

field are. This can be particularly important where data analysis is being undertaken

by someone who wasn’t involved in the original drafting of the questionnaire (quite

conceivable in a large outbreak with several partner organisations) – the last thing

that they need is to be unsure what a relevant item of data actually means.

Fortunately, EpiData allows questionnaire designers greater control over fieldname

selection where necessary.

The default fieldname selected in each case is up to 10 letters long, based on the

text that appears immediately to the left of the field but ignoring common words

such as ‘did’ or ‘the’. As an example, the fieldname for Did you attend the

funeral? is YOUATTENDF, for Were you ill? – WEREYOUILL, and for Pickled

onions - PICKLEDONI.

For these and some other fields, we would like to tailor the fieldname to make it a

bit more meaningful. The chief way of doing this is by the use of braces { }, also

known as curly brackets. When automatically selecting fieldnames, EpiData uses

text enclosed in braces in preference to normal text. If the question is “{my} first

{field}” then the field name will be MYFIELD. Braces offer a powerful method of

defining meaningful field names.

Let’s look at a simple example – the Were you ill? question. Although the

fieldname does make sense, it doesn’t really need to be that long – simply the term

ILL would be enough. By putting braces around the word ill in the questionnaire, we

force EpiData to call the field ILL. Modify the text in the questionnaire so that it

looks like this:

Were you {ill}? <Y>

Now click Preview Data Form and put the cursor into the Were you ill? field. You

can see from the status bar in the bottom corner that this fieldname is now simply

ILL. Once you’ve satisfied yourself of this, close the preview (CTRL F4).


10

This is a fairly simple example, but the EpiData capabilities are more sophisticated

than that. EpiData can pull text from more than one set of braces together to create

a fieldname. As another example, consider the Did you attend the funeral/members

club questions. Presently these have fieldnames of YOUATTENDF and YOUATTENDM

respectively – not terribly meaningful. But by changing the text in the questionnaire

as follows:

Did you {attend} the {fun}eral? <Y> Did you {attend} the members {club}? <Y>

… we produce fieldnames of ATTENDFUN and ATTENDCLUB, which are far more

intuitive. Check for yourself by clicking Preview Data Form. Notice also that the

braces do not appear on the entry form (and won’t appear on a printout either), so it

doesn’t affect the ease of use for interviewers and data entry staff.

Go through the table below to update the questions as indicated to generate more

meaningful fieldnames:

Question Current fieldname Modification New fieldname

Date of birth DATEBIRTH {D}ate {o}f {b]irth DOB

Abdominal pain ABDOMINALP {Abdom}inal {pain} ABDOMPAIN

Case definition met? CASEDEFINI {Case def}inition {met}? CASEDEFMET

Sausage rolls SAUSAGEROL {Saus}age {rolls} SAUSROLLS

Pickled onions PICKLEDONI Pickled {onions} ONIONS

Chicken nuggets CHICKENNUG Chicken {nuggets} NUGGETS

Chicken rolls CHICKENROL {Chick}en {rolls} CHICKROLLS

Chicken sandwiches CHICKENSAN {Chick}en {sand}wiches CHICKSAND

Click Preview Data Form to confirm the changes that have been made. Once you’re

finished, close the preview and save your modified questionnaire.

Controlling data entry and skipping questions

For some fields it can be useful to place restrictions on the range of data that can

be entered – for example the Gender field can only have three sensible values

(male, female, unknown) and it also makes sense to limit the Onset time field to the

valid times represented in the 24 hour clock. There also some fields that can be

filled through calculation – for example, age at time of interview, incubation period,

perhaps even case definition in some circumstances – which can help with data

accuracy and consistency. Finally data entry can be significantly quicker by using

‘skips’ so that the data entry operative doesn’t have to cycle through irrelevant

fields (such as symptom fields for an interviewee who wasn’t ill).

These functions are all achieved by what EpiData calls checks. Checks are usually

added once a ‘data file’ has been created based on the layout in a questionnaire.


11

One of the things we’ll do is add a simple calculation to work out a persons age in

years at the time of the interview. Before we create the data file, add a new numeric

field of 2 digits to hold the calculated age. Place it below the date of birth question.

Now our questionnaire has all the fields we need, so we can create the data file that

EpiData will actually store the records in once they are entered. Make sure you have

saved the most recent changes to the questionnaire, then click MakeMakeMakeMake Data File > Data File > Data File > Data File >

Make Data FileMake Data FileMake Data FileMake Data File to display the following dialog:

The .QES file is the file holding the questionnaire details (.QES is the extension that

EpiData uses for all questionnaire files). The currently active file should be

displayed in this box. The data file will be created according to the details shown in

the lower box, and the default setting is the same name as the questionnaire file,

but with a .REC extension, which is the extension used by EpiData for data record

files. If the settings in the dialog look appropriate, click OK.

You’ll then be presented with another dialog asking you to give a label to the data

file:

Give the data file an appropriate label (e.g. codename or incident number of the

outbreak).

You should then get a message saying that the data file has been created. EpiData

has also closed the original questionnaire, so we now have a blank screen. Click the

ChecksChecksChecksChecks button, which will then ask you to open a data file – select the file we just

created above.

EpiData will then open up the new data form – unsurprisingly this has a similar

appearance to that of the Preview Data Form. However, we are currently in

‘Add/revise checks’ mode, so it isn’t possible to enter data. You’ll also notice that


12

the Check file dialog has appeared. The status bar of the dialog shows the name of

the check file that check details will be stored in. This will have the same name as

the REC file, but with a .CHK extension.

The dialog shows the current field that checks

may be added to. Some of the basic check

settings are then shown in the lower part of the

dialog. We’ll briefly summarise all these options

before looking at some specific examples.

Range,Range,Range,Range, Legal Legal Legal Legal allows you to restrict the range of

values that can be entered in a particular field. A

range is defined by typing the minimum value and

the maximum value separated by a hyphen.

Typing 2-5 defines that only the numbers 2,3,4 or

5 can be entered in the current field. If only a

maximum value is wanted then use -INF (minus

infinity) as the minimum value. If only a minimum

value is wanted then use INF (infinity) as the

maximum value. Typing -INF-5 defines all numbers less than or equal to 5 as legal

entries in the current field. Typing 0-INF defines all positive numbers as legal

entries. Legal values are defined by typing all the accepted values separated by

spaces or commas. Typing 4,6,8,10 defines that only the numbers 4,6,8 or 10 can

be entered in the current field.

JumpsJumpsJumpsJumps are available to help data entry flow. As an example, if a person has

indicated that they suffered illness, data entry is likely to cover the symptoms that

they suffered. If they weren’t ill, it makes more sense for data entry to skip past

those fields and onto the next section. Jumps are entered by specifying the value,

entering a greater-than-sign (>) and specifying the name of the field to jump to. For

example, in relation to the ILL field, entering Y>ONSETDATE,N>SAUSROLLS would set

up the necessary jumps as suggested in the example. We’ll look at exactly how we

set this up a little later (since we might also want to automatically set the Case

Definition Met field to No if the person has not been ill). It’s also possible to use

AUTOJUMP followed by the fieldname to make the skip take place regardless of the

value inserted in the field.

Must enterMust enterMust enterMust enter is quite simple – the Yes/No value defines whether or not a value must

be entered for the question. There will be some fields for which this is useful (e.g.

basic personal details like name, perhaps date of birth and address as well,

together with key data points such as whether they were ill and/or meet the case

definition). There may be other fields where being able to leave the field blank is

useful – for example if someone does not remember whether or not they ate Crisps,

it is better that the field is left blank rather than assuming a ‘No’ answer, which

could distort results.

RepeatRepeatRepeatRepeat – if Yes is entered in this rule then the data entered in the previous record

will be repeated in the next new record. Repeated data can be changed during data

entry. This function can save a lot of typing if your forms contain data that changes


13

only rarely in a particular batch of forms (e.g. reporting forms in a surveillance

system). It is probably of less use in an outbreak situation.

Value labelsValue labelsValue labelsValue labels are a set of values combined with text items that explain the meaning

of each value. For example, a field is created to enter information on the sex of the

informant. It is decided that a value of 1 in the field means that the informant is

male and that a value of 2 means the informant is female. If a value label is defined

then a ‘translation table’ can be shown during data entry if the user presses [F9] (or

the [+] key on the numeric keypad). The value labels in this example would be:

1 Male 2 Female

It is important not to confuse value labels with ranges/legal values – although both

place restrictions on the data that can be entered into the field. Decide what you

want and select the appropriate option – you may not want to have to go to the

trouble of setting value labels if a simple range is all that’s required, and of course

in some situations value labels aren’t relevant.

These are the basic checks that can be attached to a field through the check file

dialog. In addition, you can click Edit to open the check file editor for the current

field and enter check code manually. This is useful for calculating field values based

on what has already been entered, and for more complicated checks (which are

largely outside the remit of this training but covered in detail in the EpiData help

files).

Let’s work through some examples.

Select the ONSETTIME field on the form (or from the dropdown list in the check

dialog). Then type 00.00-23.59 into the Range, LegalRange, LegalRange, LegalRange, Legal box on the dialog. This sets

the range to those relevant to a time setting. Let’s see how this actually appears in

the check code itself – click the Edit button. This brings up another screen showing

the actual check code relevant to this field.

We can see that the code

starts with the fieldname, and

then the code for the range is

included. The word END

indicates the end of the checks

for this field. As we’ll go on to

see, one field can have several

different types of checks in the

code. Click Cancel to close this

dialog without making any

further changes.

Now select the GENDER field. This time we’ll create Value labels to restrict the

options for entry (and to give a guide to data entry staff). Select the Value label

dropdown list – you’ll see that there are some predefined value labels, including

one for ‘sex’. However, the predefined ‘sex’ labels are based on entering a single


14

digit number, and our gender field is an uppercase text field. So let’s instead create

our own value label list. Select the [none] option from the dropdown list, and then

click the + button next to the list.

The edit checks screen appears, with the following text showing:

LABEL Label_GENDER END

We then enter the legal values and relevant labels as follows:

LABEL Label_GENDER M Male F Female U Unknown END

The indenting of the text setting the labels is optional, but makes the code easier to

read. Click Accept and CloseAccept and CloseAccept and CloseAccept and Close to close the window. The value label list now shows

label_gender. This label list can be re-used for other fields if desired – perhaps not

so useful in the case of gender, but if for example you wanted to record details of

portion size in relation to each food consumed, you could define one list of value

labels (e.g. small, medium, large) and apply that to each portion size field.

There are a few fields that we would like to be entered for every questionnaire – for

example, if we do not know if the person was ill or meets the case definition, it is

difficult to draw any conclusions from any other information they have given us. So

we need to make sure that these fields are set to ‘must enter’. Select the ILL field

and change the Must enter option to Yes. Repeat this process for the CASEDEFMET,

ATTENDFUN and ATTENDCLUB fields. Initially it can be tempting to set this option for

most of the fields, but not all fields will be relevant to all interviewees (e.g.

ONSETDATE is only relevant for those who have been ill) and the blank field option

(indicating missing or unknown data) can be important in relation to exposures.

So far we’ve covered the use of ranges, value labels and must enter checks. Next,

let’s consider the use of jumps. Previously we considered that this might be useful

in controlling data entry flow after the Were you ill? question. Select the ILL field.

There are two options for the contents of the field after entry – Y or N. (By setting the

Must enter property to Yes, the ‘empty’ option is not available). Type the following

into the Jumps option:

Y>ONSETDATE

This sets the flow so that the next field selected after a Y is entered will be

ONSETDATE.

Now we need to enter the details for the N option – add a comma after the text that

is already there, then type:

N>


15

Instead of typing the name of the appropriate field to jump to, you can also select it

on the screen using the mouse – do this now by clicking on the Sausage rolls

field. EpiData automatically inserts the relevant fieldname (SAUSROLLS) into the

Jumps option.

Before we move on, let’s look at how

this code looks in the editor. Click the

Edit button to bring up the check code

editor. We can see that the JUMPS

options are laid out line by line, and

MUSTENTER follows it, showing how

more than one check can be included

in relation to one particular field. Even

so, all this was created just by using the

dialog box.

Now lets look at calculating one field based on the information entered into one or

more other fields. As an example, we’ll calculate the age of the respondent at the

time of the event, based on the date of the event (a value we will provide in the

code). Since this is a calculation that will be run once the Date of Birth details have

been entered, we actually need to put the relevant code into that field, so select the

DOB field. We can’t create the calculation using the dialog options, so instead click

Edit to bring up the code editor.

The first thing to do is tell EpiData that we want the commands to run after data

entry into the DOB field has finished. We do this using the AFTER ENTRY … END code

block, as follows:

DOB AFTER ENTRY END END

Now we can enter the actual code to do the calculation, in between AFTER ENTRY

and the first END command. First, we need to define the date of the event:

DEFINE dateofevent <dd/mm/yyyy> dateofevent = “14/08/1998”

This defines a temporary ‘variable’ that holds the date of the event – doing things

this way makes the final formula easier to understand. Now we add the actual

calculation that assigns the age of the person to the AGE field.

AGE = trunc(int(dateofevent - DOB)/365.25)

This might seem a little complicated, but by taking it apart it is easier to understand:

1. First we take the difference in days between the event date and DOB: dateofevent - DOB


16

2. We convert that difference (which EpiData is still treating as a ‘date’) into an

integer, using the int function: int(dateofevent - DOB)

3. Convert the result in days to number of years, by dividing by 365.25: int(dateofevent - DOB ) / 365.25

4. It’s likely that the result of this calculation isn’t going to be a round number, so

we use the trunc function to round the result down to the person’s age in

years: trunc(int(dateofevent - DOB) / 365.25)

5. Finally we assign the result of the calculation to the AGE data field: AGE = trunc(int(dateofevent - DOB) / 365.25)

One other thing to do – since we are calculating the AGE field, we don’t need the

data entry form to actually include that field, so we can skip it and go straight to the

Gender field. We can use the Jumps section of the check dialog to do this, so

Accept and Close the code edits that you have made for the calculation and return

to the check dialog. Because we want to jump straight to the Gender field

regardless of the value entered in the DOB field, we use the AUTOJUMP term, as

follows:

AUTOJUMP GENDER

That’s all the changes we need to make, so click Save and then Close on the check

dialog.

Hopefully this makes some sense, and you can follow how check code can be used

to calculate data for a particular field. If it doesn’t, or seems too complicated, don’t

worry too much. Knowing how to use the finer points of calculations and check code

is not essential to using EpiData for outbreak investigation – but it does open up

some of the power of the program in controlling data entry and consistency, and

saving time.

On the other hand, if this has piqued your interest in using check code for running

calculations and controlling data entry, much more information on how to do this

can be found in the EpiData help files. EpiData follows largely the same check code

rules as Epi Info 6 (the DOS version of Epi Info) so if you have access to old check

code programs used in Epi Info 6, they may still work in EpiData (perhaps with some

minor tweaks).

For now, we’ve done enough to create a questionnaire to investigate this outbreak,

with some basic checks and calculations in place to help data entry. In the next

section, we’ll look briefly at how we actually enter data into our EpiData data file.

Entering Data using EpiData

17


From the main EpiData screen, select the Enter DataEnter DataEnter DataEnter Data button (close any open forms

first if necessary). You’ll be asked to select a data file – choose the data (.REC) file

that you created earlier.

The data entry form that you are probably familiar with by now should appear. This

time you can enter data for real! Note also that the status bar at the bottom of the

screen has some additional buttons for navigating around records in the file.

The table below includes fictional data for three sample records – have a go at

entering them into the form. It’s good to get into the habit of using the EnterEnterEnterEnter and/or

TabTabTabTab keys to move between the fields, rather than clicking with the mouse – because

check code that has been set to run before or after entry of data into a particular

field will not be run if the mouse is used.

As you go through the data entry, note how the check code we included earlier is

working – inserting calculated results into the AGE field, jumping fields according to

the data that has been entered, requiring data to be entered into a particular field.

At the end of each record, you’ll be asked if you want to save the record to disk –

click Yes.

Field Record 1 Record 2 Record 3

Interview date 20/08/1998 21/08/1998 24/08/1998

Interviewer AGT DJG KJB

Surname Jones Dickens Jenkins

Forename Stephen Charles Hannah

Date of birth 24/10/1943 12/12/1922 13/10/1992

Gender M M F

Occupation Teacher Retired Schoolchild

House number 24 745

House name Ty Gwyn

Street name Gelligaer Street Mill Lane Newport Road

District Cathays Lisvane Rumney

Town Cardiff Cardiff Cardiff

Postcode CF24 4LA

Home tel. 029 2067 8765 029 2045 3234

Other tel. 07796 423659

Did you attend the

funeral?

Yes Yes Yes

Did you attend the

members club?

No No No

Were you ill? Yes No Yes

Onset date 15/08/1998 16/08/1998

Onset time 13.00 01.00


18

Duration (days) 3 4

Inc. period (hrs) 22 34

Diarrhoea Y Y

Vomiting Y N

Abdominal pain Y Y

Nausea Y N

Pyrexia N Y

Headache N Y

Other aches N N

Other symptoms Y N

Other symptoms

description Fainted

Case definition met? Y N Y

Sausage rolls Y N Y

Salmon sandwiches N Y N

Pickled onions N Y Y

Corned beef sandwiches N N N

Chicken nuggets Y N Y

Chicken rolls N Y Y

Chicken sandwiches N Y N

Ham sandwiches Y Y N

Ham rolls N N N

Egg rolls Y N Y

Egg sandwiches Y N Y

Pasties Y Y N

Crisps N N Y

Gateaux N Y Y

Cheese & biscuits Y N N

Comments Interview with mother

Basic data entry is as simple as that really.

The only other thing we’ll dwell on here is

navigating between existing records. You can use the additional buttons at the

bottom of the status bar, which show the current record, total number of records, as

well as having buttons for creating a new record and deleting the current record.

When you delete a record, the record is just ‘marked’ as deleted with the word DEL

in the status bar (and therefore isn’t included in any future analysis). However you

can ‘undelete’ the record by simply clicking the delete button again (or using the

option in the Goto menu).

Now that we’ve seen how questionnaires can be designed, and data entered using

EpiData, let’s look at how we analyse the data using the data analysis tools in Epi

Info for Windows.

Outbreak Investigation using Epi Info Analysis

19


For this section of the course, we’ll move to using Epi Info for Windows, and

specifically the data analysis elements of the software.1

Epi Info for Windows is based around the idea on working on projects, which are

actually based around the Microsoft Access file format. Epi Info for Windows

provides a full package for designing questionnaires, entering data and carrying out

analysis, but because EpiData’s questionnaire design tools are quicker and easier

to use, we made use of that software instead. So we need to import our EpiData

data file (which is stored in REC format, the same format used by Epi Info 6) into a

format that is usable in Epi Info for Windows.

The first thing we need to do is create an Epi Info project that we can import the

data into. We actually do this by starting the process of designing a new

questionnaire. Using the shortcut on the desktop, or via the Start menu, open the

main Epi Info for Windows menu screen. Click the Make View button to open the

Make/Edit View program. The program starts with a blank screen, so create a new

project by selecting File > NewFile > NewFile > NewFile > New. The Create or Open Project dialog appears:

Browse to the directory containing your working files and type in an appropriate file

name, then click Open (which will create the project file). You’ll then be asked to

name the new ‘View’ that you are creating (view is the term Epi Info uses for a

questionnaire design & data table) – as we don’t want to create a new

1 Elements of this section of the training have been taken from “Introduction to Epi Info for Windows” by Andrew G Dean, available at: http://www.epiinformatics.com/Resources.htm


20

questionnaire, just click Cancel. Then select File > ExitFile > ExitFile > ExitFile > Exit to return to the main Epi Info

menu.

Now we can import our data into the Epi Info Analysis program. To run the Analysis

element, click the ANALYZE DATA button on the main menu screen. The Analysis

program will then open.

Let’s take a quick tour of what the program shows. All the main analysis commands

are shown in the tree view on the left. Clicking on a command will bring up a dialog

that places the command in appropriate form in the program editor at the bottom of

the screen. Results appear as web pages in the Output window, a simplified version

of the Microsoft Internet Explorer browser.

READing data into Analysis

For this section of the training we’ll use some real (anonymous) outbreak data that

broadly matches the scenario we’ve been working with. Before we do that, though,

we’ll import the data we entered into the questionnaire we created earlier, just to

confirm that it all worked! The first thing we need to do is READ that data into

Analysis. The data is in the REC file format used by Epi Info v6 and EpiData.

Click on the READ command. A dialog box appears so that you can choose a

database and a view. Click the button called CHANGE PROJECT and then use the


21

dialog that pops up to find the project file you created a moment ago. Once you’ve

found the file, select it and click the Open button to return to the main READ dialog.

Since we created a brand new project, no ‘views’ or data tables appear in the list.

Now we need to import our EpiData REC file. On the Data Formats dropdown list,

select Epi6 (which is the same format as EpiData uses). Then, click the button with

three dots … next to the Data Source text box (which will now be empty). Select the

REC file we created earlier from the dialog, and select Open. You’ll now return to the

READ dialog, which should show something like the screenshot below:

Check that you’ve selected the right file, then click OK. Epi Info imports the data

table into Analysis. You’ll notice that some text will appear in the Analysis Output

window detailing the current “view” (Epi Info’s term for a data table), number of

records and the current date and time. You’ll also notice that a command appears

in the Program Editor – every time you carry out a task/function the relevant line of

code will appear in the editor.

The data has now been imported into the Epi Info project – so if you come back a t

afuture time to carry out more analysis, you can select the relevant “view” from

within the project, rather than having to import the data again. Of course if the data

has been updated or amended then you’ll want to import it again to work with the

most recent information.

We’ll carry out a couple of basic tasks with this data before we import some more

meaningful records.


22

LISTing basic case details

A common task in outbreak investigation is producing a simple case listing,

including for example name, gender, date of birth, case status, onset date, and so

on.

Click on LIST in the command tree. A dialog box will then appear. Initially, let’s go

with the default settings and produce a grid showing all the data, so just click OK. A

grid then appears over the top of the output window, with scrollbars etc, allowing

you to scroll through all the data currently selected. This is a bit overwhelming so we

need to change our parameters a little to limit the information that appears.

Click the LIST command again, but this time use the drop-down list at the top of the

dialog to select the following variables:

SURNAME FORENAME GENDER AGE CASEDEFMET ONSETDATE ONSETTIME INCUBATION DURATION

Note that the items appear in

the dialog in alphabetical order

as you select them – however

when the list is displayed the

items will appear from left to

right in the order you actually

select them. We can also

choose to have the listing

formatted in the main output

window, rather than the

temporary grid we got the first

time. To do this, select the ‘Web (HTML)’ option from the Display Mode list on the

right hand side of the dialog. When you’ve selected this, some extra options will

appear – the one that is most useful is ‘Fields Per Page’. Putting a zero into ‘Fields

Per Page’ ensures that all the data for one record appears on the same line

(otherwise the table can get split after 6 columns, which can be difficult to read).

Click OK to display the list. Epi Info displays ‘Missing’ to represent blank fields (such

as ONSETDATE for a person who did not suffer illness).

OK, so we’ve seen that we can import data created in EpiData for analysis in Epi

Info. Let’s now import some real outbreak data for some more meaningful analysis.

Click the Read command again, and repeat the process we followed before (except

that you shouldn’t need to Change Project this time). Select the OutbreakData.rec

file. If all has gone well, the output window should show a record count of 75

records.


23

For the rest of this section we’ll work with this data to carry out analysis that fits in

with our outbreak scenario.

Producing FREQuencies for various items of data

Generally it is useful to know what the ‘distribution’ of our study population is – for

example by age, location, gender, occupation, illness etc. These sort of descriptive

statistics are often the starting point for analysis of a set of data. Epi Info’s

Frequencies command allows us to do this. Click the Frequencies command in the

command tree to bring up the FREQ dialog. In the dialog box, use the ‘Frequency of’

dropdown list to select the following variables:

ILL SEX

PERIOD (meaning

incubation period)

Then click OK. After a

short wait, this produces

a frequency breakdown

for each of these data

variables in the Output

window. Scroll up and

down and note that

each table is accompanied by yellow bars to the right that indicate the frequencies.

Epi Info also estimates 95% confidence intervals for each row in the frequency

table, although for most variables these are likely to be of limited use (particularly in

outbreak investigation). If you want to turn off these statistics, click the Settings

button in the FREQ dialog, and ensure the Statistics option is set to None.

For numeric (i.e. measured on a continuous scale) variables like PERIOD, we might

also wish to identify some other statistics, such as mean and median values. Epi

Info’s Means command can provide this data. Click the Means command in the

command tree to bring up the dialog – select PERIOD from the ‘Means of’ dropdown

list, and click OK. This produces a similar table to the one we got with the

Frequencies command, but at the bottom additional statistics are provided

including number of observations, mean, median, mode, standard deviation, and

quartiles.

(As an aside, putting two or more numeric variables into the MEANS dialog means

that Epi Info also runs a test to check for the statistical significance between the

variables. This isn’t usually of use in an outbreak investigation, where we are more

interested in the Relative Risk and/or Odds Ratio for various exposures, but can be

of value in other types of epidemiological investigation, such as evaluation of a trial

or intervention, comparing a particular measurement before and after the

intervention).


24

SELECTing particular groups of records

Sometimes you don’t want to run a particular analysis on the entire dataset –

perhaps you only want to know about those people who were ill. In Epi Info Analysis,

the SELECT statement limits subsequent analysis to particular records based on

criteria that you specify. Click the Select command to bring up the SELECT dialog.

The Available Variables dropdown list includes all the available variables in the

dataset. We’ll select only those cases that meet the case definition. Select the

CASEDEF variable, then click the = and “Yes” buttons to put together the full

statement – note that Epi Info uses the notation (+) to denote Yes (it also uses (-)

for No and (.) for Missing (blank) data. Alternatively, you can just type the

statement directly into the dialog. It’s possible to construct more complex

conditional statements – e.g. requiring more than one condition to be met using

AND, or one of a series of conditions to be met using OR.

Click OK when you are finished, and you’ll be returned to the main Analysis screen.

The Output window has updated to show the current statement(s) that are applying,

and the number of records now selected (note that this has dropped from 75 to 56).

Run a FREQ CASEDEF command to confirm that only cases have been selected.


25

If we now run another SELECT command, that will be processed only on the 56

records that we are currently working with (this provides another way for combining

several statements to select a particular subgroup of the data). Let’s try this by

clicking the SELECT command again. This time, we’ll select all those aged 50 or

over. Select AGE from the dropdown list of variables, then either type or use the

buttons to produce the statement AGE>=50. Click OK, and see how the Output

window has updated to reflect the second SELECT statement. Run a case LISTing of

AGE and CASEDEF to see the selected records.

To get back to the base dataset of all records, simply click the CANCEL SELECT

command, and click OK in the resulting dialog.

You’ll notice that there are also options in the same section of the command tree

for SORTing data. The dialog is self-explanatory so we won’t dwell on it here – like

the SELECT command, the SORT on the data will apply to all future analysis until a

CANCEL SORT command is issued. Before moving on, cancel any current SELECT or

SORT commands so that you are working with the base dataset again.

Recoding data

We saw earlier that, as you might expect, the individuals in our dataset are spread

across a wide range of ages. This makes it difficult to get an accurate, easily

understood picture from Frequencies or Means results about how the population is

distributed. We can overcome this problem by recoding the age data into a series of

age groups.

To do this, we first need to define a new variable that will hold the age group

information for each case. Click the Define command to bring up the DEFINE dialog.

Defining a Standard variable effectively creates an additional field in our dataset,

into which we can then insert data. (Global and Permanent variables are more

commonly used in more complex Epi Info Analysis programs – if you want to know

more about them, read the information in the Epi Info help file). Define the Variable

Name as AGEGROUP, and click OK.

Now that we’ve defined the AGEGROUP variable, we can recode the age data into that

variable. Click the Recode command to show the RECODE dialog.


26

We want to recode from the base data in the AGE field, so select that in the ‘From’

dropdown list. Our target field for the recoded data is AGEGROUP, so put that into the

‘To’ list.

The dialog provides two ways to

recode data: entering the options by

hand into the grid shown, or by

clicking the Fill Ranges button,

which is what we’ll do in this case.

You’ll see that the dialog changes to

a simpler format, asking for values

for Start, End and By. Start and End

are self-explanatory – the start and

finish points for the recoding. The By

values determines the size or

interval for each group. Enter the

range as follows: Start=10, End=90,

By=10, then click OK. This returns

you to the original dialog, which has

now had the grid filled with recoding

values based on our selections. Notice that the terms LOVALUE and HIVALUE are

used to denote the minimum and maximum values and these rows in the grid are

automatically created for us (which is why we set 10 as the start point, not zero).

Click OK to process the recode.

Now run a FREQuencies command on the AGEGROUP variable. Select the CASEDEF

variable in the ‘Stratify by’ dropdown list to produce two separate tables: one for

cases and the other for non-cases. Using the recoded variable gives a much more

concise description of the age distribution in the population.

One final note: Epi Info may give an error if you try and recode with more than 10

separate recoding options, so it’s best to limit the range of ‘rows’ in the recoding

grid to a maximum of 10 options.

Using the Program editor

Because the AGEGROUP variable we defined is only a ‘standard’ variable, the data

will be lost when we exit Epi Info Analysis. Rather than having to go through the

whole process again (not too difficult in this case, but potentially irritating if the

recoding groups have been more carefully hand-crafted), it would be useful to have

a way of saving the commands that we produced so that we can issue them again

in future. Fortunately, Epi Info Analysis allows us to do this, using the Program

Editor.

You’ve probably already noticed that every time you run a command through a

dialog, text is added to the Program Editor window. Let’s look a bit more closely at it.

Scroll through the text and you’ll see a range of commands, starting with the

READing of the original data, the case LISTing, FREQuencies and MEANS operations,


27

SELECTing and SORTing of records. At the bottom of the text will be the most recent

commands, including DEFINE AGEGROUP and the RECODE commands.

The Program Editor allows us to save this output into the project as an Epi Info

Program. We could save this entire output as a program in Epi Info, and then

running it again at another date would process all the commands listed one after

another. That’s probably overkill, though, and would take a significant amount of

processing time to come up with the results. More usefully, we could save the

commands relevant to the recoding process. Go through the text in the editor and

delete everything that appears before the DEFINE AGEGROUP statement. Then delete

any text after the END statement at the end of the RECODE code block. The Program

Editor should now contain text that looks something like this:

DEFINE AGEGROUP RECODE AGE TO AGEGROUP LOVALUE - 10 = "<=10" 10 - 20 = ">10 - 20" 20 - 30 = ">20 - 30" 30 - 40 = ">30 - 40" 40 - 50 = ">40 - 50" 50 - 60 = ">50 - 60" 60 - 70 = ">60 - 70" 70 - 80 = ">70 - 80" 80 - 90 = ">80 - 90" 90 - HIVALUE = ">90" END

Click the Save button in the Program Editor – the Save Program dialog will appear:

The Project File field

contains the details of the

current project, and can be

left as it is. Type a name for

your program into the

Program field, put your

name in the Author field

and perhaps include a brief

comment as well. The Date

fields are automatically

filled by Epi Info and are not

editable. When you’re

happy, click OK to save the

program. (Note that the

Text File option also exists

to save the program to a

separate text file – useful if you want to maintain a repository of useful programs to

import into any Epi Info project.)

Now let’s see how this has worked. Exit the Analysis program (there’s a button at

the top left) and return to the main Epi Info menu. Click the Analyze Data button to

reopen a new instance of Analysis. You’ll need to READ in the outbreak data, so do

that (changing project if necessary), selecting the viewOutbreakData option.


28

Once the data has been read into Analysis, click the Open button on the Program

Editor. The Read Program dialog will appear – this looks very similar to the Save

Program dialog. Click the Program dropdown list and select the program you stored

earlier – the author, date and comment details will then appear. Click OK to read

the program into the Program Editor. The program hasn’t yet been run, so click the

Run button (note that the Run Command button runs only the command currently

containing the cursor). Since all the commands in our program don’t actually

produce any output of their own, we need to run FREQ AGEGROUP to see if things

have worked. You should get the same age group breakdown that we saw a little

earlier.

Accessing previous results & controlling file storage

By now, you might be wondering where all this Analysis output is being stored. Epi

Info Analysis stores output as HTML files (web pages). By default, these files are

stored in the same folder as the project file, but there are several options available

to customise this in the ‘Output’ section towards the bottom of the command tree.

Epi Info also provides a handy index of all the output that you’ve produced in work

on the project. First, click on CLOSEOUT to close the output file you’ve just been

working on, and then click on the hyperlink called RESULTS LIBRARY at the top of

the output in the browser (you might need to scroll up). An index page appears,

showing previous commands that have produced output files. Click on any of the

entries to display it.

There are a wide range of options for customising storage of data, mostly accessed

via the Storing Output command. The most useful of these is the ability to set the

‘Results Folder’ where output files are stored – perhaps a new subfolder inside your

main project directory. Other settings for archiving data are also available, but get

more involved – refer to the Epi Info help file for more details.

Despite all this, there might come a time when you want to define a specific file in

which to store a particular set of output. In the next section, we’ll be producing

some graphs, so now we’ll create a file specifically to store the graph output in. This

is does using the Routeout command, which pops up a simple dialog asking for an

output filename.

Click the … button to bring

up the file browser dialog –

enter a file name (like

Graphs) and click the Open

button, then click OK in the

main ROUTEOUT dialog.

Any future output will be directed to this file, until a CLOSEOUT command is issued

(when Epi Info will start issuing output files in the default location again.)


29

Producing simple GRAPHs

A common task in outbreak investigation is to plot an epidemic curve showing the

order and frequency of onset date. Epi Info Analysis has a Graph command that

helps us do this.

Click this command to bring up the GRAPH dialog.

First of all, let’s run a simple bar chart based on ONSETDATE. At the top left of the

dialog is the Graph Type dropdown list. The default setting is BAR, so leave that

alone for now. Below that, select ONSETDATE as the Main Variable for the x-axis. For

the y-axis, we want the Count of the main variable, so this is OK as it is. Click OK.

This displays the graph in a separate ‘Epi Graph’ window, and we can see that the

onset date information is displayed in the graph. This window allows further

customisation of the graph if desired – for now we’ll just accept what we’re given

and return to Epi Info by selecting File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit. The graph is then displayed in

the output window, and is also saved as a separate JPEG image file in the results

output directory.


30

Unfortunately, this graph isn’t terribly helpful – because the incubation period is

relatively short, separating the cases in hours would be more useful. Fortunately,

the outbreak investigators in this case included a field for incubation period in their

data collection, measured in hours. Previously we ran a MEANS command on the

PERIOD variable, so we know that there is a wide range of values from 5 to 46 hours

– and around 30 different values in each case. Producing a Bar chart in this

instance may not be terribly helpful, since we’ll get the same wide range of values –

because a bar chart will produce a bar for each individual value represented in the

data set. But because we’re using a numerical (continuous) field, we can run a

Histogram graph, which gives us more control over the output. Click the Graph

command again, and this time select HISTOGRAM from the Graph Type dropdown

list.

Put PERIOD into the x-axis Main Variable list. Notice also the Interval boxes below.

This is where you can set the intervals that the Histogram bars will be grouped in –

select an appropriate value like 6 hours. The 1st Value box can also be set – the

default is Auto which will select an appropriate value. However, you can adjust this

yourself if you want to force the x-axis to start at 0 or another fixed point – which

makes sense in this case, so set the value to zero. Accept the other settings and

click OK.

The way that the graph appears suggests a point source outbreak, but there is also

a second, smaller ‘peak’ at around 36 hours. How could this be explained?

One possible explanation could be that these are ‘secondary’ cases, who did not

attend the buffet, but consumed the food in the club bar later in the evening, or

contracted the illness through person-to-person contact with cases. We can run the


31

graph again, and this time separate the data into two ‘series’, based on whether or

not the case attended the funeral itself. Open the GRAPH dialog again, select

HISTOGRAM as the graph type, PERIOD as the main variable, and an interval of 6

hours, with a 1st value of zero. We still want a count of cases on the y-axis, but we

also want to display bars for the series – so select the variable ATTEND in the ‘Bar

for each value of …” dropdown list. It is possible to set the title, but we can do this

at the customisation stage, so we’ll look at that in a moment.

Click OK to produce the graph in Epi Graph. Now we can see that there are two

separate sets of bars for those who did and did not attend the funeral.

Nevertheless, by customising the graph we can make it easier to see exactly what’s

going on. In Epi Graph, select

View > Customization …View > Customization …View > Customization …View > Customization …

The Customization dialog allows

us to modify many of the factors

relating to the design and

presentation of the graph. On the

General tab, we can add or

modify the Title and Subtitle, set

font sizes, whether the graph

should appear in colour or

monochrome, apply gridlines, and

determine whether a data table is

produced to go with the graph.

Add a title and subtitle for the

graph if you wish – you can leave most of the other settings as they are unless you

particularly want to add grid lines for the y-axis to make it easier to read across the

counts.

Now select the Plot tab in the

dialog. The Plot Style list in the

centre of this tab gives a range of

options for displaying the graph

series. Currently the separate

bars mean that it is hard to pick

out the actual epidemic curves.

We can change the display to

make it easier to see what is

happening – select Area from the

Plot Style list (which displays the

series as filled areas).

To see what difference a change

makes, you can click the Apply button, which applies the customisations to the

graph without closing the dialog. It might also help to make the series show in 3D,

so select that option from the bottom left. The other tabs can be left as they are


32

unless you would like to change the Font, Color or Styles used for the display –

these tabs are self-explanatory.

Access to many of these settings can also be obtained by right-clicking on the graph

and selecting from the pop-up menu options that appear.

Once you’re happy with the settings, click OK in the dialog to have a proper look at

your graph. It’s now easier to see what’s been happening. When you’re finished you

can select File > Save & ExitFile > Save & ExitFile > Save & ExitFile > Save & Exit to return to the Analysis window. (You can also select

File > ExportFile > ExportFile > ExportFile > Export to export the graph as an image to the clipboard, direct to a printer, or

a file location of your choice).

An example of the type of results that are achievable using Epi Graph is shown

below (you would probably show the graph in a larger format for a full outbreak

report).

0.0

5.0

10.0

0 6 12 18 24 30 36 42 48

Epidemic CurveIncubation period since buffet (hours)

COUNT

PERIOD

ATTEND=Yes ATTEND=No

We’ve finished working with the graphs now, so click the Closeout command to stop

routing output to the Graphs file we created earlier.

Producing 2x2 tables to assess relevance of exposures

The chief purpose of conducting an epidemiological study in outbreak investigation

is to try to identify statistically significant associations between certain exposures

and illness. This is normally done through the construction of ‘2x2’ tables,

comparing the rates of illness in those exposed to a particular risk factor to the rate

in those who were not exposed.

Epi Info’s Tables command allows us to produce these tables, together with

relevant statistics. Let’s investigate the relevance of exposures in our outbreak

example. We’ll start by looking at consumption of ham sandwiches. Click the Tables

command to bring up the TABLES dialog.


33

Select the variable HAM in the Exposure Variable list. Our outcome variable for all

this analysis will be CASEDEF, representing whether or not an individual meets the

case definition (the alternative would be ILL, but there may be some people who

have reported illness but do not meet the case definition). Accept all the other

default settings, and click OK.

The 2x2 table will be displayed in the output window, followed by a range of

different statistics. Presentation of the 2x2 table is fairly obvious, but let’s spend a

little time considering the various statistics. The statistics produced from the above

analysis are shown below.

Single Table AnalysisSingle Table AnalysisSingle Table AnalysisSingle Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Odds-based Odds Ratio (cross product) 0.6771 0.2368 1.9365 (T) Odds Ratio (MLE) 0.6806 0.2286 1.9691 (M) 0.2038 2.1855 (F) PARAMETERS: Risk-based Risk Ratio (RR) 0.9065 0.6960 1.1807 (T) Risk Difference (RD%) -7.3257 -26.9187 12.2672 (T) (T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square – uncorrected 0.5319 0.4658206779 Chi square - Mantel-Haenszel 0.5248 0.4688093873 Chi square - corrected (Yates) 0.2151 0.6428085146 Mid-p exact 0.2409736241 Fisher exact 0.3219542501

In a case-control study, we would be interested in the odds ratio (cross product),

and risk-based measures should not be used. In a cohort study, we are able to use

risk-based parameters as the total population exposed is a known quantity – so the

Risk Ratio (also known as the Relative Risk) is the most useful basic measure of

risk. In this case the RR is 0.9065, which suggests very little difference in outcome

based on this exposure (strict interpretation: those eating ham sandwiches were


34

0.9065 times as likely to be cases as those who did not). Since the 95% Confidence

Interval includes the ‘no difference’ value of 1, we know that this difference is not

statistically significant at the 95% confidence level.

The statistics provided also include chi-square test results in the form of p-values,

achieved by a number of different statistical procedures. In general, where a

reasonably large dataset/sample size has been used, there is unlikely to be an

important difference between these procedures. However, where the differences

between the procedures are important (e.g. one identifies a statistically significant

result, but another does not), you should seek the advice of an epidemiologist or

statistician to assist with interpretation of the results.

This analysis indicates that consumption of ham sandwiches was not associated

with illness in this outbreak. However, other foodstuffs might be implicated in the

outbreak, and assessments of the strength of association between consumption

and illness for each of the menu items should be completed.

Appendix IV Appendix IV Appendix IV Appendix IV containscontainscontainscontains a a a a worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the worksheet containing an empty table for you to record the

results of this analysis.results of this analysis.results of this analysis.results of this analysis. A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the A completed version of the worksheet is included on the

course CDcourse CDcourse CDcourse CD----ROM.ROM.ROM.ROM.

Using Analysis with routine COSURV data

35


So far we’ve concentrated on the outbreak scenario, but there are also times when

we want to carry out on routinely collected surveillance data, particularly the

information held on Cosurv. Cosurv has the ability to export data into the REC

format readable by Epi Info for Windows. This short section looks at how we export

that data out of Cosurv, and some common analysis tasks that we might wish to

perform.

NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e. NB This process will overwrite any existing/previous exports that you have done (i.e.

the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC the file EpiXport.REC –––– read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you ha read on for details) so please make sure that you have taken ve taken ve taken ve taken

a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.a copy of any previous exports if the data is still required.

Exporting data from Cosurv

Open Cosurv, and enter your username and password. From the Cosurv main

screen, select Export > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCIIExport > EpiInfo & ASCII. The Export dialog will then appear:

Select the dates for the relevant period you want to export to – say, all of 2006 –

then click the Get Records button. The list of records to be exported will appear in a

‘spreadsheet’ format in the main part of the dialog. You can use the scrollbars to

move around the list and explore the data that will be exported. One word of

warning: if a specific record has been selected in that dialog, and then the Export

button is clicked, only that record will be exported (but potentially hundreds of

times, depending on the size of the dataset). So if you have unwittingly clicked on a

specific record, try clicking off the dialog somewhere to deselect it. This may be a

bug in Cosurv, or it may be intended behaviour – either way, it probably isn’t what

you want to happen.


36

If you’re happy with the list that’s shown, click the Export button. Cosurv will process

the export – you may see a DOS program window pop up briefly, as Cosurv actually

uses one of the old Epi Info 6 DOS utilities to carry out the export. Once the export

has been processed, you’ll be returned to the main Cosurv screen.

The exported data has been placed into an Epi Info REC file named EpiXport.REC

(NB the data has also been exported in ASCII delimited text format in a file named

csExport.txt – a format suitable for import into spreadsheet software such as Excel).

But where is it? The location of this file depends on your current Cosurv settings.

You can find these out from the main Cosurv screen by selecting System > Setup > System > Setup > System > Setup > System > Setup >

DistrictDistrictDistrictDistrict, which brings up the District settings dialog.

The final box on the General tab shows the location for Epi Info export files. When

you’ve done an export, it’s strongly recommended that you move or rename this file

appropriately to stop it being overwritten by future exports.

The data that you’re provided with for this training has been anonymised, with key

patient-identifying data (names, address, telephone numbers, employer details, id

numbers etc removed). Obviously when working with live data these will be

available, and this is something to bear in mind since it is important that data

protection and confidentiality guidelines are followed. The Cosurv data system itself

is encrypted, i.e. data can only be accessed through the password-protected Cosurv

database itself – anyone trying to access the datafiles via other methods will just

read encrypted gobbledegook. However, data exported in Epi Info (or ASCII text)

format is not encrypted, and can be read by anyone with a text editor or

spreadsheet. Therefore, you need to keep careful control over how exported data is

stored and manipulated.


37

Importing data into Epi Info

Since the data is now in the REC file format, importing the data into an Epi Info

project follows the same process as discussed earlier in relation to the outbreak

data. In reality, you might want to create a separate new Epi Info project for this

routine data, but for the purposes of this training, we’ll use the same project that

we’ve been working with so far. So use the READ command in Epi Info Analysis,

select Epi6 from the Data Formats dropdown list, and then browse to select the

Cosurv2005.REC data file. Click OK to import the data into Epi Info Analysis.

Common analysis tasks for routine surveillance data

Let’s briefly look at some of the common analysis tasks that we might want to

complete with the Cosurv data. Of course, Cosurv itself has its own reporting

mechanisms through the District Reports section, but these apply only to certain

predetermined formats – using Epi Info allows us full control over the data and its

presentation. The data supplied is the anonymised surveillance reports held by

Cardiff Council for the calendar year 2005.

First of all, let’s generate some basic frequencies. Click the Frequencies command

and include the following fields in the dialog:

SEX Gender of the case

LOCALITY Locality (in Cardiff, we use electoral ward)

DISEASE Disease (as used in NOIDS return)

TRANSMISS Suspected mode of transmission

CONTRACTED Suspected source – home, commercial, unknown

The usual list of frequency tables will appear. You may or may not be surprised by

the results!

It might also be interesting to study the distribution of cases by age in the

population. Age data is stored in two ways in the Cosurv data system – the DOB

field (date of birth), and the combination of the fields AGEY (age in years) and AGEM

(months passed of current year) – so a case aged 18 months at the time of the data

entry would have AGEY=1, AGEM=6. We saw before that age has such a wide range

of values that it is often easier to recode age information into age groups for

analysis.

Let us reuse the program we created earlier in the outbreak analysis. Load the

program into the Program Editor using the Open button. We need to make one

minor modification – the variable we use for age is AGEY, whereas in the program it

refers to AGE – make any necessary modifications and then run the program. Now

run FREQ AGEGROUP to see the distribution of cases by age group (equally, we

could use this variable to construct a bar graph, or even a population pyramid by

stratifying by gender).


38

We’ll now focus a bit more on the data that specifically relates to cases of Food

Poisoning. To do this we need to select only those cases where DISEASE=”Food

Poisoning”. Click the SELECT command and pick the DISEASE variable from the

dropdown list, then type =”Food Poisoning” into the dialog after the variable

name. (Alternatively, you can use the DISCODE variable and type =22, which achieves

the same thing and is quicker and easier to finish – but of course you have to know

that the DISCODE for Food Poisoning is 22).

Now lets look at the various types of organism that have been associated with

cases of food poisoning – so run a Frequencies command for the ORGANISM field.

You’ll note that organisms such as Campylobacter are basically caught under one

field, but because most isolates of Salmonella are sent to HPA Colindale for further

typing, the full breakdown of species is available for these cases.

As a final exercise, let’s investigate seasonal variation in notification of food

poisoning cases. Select the Graph command, and choose the LINE graph type.

Select the WEEK field as the main variable for the x-axis (WEEK is the NOIDS week

during which the case was notified), and leave the other default settings as they

are, since what we are looking for is the count of cases in each week. The graph will

be produced (and as before, you can continue to tinker with the display settings,

titles etc). The line is quite jagged, because there is inevitably some variation from

week to week anyway – if you wanted to ‘flatten’ this out a bit, you could produce a

similar graph in HISTOGRAM type, with an INTERVAL of either 2 or 4 weeks to

aggregate the data a bit closer (fortnightly figures are quite useful).

What What What What factors factors factors factors might explain somight explain somight explain somight explain some of the variation noted?me of the variation noted?me of the variation noted?me of the variation noted?

This training guide has introduced the basic analysis commands available in Epi

Info Analysis, with particular reference to the common epidemiological tasks

undertaken in outbreak investigation. Epi Info Analysis is a very powerful tool, and

we have only scratched the surface of the various options available, not to mention

the advanced statistical analysis that is possible. If you want to know more, a good

starting point is the further information and resources detailed in Appendix III.

Other capabilities of EpiData and Epi Info

39


EpiData

EpiData includes a number of additional capabilities that have not been covered in

this training guide. Some of these functions are discussed briefly below – for more

information, consult the excellent documentation in the EpiData help file.

Revise an existing questionnaire/data file

In the training, we produced a new questionnaire and REC file from scratch, in one

session. In reality, you may well design a questionnaire, pilot it with a few

individuals, and then come back and amend the questionnaire based on the

outcome of the pilot. EpiData allows you to amend an existing QES and REC file

structure. Data already entered will be copied to the new format for fields with the

same name. You can add fields, change definition of fields or delete fields. Use the

function Revise File found in the Tools menu from the main screen (close all files

first). Full instructions on how to do this are in the EpiData help file.

WARNING: If you remove fields or change the names of the fields (e.g. by changing

the text to the left of the fields) you will lose data. Please check the new, revised

data file carefully. If something goes wrong, the original data file can be restored.

The original data file is saved with the name FILENAME.OLD.REC in the same

directory as the new data file.

This ability to revise existing structures also means that it is possible to create a

‘template’ questionnaire and data file (with check code if desired) for investigation

of outbreaks. Design the basic structure once (much as we did in the training),

including patient details, medical information on illness and symptoms, details of

contacts, and so on. Create the REC file and any check code that you want to

include. When a new outbreak occurs, you can use this template as a basis by

copying and pasting all the relevant files into a new folder (so that you don’t

overwrite your ‘master template’). Then all you need to do is revise the

questionnaire to include the specific exposures that are of interest in the particular

situation, update the REC file, and you have a new questionnaire and data file

structure ready to start working with. This sort of preparation work can be valuable

in saving time in the pressured situation of a major outbreak.

Creating encrypted fields

One of the options for text fields in EpiData is the ‘Encrypted’ field (denotes in a QES

file by the notation <E >, with the total length including spaces determining

the field size). Encrypted fields are a special kind of text fields. The contents of

encrypted fields are shown in readable form on the screen, but saved on disk with

encryption. Use encrypted fields to store data in a protected mode, e.g. personal

information. When a datafile is created and the QES-file contains one or more


40

encrypted fields, EpiData will prompt for a password that is saved in the datafile. A

datafile containing encrypted fields can only be opened if the proper password is

entered. There is no way of breaking the encryption, so do not forget the password

or the information in that field is lost.

Data file documentation

EpiData can automatically produce a file documenting the structure and checks in

place for a particular REC file. Select Document > File StructureDocument > File StructureDocument > File StructureDocument > File Structure and then choose

the REC file you want to document in the dialog box. EpiData then produces a text

file detailing the file structure, field types and checks in place. This can then be

saved (EpiData saves these files with the .NOT extension).

Data entry notes

During data entry in EpiData of a questionnaire it is often useful to make notes (e.g.

if a difficult to read word is written on a questionnaire). The Data Entry Notes

function can be used for making short notes either during data entry or when no

data file is open.

During data entry the notes can be accessed by pressing [F8]. If no note file exists

for the current data file, a new file will be created. Current time and date are

automatically inserted in the notes.

Logical consistency check

Consistency checks are a special type of check code – rather than being associated

with a particular field, they appear in a separate block (and can even be a separate

text file), and are run by selecting Document > Consistency CheckDocument > Consistency CheckDocument > Consistency CheckDocument > Consistency Check. It provides

’batch’ checking as opposed to the interactive (i.e. as data is entered) checking that

is also offered by check commands. A number of consistency checks may be

defined in the check file or another text file. The function will show a list of all

records in the data file that fail one of the specified consistency checks.

Consistency checks are particularly useful where some data has already been

entered before interactive checks were added to the data file, and so you want to

check the existing data for consistency as well as any future records that may be

added.

Double entry and validation

To ensure a high quality of data, often it is a good strategy to have two different

persons enter the same data. In EpiData this can be done in two different ways:

either by entering the same data in two separate data files, which later can be

compared or by entering in double entry mode where the new data immediately are

compared with the original data. Detailed instructions on these processes are

available in the EpiData help file.


41

Data backup and encryption

EpiData has a simple data backup option – select Export Data > Export Data > Export Data > Export Data > BackupBackupBackupBackup, choose a

data file to backup and select a destination directory. Press OK to begin the backup.

This function creates a copy of the selected data file, a .QES file of the same name,

a check file of the same name and a data entry notes file of the same name in the

selected destination directory.

As an alternative to a plain backup which simply copies the files, you can compact

files and complete directory structures with the archive command in the tools

menu. This will save files in a Zip formatted file (which can be handled by other

programs e.g. WinZip, Windows Explorer on Windows XP machines). As an option

these archives can be encrypted using “strong encryption” made with the

AES/Rijndael algorithm. There is no way to find or guess a forgotten password, so

make sure you keep a copy in a safe place.

EpiData Analysis

EpiData also comes with a separate Analysis module, which is based on the

Analysis module in Epi Info 6 for DOS. Many of the commands are the same as

those used in Epi Info for Windows. In the author’s opinion, the Epi Info Analysis is

easier to use and more powerful than EpiData Analysis, which is why that software

was used for this training. However, the EpiData Analysis module is a relatively

small file, and can be installed on a PC without administrator privileges – it also

does not require a particularly high-specification of PC hardware. The final

advantage is that both EpiData and EpiData Analysis can be installed and run from

a USB stick, since they do not rely on any specific Windows libraries. The EpiData

Analysis installation file is included on the course CD-ROM.

Epi Info for Windows

In addition to the questionnaire creation, data entry and Analysis modules, Epi Info

for Windows includes several other modules and functions that may be of interest.

Again, a summary is given below, with further information available in the Epi Info

help file (see also the resources listed in Appendix III).

Map creation

Clicking Create Maps on the main Epi Info window opens the Epi Map module. Epi

Map is designed to show data from Epi Info files by relating data fields to shapefiles

containing the geographic boundaries. Shapefiles are an industry standard for

Geographic Information Systems (GIS) used in the popular ArcView and ArcInfo

programs. Shapefiles of every country in the world can be downloaded from the Epi

Info website under “Maps”, and the Internet provides thousands of such files.

Shapefiles can be drawn or edited in either ArcView or Epi Info for emergency or

custom purposes. Shapefiles also can contain data on population or other

variables, and can therefore provide numeric data that become part of the display

either as numerator or denominator. Numeric data can be displayed either as


42

color/pattern maps or as dot density maps with the dots randomly distributed within

geographic regions.

The maps shown below were created using Epi Map. The first map shows the

percentage of births in which the mother was an adolescent in 1995 for each

Mexican state, whilst Map 2 shows the distribution of cholera cases (triangles)

around the Broad Street pump in Victorian Soho.


43

An excellent tutorial on using Epi Map to present geographical data in a visual

format can be found in Andrew Dean’s Introduction to Epi Info for Windows,

available at http://www.epiinformatics.com/Resources.htm.

StatCalc – a simple epidemiological calculator

StatCalc is an epidemiologic calculator the produces statistics from summary data

entered on the screen – particularly useful when you only have the summary data

available (and so the conventional tools in Epi Info Analysis are not suitable). Three

types of calculations are offered:

♦ Statistics for 2-by-2 and 2-by-n tables

♦ Sample Size Calculations

♦ Chi Square for Trend

You can access StatCalc from the main Epi Info menu, by selecting Utilities > Utilities > Utilities > Utilities >

StatCalcStatCalcStatCalcStatCalc.

Producing reports of analysis results

Clicking Create Reports opens the Epi Reports module. The Epi Report tool can be

used to design and generate various reports. An end user can include various

elements in the reports generated, and these elements can be bound to various

data sources. Record lists, cell replacement, groups, fragments from analysis XML

can be included in the report. Data analysis through pivot table is also provided.

Read the Epi Info help file for more details, and see also the CDC Epi Info Exercises

file, which is included on the CD-ROM with the installation files – one of the

chapters in this file deals with using Epi Reports.

Encrypting Epi Info Projects

Epi Lock provides encryption to control access to data files and compression to

facilitate both transmission and making backup copies. Using Epi Lock, files can be

encrypted with a password to reduce the risk of others being able to read the data.

The encryption mechanism also provides some assurance that the files have not

been altered in transmission, since they will not decrypt properly if altered. Epi Lock

also performs the function of properly deleting the unencrypted version (“clear

text”) version of the file after testing to make sure that the encrypted version can be

decrypted to produce the identical result. Access the Epi Lock functions by selecting

Utilities > Epi LockUtilities > Epi LockUtilities > Epi LockUtilities > Epi Lock from the main Epi Info menu.

Appendix I – Comparison of Epi Info & EpiData

44


This article was published in Eurosurveillance Weekly Release surveillance report

(vol 12, issue 1, 12 January 2007). The original can be accessed online at:

♦ http://www.eurosurveillance.org/ew/2007/070111.asp#4

A comparison of public domain software for outbreak

investigations: Epi Info and EpiData

G Kafatos ([email protected]), NQ Verlander, D Gelb, E Arnold, A Charlett

Statistics, Modelling and Bioinformatics Department, Health Protection Agency

Centre for Infections, London, UK

Since the early 1980s, there has been an increase in the number of outbreaks

investigated using analytical epidemiology since it became widely accepted that

microbiological investigation alone is often insufficient [1, 2, 3]. This increase has also

been encouraged by a growing availability of statistical tools within public health

and the establishment of international and national training programmes and

courses (such as the European Programme for Intervention Epidemiology Training).

In 1985, a team at the Centers for Disease Control and Prevention (CDC) in the

United States created a suite of public domain computer programs called Epi Info

(http://www.cdc.gov/epiinfo/), which were designed to support field

epidemiological investigations of outbreaks.

We compare the two most recent versions of Epi Info, and EpiData, a public domain

software based on Epi Info 6.04d.

Epi Info 6.04d

Since the release of its first version written for the DOS-environment in 1986, Epi

Info quickly became very popular amongst epidemiologists and other public health

professionals. Its popularity is demonstrated by the fact that the program is

available in 16 different languages. In 2001, the CDC released the final version of

Epi Info DOS (Epi Info 6.04d).

Epi Info DOS consists of a suite of programs, which provide all that is necessary to

support an outbreak investigation. It is a flexible, versatile tool that permits the

construction of a data entry system and an underlying database with great speed,

which is something greatly appreciated under the time pressures of an on-going

outbreak investigation.

In addition, Epi Info DOS contains an analysis module for elementary statistical

analysis that is very useful for outbreak investigations where the outcome is often

binary (case/control, ill/not ill). The ‘tables’ command provides all the necessary

measures of the strength of association, such as odds ratios (OR) and risk ratios

(RR), their confidence intervals, and hypothesis tests. It also allows stratified


45

analysis to remove the effects of up to three confounding factors. The ‘csample’

module enables cluster analysis and stratified designed surveys. Epi Info DOS was

complemented by a CDC public domain mapping software (Epi Map, CDC), allowing

area maps to be produced directly from Epi Info files.

Epi Info for Windows 3.3.2

Despite the unprecedented success of Epi Info DOS, by the late 1990s, the

widespread uptake of the Microsoft Windows operating system for personal

computers meant that a Windows version was needed. Epi Info 2000 (Epi Info 3) for

Windows was released in 2000. However, there are substantial differences

compared with Epi Info DOS.

The analysis capability of Epi Info DOS was carried forward to Epi Info 3 while

adding the ability to estimate proportional hazards models and integrating logistic

regression and mapping facilities into the package. The software uses components

from commercial software, which enhances its compatibility with ‘industry

standards’ such as Microsoft Office Access 2003. However, this is also the main

disadvantage, since it has resulted in the loss of some functions (such as copying

and pasting of fields in questionnaire creation and of check commands between

different fields) reducing the speed of usage, a key requisite for outbreak

investigations. Moreover, because the principle of simplicity and easy use which

underpinned the creation of Epi Info DOS was not carried forward to Epi Info 3,

those who have worked with Epi Info DOS will need training to become proficient

users of Epi Info 3.

EpiData 3.1 (120306)

EpiData was initially released in 2000 by a non-profit organisation in Denmark

(EpiData Association, http://www.epidata.dk/). The current version 3.1 became

available in 2006. This software was developed specifically to translate Epi Info

DOS into Windows software by using the Epi Info DOS file structures and command

language wherever possible.

Certain restrictions in file size were removed but those dependent upon file

structures remain. Construction of a data entry system using EpiData is very simple,

especially if the original questionnaire has been constructed in a Windows-based

word processing software where it can be either opened as a text file or copied

directly into EpiData.

For validating double-entered data (a common practice to ensure high quality data)

the ‘Validate Duplicate Files’ option in EpiData has built on the dual entry validation

capability within Epi Info DOS. Two additional options to either ‘Ignore text fields’ or

‘Ignore letter-case in text fields’ resolve the problems that occur when comparing

text fields (i.e. two identical text fields are considered different by the software if

one is written in capital and the other in lower-case letters).


46

EpiData Analysis (v1.1 Build 68)

When EpiData was originally released, one major disadvantage was its lack of

analytical statistical tools. This was addressed with the introduction of EpiData

Analysis in 2005. This provides the statistical methods available within the analysis

module of Epi Info DOS, but has greatly improved data manipulation facilities.

A wide range of statistical tools exists within Epi Data Analysis, including analysis of

variance, which provides a t-test for independent groups, chi-square and Fisher’s

exact tests of association, nonparametric tests of difference in location between

two groups, and estimates of measures of association with confidence intervals.

The graphical capability in EpiData Analysis is much better than Epi Info DOS, with

more options and flexibility. While the graphs are acceptable for descriptive

purposes, there will probably always be limitations in producing publication quality

graphs. A new capability within EpiData Analysis is the ability to produce statistical

process control (SPC) charts for performance monitoring applicable for surveillance

data.

Discussion

Two public domain software packages have been developed based on the classic

DOS version of Epi Info. There does seem to be a divergence in the development of

EpiData (Entry and Analysis) and Epi Info 3, which is surprising given their common

ancestor, Epi Info DOS. While EpiData has aimed to keep the functionality and

interface of Epi Info DOS available, Epi Info 3 appears to be developing into a more

general software to support epidemiologists rather than keeping with the original

concept of an efficient and relatively simple software for outbreak investigation.

It is difficult to decide which of the two development directions will best meet the

needs of epidemiologists over the next twenty years. It is possible that there is a

place for both. The DOS version of Epi Info was extremely well-structured and

efficient and, as EpiData has remained faithful to this concept, this could be seen

as its natural successor. With the requirement for more sophisticated analytical

methods to overcome issues such as confounding, it may be that Epi Info 3 will be

better placed. However, it is now moving into a very competitive arena where

commercial software has predominated. The authors would recommend EpiData

3.1 and EpiData Analysis v1.1 software packages for those involved in the

investigation of infectious disease outbreaks for their speed and simplicity of use in

a Windows environment.

Reviewed software:

Epi Info, version 6.04d for DOS and version 3.3.2 for Windows

Centers for Disease Control and Prevention (CDC), Atlanta, Georgia 30333.

Can be downloaded from www.cdc.gov/epiinfo/Epi6/ei6.htm (DOS) and

www.cdc.gov/epiinfo/ (Windows)

EpiData, version 3.1 (120306) and EpiData Analysis, version 1.1 Build 68


47

The EpiData Association, att. Jens Lauritsen, Enghavevej 34, DK5230 Odense

M, Denmark. Can be downloaded from www.epidata.dk/download.php

References:

1. Fonseca MGP & Armenian HK (1991) Use of the Case-Control Method in

Outbreak Investigations. American Journal of Epidemiology 133: 748-752

2. O'Brien SJ, Elson R, Gillespie IA, Adak GK, & Cowden JM (2002). Surveillance

of foodborne outbreaks of infectious intestinal disease in England and Wales

1992-1999: contributing to evidence-based food policy? Public Health 116:

75-80

3. Palmer SR (1989) Epidemiology in search of infectious diseases: methods in

outbreak investigation. Journal of Epidemiology and Community Health 43:

311-314

Appendix II – Contents of course CD-ROM

48

Appendix II – Contents of course CD-ROM

♦ This training guide (Adobe PDF document)

♦ Examples of files that might be produced by following this training:

◊ AGEGROUPS.PGM – sample analysis program file for recoding age into

agegroups

◊ Completed worksheet for 2x2 table results (Adobe PDF document) – see

Appendix IV

◊ Lead Officer.CHK – EpiData check code file

◊ Lead Officer.MDB – Epi Info project

◊ Lead Officer.QES – EpiData questionnaire file

◊ Lead Officer.REC – EpiData data file

♦ Installation and support files:

◊ EpiData Setup

◊ EpiData Analysis Setup

◊ Epi Info Setup

◊ CDC Epi Info Exercises (Word document)

◊ OpenEpi project (Zip archive file) – see Appendix III

Appendix III – Further information & resources

49


EpiData

EpiData homepage

http://www.epidata.dk/

EpiData email discussion list

http://lists.umanitoba.ca/mailman/listinfo/epidata-list

List intended for discussions, suggestions and help among users of EpiData.

EpiData documentation and technical info

http://www.epidata.dk/documentation.php

Extended manuals, field notes and technical information on use of EpiData, as

well as some general guidance on data management.

EpiData Wiki

http://www.epidata.org/wiki

Common entry for development of field guides, questions and solutions. All

users can add their suggestions and knowledge to the site. A couple of useful

field guides already exist, including one on the investigation of a food

poisoning outbreak.

Epi Info

Centers for Disease Control & Prevention: Epi Info homepage

http://www.cdc.gov/epiinfo

CDC Epi Info Tutorials

http://www.cdc.gov/epiinfo/tutorials.htm

Two tutorials designed and published by CDC – one based on a cholera

outbreak, and the other on a community health assessment.


50

Epi Informatics - Dr Andrew Dean

http://www.epiinformatics.com/

Site dedicated to the use of computers in epidemiology and public health.

Includes a brief online introduction to Epi Info, and links to download a more

complete introductory manual, together with two 15-hour courses (complete

with tutor manuals).

Professor Kevin Sullivan

http://www.sph.emory.edu/~cdckms/

Includes tutorials on Epi Info for Windows and Epi Info 6 for DOS, as well as

general epidemiology documents.

Google

http://www.google.co.uk/

The above sites are just a summary – there is a wide range of training

materials, courses etc. relating to Epi Info available globally – a simple Google

search on ‘Epi Info training’, ‘Epi Info exercises’, ‘Epi Info

tutorials’ etc. should give a good range of resources.

Epidemiology and Statistics

CDC General Training Resources

http://www.cdc.gov/descd/materials.html#foodborne

In addition to the cholera outbreak tutorial referred to above, three other

epidemiologic case studies that focus on real-life food borne disease outbreak

investigations. Not specifically based on Epi Info.

OpenEpi

http://www.openepi.com/

OpenEpi is a project to create open source software for Epidemiologic

statistics in JavaScript and HTML. OpenEpi can be run from a web site or

downloaded and run without a web connection. The OpenEpi files are also

included on the course CD-ROM for installation and use without the need for

internet access.

Epidemiology Supercourse

http://www.pitt.edu/~super1/index.htm

The Supercourse is a global repository of lectures on public health and

prevention targeting educators across the world. Supercourse has a network

of over 41300 scientists in 171 countries who are sharing a free library of

over 3162 lectures in 26 languages.


51

StatPages

http://www.statpages.org/

Index of web pages that together comprise a powerful, conveniently

accessible, multi-platform statistical software package. There are also links to

online statistics books, tutorials, downloadable software, and related

resources.

Rice Virtual Lab in Statistics

http://www.onlinestatbook.com/rvls.html

Online statistics book with simulations/demonstrations, case studies, and online

analysis tools.

Outbreak Investigations – A Perspective

http://www.cdc.gov/ncidod/EID/vol4no1/reingold.htm

Article from the peer-reviewed journal Emerging Infectious Diseases that

provides an excellent overview of the principles of outbreak investigation and

the issues involved.

Appendix IV – Worksheet for 2x2 table results

52

Appendix IV – Worksheet for 2x2 table results

Complete this table with the values from the TABLES output in Epi Info Analysis.

Which exposures demonstrate an association with illness?Which exposures demonstrate an association with illness?Which exposures demonstrate an association with illness?Which exposures demonstrate an association with illness?

Which associations aWhich associations aWhich associations aWhich associations are statistically significant?re statistically significant?re statistically significant?re statistically significant?

What further analysis might you wish to undertake?What further analysis might you wish to undertake?What further analysis might you wish to undertake?What further analysis might you wish to undertake?

The completed version of this worksheet is available on the course CD-ROM.

Exposure

Epi Info

variable

name

Relative

Risk (RR)

RR 95% Confidence Interval

Lower Upper

Chi square

2-tailed

p-value

Ham sandwiches HAM

Ham rolls HAMROLL

Sausage rolls SAUSROLLS

Pickled onions ONION

Corned beef sandwiches BEEF

Chicken nuggets NUGGETS

Chicken rolls CHICKROLL

Chicken sandwiches CHICKEN

Egg rolls EGGROLL

Egg sandwiches EGG

Salmon sandwiches SALMON

Cheese & biscuits CHEESE

Crisps CRISPS

Gateaux GATEAUX

Pasties PASTIE

Appendix V – Check code example

53


Calculating incubation period

It would often be useful to be able to calculate the incubation period to assist with

producing an epidemic curve. In the examples we followed in the analysis training,

this data was given to us – but if this was calculated ‘manually’, it introduces the

possibility of human calculation or data entry errors. Far easier to have the period

calculated for us at the time of data entry, based on the onset date and time

reported. We would also need to set an arbitrary ‘base date’ – which could be the

date of an implicated event, or just a specific date at some point before the first

case. To keep things simple we track the period from midnight on the ‘base date’

In the following examples it is assumed that the fields ONSETDATE and ONSETTIME

refer to the actual date and time reported by the case. The field INCUBATION is a

numeric field (say, two or three digits long with no decimal places) that will hold the

calculated values (whether in hours or days). You may need to modify the check

code examples to reflect the fieldnames that you have used.

Where the illness in question has a relatively long incubation period, likely to be

measured in days (as might be the case in e.g. typhoid, Cryptosporidium, Legionella

outbreaks), this is a simple matter of deducting the ‘base date’ from the ‘onset

date’ to obtain the period in days. The int function is used to convert the answer

into a numeric value to match the INCUBATION field). The check code for this would

appear in relation to the ONSETDATE field, and would read something like the

following:

AFTER ENTRY DEFINE basedate <dd/mm/yyyy> basedate = “14/08/1998” INCUBATION = int(ONSETDATE – basedate) END

The situation gets a little more complicated where a shorter incubation period is

suggested, such as might be the case with many ‘food poisoning’ organisms such

as Salmonella spp., Clostridium perfringens, Bacillus cereus etc. Since EpiData

stores date and time details in separate fields, a more complex calculation is

required. The check code would need to appear in relation to whichever of

ONSETDATE and ONSETTIME was last on the questionnaire (and therefore last to have

data entered). The calculation would look something like this:

AFTER ENTRY DEFINE basedate <dd/mm/yyyy> basedate = “14/08/1998” INCUBATION=(int(ONSETDATE-basedate)*24) + trunc(ONSETTIME) END

This code initially looks the same, in that the first part of the main calculation

determines the difference in days between the base date and onset date. This value

(in days) is then multiplied by 24 to convert it into hours. Finally, we include the


54

onset time element by simply adding the value in hours on (to keep things simple,

we limit the value to hours by truncating the time field to the hour, ignoring the

minutes).

Further tinkering with the time element of the calculation allows you to set a base

time to go with the base date, which could be of use where a specific event is

implicated. This involves defining a ‘base time’ using the 24-hour clock, and

deducting that number of hours from the calculation formula above (since that

formula is based on time passed since midnight, so any other base time will reduce

the incubation period accordingly). The check code would then look like this:

AFTER ENTRY DEFINE basedate <dd/mm/yyyy> basedate = “14/08/1998” DEFINE basetime ##.## basetime = 15.00 INCUBATION=((int(ONSETDATE-basedate)*24) + trunc(ONSETTIME)) –

trunc(basetime) END

Using EpiData & Epi Info for Windows1

Documents

Using EpiData & Epi Info for Windows1