Top Banner
Last updated: May, 2017 BCHM 6280 2017 Excel Tutorial Page 1 of 5 Tutorial 1: Using Excel to find unique values in a list It is not uncommon to have a list of data that contains redundant values. Genes with multiple transcript isoforms is one example. If you are only interested in the genes and not the different transcripts, then you will probably want to filter the list to remove the redundant values. I did a search of the UCSC human genome browser with the query “colon cancer” and got back >500 matches. I created a text file listing the first 500 matches. You can download this data from the Exercise 1 home page by clicking on the link ListofGenesfromUCSC.txt. The file has 2 columns: Gene Name and Chromosome Location. You will filter on Gene Name. Once you’ve downloaded the text file, do the following: Open Excel and from within Excel open the text document. If the file you want to open is greyed out, change the drop down menu to Enable: All Readable Documents. Double-click the file you want to open and this should bring up the Text Import Wizard It should recognize it as delimited. Click the Next button to define the delimiters. By default, Excel assumes a .txt file is tab-delimited Click Next and then Finish to finish the import. Advanced filter: Select the column of gene names Click on the Data menu and select Advanced filter (if you get a warning about being unable to determine which row contains column labels and you have a column header in row 1, just click OK). Check the radio button “Copy to another locationThis should move our mouse to the “Copy to” text box. Select a column (not Columns A-C) Check the box “Unique records onlyClick the OK button. This should produce a list of 208 genes from the original 500 genes.
5

Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Jul 30, 2018

Download

Documents

vantuyen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 1 of 5

Tutorial 1: Using Excel to find unique values in a list Itisnotuncommontohavealistofdatathatcontainsredundantvalues.Geneswithmultipletranscriptisoformsisoneexample.Ifyouareonlyinterestedinthegenesandnotthedifferenttranscripts,thenyouwillprobablywanttofilterthelisttoremovetheredundantvalues.IdidasearchoftheUCSChumangenomebrowserwiththequery“coloncancer”andgotback>500matches.Icreatedatextfilelistingthefirst500matches.YoucandownloadthisdatafromtheExercise1homepagebyclickingonthelinkListofGenesfromUCSC.txt.Thefilehas2columns:GeneNameandChromosomeLocation.YouwillfilteronGeneName.Onceyou’vedownloadedthetextfile,dothefollowing:

• OpenExcelandfromwithinExcelopenthetextdocument.Ifthefileyouwanttoopenisgreyedout,changethedropdownmenutoEnable:AllReadableDocuments.

• Double-clickthefileyouwanttoopenandthisshouldbringuptheTextImportWizard• Itshouldrecognizeitasdelimited.ClicktheNextbuttontodefinethedelimiters.• Bydefault,Excelassumesa.txtfileistab-delimited• ClickNextandthenFinishtofinishtheimport.

Advancedfilter:SelectthecolumnofgenenamesClickontheDatamenuandselectAdvancedfilter(ifyougetawarningaboutbeingunabletodeterminewhichrowcontainscolumnlabelsandyouhaveacolumnheaderinrow1,justclickOK).Checktheradiobutton“Copytoanotherlocation”Thisshouldmoveourmousetothe“Copyto”textbox.Selectacolumn(notColumnsA-C)Checkthebox“Uniquerecordsonly” ClicktheOKbutton.Thisshouldproducealistof208genesfromtheoriginal500genes.

Page 2: Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 2 of 5

Tutorial 2: Using Excel to manage text data Anissuecommontogenenamesorgeneidentifiersisslightvariationsthatcanpreventtheiridentificationviaadatabaselookup.Anexampleisthatasgeneortranscriptrecordsarereviewedbycurators,theyareoftengivenanappendednumbersuchasNM_0012345.1orNM_0012345.3indicatingwhichversiontheyare.ThebaseidentifierofNM_0012345isthesamebetweenthembutifyourlisthastheappendedversionnumber,thedatabaselookuporExcellookupwon’trecognizethetwoasbeingthesamerecord.Inthisexample,therearetwoExcelfilesavailablefromtheExercise2homepage:ExpressionData.xlsxandGeneInfo.xlsxTheExpressionDatafilehastwocolumns.ThefirsthasEnsemblGeneIDswiththeversionnumber.ThesecondcolumncontainsgeneexpressioninformationintheformofLog2ratiooftreatment/control.TheGeneInfofilehasfourcolumns.ThefirsthasEnsembleGeneIDs,butasthestableidentifierratherthanasaversion.Theremainingcolumnshavethegenesymbol,NCBIGeneIDandgenedescription.YouwanttobeabletobringininformationfromtheGeneInfofileintotheExpressionDatafilebutatthemoment,theydonotsharethesameidentifiers.Tocorrectthis,youwilluseatext-relatedfunctioncalledLEFTtochangetheGeneIDsintheExpressionDatafiletomatchthoseintheGeneInfofile.

1. InsertacolumntotheleftoftheGeneIDcolumnintheExpressionDatafile.2. IncellA2,type=andselecttheLEFTfunction3. SelectcellB2forthetextboxintheFormulaBuilderdialogbox4. Tabtothenum_charsboxandtypein155. ThisshouldreturntheENSG##uptothe.asitwasoriginially6. SelectthenewlygeneratedIDinA2,thencopydowntotheendofthecolumn.TypeCtrl-D

tocopythefunctiondowntherestofthecolumn.7. ThenEdit->copythenewlygeneratedIDsanduseEdit->Paste->Special->Valuestoreplace

theformulawithvalues.8. NowyoucanusethetwofilesinthenextsectiontobringthedatafromGeneInfointothe

ExpressionDatafile Tutorial 3: Using Excel to compare lists of data. Averycommonprobleminbioinformaticsorinformationprocessingofanykindishavingmultiplelistsofdatathatyouwanttocomparetoeachother.InExcelisafunctioncalledVLOOKUPthatmakesthiseasytodo.Itisalsousefulfortransferringdatafrom1worksheettoanother.Forthispartofthetutorial,youwillusetheGeneInfoandyourmodifiedExpressionDatafilefromtheprevioussection.YoucandeletethecolumnfromtheExpressionDatafilethathadtheGeneIDswithversionnumberinthem.Inthispartofthetutorial,youwillbringintheGeneNameandNCBIGeneIDintotheExpressionDatafile.

Page 3: Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 3 of 5

OpenbothworksheetsinExcel.o IntheExpressionDatafile,insertacolumnbetweencolumns1and2.o Inthesecondrowofcolumn2(cellB2),typeand“=”sign.Thengotothedropdownmenuin

theupperleftoftheworksheet,findthefunction“VLOOKUP”andselectit.IfyoudonotseeVLOOKUPonthemainmenu,scrolldownto“morefunctions”whichopensadialogboxwithalloftheavailableExcelfunctions.Under“lookupandreference”youwillfindVLOOKUP.

o Onceyou’veinsertedthefunction,youmustfillouttheargumentsforthefunctionusingthedialogboxthatopensup.SelectcellA2asthelookupvalue.

o Thenclickintothebox“Table_array”.GouptothewindowmenuandselectGeneInfor_ExcelTutorial.xlsxasshowninFigure2.

o ThiswillactivateGeneInfo.xlsx.

Figure1:InsertingaVLOOKUPfunctionintocolumn2ofExpressionDataworksheet.

Figure2:Selectingsecondworksheetforastable_arrayintheVLOOKUPfunction.

Page 4: Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 4 of 5

o Selectthefirst2columnsofGeneInfo.xlsx.o Taborclickonthebox“Col_index_num.”Thistellstheargumentwhichcolumnofdatato

bringovertothefirstworksheet.Typeina2.o Inthefinalbox,“Range_lookup,”type“false”.IfA2intheExpressionDataworksheetmatches

A2inGeneInfoworksheet,thenthevaluefromcolumn2ofGeneInfowillbeenteredintocellB2ofExpressionData.Ifthe2cellsdonotmatch,itwillfillin“N/A”.

o Tofillintherestofthecolumn,selectfromcellB2throughthenendofthedataandundertheEditmenu,selectFillDownorusethekeyboardshortcutof“Ctl+D”.

Figure5:Fillingintherestofthecolumnwiththesamefunction. Whenyouaredone,yourExpressionDataworksheetshouldlooklikethatshowninFigure4:

Figure3:Fillingintherestofthecolumnwiththesamefunction.

Figure4:GeneExpressionworksheetaftercompletingVLOOKUP

Page 5: Tutorial 1: Using Excel to find unique values in a listbiochem.slu.edu/bchm628/handouts/ExcelTutorials_2017.pdf · Tutorial 1: Using Excel to find unique values in a list It is not

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 5 of 5

Atthispoint,thedataincolumn2isstilllinkedtotheGeneInfoworksheet.Youcanseethisifyouclickononeofthegenenamesandlookatwhatisdisplayedinthetextboxatthetopofthesheet.Youdonotwanttoleaveyourfilelikethat,otherwiseeverytimeyouopenitwillgothroughthedatalookupfunctionagain.Toavoidthis,selecttheentirecolumn,copyitandthendoaEdit->PasteSpecialandselect“values”inthe“Pastespecial”dialogbox.Thiswillreplacethefunctionwiththevalueofthefunction.Afteryoucompletethat,clickonagenename.Youshouldseejustthegenenamedisplayedinthetextboxatthetop.

TobringintheNCBIgeneID,justinsertanothercolumnintheExpressionDataworksheetandrepeattheVLOOKUPprocessbringingincolumn3datafromGeneInforatherthancolumn2.

Figure5:GeneExpressionworksheetaftercopyingandpastespecialwithvalues