Learn to Code for Data Analysis Michel Wermelinger, Tony Hirst, Rob Griffiths School of CompuBng & CommunicaBons The Open University
LearntoCodeforDataAnalysis
MichelWermelinger,TonyHirst,RobGriffiths
SchoolofCompuBng&CommunicaBonsTheOpenUniversity
Course
• FreeintroducBontocodinganddataanalysis– Codingforreproducibleresearch
• FuturelearnMOOC:hIp://Bny.cc/lcda– 5+h/week,4weeks,inMay+Octeachyear
• AlsoonOpenLearn:hIp://Bny.cc/lcda-ol– 24/7butnodiscussionforums,nosupport– usedbySpaceScienceMScmodule– materialsavailableunderCC-BY-NC-SAlicense
• 40minsessionbasedonweek1forYear10
Learning
• Mostlytext+somevideos• InternaBonalopendata:WHO,WU,WB,UN.• Weeklyproject:– StartwithresearchquesBons– Weanswerthem,introducingnecessaryconcepts– Interleavereadinganddoingexercises– Endwithdataanalysisreport
• Showandtell:sharereportonownregion
FirstPrinciplesofInstrucBonProblem-centred:BasetheteachingandlearningoninteresBngandprogressivelymorecomplexreal-worldproblems.1. Ac0va0on:HelpthelearnersacBvatepastexperience,informaBonor
mentalmodelsthatcanbeusedtoorganisethenewknowledge.2. Demonstra0on:Showthelearnersthenewknowledge,e.g.through
workedexamples,preferablywithmulBpleviewpoints.3. Applica0on:Givelearnersasequenceofvariedproblemsforthemto
applythenewknowledge.Providefeedbackanddiminishingguidance,e.g.onhowtocorrectmistakes.
4. Integra0on:Encouragelearnerstodiscuss,reflecton,andpubliclydemonstratetheirnewknowledgeorskill,tointegrateitintotheirlives.
Moredetails:hIp://Bny.cc/fpoi
Technology
• Python:easytolearn,usedinSTEMFaculty• Pandas:R-likedataanalysislibrary• Jupyternotebooksforexercisesandreports• hIp://anaconda.com:computerapp– freeforWindows,Mac,Linux
• hIp://cocalc.com:freewebservice– featurestodistribute,collect,gradeassignments
Jupyternotebooks
• texteditor/formaIer+codeeditor/interpreter– freeprofessionalbuteasy-to-usesoaware– datascienBstsuseforreproducibleresearch– weuseinunder-andpost-graduatecourses
• text,code,andoutputs(tables,charts,...)– handoutswithexamplesandexercises– projectreports– textinMarkdown,codeinPython,bothwidelyused
• codeonelineataBme,withimmediatefeedback
Jupyternotebooks
• createandeditnotebooksinbrowser– studentscanaddownnotesandfixtyposquickly
• publishread-onlyversion,e.g.Y10notebook– exporttoHTMLorPDFandsharefile– single-clickpublishinCoCalc – putnotebookfileonline,pasteURLinhIp://nbviewer.jupyter.org
– publishonGitHub(withversioncontrolforfree!)
Problems
• Keepingupwithsoawareandsitechanges• Providealldataofflinebecause:– Online(historic)datachanges(teachingpoint!)– FreeCoCalcaccountdoesn’tallowAPIcalls
• Fileencodingissues• [](){}anditscombinaBonscanbeconfusing• Notebooksrequirediscipline• ConvincingtheExcelfans
Soaware&studypoll
• Atendofweek1,<8%responserate• 73%Windows7/8/10,17%Mac,8%Linux• 75%Anaconda,24%CoCalc– higherCocalc%forpre-Windows10
• Exercises:71%same,9%differentcomputers• Reading/coding:62%same,10%different• Study:48%regular,44%sporadicsessions
StatsFromFutureLearn
• Enrolments(learnerID,dateBme,compleBondate,quitdate)
• StepAcBvity(learnerID,step,firstvisit,completed)
• Comments(learnerID,dateBme,step)• Quiz(learnerID,dateBme,quesBonnumber,learnerresponse,T/F)
MedianMinutesperStep
Areanyques,onspar,cularlyproblema,c?
QuesBonDiagnosBcs
StudyPaIernsStudentelapsedvisitBme:
-“hares”and“tortoises”
StudentstudysessionBme:-medianstudysessionBme?-howmanystudysessions?
Conclusions
• FreecourseforCPDandre-purposing• Bewaredata+soawareissues– Updatestodata,websitesandsoaware– ITliteracy:filehandling,swinstallaBon– Codingissues:syntax,notebookusage
• FollowFirstPrinciplesofInstrucBon• Problem-drivendataanalysisisengaging– It’sinterdisciplinary,global,local,personal