Top Banner
File and Data Management Jing Su 20180223
30

File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Oct 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FileandDataManagement

JingSu20180223

Page 2: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

WhatYouWillLearn

•  Whyfilemanagementofyourresearchdataisimportant

•  Specifictechniquesfororganizingyourresearchdata,–Filestructures–Filenaming–Versioncontrol-Storage&Backup

•  Including:–Smallgroupdiscussion–Exercisefororganizingyourowndata

•  Focusesonresearchdata,alsoappliestoothertypesoffiles

Page 3: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

SmallGroupDiscussion

•Whatkindofdatadoyouworkwith?•WhatorganizaSonalchallengeshaveyoufaced?•Whattoolsortechniquesworkforyou?

Page 4: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

ResearchDataLifecycle

Lifecycle:hVps://www.youtube.com/watch?v=-wjFMMQD3UA&feature=youtu.be

Page 5: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

DataManagementChecklist1/2

Page 6: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

DataManagementChecklist2/2

Page 7: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

DataManagementChecklist

•  Whattypesofdataandforhowlong?Fivestepstodecidewhatdatatokeep

•Whowillberesponsibletocollectanddocumentthedata?

RolesandresponsibiliSes.LegalandethicalobligaSonsandright.Planandconsenttoshare.

•Howtodocumentdifferenttypesofdata?Study-level,Data-level,andMetadataWetlab:ElectronicLabNotebook(ELN)ComputaSonal:largesizesequencingdata,consorSumdata(TCGA,ICGC)

Page 8: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

DataManagementChecklist

Page 9: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Formats:Datatypeandsources

FileformatscurrentlyrecommendedbyUKDataArchiveforlongtermpreservaSonforrsearchdata

Page 10: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –
Page 11: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FileNamingConvenSons

•  Makefilenamesunique•  IncludemostimportantidenSfyinginformaSonoftheproject:

ü  projectnameü  acronym,orresearchdatanameü  studyStleü  locaSoninformaSonü  researcheriniSalsü  date(consistentlyformaVed,e.g.YYYYMMDD)ü  version

•  Useunderscorestoseparateelements;avoidspecialcharacters,spacesandperiods.

•  UseleadingzeroswhenincorporaSngnumberstoenablesorSng(asequenceof1-100shouldbenumbered001-100).

•  Filenamesshouldbeshortenoughtobereadable,whilesSllconveyingenoughperSnentinformaSon(limits255chars)

Page 12: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FileNamingConvenSonsExamples •  TheGood:DryValleySoil_ICPOES_20101115_JDSv2.dat

–  DryValleySoil,projectname–  ICPOES,instrumentname–  20101115dateofsamplecreated–  JDS,iniSalsofthescienSst–  V2,secondversion

•  TheBad:[email protected]•  TheUgly:

Canyouunderstand/usethesedatafiles?Wouldanyone5yearsfromnow?•SrvMthdDraj.doc•SrvMthdFinal.doc•SrvMthdLastOne.doc•SrvMthdRealVersion.docUsecontent-ordescripSveinformaSon

Page 13: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

BatchingRenamingTools •  Windows:

•AdobeBridge(viaanyCreaSveCloudproducts):hVp://ist.mit.edu/adobe-creaSve-cloud•AntRenamer:hVp://www.antp.be/sojware/renamer•BulkRenameUSlity:hVp://www.bulkrenameuSlity.co.uk/•ImageMagick:hVp://www.imagemagick.org/•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html•RenameIT:hVp://sourceforge.net/prpjects/renameit

•  Mac:•AdobeBridge(viaanyCreaSveCloudproducts):hVp://ist.mit.edu/adobe-creaSve-cloud•ImageMagick:hVp://www.imagemagick.org/•NameChanger:hVp://web.mac.com/mickeyroberson/MRR_Sojware/NameChanger.html•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html•Renamer4Mac:hVp://renamer4mac.com/•NameMangler:hVp://manytricks.com/namemangler/

•  Linux:•GNOMECommander:hVp://www.nongnu.org/gcmd/•GPRename:hVp://gprename.sourceforge.net/•ImageMagick:hVp://www.imagemagick.org/•PSRenamer:hVp://www.powersurgepub.com/products/psrenamer.html

•  Unix•Theuseofthegrepcommandtosearchforregularexpressions

Page 14: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

VersionControl Aim:Keeprawdatauntouchedandreversetoearlierversion

•  Saveanuntouchedcopyoftherawdata,workonsaveuntouchedcopy•  UseafilenamingconvenSon(likev001,v002orv1_0,v1_2,v2_0•  UseadirectorystructurenamingconvenSonthatincludesversion

informaSon•  Datecanbepartofthefilename,e.g.

2012-02-27_Template_soil_tesNng.xlsx•  Appendtheauthor’snametothefilename,e.g.

Template_soil_tesNng_modified_by_AH.xlsx•  Addaversionnumberajerreachmajoredit,e.g.

Template_soil_tesNng_v03.xlsx•  Directorytop-levelfoldersshouldincludetheprojectStle,unique

idenSfier,anddate(year),butthefilesthemselvesshouldbewell-describedindependentofthedirectorystructure.

•  Versioncontroltools:–  Wetlab:ElectronicLabNotebooks/Box/LIMS–  Drylab:SVN/GitHub

Page 15: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

VersionControlExample

Page 16: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FolderStructure •  Methodsoforganisingelectronicmaterial

–  Hierarchical:Itemsorganisedinfoldersandsub-folders–  Tag-based:Eachitemassignedoneormoretags–  HybridcombinaSonofhierarchicalandtag-based

Page 17: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FolderStructureExamples–Hierarchical

Page 18: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

FolderStructureExamples–Tag-based

Page 19: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

SmallGroupDiscussion

•  Whatsortofstructure(s)doyoucurrentlyuse?•  Whatdoyouseeasthekeyadvantagesanddisadvantagesof

thedifferenttypesofsystem?•  AretherespecifictasksonesortofsystemseemsparScularly

suitablefor?Howdoesthisapplytoyourresearchproject?

Page 20: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

DataStorage The everlasting external disks

Are they really permanent? What if…

Page 21: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Whatifyourdataislost

Cancer Research UK – University of Manchester – 27 April 2017

Page 22: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Whatifyourdataislost

l  Yourlaptopgotstolenl  Youroffice/houseburntl  YourUSBsSckislostl  Yourportableharddiskisdamagedl  DatacopiedtoDropboxdisappeared

https://en.wikipedia.org/wiki/The_Scream

Page 23: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Storage+Security+EncrypSon+Backup+Sharing

https://en.wikipedia.org/wiki/The_Scream

Page 24: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Storage+Security+EncrypSon+Backup+Sharing

https://en.wikipedia.org/wiki/The_Scream

Page 25: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Storage+Security+EncrypSon+Backup+Sharing

https://en.wikipedia.org/wiki/The_Scream

•  UniversityStorageService

•  CRUKCIIT•  Lab•  Individual(Timemachine)•  CLOUD?

Page 26: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Atleast2backupsat2differentloca3ons

External disks Online backup Department

College IT

Cheap £10-15 / TB (1024GB)

Failure rate 1.5%/year

Servers

Accessibility Free (limit)

Personal data Hacking

Moving between institutions

Managed by experts

DataBackup

Page 27: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Manual Automated

Copying files to relevant folders

- Install software e.g. Time machine (Mac users)

Automatically upload files to the cloud when any changes are saved

Copying files to relevant folders

- RAID technology - Checksums

DataBackup

Page 28: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

Data backup and file sharing

Space/price 15 GB (free) 1 TB (~£80/year)

1 TB (free)

File history and recovery

Support

File size limit

Yes, unlimited

OS

2 GB (free) Unlimited (£55/year)

Windows, Mac, Linux, Android, iOS

UIS Unsupported UIS

Accessibility Sync anywhere on any devices

None

Yes

5 GB

Windows, Mac, Android, iOS

Live editing

Last 90 days

Integration with Microsoft Office

15 GB

Windows, Mac, Android, iOS

Page 29: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

•  Q:Ifmanual...howojen?A:Howmuchwouldyoubewillingtolose?

•  So5wareallowsyoutosetupbackup)me

automa3cally

1 day 1 week 1 month-year

DataBackup

Page 30: File and Data Management · • The Good: DryValleySoil_ICPOES_20101115_JDSv2.dat – DryValleySoil, project name – ICPOES, instrument name – 20101115 date of sample created –

More … file sharing

Email Website FTP