Top Banner
File Input and Output July 2nd, 2015
26

File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Jan 04, 2016

Download

Documents

Godfrey James
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

File Input and Output

July 2nd, 2015

Page 2: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Inputs and Outputs

• Inputs

• Keyboard

• Mouse

• storage(hard drive)

• Networks

• Outputs

• Graphs

• Images

• Videos(image stacks)

• Text files

• Statistical results

Page 3: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Keyboard Input

• The simplest input is reading in lines from the user

• print (“Please enter a number: “)

x = scan()

• Print (“Your number is “ + x)

Page 4: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Inputs and Outputs

• Entering data on your own is time consuming

• Large amounts of Data already exist in files

• MATLAB, R, and Python all provide the ability to read data from files (as well as write data to files)

Page 5: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

The basics of File Input and OutputFile Input and Output are the bread and butter of data manipulation and generation

File

Code

New File

Page 6: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Part 1: File Types

• Text Files

• Images

• Excel Files

• FASTA files

• Other

Page 7: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Text Files• A lot of data comes in the form of of

spreadsheets in text files (.txt or .csv extension)

• Generally the easiest to manipulate

Page 8: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Image files

• Images are stored as spreadsheets, where each pixel is represented by an X and Y co-ordinate and a range of values for the intensity of the pixel

• These values depend on what type of file the image is

Page 9: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Binary and Grayscale• Binary Image

Represented as either Boolean (TRUE, FALSE) or numerical (0,1)

• Grayscale Image Represented with a range of numerical

values (usually between 0 and 255)

Page 10: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Binary and Grayscale

BinaryGrayscale

Page 11: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Color images• Color images can be thought of as a stack of 3

images (red, blue, green)

• The intensity of each color is represented as a number between 0-255

Page 12: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Color images

Page 13: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Common I/O themes

• File type – The type of file the data will be saved as.

• Data type – The type (or mode) of the data (integers, strings, characters)

• Separator: What separates the data from each other; in text files, this is often a ,

Page 14: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Reading a file

• Most programming languages have functions that “read” files.

• You generally have to specify the type of file being read, or the computer will not know how to interpret what it's looking at

Page 15: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Reading a File

• Reading data into a program can often times be difficult and frustrating

• Can sometimes take more time than the rest of the coding!

• It's important to understand the subtleties and nuances that is specific to each language

Page 16: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

“Data Preparation”

• When working with any type of spread-sheet like files (e.g., excel): easier to export the data as a text file

• Image files can be converted from one type to another with relative ease. How you want to manipulate an image will determine in what format you want to save the image as.

Page 17: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Character Encoding

• Used to represent all of the characters in a form computers can understand it

• There are different types of character coding, the most common being Unicode

• Modern operating systems are likely to use UTF-8 or Unicode files

Page 18: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Character Encoding

• If character encoding differs between your data and what a computer uses to view the data, your text files can look like gibberish.

• Character Encoding can be changed using programming language specific functions

Page 19: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Reading a File

• R read.table() read.csv scan()

• Python .read() open() readline()

• Matlab importdata() fscanf() fopen()

Page 20: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

File Output

• Generating a file is an easier process than reading in a file

• Stored data can be written to a file of your choosing, whether it's an image, text file, etc.

• write(x, file = “C:\Program Files\R\data.txt”)

• write(x, file = “data.txt”, sep = “,”)

Page 21: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

The Working Directory

• The working directory can be thought of as the area that the programming language currently points to

• When reading or writing files, if a full path name is not given, the program will automatically look in the working directory to read or write a file

Page 22: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Part 2: Pipelines

• A way of structuring code that makes it simple to work with

• Simply, a pipeline is a chain of processes (such as functions) arranged so that the output of one process is the input to the next process

Page 23: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Part 2: PipelinesSequence

Data

Find Start Codon

Convert to Amino Acids

Save Amino Acid File

Page 24: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Functions

findstart <- function(fullseq, codons = c("atg", "taa", "tag", "tga")){

startposition <- sapply(codons, function(x){start(matchPattern(x, fullseq))})

return(substr(sequence, startposition, length(fullseq))

}

AminoSequence ← function(sequence){

amino ← *lots of code and a library of amino acid sequences*

return(amino)

}

Page 25: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Functions

simplesequence ← scan(“simplesequence.txt”)

sequenceTwo ← findstart(Thesequence)

AminoSeq ←AminoSequence(sequenceTwo)

write.file(AminoSeq, file = “Amino.txt”)

Page 26: File Input and Output July 2nd, 2015. Inputs and Outputs Inputs Keyboard Mouse storage(hard drive) Networks O utputs Graphs Images Videos(image stacks)

Pipelines

Structuring your code into functions can help ease the readability of your code and allow you to reuse functions for similar processes

Pipelines are excellent for automatic processing of a large amount of data sets