Top Banner
By: Raul Rodriguez Walter Checefsky (Added later) http:// orange.biolab.si /
8
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: By: Raul Rodriguez Walter Checefsky (Added later)

By:Raul Rodriguez

Walter Checefsky

(Added later)http://orange.biolab.si/

Page 2: By: Raul Rodriguez Walter Checefsky (Added later)

What is Orange?

• Python based tool for data-mining, developed by the Bioinformatics laboratory of the faculty of Computer and Information Science at the University of Ljubljana in Slovenia.

Page 3: By: Raul Rodriguez Walter Checefsky (Added later)

Why does Bioinformatics need this?

• Learn about the interaction of different genes• Discover different methods of gene expression• Learn the structure of proteins• Find probable regions of protein encoding

Page 4: By: Raul Rodriguez Walter Checefsky (Added later)

What’s it do?

• Mainly well known for its Graphical User Interface (GUI)• You can script in Python too

Page 5: By: Raul Rodriguez Walter Checefsky (Added later)

Which algorithms can it use?

• Decision trees (ID3, C4.5, CART)• Naïve Bayes• Instance Based Learning (kNN, ML-kNN)• Function Based Learning (regression analysis(log,lin,lasso,PLS,trees,mean),

ANN, SVM(libSVM,liblinear))• Ensemble Learning (bagging, AdaBoost, random forest)• Hierarchical clustering (linkage-based)• Partition Based Clustering (k-means, partition around medoids, fuzzy-c-means)• ANN based clustering(self-organizing)• Association Rules(Apriori(sparse, attr.-value)), apriori-SD)

Page 6: By: Raul Rodriguez Walter Checefsky (Added later)

Type of input?

• Tab delimited file

• Top row is: Features

• Type of data

• Meta information to describe features

• Data

Page 7: By: Raul Rodriguez Walter Checefsky (Added later)

Example Time!

Page 8: By: Raul Rodriguez Walter Checefsky (Added later)

Why isn’t it perfect?

• No Spatial Data Analysis• No Time Series Analysis• No Parallelization• Only Naïve Bayes in the Bayes family• Less algorithm options than other frameworks• Locks you into python