Top Banner
Developing Machine Learning Applications Geoff Holmes, University of Waikato 1
20

Machine Learning Application Development

Jun 12, 2015

Download

Technology

LARCA UPC

In this talk I will review several real-world applications and tools developed at the University of Waikato over the past 15 years. The early applications focused on agricultural problems such as cow culling, venison bruising and grass grubs. Following this we looked at the use of near infrared spectroscopy coupled with data mining as an alternate laboratory technique for predicting compound concentrations in soil and plant samples. Our latest application is in the area of gas chromatography mass spectrometry (GCMS), a technique used to determine in environmental applications, for example, the petroleum content in soil and water samples.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning Application Development

Developing Machine Learning ApplicationsGeoff Holmes, University of Waikato

1

Page 2: Machine Learning Application Development

Outline

• What application development have we done?

• What lessons have we learned?

• What is needed in terms of the future of machine learning application development?

2

Page 3: Machine Learning Application Development

Applications – a taxonomy

• UCI data sets – very much like our early agricultural data

• Competition data – usually larger than above, often difficult

• Signal control applications (often involve reinforcement learning) – eg autonomous helicopters, vehicles, learning the signature of a great pianist, learning to sail, learning to drive racing cars faster, learning to play soccer (often linked to robotics)

• Key to success = objective measurement – eg Human Computer Interaction, Speech and Image Recognition, Computer Games, etc.

3

Page 4: Machine Learning Application Development

WEKA Waikato Environment for Knowledge Analysis

• Machine Learning at Waikato started in 1993

• Build an interface to enable several ML methods to be compared on same data

• Explore datasets of importance to the agricultural sector in NZ

• Apple bruising, Venison bruising, Bull behaviour, Grass grubs, Pasture production, Pea seed colour, Slugs, Squash harvest, Wasp nests, White clover persistence

• Cow culling

• Datasets very of the “bring out your dead” variety

4

Page 5: Machine Learning Application Development

WEKA – unscientific study from Google Scholar

• For the query “WEKA applications”

• Bioinformatics

• Grid Computing

• Medicine

• Business and Finance

• Computer Networks

• Education

5

Page 6: Machine Learning Application Development

Early lessons learned

• Using WEKA is good but only static solutions are possible

• Datasets need to be large enough to yield significant and meaningful results

• Datasets involving human judgement tend to be unreliable

6

Page 7: Machine Learning Application Development

Scientific Equipment Application Methodology

• Obtain samples and reference data from existing technology (eg wet chemistry) – establish targets Y.

• Process same samples using a proxy (eg NIR) – new X

• Construct new dataset with new X and Y

7

Page 8: Machine Learning Application Development

Near Infrared Spectroscopy

• Once concept was proven we needed a system to support commercial use (ie alongside the LIMS)

• Developed S2 (with WEKA interface):

• Used continuously at Hill Laboratories and BLGG (Holland) since around 2005 – never gone wrong!

• So far it is the best application of the technology that we have ever come across.

• Faster than wet chemistry

• Predictions can be more accurate

• Large cost savings – multiple analyses per sample

8

Page 9: Machine Learning Application Development

S2

9

Page 10: Machine Learning Application Development

NIR – lessons learned

• Very lightweight input/output solution using dropboxmethodology was successful as it is transparent and seamless alongside a LIMS.

• Instrument data is extremely reliable

• In this Industry, compliance is important which implies that a single algorithm is better than choosing the best method per dataset.

• As data is abundant, models are rebuilt from time to time.

• No facility for users to develop new applications.

10

Page 11: Machine Learning Application Development

Gas Chromatography Mass Spectrometry

• Analytical instrument that combines the features of gas chromatography and mass spectrometry to identify different substances within a test sample

• Typical Applications

• Environmental monitoring

• Food and beverage analysis

• Criminal forensics (CSI!)

• Drugs/explosives detection

11

Page 12: Machine Learning Application Development

Example Chromatogram (PAH) – ion counts

12

Page 13: Machine Learning Application Development

MS fingerprints

13

Page 14: Machine Learning Application Development

Machine Learning Approach

• Chromatograms are pre-processed to extract features

• Dataset constructed combining pre-processed chromatograms with analyst checked compound concentrations

• Learn the relationship between pre-processed chromatograms and compound concentrations:

• extensive pre-processing of data

• parallel processing – 5000 * 300 values per instance (NIR = 1000)

• pre-processing varies among compounds

14

Page 15: Machine Learning Application Development

Process Requirements

15

Page 16: Machine Learning Application Development

Solution = Advanced DAta Mining System

• get database IDs of chromatograms

• load chromatograms from DB

• identify and reject outliers

• obtain calibration set information, check correctness of set

• align with calibration chromatogram, check correlation

• compound-specific outlier detection

• generate artificial chromatogram with peaks of compound and spike compound

• generate output for WEKA16

Page 17: Machine Learning Application Development

Limitations and future directions

• What we have seen so far works with data resident in memory (RAM) all the time

• This implies a limit can easily be reached, esp in applications like GCMS.

• We would like to be able to learn from potentially infinite data sources but with finite memory (RAM).

17

Page 18: Machine Learning Application Development

Solution = MOA

18

Page 19: Machine Learning Application Development

Future Directions

• Investigate how to get users to deploy their own DM solutions

• Implement incremental pre-processing techniques (Joao has already started!), eg incremental outlier detection.

• Implement incremental algs esp. for regression.

• Encourage work on abstention classifiers, uncertainty associated with point predictions etc.

• Meta-mine which units of a workflow are useful in tandem

• Investigate fusion: ADAMS with MOA, data (image+features), tasks (multiview, multitask, transfer)

19

Page 20: Machine Learning Application Development

Finally

Questions or Comments?

20