Keywords for Bytecodes Dr. Duboue Introduction The Speaker Bytecodes as Semantics Reverse Engineering Details Corpus Assembly Main Pipeline Results Applications Other Topics GRIUM/RALI Other Academic Focus on Technology Summary Predicting English Keywords from Java Bytecodes Pablo Ariel Duboue, PhD Les Laboratoires Foulab Montreal, Quebec Séminaires RALI-OLST, Université de Montréal
44
Embed
Details Java Bytecodeskeywords4bytecodes.org/RALI-OLST2012.pdf · Java Bytecodes Pablo Ariel Duboue, PhD Les Laboratoires Foulab Montreal, Quebec Séminaires RALI-OLST, Université
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Predicting English Keywords fromJava Bytecodes
Pablo Ariel Duboue, PhD
Les Laboratoires FoulabMontreal, Quebec
Séminaires RALI-OLST, Université de Montréal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
I IBM Research WatsonI AQUAINT: Question Answering (PIQuAnT)I Enterprise Search - Expert Search (TREC)I Connections between events (GALE)I Deep QA - Watson
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
In Montreal
I am passionate about improving societythrough language technology and split mytime between teaching, doing researchand contributing to free software projects
I Working with Prof. Nie at GRIUMI Taught a graduate class in NLG in ArgentinaI Contributed to Free Software projects, including
some of my ownI Doing some consulting focusing on startups
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Semantics, Java Bytecodes, Javadocs
I Motivation: Machine Learning for NaturalLanguage Generation
I Finding good semantic representations “in thewild” is very rare
I Level of detail of semantic representations vs.natural language
I Similarities with binary code and codecomments
I Reverse Engineering practitioners could toleratenoisy text
I As discussed in the INLG panel last summer
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Java Bytecodes
I JVM is a stack machineI The set of opcodes (~200) is small to simplify
porting to new architectures.I The opcodes fall into six categories:
I Load/store (e.g. aaload, bastore)I Arithmetic/logic (e.g. iadd, fcmpg)I Type conversion (e.g. i2b, f2d)I Object construction and manipulation (new,
putfield)I Operand stack manipulation (e.g. swap,
dup2_x1)I Control flow (e.g. if_icmpgt,goto)I Method invocation and return (e.g.
invokedynamic, lreturn)
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
LDC and CALL
I While bytecodes represent a reducedvocabulary, they can incorporate names ofclasses or methods and string constants
ldc pushes a constant onto the operandstack (number or string)
getfield instance and field namegetstatic classname and field name
invokedynamic invokes a dynamic method
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Javadocs
I Javadocs are standardized Java commentsI Include special mark-up in the form of ’@’
constructionsI @param, @throws, @return among others
I In my work, I focus on the comments associatedwith each method
I Example:I Creates a CacheRandom instance with a given
cache capacity. @param capacity Thecapacity of the cache.
I Adjusts the relative offset where the matchbegins to an absolute value. Only used byAwkMatcher to adjust the offset for streammatches. @return The length of the match.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
What is Reverse Engineering
I From WikipediaReverse engineering is the process ofdiscovering the technological principlesof a device, object, or system throughanalysis of its structure, function, andoperation. (...) The same techniques aresubsequently being researched forapplication to legacy software systems(...) to replace incorrect, incomplete, orotherwise unavailable documentation.
I REcon: the premier reverse engineeringconference, held yearly at Montreal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q .e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q .e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q .e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q .e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
I Bing snippet: Return to Notae. Wine and Rome.Now nearly extinct in the wild, grapes (vitisvinifera) grew throughout the ancientMediterranean, the juice readily fermenting asthe enzymes ...
I Summarization: Wine almost always was mixedwith water for drinking; undiluted wine merumwas considered the habit of provincials andbarbarians. The earliest work on wine andagriculture was written in Punic. Indeed, by 154BC, says Pliny, wine production in Italy wasunsurpassed.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering