Top Banner
How to conduct high quality research and write good papers Haixun Wang Microsoft Research Asia
72
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How2research

How to conduct high quality research and write good papers

Haixun Wang

Microsoft Research Asia

Page 2: How2research

2

What is research?

1. Solve a problem using existing methods. Write a README.txt. (low innovation, little impact)

2. Improve existing solutions to an existing problem. Write a tech report. (low innovation, little impact)

3. Create a new solution to an existing problem. Write a paper. (high innovation, low impact)

4. Identify a new problem. Generalize the solution. Write a paper. (high innovation, high impact)

Page 3: How2research

Research and Engineering

• New Solutions Useful Solutions

3

Page 4: How2research

How innovative are you?

4

Page 5: How2research

5

• Why, if the Chinese had come to know so much about earthquakes so early on in their immensely long history, were they never able to minimize the effects of the world’s contortions — to at least the degree that America has?

• Why did they leave the West to become leaders in the field, and leave themselves to become mired, time and again, in the kind of tragic events that we are witnessing this week?

• It is a cruel that the children who died during the earthquake in Dujiangyan (都江堰), China, knew all too well that their country once led the world in the knowledge of the planet’s seismicity.

Page 6: How2research

6

• There had been any number of Chinese Euclids and Archimedes but there was never to be a Chinese Newton or Galileo.

• Until this week Dujiangyan was a place of which China could be proud; today its wreckage stands as a tragic monument to a culture that turned its back on its remarkable and glittering history (of innovation).

• In almost every area of technology the Chinese were once supreme, without competition. And yet, in the 16th century China’s innovative energies inexplicably withered away, and modern science became the virtual monopoly of the West.

Page 7: How2research

How to train your innovation?

7

Page 8: How2research

Read, Read, Read

8

Page 9: How2research

9

Malcolm GladwellEditor, New Yorker

Page 10: How2research

10

Page 11: How2research

10,000 hours of success

Excellence requires a minimum level of practice.

10,000 hours is the magic number

(3 hours per day for 10 years)

11

Page 12: How2research

By the time Bill Gates dropped out of Harvard, he had been

programming nonstop for seven years, which was way past

10,000 hours.

12

In the last 10 years, I spent more than 3 hours watching TV

everyday, how come I didn’t achieve anything?

Page 13: How2research

13

Nicholas Carr, Atlantic MonthlyJuly 2008

Page 14: How2research

14

Independent thinking

• the downfall of deep reading/thinking

• Internet is rewiring our brains, forcibly adapting us to tolerate only bite-sized summations and simplified blips at the expense of deeper thought

• we risk turning into ‘pancake people’—spread wide and thin as we connect with that vast network of information accessed by the mere touch of a button.

Page 15: How2research

15

How to train your creativity?

Write, Write, Write!

Page 16: How2research

16

Research = Writing + Rewriting

• Turn your idea into writing before implementing it.

• Hard to write it down? Because you don’t understand the problem (or your idea).

– Writing forces us to be clear, focused

– Writing crystallises what we don’t understand

• Writing opens the way to dialogue with others: reality check, critique, and collaboration.

Page 17: How2research

17

Research = Writing + Rewriting

• The process of writing and rewriting is the process of– developing your idea

– generalizing your problem/solution

• After many times of rewriting, your problem (idea) maybe totally different from the problem (idea) you start with– more interesting and challenging

• It’s not a waste of time. It’s how you should spend your time when you do research.

Page 18: How2research

How to find a topic?

The Theory of Flying Pigs

18

Page 19: How2research

In Reality

– Pigs do not have to fly.

Page 20: How2research

[ABSTRACT] In this paper, we identify theimportance for pigs to fly. We show thatmany challenging tasks can be modeled byflying pigs. Thus, solving the flying pigproblem benefits a large variety ofapplications.

20

Page 21: How2research

[ABSTRACT] In this paper, we extend thepioneering work of flying pigs [1]. Ourimprovement enables pigs to fly higher.

Page 22: How2research

[ABSTRACT] Recently, the flying pig problemhas attracted significant attention [1, 2].However, pigs in previous works are all flyingvery slow. In this paper, we introduce atechnique so that pigs can fly an order-of-magnitude faster.

22

Page 23: How2research

and soon we have many papers …

Page 24: How2research

24

What topic to work on?

• The choices you make will define your career

• No real problems at hand

– Get a proceeding. Read from the 1st page.

– Ask senior people what they are working on.

– Make it go faster/higher

• Find real problems, use real data

Page 25: How2research

25

Is this topic meaningful?

• Convince yourself

– an issue of research ethics

• Talk to your colleagues

– Hey! I have a crazy idea

– Convince them

• Talk to/Read from people not in your field

– mathematicians, physicists, biologists, …

Page 26: How2research

26

Database research as an example

• Database has been one of the most successful fields in CS in terms of applications and industrial value!

• However, is there any leftover for substantial database research?

– Relational database theory, a closing world?

– Too many index structures already?

Page 27: How2research

27

Example: Data Model

• From : RDBMS

– Normalization is one of the cornerstones of RDBMS

– Theoretical results and practical applications

• To: XML

– Storage model: still an open problem

– hybrid database, Native XML support

Page 28: How2research

28

Example: Logic Databases

• Logic database was a hot topic in the 80’s and early 90’s– models, semantics, magic sets, …

– many results have since been incorporated into RDBMS

– is Logic Database dead?

• Rejuvenated by semantic query processing– ontology, description logics

Page 29: How2research

29

Broadening the Scope

• Concern (VLDB endowment meeting, 98’):

– The area of database research may lose the pivotal role it now plays among information system technologies

• Keep DB research current and relevant

– We should maintain a watch on trends and future directions in the general area of information management

• Can a traditionally non-DB/KDD research problem be treated using DB/KDD methods?

Page 30: How2research

31

Writing techniques

• Overcome language barrier

• Paper structure and content

Page 31: How2research

32

The Language Barrier

• One must first know the

rules to break them

Page 32: How2research

33

Some General Tips

• Choose the right word/phrase

• Use the active voice

• A picture is worth 10,000 words

• Use a fair amount of formalization

• The divide-and-conquer approach

• Keep it simple and stupid

Page 33: How2research

34

Choose the right word/phrase

• Chicken without sexual life

• Husband and wife’s lung slice

• Bean curd made by a pockmarked woman

Page 34: How2research

35

Use the active voice

• Ten Yuan will be paid for every

one-time towel you use.

Page 35: How2research

36

Use the active voice

NO YESIt can be seen that... We can see that...

34 tests were run We ran 34 tests

These properties were

thought desirable

We wanted to retain these

properties

It might be thought that this

would be a type error

You might think this would be

a type error

The passive voice is “respectable” but it DEADENS your paper. Avoid it at all costs.

“We” = you and the reader

“We” = the authors

“You” = the reader

Slide borrowed from Simon Peyton Jones

Page 36: How2research

37

Some General Tips

• Choose the right word/phrase

• Use the active voice

• A picture is worth 10,000 words

• Use a fair amount of formalization

• The divide-and-conquer approach

• Keep it simple and stupid

Page 37: How2research

38

Be Specific

NO! YES!

We describe the WizWoz

system. It is really cool.

We give the syntax and semantics of a

language that supports concurrent

processes (Section 3). Its innovative

features are...

We study its properties We prove that the type system is sound,

and that type checking is decidable

(Section 4)

We have used WizWoz in

practice

We have built a GUI toolkit in WizWoz,

and used it to implement a text editor

(Section 5). The result is half the length of

the Java version.

From Simon Peyton Jones

Page 38: How2research

39

Structure (conference paper)

• Title (1000 readers)

• Abstract (4 sentences, 100 readers)

• Introduction (1 page, 100 readers)

• The problem (1 page, 10 readers)

• My idea (2 pages, 10 readers)

• The details (5 pages, 3 readers)

• Related work (1-2 pages, 10 readers)

• Conclusions and further work (0.5 pages)

Slide borrowed from Simon Peyton Jones

Page 39: How2research

40

An Attractive Abstract Counts

• Abstract is for people to skim through in one minute

– No technical details

– Plain English, easy to understand

– No assumption of DB/KDD background

– As short as possible

• What to write

– The problem, and why it is important and challenging

– Your technical thrust, progress and contributions

– Broader impact

• Write it last!

Page 40: How2research

41

What Is a Good Introduction

• Starting from good stories– Motivation – what is the problem and why is the

problem important?

– 1-2 typical real-life applications

• Intuition and general ideas– Intuition is most important!

– No technical details

– Understandable for a CS undergraduate

– Use clear, small examples

Page 41: How2research

42

What Is a Good Introduction (2)

• Highlight major contributions

– Typical examples: identifying a new problem, novel solutions, a systematic performance study, …

– Only list the major ones, don’t over claim

– Again, no technical details

– A road map of the rest of the paper

Page 42: How2research

What’s the difference?

43

Hardcover: 1312 pages

Publisher: Wiley; 7th edition (June 20, 2001)

Language: English

ISBN-10: 0471381578

ISBN-13: 978-0471381570

Product Dimensions: 10.1 x 9.1 x 1.9 inches

Shipping Weight: 6.1 pounds

页码:378 页出版日期:2004年01月ISBN:7040137860

条形码:9787040137866

Page 43: How2research

44

Writing paper is like telling a story

• The goal of the title is to get the reader to read

the abstract …

• The goal of the abstract is to get the reader to

read the introduction …

• …

• You need a good set up … a suspense … then

you unfold your story slowly …

Page 44: How2research

45

Goal: creating a suspense

• Reader thinks “gosh, if they can really deliver this, that’d be exciting. I’d better read on”

Page 45: How2research

46

Create Suspense

Many years later, as he faced the firing squad, Colonel Aureliano Buendia was to remember that distant afternoon when his father took him to discover ice.

One hundred years of solitudeby Gabriel García Márquez

Page 46: How2research

47

Keep it Simple and Stupid

一夜北风紧

红楼梦/曹雪芹

这句虽粗,不见底下的,这正是

会作诗的起法。不但好,而且留了写不尽的多少地步与后人。

Page 47: How2research

48

An Example (SIGMOD’02)

Page 48: How2research

49

Motivation Found!

Shifting Pattern {b,c,h,j,e}

Scaling Pattern{f,d,a,g,i}

Page 49: How2research

50

Is It Meaningful?

CH1I CH1B CH1D CH2I CH2B …

VPS8 401 281 120 275 298

SSA1 401 292 109 580 238

SP07 228 290 48 285 224

EFB1 318 280 37 277 215 …

MDM10 538 272 266 277 236

CYS3 322 288 41 278 219

DEP1 317 272 40 273 232 …

NTG1 329 296 33 274 228

… … …

Page 50: How2research

51

Intuition Is the Most Important

• Example– ensemble classifier for streams

• Why ensemble?– Rigorous mathematical proof which shows ensemble

reduces classification variance

• Many benefits– High accuracy, ease of use, best approach in many

aspects

• Result: – paper rejected

Page 51: How2research

52

Optimal decision boundary

t0 t1 t2t1 & t2 errorst0 & t2 no errors!t0 & t1 & t2 errors

Page 52: How2research

53

How to Present Technical Details?

• The top-down approach

– First give an overview of the algorithm

– Present details of the major steps

• The bottom-up approach

– Start from the critical details

– Summarize the discussion and present the algorithm

• The hybrid approach

– Top-down to partition the global problem

– Bottom-up to present solutions to sub-problems

Page 53: How2research

54

How to Present Examples?

• Occam’s razor (the principle of parsimony)

– “One should not increase, beyond what is

necessary, the number of entities required to

explain anything”

• Find the simplest example that can show

all the points you want to show

– Some data in running examples can be highly

skewed

– Only select data that can show critical ideas

Page 54: How2research

55

Worksheet of Running Example

• Work out the complete running example

• Select the interesting and critical segments

• Present multiple small examples in the paper

– Only one running example if possible

– Preferably several paragraphs in one example

– Don’t give a long, exhaustive example

– Each example should focus on one point

Page 55: How2research

56

How to Present Algorithms?

• Choose the appropriate abstract level

– Operations obvious – omit them

• Readers have general CS background

– Complicated operations – function description

• The WWH sequence

– Why do we need such an operation?

– What is the operation?

– How can the operation done efficiently?

Page 56: How2research

57

Keep Your Algorithm Short

• Long algorithms are hard to understand

• Multi-level expansion of algorithms

– Use functions or procedures

• Ideally, each algorithm is less than 20 lines

• Control the complexity

– Don’t use too many variables

– Use meaningful variable names

– Use plain text to explain

Page 57: How2research

58

Performance Study Goals

• “Wisconsin wallpaper”

• Clearly say why you design and conduct

the experiments

– Effectiveness measures

– Efficiency measures

– Other considerations

Page 58: How2research

59

How to Present Experimental

Results?

• Experiment settings

• Performance study goals

• Selected experimental results

– Explanation

• Summary of performance study

Page 59: How2research

60

How to Handle Related Work?

• If possible, talk about related work at the end of the paper.– Do not interrupt the flow of your story

• Extensive collection of related work– Don’t forget to look at the latest results– Go beyond your field, if possible

• Give sufficient credits to others– We are standing on the shoulders of giants– Avoid emotional words– Be precise in comparison

• Point out critical points– Use examples if necessary

Page 60: How2research

61

What Should Be in Discussion?

• Related issues

– Constraints in your method

– Drawbacks

• Possible extensions

– Point out the other problems that can be solved straightforwardly using the proposed method

– Broader impact

• Future work if you have a detailed plan

Page 61: How2research

62

Writing Strong Conclusions

• Summarize the paper briefly.

– What is the problem solved

– Major technical contributions

– Major findings and results

• Future work if possible

Page 62: How2research

63

Aiming high!Major DB/KDD Conferences

• DB (in my opinion)

– 1st tier: SIGMOD, VLDB, ICDE

– 2nd tier: EDBT, ICDT, CIKM, ER, SSDBM

– Regional: DASFAA, WAIM, British DB Conf,

Australian DB Conf, Brazilian DB Conf, DEXA, …

• KDD (in my opinion)

– Top: KDD

– 2nd tier: SIAM DB, ICDM,

– Regional: PAKDD, PKDD, …

– KDD papers can be sent to DB & ML conferences

Page 63: How2research

64

Reviewers’ Comments

Page 64: How2research

65

Reviewers Comments

• The conference review process is necessarily imperfect

• The reviewers operate under strict time constraints, and the committee must make quick decisions.

• Some good papers will be rejected and some embarrassing papers will be accepted.

Page 65: How2research

66

Thank you!

Page 66: How2research

67

My Paper Got Accepted!

• Congratulations!

• Address reviewers’ comments in the final version– Adopt good points

– Clarify and remove confusions

• Prepare a nice talk and/or poster– Pass the general idea

– Use examples wherever possible

– Use as few symbolic text as possible

Page 67: How2research

68

Recycle a Paper

• Before publication, a paper is likely to go

through several rejections

– SIGMOD,VLDB,ICDE acceptance is around

10%-15%

– A conference with 25+% acceptance ratio

may not be good

• Aim at the next chance

Page 68: How2research

69

Learn from the Reviews

• Do we aim at the right target?

– If 2/3 of reviewers are laymen of your subject,

consider the forum seriously

• Address technical issues

– Response to reviewers’ comments by

revising/enhancing technical description and

experiments

• Improve writing

– Confused reviewers? Clarify the issues

– Correct any linguistic problems pointed out

Page 69: How2research

70

Why Journal Papers?

• Records archived

• Important for degree, promotion,

election, …

Page 70: How2research

71

Conference vs. Journal Papers

• Length

– Journal papers are often longer

• Objectives

– Conference papers mainly pass the ideas and

results

– Journal papers systematically report and

justify the research, more formal

Page 71: How2research

72

From Conference Papers to

Journal Papers

• A critical requirement: “major value added”

– 30% in some journals, e.g., TODS, TKDE

– But, how to count?

• Some “major values”

– More detailed/complete examples

– Complete formal results and proofs

– Further variations and extensions of the method

– Triviality should be avoided

Page 72: How2research

73

Steps Towards Good Research

• Motivations and problems

– More important than the solutions

• Re-search

– Systematic development of solutions

• Writing a good paper

– Careful design

• Submissions

– Good luck!