Exploring the Use of Labels to Categorize Issues in Open-Source Software Projects Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, Belén Rolandi SANER conference March 2015
Jul 15, 2015
Exploring the Use of Labels to
Categorize Issues in
Open-Source Software Projects
Jordi Cabot, Javier Luis Cánovas Izquierdo,
Valerio Cosentino, Belén Rolandi
SANER conference
March 2015
Open-Source Systems
…computer software with its source code made
available with a license in which the copyright
holder provides the rights to study, change
and distribute the software to anyone and for
any purpose.
…Open-Source Software (OSS) is developed
in a collaborative public manner.
Label Issues in GitHub
bug
duplicate
enhancement
help wanted
invalid
question
wontfix
Default labels
GitHub Analysis
GHTorrent
RQ1. Label Usage
How many labels are used in Github? How many labels are
used per project? What are the most popular ones?
RQ2. Label Influence
For those projects using labels, does its usage influence the
evolution of the project?
GiLA
Early Research Achievement
Can we detect group of labels commonly used together? Are
there label families?
Label Usage in GitHub
Using Labels122,012
3%
Not Using labels3,635,026
97%
Lesson: Labels are scarcely used in GitHub
Main Labels
Lesson: Default labels are the winners but Documentation and feature are also broadly used
Projects using labels
55561
31026
13390
6910
42063011
1934 1378 955 723
2918
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
# projectsTotal: 122,012
1.47%
0.82%
0.94%
Labels/Issue
55561
31026
13390
6910
42063011
1934 1378 955 723
2918
1 1.02 1.04 1.06 1.09 1.081.13
1.18 1.21.25
1.52
0
0.5
1
1.5
2
2.5
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
# projects Avg. Labels/issueAvg: 1.14
% Labeled Issues
55561
31026
13390
6910
42063011
1934 1378 955 723
2918
59.87 6158.89 58.84 59.72
56.1657.83 58.99 58.83
55.06 55.88
0
10
20
30
40
50
60
70
80
90
100
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
# projects %labeled issuesAvg: 58.29%
Users involved in labeled issues
55561
31026
13390
6910
42063011
1934 1378 955 723
2918
59.87 6158.89 58.84 59.72
56.1657.83 58.99 58.83
55.06 55.88
80.98
72.06
77.7375.81 75.22
72.05 72.8775.52
72.0669.25 70.43
0
10
20
30
40
50
60
70
80
90
100
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
# projects %labeled issues % users involved in labeled issues
Avg: 78,72%
Label Influence
26.93
46.18
74.92
101.3111.8
145.7
116.4127.2
116.4
70.4
306.4
148.1
22.53
43.51
48.76
53.2155.27 56.3
58.82 57.95 59.28
63.23
47.59
60.19
0
10
20
30
40
50
60
70
0
50
100
150
200
250
300
350
0 1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
Med. Time to solve % solved
On average, the percentage of solved labeled issues tends to
increase together with the number of labels used in the project, it may
confirm that the effort of categorizing issues is beneficial for the
project advancement
It might come at the cost of taking more time to solve those labeled
issues
ρ = 0.80
ρ = 0.73
Detecting families
bug build
contributiondocumentation
duplicate
0 - backlog1 - ready
2 - working
3 - done
docs
enhancement
invalid
urgent
priority-highhigh-priority
priority-low
question
priority-medium
usability
component-logic
component-notyi
component-ui
priority-low
component-mode-perl
component-ui-gtk
frontend-gtkfrontend-pango
milestone
imported
0.0.1
0.0.1
0.0.3
1.0.0.rc1
0.2.0
0.5.01.0.0 update
type-cleanup
p1 p2
p3
taken
fixeddiscuss milestone-release0.4
milestone-release0.7
performance
medium-priority
usability
wontfix
new
type-ask
Families?
low-priority
Detecting families
bug build
contributiondocumentation
duplicate
0 - backlog1 - ready
2 - working
3 - done
docs
enhancement
invalid
urgent
priority-highhigh-priority
priority-low
question
priority-medium
usability
component-logic
component-notyi
component-ui
priority-low
component-mode-perl
low-priority
component-ui-gtk
frontend-gtkfrontend-pango
milestone
imported
0.0.1
0.0.1
0.0.3
1.0.0.rc1
0.2.0
0.5.01.0.0 update
type-cleanup
p1 p2
p3
taken
fixeddiscuss milestone-release0.4
milestone-release0.7
performance
medium-priority
usability
wontfix
new
type-ask
Family # Labels % Projects
Priority 1,027 (2.33%) 4.33%
Version 2,703 (6.14%) 1.68%
Workflow 1,972 (4.48%) 5.67%
Architecture 1,104 (2.51%) 2.00%
0 - backlog
frontend-pangotype-cleanup
2 - working
enhancement
component-ui
component-ui-gtk milestone
imported1.0.0taken
milestone-release0.4
usabilitycontribution
documentationduplicate
3 - done
invalid
question
component-logic
component-notyicomponent-mode-perl
frontend-gtk
0.2.0
0.5.0update
milestone-release0.7
performancenew
type-ask
bug build 1 - ready docs
usability
0.0.1
0.0.1
0.0.3
1.0.0.rc1
p1 p2
p3
fixeddiscuss
wontfix
Detecting families
urgent
priority-highhigh-priority
priority-low
priority-medium
priority-low
low-priority
medium-priority
Family # Labels % Projects
Priority 1,027 (2.33%) 4.33%
Version 2,703 (6.14%) 1.68%
Workflow 1,972 (4.48%) 5.67%
Architecture 1,104 (2.51%) 2.00%
duplicate
component-ui-gtk
importedtaken
documentation
update
high-priority
frontend-pangotype-cleanup
2 - workingusability
contribution
3 - done
invalidfrontend-gtk
type-ask
0 - backlog
enhancement
component-ui
question
component-logic
component-notyicomponent-mode-perl
bug build 1 - ready docs
usability
p1 p2
p3
fixeddiscuss
wontfix urgent
priority-high
priority-low
priority-medium
priority-low
low-priority
medium-priority
performance
Detecting families
Family # Labels % Projects
Priority 1,027 (2.33%) 4.33%
Version 2,703 (6.14%) 1.68%
Workflow 1,972 (4.48%) 5.67%
Architecture 1,104 (2.51%) 2.00%
milestone
1.0.0
milestone-release0.4
0.2.0
0.5.0
milestone-release0.7
new
0.0.1
0.0.1
0.0.3
1.0.0.rc1
1.0.00.5.0
importedcomponent-notyi
fixed
milestone
milestone-release0.7
0.0.1
duplicate
type-cleanup
contribution
bug build
priority-low
medium-priority
new
0.0.3
component-ui-gtk
documentation
update
high-priority
frontend-pango
usability
invalidfrontend-gtk
type-askenhancement
component-ui
question
component-logic
component-mode-perl
docs
usability
discuss
wontfix urgent
priority-high
priority-low
priority-medium low-priority
milestone-release0.4
0.2.0
0.0.1
1.0.0.rc1
performance
Detecting families
Family # Labels % Projects
Priority 1,027 (2.33%) 4.33%
Version 2,703 (6.14%) 1.68%
Workflow 1,972 (4.48%) 5.67%
Architecture 1,104 (2.51%) 2.00%
2 - working
3 - done
0 - backlog1 - ready
p2
p3
taken
p1
taken 1.0.00.5.0
type-ask 3 - done
fixed
0.0.1bug build
priority-low
new
0.0.3docs
usability
priority-high
low-priority
milestone-release0.4
0.2.0
1.0.0.rc1
2 - working
1 - ready
p2
p3
p1
imported
milestone
duplicate
type-cleanup
contribution
medium-priority
documentation
update
high-priorityusability
invalid
enhancement
question
discuss
wontfix urgent
priority-low
priority-medium
0.0.1 0 - backlog
performance
Detecting families
Family # Labels % Projects
Priority 1,027 (2.33%) 4.33%
Version 2,703 (6.14%) 1.68%
Workflow 1,972 (4.48%) 5.67%
Architecture 1,104 (2.51%) 2.00%
component-notyi milestone-release0.7
component-ui-gtk
frontend-pango frontend-gtk
component-uicomponent-logic
component-mode-perl
Conclusion
• Label mechanism is scarcely used
• When used, it may have a positive impact in the project
• Confirmed the existence of families when using labels
• Further research is needed to better classify their use
• How families influence the project success
• Why projects choose a specific label family
• How labels evolve during the life-cycle of the project
• Perform the analysis to other web-based code hosting services
Early result
Future
Except where otherwise noted, content on this presentation is licensed under a Creative Commons Attribution 3.0 License.
Thanks!
Come to see our awesome
demostration!
Belén [email protected]
Jordi [email protected]
Javier L. Cánovas [email protected]
Valerio [email protected]
Label Usage (issues)
45150
17268
3915
1071 421 223 84 49 19 4 12
69.55
75.9479.65
82.18 84.31
78.1
84.65 84.64 83.57 8582.64
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 2 3 4 5 6 7 8 9 10 >10
Projects with 0 to 9 issues
9996
13203
8771
51213115
1995 1177 823 518 337 780
18.39
43.48
52.64
58.3662.72 63.68
66.88 67.8172.16
69.7473.85
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 2 3 4 5 6 7 8 9 10 >10
Projects with 10 to 99 issues
407 545 694 703 656 773 651 481 394 3631765
6.03
11.81
31.1528.68 30.52 31.52
39.1143.11 41.96 42.89
52.25
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 2 3 4 5 6 7 8 9 10 >10
Projects with 100 to 999 issues
8 10 10 15 14 20 22 25 24 19 361
28.67
14.41
5.78
14.4517.28
12.8
24.43 22.83
28.31
21.6
33.95
0
10
20
30
40
50
60
70
80
90
100
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 2 3 4 5 6 7 8 9 10 >10
Projects with more than 999 issues
Total: 68,216
Avg: 70.10%
Total: 46,836
Avg: 45,68%
Total: 7,432
Avg: 34.81%
Total: 528
Avg: 29.54%
Label Influence
795.4 808.6
937.3
998.5
11111060
1152 1139
982.9
1425
1148
46.1874.92
101.3 111.8145.7
116.4 127.2 116.470.4
306.4 148.1
0
200
400
600
800
1000
1200
1400
1600
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
Avg. Time to solve Med. Time to solve
Label Influence
4516
5867
7540
8646
9196
9729
9330
10610
10100
8321 8302
2577
3747
6346
7427
80818335 8268
9154
8612
72765918
0
2000
4000
6000
8000
10000
12000
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
Avg. Issue Age Med. Issue Age
Label Influence
43.51
48.76
53.2155.27 56.3
58.82 57.9559.28
63.23
47.59
60.19
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 >10
# labels used in the project
% Solved