test/methodology_EC.tex.backup at master · M4tt30rru/test · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
\section{Methodology}
\label{Methodology}
\textbf{In this work we aim at analysing software structure using the network associate to a software systems.
Firstly, we need to build the network associated to a software system by parsing its source code, retrieved
from the corresponding Software Control Managers (SCM). Then we associated its nodes to classes and edges to the
several relationships between classes (inheritance, composition, etc.).
We consider as a main indicator of a software quality, the number of defects (bugs) that it presents. We collect
this information by mining software's Bug Tracking Systems (BTS).
In order to associate to each class its corresponding classes we mine commits on the software SCM to figure out
which classes is correctly associated to a bug. At the end we obtained a network where each node is annotated
with the number of bugs for the associated class.
% Specifically we are interested in extracting the community structure of a software system in order
% to figure out its modular organization. Moreover, we are interested in computing the modularity Q associated
% to a community structure \ref{}, the number of communities, and the clustering coefficient.
% In order to compute the metrics related to the community structure, we
% need to build the networks to associate to the software systems. This is done
% by parsing the source code retrieved from Software Configuration Management
% (SCM) repositories, in order to extract the various relationships among classes
% and files.
% These relationships could be inheritance, composition, dependencies,
% aggregation, association and so on. We considered Java classes as nodes of
% the software network, while we considered the relationships among classes as
% network edges.
% Once we retrieved the networks, we collected software issues
% by mining bug repositories, in order to associate to each node in the network
% the corresponding defects. Finally we analyzed the community structure of the
% software networks, computing different community metrics and some software
% metrics.
We collected the source code and analysed 5 releases of NetBeans, whose main feature are presented in Table \ref{tab:Eclipse}.}
% We collected the source code of NetBeans and Eclipse from the CVS repository.
% We analyzed 6 releases of NetBeans and 5 releases of Eclipse. In Table \ref{tab:Eclipse}
% we report their main features.


\begin{table}[h]
\begin{center}
\scalebox{0.9}
{
\begin{tabular}{|l|c|c|c|c|c|c|}
\hline
Release & NB 3.2 & NB 3.2.1 & NB 3.3.0 & NB 3.4 & NB 4.0 & NB 6.0.1\\ \hline
Size & 4333 & 4348 & 5678 & 7520 & 11866 & 34591 \\

Sub-Projects n.& 38 & 38 & 39 & 42 & 41 & 56 \\

N. of defects & 14948 & 15043 & 19218 & 21529 & 26592 & 73230 \\ \hline


\end{tabular}
}


% \scalebox{0.9}
% {
% \begin{tabular}{|l|c|c|c|c|c|}
% \hline
% Release & Eclipse 2.1 & Eclipse 3.0 & Eclipse 3.1 & Eclipse 3.2 & Eclipse 3.3 \\\hline
% Size & 8257 & 11406 & 13413 & 16013 & 17517 \\
%
% Sub-Projects n.& 49 & 66 & 70 & 86 & 104 \\
%
% N. of defects & 47788 & 59804 & 69900 & 80149 & 95337  \\ \hline
%
%
% \end{tabular}
% }
\end{center}
\caption{Main features of the analysed releases of NetBeans (NB): size (number of classes), number of sub-projects (sub-networks), and total number of defects.}
\label{tab:Eclipse}

\end{table}

\textbf{Each release is structured in almost independent sub-projects, thus the total number of sub-projects analysed amounts 254, with more than 68336 nodes (classes)
and more than 170000 defects.} % 170623

\textbf{We performed the computation of the community structure using the algorithm devised by Clauset et al. \cite{Clauset:2004}.
This is an agglomerative clustering algorithm that perform a greedy optimization of the Modularity (Q) \cite{Newman:2004}.
At the end we retrieved the number of communities in which the network is structured, the corresponding value for Q and the nodes associated to each community.
We performed the computation of the clustering coefficient using the implementation included in the IGraph package \cite{} for R software\cite{}.
To study the evolution of the system we use the following approach. We carried out the
analysis for both each release, and than putting together different releases, according to a temporal evolution. Specifically of the fifth releases of our dataset, we
studied the evolution of the system by cumulating the first and the second releases, then adding the third release to the first set, and so on. This way we were able
to make predictions about the last release starting from those previously cumulated}.