projet-cpp-git-alglin/project.tex at main · master-csmi/projet-cpp-git-alglin · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
\documentclass[12pt]{article}
\usepackage{a4wide}
\usepackage{amsmath,amssymb}
\usepackage{bm}
\usepackage[colorlinks]{hyperref}
\newcommand{\vect}[1]{\hat{\boldsymbol{#1}}}
\title{C++ Project using Eigen3}
\usepackage{minted}
\begin{document}
    \maketitle

\tableofcontents

\section{Solving linear regression tasks}

Assume that we have a dataset, $\left\{y_{i}, x_{i 1}, \ldots, x_{i p}\right\}_{i=1}^{n}$, so that we can express the linear relation between $y$ and $x$ with mathematical formula in the following way:

\begin{equation}
y_{i}=\beta_{0} 1+\beta_{1} * x_{i 1}+\ldots+\beta_{p+1} x_{i p}+\epsilon_{i}=x_{i}^{T} \beta+\epsilon_{i}, i=1, \ldots, n
\end{equation}

Here, $p$ is the dimension of the independent variable, and $T$ denotes the transpose, so that $x_{i}^{T} \beta$ is the inner product between vectors $x_{i}$ and $\beta$. Also, we can rewrite the previous expression in matrix notation, as follows:

\begin{equation}
y=X \beta+\epsilon
\end{equation}


$$
y=\left(\begin{array}{c}
y_{1} \\
y_{2} \\
\vdots \\
y_{n}
\end{array}\right) X=\left(\begin{array}{c}
x_{1}^{T} \\
x_{2}^{T} \\
\vdots \\
x_{n}^{T}
\end{array}\right)=\left(\begin{array}{cccc}
1 & x_{11} & \ldots & x_{1 p} \\
1 & x_{21} & \ldots & x_{2 p} \\
\vdots & \vdots & \ddots & \vdots \\
1 & x_{n 1} & \ldots & x_{n p}
\end{array}\right) \beta=\left(\begin{array}{c}
\beta_{0} \\
\beta_{1} \\
\vdots \\
\beta_{p}
\end{array}\right) \epsilon=\left(\begin{array}{c}
\epsilon_{1} \\
\epsilon_{2} \\
\vdots \\
\epsilon_{n}
\end{array}\right)
$$

The preceding matrix notation can be explained as follows:
\begin{itemize}
    \item $y:$ This is a vector of observed target values.
    \item $x:$ This is a matrix of row-vectors, $x_{i},$ which are known as explanatory or independent values.
    \item $\beta$: This is a $(p+1)$ dimensional parameters vector.
    \item $\varepsilon$ : This is called an error term or noise. This variable captures all other factors that influence the $y$ dependent variable other than the regressors.
\end{itemize}

When we are considering simple linear regression, $p$ is equal to 1 , and the equation will look like this:
$$
y_{i}=\beta_{0} 1+\beta_{1} x_{i}+\epsilon_{i}
$$
The goal of the linear regression task is to find parameter vectors that satisfy the previous equation. Usually, there is no exact solution to such a system of linear equations, so the task is to estimate parameters that satisfy these equations with some assumptions. One of the most popular estimation approaches is one based on the principle of least squares:
minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function. This is called the ordinary least squares (OLS) estimator. So, the task can be formulated with the following formula:
$$
\hat{\beta}=\operatorname{argmin}_{\beta} S(\beta)
$$

This minimization problem has a unique solution, in the case that the $p$ columns of the $x$ matrix are linearly independent. We can get this solution by solving the normal equation, as follows:
$$
\beta=\left(X^{T} X\right)^{-1} X^{T} y
$$
Linear algebra libraries can solve such equations directly with an analytical approach, but it has one significant disadvantage-computational cost. In the case of large dimensions of $y$ and $x$, requirements for computer memory amount and computational time are too big to solve real-world tasks.

So, usually, this minimization task is solved with iterative approaches. Gradient descent (GD) is an example of such an algorithm. GD is a technique based on the observation that if the function $S(\beta)$ is defined and is differentiable in a neighborhood of a point $\beta,$ then $S(\beta)$ decreases fastest when it goes in the direction of the negative gradient of $S$ at the point $\beta$

We can change our $S(\beta)$ objective function to a form more suitable for an iterative approach. We can use the mean squared error (MSE) function, which measures the difference between the estimator and the estimated value, as illustrated here:
$$
S(\beta)=\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-X_{i} \beta\right)^{2}
$$
In the case of the multiple regression, we take partial derivatives for this function for each of $x$ components, as follows:
$$
\frac{\partial S}{\partial \beta_{i}}
$$
So, in the case of the linear regression, we take the following derivatives:
$$
\begin{array}{l}
\frac{\partial S}{\partial \beta_{0}}=\frac{2}{n} \sum_{i=1}^{n}\left(X_{i} \beta-y_{i}\right) \\
\frac{\partial S}{\partial \beta_{1}}=\frac{2}{n} \sum_{i=1}^{n}\left(X_{i} \beta-y_{i}\right) X_{i}
\end{array}
$$

The whole algorithm has the following description:
\begin{enumerate}
    \item  Initialize $\beta$ with zeros.
    \item Define a value for the learning rate parameter that controls how much we are adjusting parameters during the learning procedure.
    \item Calculate the following values of $\beta$
    $$
    \begin{array}{l}
    \beta_{0}=\beta_{0}-\gamma \frac{2}{n} \sum_{i=1}^{n}\left(X_{i} \beta-y_{i}\right) \\
    \beta_{1}=\beta_{1}-\gamma \frac{2}{n} \sum_{i=1}^{n}\left(X_{i} \beta-y_{i}\right) X_{i}
    \end{array}
    $$
    \item Repeat steps $1-3$ for a number of times or until the MSE value reaches a reasonable amount.
\end{enumerate}

\subsection{Project}
Implement using the library Eigen two different ways to solve linear regression tasks and at least one must be iterative.

You will test automatically the algorithms with respect to  data sets below

\begin{itemize}
\item \texttt{data/d1.txt}: brain and body weight, 62rows, 3 columns;
\item \texttt{data/d2.txt}: height, weight, catheter length, 12 rows, 4 columns;
\item \texttt{data/d3.txt}: age, blood pressure, 30 rows,4 columns;
\end{itemize}
Some explanations and references are available in the files.

Use
\begin{itemize}
    \item C++ for implementation
    \item Github to collaborate and share the code
    \item cmake to compile
    \item ctest to test
\end{itemize}


\section{Hilbert Matrices}

We give ourselves the $A$ matrix (Hilbert's matrix of order 10)
\begin{minted}{cpp}
>> A=hilb(10) = 1/(i+j+1);
\end{minted}
and the linear system $A\vect{x}=\vect{b}$, where \verb|b=A*ones(10,1)|.
The exact solution $\vect{x}$ of this system is therefore the vector \verb|ones(10,1)|.

\begin{itemize}
    \item  Check that the matrix $A$ is symmetric and
of positive definition.
    \item Calculate the factorization $LU$ from $A$ using
the command matlab {\tt lu} in the form {\tt [L,U,P]=lu(A)} and then
the $\backslash$ order
to solve the two triangular systems (be careful not to
forget the permutation matrix).

Up to which number after the decimal point the components of
the calculated solution are accurate? Explain briefly
why we can't get a very accurate result
knowing that the eigen command to calculate the number of
conditioning of a matrix can be implemented as follows:

\begin{minted}[]{cpp}
JacobiSVD<MatrixXd> svd(A);
double cond = svd.singularValues()(0) / svd.singularValues()(svd.singularValues().size()-1);
\end{minted}

\item To factorize $A$ could we have used the
factorization of Cholesky? What are the advantages?
So use - if possible - the matlab command {\tt chol} (which calculates a matrix
$H$ such that $A=H^\top H$) to resolve the system $Avect{x}=\vect{b}$ and examine the accuracy of the result.

\end{itemize}

\section{Introduction to Eigen}

Eigen is a general-purpose linear algebra C++ library. In Eigen, all matrices and vectors are objects of the Matrix template class, and the vector is a specialization of the matrix type, with either one row or one column. Tensor objects are not presented in official APIs but exist as submodules.

We can define the type for a matrix with known dimensions and floating-point data type like this:

\begin{minted}{cpp}
typedef Eigen::Matrix<float, 3, 3> MyMatrix33f;
\end{minted}

We can define a vector in the following way:

\begin{minted}{cpp}
typedef Eigen::Matrix<float, 3, 1> MyVector3f;
\end{minted}

Eigen already has a lot of predefined types for vector and matrix objects- -for example, Eigen: : Matrix3f (floating-point $3 \times 3$ matrix type) or Eigen : : RowVector2f (floating-point $1 \times 2$ vector type). Also, Eigen is not limited to matrices whose dimensions we know at compile time. We can define matrix types that will take the number of rows or columns at initialization during runtime. To define such types, we can use a special type variable for the Matrix class template argument named Eigen: : Dynamic. For example, to define a matrix of doubles with dynamic dimensions, we can use the following definition:

\begin{minted}{cpp}
typedef Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic> MyMatrix;
\end{minted}

Objects initialized from the types we defined will look like this:

\begin{minted}[]{cpp}
MyMatrix33f a;
MyVector3f v;
MyMatrix m(10,15);
\end{minted}


To put some values into these objects, we can use several approaches. We can use special predefined initialization functions, as follows:
\begin{minted}[]{cpp}
a = MyMatrix33f::Zero(); // fill matrix elements with zeros
a = MyMatrix33f::Identity(); // fill matrix as Identity matrix
v = MyVector3f::Random(); // fill matrix elements with random values
\end{minted}
We can use the comma-initializer syntax, as follows:
\begin{minted}[]{cpp}
    a << 1,2,3,
        4,5,6,
        7,8,9;
\end{minted}
This code construction initializes the matrix values in the following way:
$$
\left[\begin{array}{lll}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{array}\right]
$$
We can use direct element access to set or change matrix coefficients. The following code sample shows how to use the ( ) operator for such an operation:
\begin{minted}[]{cpp}
a(0,0) = 3;
\end{minted}

We can use the object of the Map type to wrap an existent $\mathrm{C}++$ array or vector in the Matrix type object. This kind of mapping object will use memory and values from the underlying object, and will not allocate the additional memory and copy the values. The following snippet shows how to use the Map type:

\begin{minted}[]{cpp}
int data[] = {1,2,3,4};
Eigen::Map<Eigen::RowVectorxi> v(data,4);
std::vector<float> data = {1,2,3,4,5,6,7,8,9};
Eigen::Map<MyMatrix33f> a(data.data());
\end{minted}

We can use initialized matrix objects in mathematical operations. Matrix and vector arithmetic operations in the Eigen library are offered either through overloads of standard C++ arithmetic operators such as $+,-, \star,$ or through methods such as dot () and cross (). The following code sample shows how to express general math operations in Eigen:

\begin{minted}[]{cpp}
using namespace Eigen;
auto a = Matrix2d::Random();
auto b = Matrix2d::Random();
auto result = a + b;
result = a.array() * b.array(); // element wise multiplication result = a.array() / b.array();
a += b;
result = a * b; // matrix multiplication
//Also it’s possible to use scalars:
a = b.array() * 4;
\end{minted}

Notice that in Eigen, arithmetic operators such as operator+ do not perform any computation by themselves. These operators return an expression object, which describes what computation to perform. The actual computation happens later when the whole expression is evaluated, typically in the operator= arithmetic operator. It can lead to some strange behaviors, primarily if a developer uses the auto keyword too frequently.

Sometimes, we need to perform operations only on a part of the matrix. For this purpose, Eigen provides the block method, which takes four parameters: $i, j, p, q .$ These parameters are the block size $p, q$ and the starting point $i, j .$ The following code shows how to use this method:

\begin{minted}[]{cpp}
Eigen::Matrixxf m(4,4);
Eigen::Matrix2f b = m.block(1,1,2,2); // copying the middle part of matrix
m.block(1,1,2,2) *= 4; // change values in original matrix
\end{minted}

There are two more methods to access rows and columns by index, which are also a type of block operation. The following snippet shows how to use the $\mathrm{col}$ and the row methods:

\begin{minted}[]{cpp}
m.row(1).array() += 3;
m.col(2).array() /= 4;
\end{minted}

Another important feature of linear algebra libraries is broadcasting, and Eigen supports this with the colwise and rowwise methods. Broadcasting can be interpreted as a matrix by replicating it in one direction. Take a look at the following example of how to add a vector to each column of the matrix:

\begin{minted}[]{cpp}
Eigen::Matrixxf mat(2,4);
Eigen::Vectorxf v(2); // column vector
mat.colwise() += v;
\end{minted}

This operation has the following result:
$$\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6\end{array}\right] \mbox{.colwise()}+\left[\begin{array}{l}0 \\ 1\end{array}\right]=\left[\begin{array}{lll}1 & 2 & 3 \\ 5 & 6 & 7\end{array}\right]$$
\end{document}