dtim-upc · monadplus · Mar 31, 2022 · Apr 5, 2022
diff --git a/Distributed_Complex_Event_Recognition-Arnau_Abela.pdf b/Distributed_Complex_Event_Recognition-Arnau_Abela.pdf
diff --git a/docs/thesis/chapters/chapter_3.tex b/docs/thesis/chapters/chapter_3.tex
@@ -144,9 +144,9 @@ \section{Computational model}\label{sec:cea}
 
 \section{Timed Enumerable Compact Set}\label{sec:data_structure}
 
-The data structure that lazily represents the set of partial matches in CORE is called \acrfull{tecs}. A tECS is a \emph{directed acyclic graph (DAG)} $\mathcal{E}$ with two types of nodes: \emph{union nodes} and \emph{non-union nodes}. Union nodes have two children: \code{left} and \code{right}. Non-union nodes are labelled by a stream position and are divided in \emph{output nodes} and \emph{bottom nodes}. The former have exactly one child and the latter have none. To simplify presentation in what follows, we write nodes of any kind as \textrm{n}, bottom, output and union nodes as \textrm{b, o, u}, respectively, and we denote the sets of bottom, output and union nodes by $N_{B}$, $N_{O}$ and $N_{U}$, respectively.
-
-For a node \textrm{n}, define its \emph{descending-paths}, denoted \code{paths(\textrm{n})}, recursively as follows: if \textrm{n} is a bottom node, then \code{paths(\textrm{n})} = 1; if \textrm{n} is an output node, then \code{paths(\textrm{n}) = paths(next(\textrm{n}))}; if \textrm{n} is a union node, \code{paths(\textrm{n}) = paths(left(\textrm{n})) + paths(right(\textrm{n}))}. Every node \textrm{n} carries paths(\textrm{n}) as an extra label; thus the descending-paths can be retrieved in constant time. The descending-paths attribute is going to be used during the enumeration phase of the distributed evaluation algorithm to balance the workload of each processing unit.
+The data structure that lazily represents the set of partial matches in CORE is called \acrfull{tecs}. A tECS is a \emph{directed acyclic graph (DAG)} $\mathcal{E}$ with two types of nodes: \emph{union nodes} and \emph{non-union nodes}. Union nodes have two children: \code{left} and \code{right}. Non-union nodes are labelled by a stream position and are divided in \emph{output nodes} and \emph{bottom nodes}. The former have exactly one child and the latter have none.
+{\color{Comment} We should integrate descending-paths here (or at least mention it)}.
+To simplify presentation in what follows, we write nodes of any kind as \textrm{n}, bottom, output and union nodes as \textrm{b, o, u}, respectively, and we denote the sets of bottom, output and union nodes by $N_{B}$, $N_{O}$ and $N_{U}$, respectively. {\color{Comment} missing definition of $pos$, $next$, $right$ and $left$}.
 
 An \emph{open complex event}, denoted $(i, D)$, is a complex event $([i, j], D)$, where the closing event $j$ hasn't been reached yet. A \acrshort{tecs} represents sets of open complex events. Let $\bar{p} = n_{1}, n_{2}, \ldots, n_{k}$ be a \emph{full-path} in $\mathcal{E}$ such that $n_{k}$ is a bottom node. Then $\bar{p}$ represents the open complex event ${\llbracket \bar{p} \rrbracket}_{\mathcal{E}} = (i, D)$, where $i = pos(n_{k})$ is the label of bottom node $n_{k}$ and $D$ are the labels of the other non-union nodes in $\bar{p}$. Given a node \textrm{n} in $\mathcal{E}$, ${\llbracket \textrm{n} \rrbracket}_{\mathcal{E}}$ is the set of open complex events represented by \textrm{n} and consists of all open complex events ${\llbracket \bar{p} \rrbracket}_{\mathcal{E}}$ with $\bar{p}$ a full-path in $\mathcal{E}$ starting at \textrm{n}.
 
@@ -240,6 +240,44 @@ \section{Auxiliary data structures}\label{sec:auxiliary_data_structure}
 
 We assume that hash table lookups and insertions take constant time, and iterating over has constant delay.
 
+{\color{Added}
+\section{Descending-paths}\label{sec:desc-path}
+
+{\color{Comment} I don't think we need an exclusive section for this, but when I tried to integrate this into the description of the tECS it was too much information interrupting the flow of the description}
+
+For a node \textrm{n}, define its \emph{descending-paths}, denoted \code{paths(\textrm{n})}, as a binary relation $R$ between event position and the number of paths starting from that position that reaches \textrm{n}. This binary relation is defined recursively as follows:
+
+\[
+paths(\textrm{n}) =
+\begin{cases}
+  \{(pos(\textrm{n}), 1)\} &\quad \textrm{n} \in N_{B}  \\
+  paths(next(\textrm{n}))  &\quad \textrm{n} \in N_{O}  \\
+  paths(left(\textrm{n})) \cup paths(right(\textrm{n})) &\quad \textrm{n} \in N_{U}  \\
+\end{cases}
+\]
+
+We define the union of two binary relations $R := R_{1} \cup R_{2}$ as follows:
+
+\begin{align*}
+  R := & \{ \ (x_{1}, y_{1} + y_{2}) \ | \ \forall (x_{1}, y_{1}) \in R_{1}, \exists (x_{2}, y_{2}) \in R_{2}, x_{1} = x_{2} \ \} \\
+       & \cup \{ \ (x_{1}, y_{1}) \ | \ \forall (x_{1}, y_{1}) \in R_{1}, \nexists (x_{2}, y_{2}) \in R_{2} \ \} \\
+       & \cup \{ \ (x_{2}, y_{2}) \ | \ \forall (x_{2}, y_{2}) \in R_{2}, \nexists (x_{1}, y_{1}) \in R_{1} \ \}
+\end{align*}
+
+For example, suppose node \textrm{n} has two paths: one starting at position $0$ and the other starting at position $2$, then $paths(\textrm{n}) = {\{0 \to 1, 2 \to 1\}}$. The descending-paths attribute is used during the enumeration phase of the distributed evaluation algorithm to balance the workload of each processing unit.
+
+Every node \textrm{n} carries paths(\textrm{n}) as an extra attribute; thus the descending-paths can be retrieved in constant time. This attribute will contain a pointer to a hash table that encodes the descending-path binary relation efficiently. Additionally, we define the method \code{pathsc(n, $\tau$)} that counts the number of paths starting after position $\tau$. This method will be useful for the enumeration phase of the evaluation algorithm \ref{sec:enumeration}.
+
+Notice, the size of this hash table may grow linearly with respect to the length of the stream. During the evaluation algorithm \ref{sec:evaluation}, to keep the size constant, we will only preserve paths that are inside the time window. The hash table can be efficiently generated as follows:
+
+\begin{itemize}
+  \item Bottom nodes will create a new hash table with a single entry corresponding to the current position.
+  \item Output nodes will point to the hash table of their child node.
+  \item Union nodes will create a fresh hash table and initialize it with all binary relations from \code{left(\textrm{u})} and \code{right(\textrm{u})} such that $\{(x, y) \ | \ x \ge \tau\}$, where $\tau = j - \epsilon$, $j$ is the current position and $\epsilon$ is the time window.
+\end{itemize}
+
+All three operations can be computed in constant time $\mathcal{O}(\tau)$.
+}
 
 \section{Chapter summary}
 

diff --git a/docs/thesis/chapters/chapter_5.tex b/docs/thesis/chapters/chapter_5.tex
@@ -151,7 +151,7 @@ \section{The Enumeration procedure}\label{sec:enumeration}
   \SetKwProg{Procedure}{procedure}{}{}
   \SetKwFunction{Enumerate}{\textsc{Enumerate}}
   \Procedure{\Enumerate{$\mathcal{E}, n, \epsilon, j, p$}}{
-    $\delta \leftarrow \lceil \text{paths(n)} \ / \ {|\mathcal{P}|} \rceil$\;
+    {\color{Added}$\delta \leftarrow \lceil \text{pathsc(n, }\tau) \ / \ {|\mathcal{P}|} \rceil$}\;
     $\sigma \leftarrow \text{index}(p) \cdot \delta$\;
     st $\leftarrow$ new-stack()\;
     $\tau \leftarrow j - \epsilon $\;
@@ -168,23 +168,21 @@ \section{The Enumeration procedure}\label{sec:enumeration}
           $P \leftarrow P \ \cup $ {pos($n'$)}\;
           $n' \leftarrow $ next($n'$)\;
         }
-        \ElseIf{$n' \in N_{U}$}{
-          \If{$max(right(n')) \ge \tau$}{
-            \eIf{$paths(left(n')) > \sigma'$}{
-              $\delta'' \leftarrow \delta' - max(0, paths(left(n')) - \sigma')$\;
-            }{
-              $\delta'' \leftarrow \delta'$\;
+        {\color{Added}
+          \ElseIf{$n' \in N_{U}$}{
+            \If{$max(right(n')) \ge \tau$}{\label{line:enumeration:if:1}
+              $\delta'' \leftarrow \delta' - max(0, pathsc(left(n'), \tau) - \sigma')$\;
+              \If{$\delta'' > 0$}{\label{line:enumeration:if:2}
+                $\sigma'' \leftarrow \sigma' - pathsc(left(n'), \tau)$\;
+                push(st, $\langle$right($n'$), $P$, $\sigma''$, $\delta''\rangle$)\;
+              }
             }
-            $\sigma'' \leftarrow max(0, \sigma' - paths(left(n')))$\;
-            \If{$paths(right(n')) > \sigma'' \land \delta'' > 0$}{\label{line:enumeration:if:1}
-              push(st, $\langle$right($n'$), $P$, $\sigma''$, $\delta''\rangle$)\;
+            \eIf{$pathsc(left(n'), \tau) > \sigma'$}{\label{line:enumeration:if:3}
+              $n' \leftarrow left(n')$\;
+            }{
+              \textbf{break}\;
             }
           }
-          \eIf{$paths(left(n')) > \sigma'$}{\label{line:enumeration:if:2}
-            $n' \leftarrow left(n')$\;
-          }{
-            \textbf{break}\;
-          }
         }
       }
     }
@@ -219,9 +217,13 @@ \section{The Enumeration procedure}\label{sec:enumeration}
 \end{figure}
 
 
-\aref{algo:enumeration} receives as an input a \acrshort{tecs}, a node $n$, a time-window $\epsilon$, a position $j$, and a process $p$ and traverses a fraction of $G_{\mathcal{E}}$ in a DFS-way left-to-right order. First, computes the parameters $\sigma, \delta$ corresponding to the starting and ending path to enumerate, respectively. The value of these parameters can be computed statically i.e. without message interchanging.  Each iteration of the \code{while} of line~\ref{line:enumeration:while:1} traverses a new path starting from the point it branches from the previous path (except for the first iteration). For this, the stack $st$ is used to store the node and partial complex event of that branching point. Then, the \code{while} of line~\ref{line:enumeration:while:2} traverses through the nodes of the next path, following the left direction whenever a union node is reached and adding the right node to the stack whenever need. The \code{if}s of line~\ref{line:enumeration:if:1}~and~line~\ref{line:enumeration:if:2} make sure that enumeration starts on path $\pi_{\sigma}$ so only $paths_{\ge j - \epsilon, \sigma, \delta}$ are traversed. Moreover, by checking for every node $n'$ its value $max(n')$ before adding it to the stack, it makes sure of only going through paths in $paths_{\ge j - \epsilon}$.
+{\color{Added} \aref{algo:enumeration} receives as an input a \acrshort{tecs}, a node $n$, a time-window $\epsilon$, a position $j$, and a process $p$ and traverses a fraction of $G_{\mathcal{E}}$ in a DFS-way left-to-right order. First, it computes the parameters $\sigma$ and $\delta$ corresponding to the starting and ending path to enumerate using the method \code{pathsc(n, $j - \epsilon$)}. This method can be executed in constant time. We remark that the value of parameters  $\sigma$ and $\delta$ can be computed statically and locally before the execution of the algorithm on each node.}
+
+The enumeration starts from the root node $n$. Each iteration of the \code{while} of line~\ref{line:enumeration:while:1} traverses a new path starting from the point it branches from the previous path. The stack $st$ is used to store the node, the partial complex event, and the parameters $\sigma, \delta$ of that branching point. {\color{Added} The \code{while} of line~\ref{line:enumeration:while:2} traverses through the nodes of the next path, following the left direction whenever a union node is reached and adding the right node to the stack whenever need. The \code{if} of line~\ref{line:enumeration:if:1} guarantees that traversed paths $\in paths_{\ge \tau}$ i.e. paths outside the time window are skipped. The \code{if}s of line~\ref{line:enumeration:if:2}~and~\ref{line:enumeration:if:3} assert that we enumerate at most $\delta$ paths and the enumeration starts on path $\pi_{\sigma}$, respectively. All three conditionals guarantee that only paths $\in paths_{\ge \tau, \sigma, \delta}$ are enumerated on process $p$.}
 
+\begin{remark*}
 A simpler recursive algorithm could have been used, however, the constant-delay output might not be guaranteed because the number of backtracking steps after branching might be as long as the longest path of $G_{\mathcal{E}}$. To guarantee constant steps after branching and assure constant-delay output, \aref{algo:enumeration} uses a stack which allows to jump immediately to the next branch. We assume that storing $P$ in the stack takes constant time. We materialize this assumption by modelling $P$ as a linked list of positions, where the list is ordered by the last element added. To update $P$ with position $i$, we only need to create a node $i$ that points to the previous last element of $P$. Then, storing $P$ on the stack is just storing the pointer of the last element added.
+\end{remark*}
 
 This concludes the presentation of Algorithm~\ref{algo:enumeration}. In the reminding of this section, we prove properties (1), (2) and (3).
 

diff --git a/docs/thesis/preamble.tex b/docs/thesis/preamble.tex
@@ -93,6 +93,8 @@
 
 \usepackage{xcolor} % \definecolor, \color{codegray}
 \definecolor{codegray}{rgb}{0.9, 0.9, 0.9}
+\colorlet{Added}{green!70!black}
+\colorlet{Comment}{red!50!yellow}
 % \color{codegray} ... ...
 % \textcolor{red}{easily}
 
@@ -103,8 +105,8 @@
 %%%% Extras
 
 \usepackage{multirow}
-\usepackage{todonotes}
 \usepackage{adjustbox}
+\usepackage{soul}
 
 %%%% Glossaries & Acronyms
 

diff --git a/docs/thesis/thesis.cls b/docs/thesis/thesis.cls
@@ -149,6 +149,7 @@
 \newtheorem{definition}[theorem]{Definition}
 \theoremstyle{remark}
 \newtheorem{remark}[theorem]{Remark}
+\newtheorem*{remark*}{Remark}
 \usepackage[centerlast,small,sc]{caption}
 \setlength{\captionmargin}{20pt}
 \newcommand{\fref}[1]{Figure~\ref{#1}}