Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
44 changes: 41 additions & 3 deletions docs/thesis/chapters/chapter_3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ \section{Computational model}\label{sec:cea}

\section{Timed Enumerable Compact Set}\label{sec:data_structure}

The data structure that lazily represents the set of partial matches in CORE is called \acrfull{tecs}. A tECS is a \emph{directed acyclic graph (DAG)} $\mathcal{E}$ with two types of nodes: \emph{union nodes} and \emph{non-union nodes}. Union nodes have two children: \code{left} and \code{right}. Non-union nodes are labelled by a stream position and are divided in \emph{output nodes} and \emph{bottom nodes}. The former have exactly one child and the latter have none. To simplify presentation in what follows, we write nodes of any kind as \textrm{n}, bottom, output and union nodes as \textrm{b, o, u}, respectively, and we denote the sets of bottom, output and union nodes by $N_{B}$, $N_{O}$ and $N_{U}$, respectively.

For a node \textrm{n}, define its \emph{descending-paths}, denoted \code{paths(\textrm{n})}, recursively as follows: if \textrm{n} is a bottom node, then \code{paths(\textrm{n})} = 1; if \textrm{n} is an output node, then \code{paths(\textrm{n}) = paths(next(\textrm{n}))}; if \textrm{n} is a union node, \code{paths(\textrm{n}) = paths(left(\textrm{n})) + paths(right(\textrm{n}))}. Every node \textrm{n} carries paths(\textrm{n}) as an extra label; thus the descending-paths can be retrieved in constant time. The descending-paths attribute is going to be used during the enumeration phase of the distributed evaluation algorithm to balance the workload of each processing unit.
The data structure that lazily represents the set of partial matches in CORE is called \acrfull{tecs}. A tECS is a \emph{directed acyclic graph (DAG)} $\mathcal{E}$ with two types of nodes: \emph{union nodes} and \emph{non-union nodes}. Union nodes have two children: \code{left} and \code{right}. Non-union nodes are labelled by a stream position and are divided in \emph{output nodes} and \emph{bottom nodes}. The former have exactly one child and the latter have none.
{\color{Comment} We should integrate descending-paths here (or at least mention it)}.
To simplify presentation in what follows, we write nodes of any kind as \textrm{n}, bottom, output and union nodes as \textrm{b, o, u}, respectively, and we denote the sets of bottom, output and union nodes by $N_{B}$, $N_{O}$ and $N_{U}$, respectively. {\color{Comment} missing definition of $pos$, $next$, $right$ and $left$}.

An \emph{open complex event}, denoted $(i, D)$, is a complex event $([i, j], D)$, where the closing event $j$ hasn't been reached yet. A \acrshort{tecs} represents sets of open complex events. Let $\bar{p} = n_{1}, n_{2}, \ldots, n_{k}$ be a \emph{full-path} in $\mathcal{E}$ such that $n_{k}$ is a bottom node. Then $\bar{p}$ represents the open complex event ${\llbracket \bar{p} \rrbracket}_{\mathcal{E}} = (i, D)$, where $i = pos(n_{k})$ is the label of bottom node $n_{k}$ and $D$ are the labels of the other non-union nodes in $\bar{p}$. Given a node \textrm{n} in $\mathcal{E}$, ${\llbracket \textrm{n} \rrbracket}_{\mathcal{E}}$ is the set of open complex events represented by \textrm{n} and consists of all open complex events ${\llbracket \bar{p} \rrbracket}_{\mathcal{E}}$ with $\bar{p}$ a full-path in $\mathcal{E}$ starting at \textrm{n}.

Expand Down Expand Up @@ -240,6 +240,44 @@ \section{Auxiliary data structures}\label{sec:auxiliary_data_structure}

We assume that hash table lookups and insertions take constant time, and iterating over has constant delay.

{\color{Added}
\section{Descending-paths}\label{sec:desc-path}

{\color{Comment} I don't think we need an exclusive section for this, but when I tried to integrate this into the description of the tECS it was too much information interrupting the flow of the description}

For a node \textrm{n}, define its \emph{descending-paths}, denoted \code{paths(\textrm{n})}, as a binary relation $R$ between event position and the number of paths starting from that position that reaches \textrm{n}. This binary relation is defined recursively as follows:

\[
paths(\textrm{n}) =
\begin{cases}
\{(pos(\textrm{n}), 1)\} &\quad \textrm{n} \in N_{B} \\
paths(next(\textrm{n})) &\quad \textrm{n} \in N_{O} \\
paths(left(\textrm{n})) \cup paths(right(\textrm{n})) &\quad \textrm{n} \in N_{U} \\
\end{cases}
\]

We define the union of two binary relations $R := R_{1} \cup R_{2}$ as follows:

\begin{align*}
R := & \{ \ (x_{1}, y_{1} + y_{2}) \ | \ \forall (x_{1}, y_{1}) \in R_{1}, \exists (x_{2}, y_{2}) \in R_{2}, x_{1} = x_{2} \ \} \\
& \cup \{ \ (x_{1}, y_{1}) \ | \ \forall (x_{1}, y_{1}) \in R_{1}, \nexists (x_{2}, y_{2}) \in R_{2} \ \} \\
& \cup \{ \ (x_{2}, y_{2}) \ | \ \forall (x_{2}, y_{2}) \in R_{2}, \nexists (x_{1}, y_{1}) \in R_{1} \ \}
\end{align*}

For example, suppose node \textrm{n} has two paths: one starting at position $0$ and the other starting at position $2$, then $paths(\textrm{n}) = {\{0 \to 1, 2 \to 1\}}$. The descending-paths attribute is used during the enumeration phase of the distributed evaluation algorithm to balance the workload of each processing unit.

Every node \textrm{n} carries paths(\textrm{n}) as an extra attribute; thus the descending-paths can be retrieved in constant time. This attribute will contain a pointer to a hash table that encodes the descending-path binary relation efficiently. Additionally, we define the method \code{pathsc(n, $\tau$)} that counts the number of paths starting after position $\tau$. This method will be useful for the enumeration phase of the evaluation algorithm \ref{sec:enumeration}.

Notice, the size of this hash table may grow linearly with respect to the length of the stream. During the evaluation algorithm \ref{sec:evaluation}, to keep the size constant, we will only preserve paths that are inside the time window. The hash table can be efficiently generated as follows:

\begin{itemize}
\item Bottom nodes will create a new hash table with a single entry corresponding to the current position.
\item Output nodes will point to the hash table of their child node.
\item Union nodes will create a fresh hash table and initialize it with all binary relations from \code{left(\textrm{u})} and \code{right(\textrm{u})} such that $\{(x, y) \ | \ x \ge \tau\}$, where $\tau = j - \epsilon$, $j$ is the current position and $\epsilon$ is the time window.
\end{itemize}

All three operations can be computed in constant time $\mathcal{O}(\tau)$.
}

\section{Chapter summary}

Expand Down
34 changes: 18 additions & 16 deletions docs/thesis/chapters/chapter_5.tex
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ \section{The Enumeration procedure}\label{sec:enumeration}
\SetKwProg{Procedure}{procedure}{}{}
\SetKwFunction{Enumerate}{\textsc{Enumerate}}
\Procedure{\Enumerate{$\mathcal{E}, n, \epsilon, j, p$}}{
$\delta \leftarrow \lceil \text{paths(n)} \ / \ {|\mathcal{P}|} \rceil$\;
{\color{Added}$\delta \leftarrow \lceil \text{pathsc(n, }\tau) \ / \ {|\mathcal{P}|} \rceil$}\;
$\sigma \leftarrow \text{index}(p) \cdot \delta$\;
st $\leftarrow$ new-stack()\;
$\tau \leftarrow j - \epsilon $\;
Expand All @@ -168,23 +168,21 @@ \section{The Enumeration procedure}\label{sec:enumeration}
$P \leftarrow P \ \cup $ {pos($n'$)}\;
$n' \leftarrow $ next($n'$)\;
}
\ElseIf{$n' \in N_{U}$}{
\If{$max(right(n')) \ge \tau$}{
\eIf{$paths(left(n')) > \sigma'$}{
$\delta'' \leftarrow \delta' - max(0, paths(left(n')) - \sigma')$\;
}{
$\delta'' \leftarrow \delta'$\;
{\color{Added}
\ElseIf{$n' \in N_{U}$}{
\If{$max(right(n')) \ge \tau$}{\label{line:enumeration:if:1}
$\delta'' \leftarrow \delta' - max(0, pathsc(left(n'), \tau) - \sigma')$\;
\If{$\delta'' > 0$}{\label{line:enumeration:if:2}
$\sigma'' \leftarrow \sigma' - pathsc(left(n'), \tau)$\;
push(st, $\langle$right($n'$), $P$, $\sigma''$, $\delta''\rangle$)\;
}
}
$\sigma'' \leftarrow max(0, \sigma' - paths(left(n')))$\;
\If{$paths(right(n')) > \sigma'' \land \delta'' > 0$}{\label{line:enumeration:if:1}
push(st, $\langle$right($n'$), $P$, $\sigma''$, $\delta''\rangle$)\;
\eIf{$pathsc(left(n'), \tau) > \sigma'$}{\label{line:enumeration:if:3}
$n' \leftarrow left(n')$\;
}{
\textbf{break}\;
}
}
\eIf{$paths(left(n')) > \sigma'$}{\label{line:enumeration:if:2}
$n' \leftarrow left(n')$\;
}{
\textbf{break}\;
}
}
}
}
Expand Down Expand Up @@ -219,9 +217,13 @@ \section{The Enumeration procedure}\label{sec:enumeration}
\end{figure}


\aref{algo:enumeration} receives as an input a \acrshort{tecs}, a node $n$, a time-window $\epsilon$, a position $j$, and a process $p$ and traverses a fraction of $G_{\mathcal{E}}$ in a DFS-way left-to-right order. First, computes the parameters $\sigma, \delta$ corresponding to the starting and ending path to enumerate, respectively. The value of these parameters can be computed statically i.e. without message interchanging. Each iteration of the \code{while} of line~\ref{line:enumeration:while:1} traverses a new path starting from the point it branches from the previous path (except for the first iteration). For this, the stack $st$ is used to store the node and partial complex event of that branching point. Then, the \code{while} of line~\ref{line:enumeration:while:2} traverses through the nodes of the next path, following the left direction whenever a union node is reached and adding the right node to the stack whenever need. The \code{if}s of line~\ref{line:enumeration:if:1}~and~line~\ref{line:enumeration:if:2} make sure that enumeration starts on path $\pi_{\sigma}$ so only $paths_{\ge j - \epsilon, \sigma, \delta}$ are traversed. Moreover, by checking for every node $n'$ its value $max(n')$ before adding it to the stack, it makes sure of only going through paths in $paths_{\ge j - \epsilon}$.
{\color{Added} \aref{algo:enumeration} receives as an input a \acrshort{tecs}, a node $n$, a time-window $\epsilon$, a position $j$, and a process $p$ and traverses a fraction of $G_{\mathcal{E}}$ in a DFS-way left-to-right order. First, it computes the parameters $\sigma$ and $\delta$ corresponding to the starting and ending path to enumerate using the method \code{pathsc(n, $j - \epsilon$)}. This method can be executed in constant time. We remark that the value of parameters $\sigma$ and $\delta$ can be computed statically and locally before the execution of the algorithm on each node.}

The enumeration starts from the root node $n$. Each iteration of the \code{while} of line~\ref{line:enumeration:while:1} traverses a new path starting from the point it branches from the previous path. The stack $st$ is used to store the node, the partial complex event, and the parameters $\sigma, \delta$ of that branching point. {\color{Added} The \code{while} of line~\ref{line:enumeration:while:2} traverses through the nodes of the next path, following the left direction whenever a union node is reached and adding the right node to the stack whenever need. The \code{if} of line~\ref{line:enumeration:if:1} guarantees that traversed paths $\in paths_{\ge \tau}$ i.e. paths outside the time window are skipped. The \code{if}s of line~\ref{line:enumeration:if:2}~and~\ref{line:enumeration:if:3} assert that we enumerate at most $\delta$ paths and the enumeration starts on path $\pi_{\sigma}$, respectively. All three conditionals guarantee that only paths $\in paths_{\ge \tau, \sigma, \delta}$ are enumerated on process $p$.}

\begin{remark*}
A simpler recursive algorithm could have been used, however, the constant-delay output might not be guaranteed because the number of backtracking steps after branching might be as long as the longest path of $G_{\mathcal{E}}$. To guarantee constant steps after branching and assure constant-delay output, \aref{algo:enumeration} uses a stack which allows to jump immediately to the next branch. We assume that storing $P$ in the stack takes constant time. We materialize this assumption by modelling $P$ as a linked list of positions, where the list is ordered by the last element added. To update $P$ with position $i$, we only need to create a node $i$ that points to the previous last element of $P$. Then, storing $P$ on the stack is just storing the pointer of the last element added.
\end{remark*}

This concludes the presentation of Algorithm~\ref{algo:enumeration}. In the reminding of this section, we prove properties (1), (2) and (3).

Expand Down
4 changes: 3 additions & 1 deletion docs/thesis/preamble.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@

\usepackage{xcolor} % \definecolor, \color{codegray}
\definecolor{codegray}{rgb}{0.9, 0.9, 0.9}
\colorlet{Added}{green!70!black}
\colorlet{Comment}{red!50!yellow}
% \color{codegray} ... ...
% \textcolor{red}{easily}

Expand All @@ -103,8 +105,8 @@
%%%% Extras

\usepackage{multirow}
\usepackage{todonotes}
\usepackage{adjustbox}
\usepackage{soul}

%%%% Glossaries & Acronyms

Expand Down
1 change: 1 addition & 0 deletions docs/thesis/thesis.cls
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@
\newtheorem{definition}[theorem]{Definition}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
\newtheorem*{remark*}{Remark}
\usepackage[centerlast,small,sc]{caption}
\setlength{\captionmargin}{20pt}
\newcommand{\fref}[1]{Figure~\ref{#1}}
Expand Down