Note:: All examples from this tutorial can be found in
test/datalog_tests/tutorial.dl.
The examples can be executed using test inputs from
test/datalog_tests/tutorial.dat
and the expected outputs are in
test/datalog_tests/tutorial.dump.expected.
Differential Datalog or DDlog is an enhanced version of the Datalog programming language. A DDlog program can be used in two ways:
- integrated in other applications as a library. This is the main use case that DDlog is optimized for.
- as a standalone program with a command-line interface (CLI)
In this tutorial we use DDlog using the CLI. The following diagram shows how we will use DDlog in the tutorial. The DDlog compiler generates Rust code, which is compiled by the Rust compiler to an x86 executable, which provides a text-based interface to push relation updates to the program and query relation state.
input data
------------ ------------ |
DDlog | DDlog | Rust | Rust | V
program -> | compiler |-> code -> | compiler |--> executable
------------ ------------ |
runtime ^ V
libraries ------/ output data
Installation instructions are found in the README.
Files storing DDlog programs have the .dl suffix.
dl.vim is a file that offers syntax
highlighting support for the vim editor.
Create a file called playpen.dl with the following text in an empty directory.
/* First DDlog example */
typedef Category = CategoryStarWars
| CategoryOther
input relation Word1(word: string, cat: Category)
input relation Word2(word: string, cat: Category)
output relation Phrases(phrase: string)
// Rule
Phrases(w1 ++ " " ++ w2) :- Word1(w1, cat), Word2(w2, cat).
DDlog comments are like C++ and Java comments (// and /* */).
This program contains several declarations:
- a data type declaration,
Category - two input relations (collections) called
Word1andWord2 - an output relation called
Phrases - a rule describing how to compute
Phrases
When this program is executed it expects to be given as input the
contents of collections Word1 and Word2 and it produces as output the contents
of Phrases.
The Category type is similar to a C or Java enum, and has only two
possible values: CategoryStarWars and CategoryOther.
DDlog relations are similar to database tables (the term 'relation'
also comes from databases): a relation is a set of records. Records
are like C structs: objects with multiple named and fields. Relation names must
start with an uppercase letter.
Relations without an input qualifier are also called derived relations. They are
computed from input relations using rules.
Output relations (marked by the output qualifier) are derived relations whose content
can be observed by the user (see below).
The declaration of Word1:
input relation Word1(word: string, cat: Category) indicates the type of its records: each
record has a word field of type string, and a cat field of type Category.
The records of Word2 happen to have the same type.
From this declaration: output relation Phrases(phrase: string)
we know that he output relation Phrases has records with a single field named phrase,
which is a string.
Finally, the rule
Phrases(w1 ++ " " ++ w2) :- Word1(w1, cat), Word2(w2, cat).
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
head body dot
shows how to compute the records that belong to Phrases: for every
w1, w2, and cat, such that (w1, cat) is an element of Word1 and (w2, cat) is an element of Word2, there is a record in Phrases obtained by
concatenating the strings w1, a space and w2: w1 ++ " " ++ w2.
The rule's body shows how to compute the rule's head.
A comma in a rule is read as "and". Rules end with a mandatory dot.
In database terminology the above rule computes an inner join of
the Word1 and Word2 relations on the cat field.
-
Identifiers start with letters or underscore
_, and may contain digits. -
Relation names must be capitalized (start with an uppercase symbol).
-
Variable names must start with a lowercase symbol or
_. -
Type names have no restrictions.
The order of declarations in a DDlog program is unimportant; you should use the order that makes programs easiest to read.
Compiling a DDlog program consists of two steps. First, run the DDlog compiler to generate the Rust program:
ddlog -i playpen.dl -L <ddlog/lib>
where <ddlog/lib> is the path to the lib directory inside the
DDlog installation directory (when using a binary release) or
inside the DDlog repository (when building DDlog from source).
Second, run the Rust compiler to generate a binary executable:
(cd playpen_ddlog && cargo build --release)
The text-based interface offers a convenient way to interact with the
DDlog program during development and debugging. In your favorite Unix
shell start the program that was generated by the compiler ./playpen_ddlog/target/release/playpen_cli.
The CLI shows you a prompt >>. Now you can type some commands that are executed by the datalog engine.
CLI commands can be used to show the contents of output relations, or to insert or delete records from input relations.
We can display the contents of the Phrases relation:
>> dump Phrases;
>>
Initially all relations are empty, so nothing is printed. Records can
be added or removed from relations. All insertion and deletion
operations must be part of a transaction. We start a transaction
with start:
>> start;
>>
Then we insert some records:
>> insert Word1("Hello,", CategoryOther);
>> insert Word2("world!", CategoryOther);
And finally we complete the transaction with commit:
>> commit;
insert 30 Word1(Word1{"Hello,",Category::CategoryOther{}})
insert 31 Word2(Word2{"world!",Category::CategoryOther{}})
insert 21 Phrases(Phrases{"Hello, world!"})
The CLI has executed all the operations that we have requested. It
has inserted records in relations Word1 and Word2, and the rule
for Phrases has inferred that a new record should be added to
this relation. When commiting a transaction DDlog reports all
changes to all (input and derived) relations.
We can now again inspect the contents of Phrases:
>> dump Phrases;
Phrases{"Hello, world!"}
>>
All DDlog relations are sets, i.e., a relation cannot contain multiple
identical records. Try adding an existing record to Word1 and
check that it only appears once.
We can delete records from input relations:
start;
delete Word2("world!", CategoryOther);
commit;
dump Prases;
Phrases should be empty again. DDlog only computes the change of Phrases
incrementally, without recomputing all records.
See this document for a complete list of commands supported by the CLI tool.
Typing commands interactively quickly gets tedious. Therefore, throughout the rest
of this tutorial we will run DDlog in batch mode, feeding commands from a file.
Create a file called playpen.dat with the following contents:
start;
insert Word1("Hello," , CategoryOther),
insert Word1("Goodbye," , CategoryOther),
insert Word2("World" , CategoryOther),
insert Word2("Ruby Tuesday", CategoryOther);
insert Word1("Help me," , CategoryStarWars),
insert Word1("I am your" , CategoryStarWars),
insert Word2("Obi-Wan Kenobi", CategoryStarWars),
insert Word2("father" , CategoryStarWars);
commit;
echo Phrases:;
dump Phrases;
Feed the file to the compiled DDlog program:
./playpen_ddlog/target/release/playpen_cli --no-print < playpen.dat
The --no-print flag tells DDlog to stop logging changes to program relations to stderr.
In this mode, it will only produce output in response to commands such as dump or
commit dump_changes (see command reference).
In this example, it will produce the following output:
Phrases:
Phrases{"Goodbye, Ruby Tuesday"}
Phrases{"Goodbye, World"}
Phrases{"Hello, Ruby Tuesday"}
Phrases{"Hello, World"}
Phrases{"Help me, Obi-Wan Kenobi"}
Phrases{"Help me, father"}
Phrases{"I am your Obi-Wan Kenobi"}
Phrases{"I am your father"}
At this point you may want to create a small shell script to automate compiling and running DDlog. It will save you some typing and will make sure you don't forget to recompile your program after making changes to it:
#!/bin/bash ddlog -i playpen.dl -L <ddlog/lib> (cd playpen_ddlog && cargo build --release) ./playpen_ddlog/target/release/playpen_cli --no-print < playpen.datIf you are using Nix shell, the script already exists and can be invoked as:
ddlog-run playpen.dlwhich will recompile the program and feed
playpen.datas input, if that file exists.
In the rest of this document we introduce other features of DDlog through simple examples.
Expressions can be used to filter records and to compute derived records. In the previous example we have used a string concatenation expression.
Let us write a program to manipulate a network information database. Given a set of IP hosts and subnets, the following DDlog program computes a host-to-subnet mapping by matching IP addresses against subnet masks.
We use type aliases to give an alternative names to types:
// Type aliases
typedef UUID = bit<128>
typedef IP4 = bit<32>
typedef NetMask = bit<32>
In this example IP4 and bit<32> refer to the exact same type, a
32-bit unsigned integer.
//IP host specified by its name and address.
input relation Host(id: UUID, name: string, ip: IP4)
//IP subnet specified by its IP prefix and mask
input relation Subnet(id: UUID, prefix: IP4, mask: NetMask)
// HostInSubnet relation maps hosts to known subnets
output relation HostInSubnet(host: UUID, subnet: UUID)
HostInSubnet(host_id, subnet_id) :- Host(host_id, _, host_ip),
Subnet(subnet_id, subnet_prefix, subnet_mask),
((host_ip & subnet_mask) == subnet_prefix). // filter condition
Two input relations store known IP hosts and subnets respectively.
The HostInSubnet relation is computed by filtering all host-subnet
pairs where the host address matches the subnet prefix and mask.
This clause:
Host(host_id, _, host_ip)
binds variables host_id and host_ip, to corresponding fields of records from
the Host relation. So the first two lines of the rule iterate over
all host-subnet pairs. The expression on the third line (host_ip & subnet_mask) == subnet_prefix uses & (bitwise "and"), and ==
(equality test). This expression acts as a filter that eliminates
all tuples that do not satisfy the condition that the host subnet
(computed using the mask) is the same as the subnet_prefix.
This rule does not care about the name field of the Host relation,
so it uses the "don't care" symbol (_) for the second field:
Host(host_id, _, host_ip)
^ don't care
In these examples relation field names are identified by position, but they can also be specified by name:
Host(.id=host_id, .ip=host_ip)
Omitted fields (e.g., name in this example) are don't cares.
DDlog strings are sequences of UTF-8 symbols. Two forms of string constants are supported:
-
Quoted strings are Unicode strings enclosed in quotes, and use the same single-character escapes as in C, e.g.:
"bar", "\tbar\n", "\"buzz", "\\buzz", "barΔ"Long strings can be broken into multiple lines using a pair of backslash characters; the spaces between backslashes are ignored:
"foo\ \bar"is equivalent to
"foobar" -
Raw strings, enclosed in
[| |], can contain arbitrary Unicode characters, (except the|]sequence), including newlines and backslashes, e.g.,[| foo\n buzz bar|]is the same string as
"foo\\n\nbuzz\nbar".
Adjacent string literals (of both kinds) are automatically concatenated, e.g., "foo" "bar" is
converted to "foobar".
String interpolation allows evaluating arbitrary DDlog expressions
to construct a string (this feature is inspired by JavaScript).
Embedded expressions
are wrapped in ${}. The following program defines relation
Pow2 to contain strings that enumerate the squares of all numbers in relation Number.
input relation Number(n: bigint)
output relation Pow2(p: string)
Pow2("The square of ${x} is ${x*x}") :- Number(x).
Built-in types have built-in conversions to strings. To convert a
user-defined type (such as Category above) the user can implement a
function named category2string that returns a string given a category
(functions are described below).
Let's say we want to display IP and Ethernet
addresses in a standard format, e.g., "192.168.0.1" and
"aa:b6:d0:11:ae:c1" respectively. Since IP4 is an alias to 32-bit
numbers, in an interpolated string an IP address is formatted as a
decimal integer. We can create a custom formatting function for IP
addresses called format_bit32_as_ip, which we can then invoke: e.g.,
"IP address: ${format_bit32_as_ip(addr)}", but
this is error-prone, since it is easy to forget to call the
formatting function when printing IP addresses. An alternative is to declare a new type for IP addresses:
typedef ip_addr_t = IPAddr{addr: bit<32>}
This declares a new type named ip_addr_t, having a single type constructor called IPAddr.
Think of this declaration as a C struct with
a single field of type bit<32>. We can write a user-defined formatting method:
function ip_addr_t2string(ip: ip_addr_t): string = {
"${ip.addr[31:24]}.${ip.addr[23:16]}.${ip.addr[15:8]}.${ip.addr[7:0]}"
}
We have used two new operators: dot to access a field:
ip.addr, and bit slicing: ip.addr[31:24] to select a contiguous
range of bits from a bit vector. Bits are numbered from the
least-significant, which is bit 0. The slice boundaries are both
inclusive, so the result of ip.addr[31:24] has 8 bits. Using our new
type ip_addr_t prevents one from accidentally mixing IP addresses and numbers.
We do the same for Ethernet addresses; for formatting we invoke the built-in
hex() function to convert a number to a hexadecimal string
representation:
typedef mac_addr_t = MACAddr{addr: bit<48>}
function mac_addr_t2string(mac: mac_addr_t): string = {
"${hex(mac.addr[47:40])}:${hex(mac.addr[39:32])}:${hex(mac.addr[31:24])}:\
\${hex(mac.addr[23:16])}:${hex(mac.addr[15:8])}:${hex(mac.addr[7:0])}"
}
The NHost type describes a network host and contains the host's IP and MAC
addresses:
typedef nethost_t = NHost {
ip: ip_addr_t,
mac: mac_addr_t
}
function nethost_t2string(h: nethost_t): string = {
"Host: IP=${h.ip}, MAC=${h.mac}"
}
DDlog will automatically invoke the user-defined string conversion
functions to format the ip and mac fields when it needs to produce strings.
Here is a program that computes a derived relation storing string representation of hosts:
input relation NetHost(id: bigint, h: nethost_t)
output relation NetHostString(id: bigint, s: string)
NetHostString(id, "${h}") :- NetHost(id, h).
Try feeding the following .dat file to this program:
start;
insert NetHost(1, NHost{IPAddr{0xaabbccdd}, MACAddr{0x112233445566}}),
insert NetHost(2, NHost{IPAddr{0xa0b0c0d0}, MACAddr{0x102030405060}});
commit;
dump NetHostString;
You should obtain the following output:
NetHostString{1,"Host: IP=170.187.204.221, MAC=11:22:33:44:55:66"}
NetHostString{2,"Host: IP=160.176.192.208, MAC=10:20:30:40:50:60"}
The type bigint describes arbitrary-precision (unbounded) integers.
var x: bigint = 125 declares a variable x with type integer and
initial value 125.
Signed integers are written as signed<N>, where N is the width of
the integer in bits. Currently DDlog only supports integer with
widths that are supported by the native machine type (e.g., 8, 16, 32,
or 64 bits). DDlog supports all standard arithmetic operators over
integers.
Bit-vector types are described by bit<N>, where N is the
bit-vector width. Bit-vectors are unsigned integers when performing
arithmetic. DDlog supports all standard arithmetic and bitwise
operators over bit-vectors. Concatenation of bit-vectors is
expressed with the ++ operator (same notation as string
concatenation).
// Form IP address from bytes using bit vector concatenation
function ip_from_bytes(b3: bit<8>, b2: bit<8>, b1: bit<8>, b0: bit<8>)
: ip_addr_t =
{
IPAddr{.addr = b3 ++ b2 ++ b1 ++ b0}
}
// Check for multicast IP address using bit slicing
function is_multicast_addr(ip: ip_addr_t): bool = ip.addr[31:28] == 14
input relation Bytes(b3: bit<8>, b2: bit<8>, b1: bit<8>, b0: bit<8>)
// convert bytes to IP addresses
output relation Address(addr: ip_addr_t)
Address(ip_from_bytes(b3,b2,b1,b0)) :- Bytes(b3,b2,b1,b0).
// filter multicast IP addresses
output relation MCastAddress(addr: ip_addr_t)
MCastAddress(a) :- Address(a), is_multicast_addr(a).
Bit vector constants can be written as decimal numbers. If the width is not specified DDlog computes the width using type inference.
Bit vector constants can also specify the width explicitly and be written in
binary, octal or hexadecimal:
[<width>]<radix_specifier><value> syntax, where <radix> is one of
'd, 'h, 'o, 'b for decimal, hexadecimal, octal and binary
numbers respectively:
32'd15 // 32-bit decimal
48'haabbccddeeff // 48-bit hexadecimal
3'b101 // 3-bit binary
'd15 // bit vector whose value is 15 and whose width is determined from the context
DDlog does not automatically coerce different integer types to each other, e.g.,
it is an error for a function that is declared to return a bit<64> value to
return bit<32>. Instead, the programmer must use the type cast operator as to
perform type conversion explicitly:
32'd15 as bit<64> // type-cast bit<32> to bit<64>
32'd15 as signed<32> // type-cast bit<32> to signed<32>
32'd15 as bigint // type-cast bit<32> to bigint
The following type conversions are currently supported:
bit<N>tobit<M>for anyNandMsigned<N>tosigned<M>for anyNandMbit<N>tosigned<N>signed<N>tobit<N>bit<N>tobigintsigned<N>tobigint
Note that converting between signed and unsigned bit vectors of different width
requires chaining two type casts: bit<32> as signed<32> as signed<64>.
DDlog uses two's complement representation for bit vectors. Casting a signed or unsigned bit vector to a narrower bit vector type truncates higher-order bits. Casting to a wider type sign extends (for signed bit vectors) or zero-extends (for unsigned bit vectors) the value to the new width. Finally, converting between signed and unsigned bitvectors does not change the underlying representation.
All datatypes support comparison for equality (== and difference
!=) and ordering comparisons (>, >=, <=, <). Only values
with the same type can be compared. Comparison of complex types
(strings, tuples, types with constructors) is done lexicographically.
DDlog functions are written using an expression-oriented language, where all statements are expressions. Evaluation order can be controlled using several constructs:
-
Semicolon is used to separate expressions that are evaluated in sequence, from left to right.
-
if (cond) e1 [else e2]evaluates one of two subexpressions depending on the value ofcond. If theelseis missing, the value()(emtpy tuple) is used fore2 -
Matching is a generalization of C/Java
switchstatements. -
A new block (scope) can be created with curly braces
{ } -
A for loop
-
continueterminates current loop iteration -
breakterminates a loop -
return [expr]returns from a function
The following example illustrates the first four of these constructs. Loops are explained below.
function addr_port(ip: ip_addr_t, proto: string, preferred_port: bit<16>): string =
{
var port: bit<16> = match (proto) { // match protocol string
"FTP" -> 20, // default FTP port
"HTTPS" -> 443, // default HTTP port
_ -> { // other protocol
if (preferred_port != 0)
preferred_port // return preferred_port if specified
else
return "${ip}:80" // assume HTTP otherwise
}
}; // semicolon required for sequencing
// Return the address:port string
"${ip}:${port}"
}
The result computed by a function is the value of the last expression evaluated
or the value produced by the first return statement encountered when
evaluating the function. If the else is missing the value () (empty tuple)
is used for the else branch. In match expressions the patterns must cover
all possible cases (for instance, the match expression above would not be
correct without the last "catch-all" (_) case).
DDlog variables must always be initialized when declared. In this
example, the port variable is assigned the result of the match
expression. A variable can be assigned multiple times, overwriting
previous values.
DDlog assignments cannot be chained like in C; although an assignment
is an expression, it produces the value () (an empty tuple,
described below).
match matches the value of its argument against a series of
patterns. The simplest pattern is a constant value, e.g., "FTP".
The don't care pattern (_) matches any value. More
complex patterns are shown below.
In the following example the input relation stores a set of network endpoints
represented by their IP addresses, protocols, and preferred port
numbers. The EndpointString relation contains a string
representation of the endpoints, computed using the function
addr_port(), which is invoked in the head of the rule:
input relation Endpoint(ip: ip_addr_t,
proto: string,
preferred_port: bit<16>)
output relation EndpointString(s: string)
EndpointString(addr_port(ip, proto, preferred_port)) :-
Endpoint(ip, proto, preferred_port).
DDlog functions are pure (side-effect-free) computations. A function may not modify its arguments. The body of a function is an expression whose type must match the function's return type. A function call can be inserted anywhere an expression of the function's return type can be used. DDlog currently does not allow recursive functions.
Functions that cannot be easily expressed in DDlog can be implemented as extern functions. Currently these must be written in Rust; the Rust implementation may in turn invoke implementations in C or any other language.
For instance, DDlog does not provide a substring function. We can
declare such a function as extern:
extern function string_slice(x: string, from: bit<64>, to: bit<64>): string
We can invoke string_slice() just like a normal DDlog function, e.g.,
output relation First5(str: string)
First5(string_slice(p, 0, 5)) :- Phrases(p).
To build this program we must provide a Rust implementation of
string_slice(). DDlog generates a commented out function prototype
in playpen_ddlog/types/lib.rs. Do not modify this file; instead, create a new
file called playpen.rs in the same directory:
pub fn string_slice(x: &String, from: &u64, to: &u64) -> String {
x.as_str()[(*from as usize)..(*to as usize)].to_string()
}
DDlog will automatically pickup this file and inline its contents in the
generated lib.rs.
Let's assume we want to compute a relation SanitizedEndpoint
containing the endpoints that do not appear in a Blacklisted
relation:
input relation Blacklisted(ep: string)
output relation SanitizedEndpoint(ep: string)
SanitizedEndpoint(endpoint) :-
EndpointString(endpoint),
not Blacklisted(endpoint).
The not operator in the last line eliminates all endpoint values
that appear in Blacklisted. In database terminology this is known
as antijoin.
We can directly use assignments in rules:
SanitizedEndpoint(endpoint) :-
Endpoint(ip, proto, preferred_port),
var endpoint = addr_port(ip, proto, preferred_port),
not Blacklisted(endpoint).
It is sometimes convenient to refer to a table row as a whole.
The following binds the ep variable to a row of the Endpoint relation.
SanitizedEndpoint(endpoint) :-
ep in Endpoint,
var endpoint = addr_port(ep.ip, ep.proto, ep.preferred_port),
not Blacklisted(endpoint).
Here we filter the Endpoint relation to select HTTP endpoints only
and bind resulting rows to ep:
SanitizedHTTPEndpoint(endpoint) :-
ep in Endpoint(.proto = "HTTP"),
var endpoint = addr_port(ep.ip, ep.proto, ep.preferred_port),
not Blacklisted(endpoint).
DDlog supports three built-in container data types: Vec, Set, and Map. These
are generic types that can be parameterized by any other
DDlog types, e.g., Vec<string> is a vector of strings, Map<string,bool> is
a map from strings to Booleans.
Let us assume that we have an extern function that splits a string into a list of substrings according to a separator:
extern function split(s: string, sep: string): Vec<string>
The Rust implementation can be as follows:
pub fn split_ip_list(s: &String, sep: &String) -> Vec<String> {
s.as_str().split(sep).map(|x| x.to_string()).collect()
}
We define a DDlog function which splits IP addresses at spaces:
function split_ip_list(x: string): Vec<string> =
split(x, " ")
Consider an input relation HostAddress associating each host with a
string of all its IP addresses, separated by whitespaces. We want to
compute a HostIP relation that consists of (host, ip) pairs:
input relation HostAddress(host: bit<64>, addrs: string)
output relation HostIP(host: bit<64>, addr: string)
For this purpose we can use FlatMap: this operator applies a
function that produces a set or a vector for each value in a relation, and creates a
result that is the union of all the resulting sets:
HostIP(host, addr) :- HostAddress(host, addrs),
FlatMap(addr=split_ip_list(addrs)).
You can read this rule as follows:
-
For every
(host, addrs)pair in theHostAddressrelation, usesplit_ip_list()to splitaddrsinto individul addresses. -
Bind each address in the resulting set to the
addrvariable, producing a new set of(host, addr)records. -
Store the resulting records in the
HostIPrelation.
For-loops allow manipulating container types in a more procedural fashion, without first flattening them. The following function concatenates a vector of strings, each starting at a new line.
function vsep(strs: Vec<string>): string = {
var res = "";
for (s in strs) {
res = res ++ s ++ "\n"
};
res
}
Loops can only iterate over container types: sets, maps, and vectors. When iterating over sets
and vectors, the loop variable (s in the above example) has the same type as elements of the container
(e.g., string). When iterating over maps, the loop variable is a 2-tuple (key,value).
A continue statement used anywhere inside the body of a loop terminates the
current loop iteration:
// Returns only even elements of the vector.
function evens(vec: Vec<bigint>): Vec<bigint> = {
var res: Vec<bigint> = vec_empty();
for (x in vec) {
if (x % 2 != 0) { continue };
vec_push(res, x)
};
res
}
A break statement used anywhere inside the body of a loop terminates the loop:
// Returns prefix of `vec` before the first occurrence of value `v`.
function prefixBefore(vec: Vec<'A>, v: 'A): Vec<'A> = {
var res: Vec<'A> = vec_empty();
for (x in vec) {
if (x == v) { break };
vec_push(res, x)
};
res
}
The following program computes the sums and products of pairs of values from X:
input relation X(x: bit<16>)
output relation Sum(x: bit<16>, y: bit<16>, sum: bit<16>)
output relation Product(x: bit<16>, y: bit<16>, prod: bit<16>)
Sum(x,y,x+y) :- X(x), X(y).
Product(x,y,x*y) :- X(x), X(y).
The last two rules can be written more compactly as a rule with multiple heads:
Sum(x,y,x+y), Product(x,y,x*y) :- X(x), X(y).
The same relation can appear both in the head and the body of a rule. The following program computes pairs of connected nodes in a graph given the set of edges:
input relation Edge(s: node, t: node)
output relation Path(s1: node, s2: node)
Path(x, y) :- Edge(x,y).
Path(x, z) :- Path(x, w), Edge(w, z).
This is an example of a recursive program, where a relation depends on itself, either directly or via a chain of rules. The language reference describes constraints on the recursive programs accepted by DDlog.
The Aggregate operator groups records that have the same values of a subset of
variables (group-by variables) and applies an aggregation function to each group.
The following program groups the Price relation by item and selects the minimal
price for each item:
input relation Price(item: string, vendor: string, price: bit<64>)
output relation BestPrice(item: string, price: bit<64>)
BestPrice(item, best_price) :-
Price(.item = item, .price = price),
var best_price = Aggregate((item), group_min(price)).
Here group_min() is one of several aggregation functions defined in
lib/std.rs.
It is possible to group a relation by multiple fields. For example, the following rule computes the lowest price that each vendor charged for each item:
BestPricePerVendor(item, vendor, best_price) :-
Price(.item = item, .vendor = vendor, .price = price),
var best_price = Aggregate((item, vendor), group_min(price)).
What if we wanted to return the name of the vendor along with the lowest price for each item? The following naive attempt will not work:
output relation BestVendor(item: string, vendor: string, price: bit<64>)
BestVendor(item, vendor, best_price) :-
Price(.item = item, .vendor = vendor, .price = price),
var best_price = Aggregate((item), group_min(price)).
DDlog will complain that vendor is unknown in the head of the rule. What
happens here is that we first group records with the same item value
and then reduce each group to a single value, best_price. All other
variables (except item and best_price) disappear from the scope and cannot
be used in the body of the rule following the Aggregate operator or in
its head.
A correct solution requires a different aggregation function that takes a
group of (vendor, price) tuples and picks one with the smallest price. To
enable such use cases, DDlog allows users to implement their own custom
aggregation functions:
/* User-defined aggregate that picks a tuple with the smallest price */
function best_vendor(g: Group<'K, (string, bit<64>)>): (string, bit<64>) =
{
var min_vendor = "";
var min_price: bit<64> = 'hffffffffffffffff;
for (vendor_price in g) {
if (vendor_price.1 < min_price) {
min_vendor = vendor_price.0;
min_price = vendor_price.1
}
};
(min_vendor, min_price)
}
BestVendor(item, best_vendor, best_price) :-
Price(item, vendor, price),
var best_vendor_price = Aggregate((item), best_vendor((vendor, price))),
(var best_vendor, var best_price) = best_vendor_price.
The aggregation function takes an argument of type Group, parameterized by
group key and group value types. The group key is a tuple consisting of all
group-by variables. The value type is the type of records in the group and.
The aggregation function can access the group key using the group_key()
function. The following custom aggregation function computes the cheapest
vendor for each item and returns a string containing item name, vendor,
and price:
function best_vendor_string(g: Group<string, (string, bit<64>)>): string =
{
var min_vendor = "";
var min_price: bit<64> = 'hffffffffffffffff;
for (vendor_price in g) {
if (vendor_price.1 < min_price) {
min_vendor = vendor_price.0;
min_price = vendor_price.1
}
};
"Best deal for ${group_key(g)}: ${min_vendor}, $${min_price}"
}
output relation BestDeal(best: string)
BestDeal(best) :-
Price(item, vendor, price),
var best = Aggregate((item), best_vendor_string((vendor, price))).
EthPacket below is a data type modeling the essential contents of
Ethernet packets, including source and destination addresses, and the
payload:
typedef eth_pkt_t = EthPacket {
src : bit<48>,
dst : bit<48>,
payload : eth_payload_t
}
A payload can be one of several things: an IPv4 packet, an IPv6 packet, or some other level-3 protocol:
typedef eth_payload_t = EthIP4 {ip4 : ip4_pkt_t}
| EthIP6 {ip6 : ip6_pkt_t}
| EthOther
This declaration is an example of a variant type. EthIP4,
EthIP6, and EthOther are the three type constructors and
variables in curly braces are arguments of type constructor.
Category from "Hello, world!" was also a variant
type, without arguments. ip_addr_t is another variant type, but with a
single constructor.
We continue by defining types holding important fields of IP4, IP6, and transport protocol packets:
typedef ip4_pkt_t = IP4Pkt { ttl : bit<8>
, src : ip4_addr_t
, dst : ip4_addr_t
, payload : ip_payload_t}
typedef ip6_pkt_t = IP6Pkt { ttl : bit<8>
, src : ip6_addr_t
, dst : ip6_addr_t
, payload : ip_payload_t}
typedef ip_payload_t = IPTCP { tcp : tcp_pkt_t}
| IPUDP { udp : udp_pkt_t}
| IPOther
typedef tcp_pkt_t = TCPPkt { srcPort : bit<16>
, dst{prt : bit<16>
, flags : bit<9> }
typedef udp_pkt_t = UDPPkt { srcPort : bit<16>
, dstPort : bit<16>
, len : bit<16>}
The following function creates a packet with Ethernet, IPv6 and TCP headers:
function tcp6_packet(ethsrc: bit<48>, ethdst: bit<48>,
ipsrc: ip6_addr_t, ipdst: ip6_addr_t,
srcport: bit<16>, dstport: bit<16>): eth_pkt_t =
{
EthPacket {
// Explicitly name constructor arguments for clarity
.src = ethsrc,
.dst = ethdst,
.payload = EthIP6 {
// Omit argument name here
IP6Pkt {
.ttl = 10,
.src = ipsrc,
.dst = ipdst,
.payload = IPTCP {
TCPPkt {
.src = srcport,
.dst = dstport,
.flags = 0
}
}
}
}
}
}
As with relations, type constructor arguments can be identified by name or by position in the argument list. The above function uses both notations.
How does one access the content of a variant type? Let's say we have a variable pkt of type
eth_pkt_t. We can access pkt fields just like a C struct, e.g., pkt.ttl, pkt.src. This is
safe, as eth_pkt_t has a single type constructor and hence every instance of this type has this
field. In contrast, pkt.payload.ip4 is unsafe, since eth_payload_t has multiple constructors,
and the packet may or may not have the ip4 field depending on its specific constructor. DDlog
will therefore reject this expression. Instead, a safe way to access variant types is using match
expressions. The following function extracts IPv4 header from a packet or returns a default value
if the packet is not of type IPv4 (we will see a nicer way to deal with non-existent values
below):
function pkt_ip4(pkt: eth_pkt_t): ip4_pkt_t = {
match (pkt) {
EthPacket{.payload = EthIP4{ip4}} -> ip4,
_ -> IP4Pkt{0,0,0,IPOther}
}
}
Note how the match pattern simultaneously constrains the shape of the packet and binds its relevant
fields to fresh variable names (ip4 in this case).
Structural matching can be performed up to arbitrary depth. For example, the following function matches both level-3 and level-4 protocol headers to extract destination UDP port number from a packet.
function pkt_udp_port(pkt: eth_pkt_t): bit<16> = {
match (pkt) {
EthPacket{.payload = EthIP4{IP4Pkt{.payload = IPUDP{UDPPkt{.dst = port}}}}} -> port,
EthPacket{.payload = EthIP6{IP6Pkt{.payload = IPUDP{UDPPkt{.dst = port}}}}} -> port,
_ -> 0
}
}
Structural matching can also be performed directly in the body of a rule. The following program extracts all TCP destination port numbers from an input relation that stores a set of packets.
input relation Packet(pkt: eth_pkt_t)
output relation TCPDstPort(port: bit<16>)
TCPDstPort(port) :- Packet(EthPacket{.payload = EthIP4{IP4Pkt{.payload = IPTCP{TCPPkt{.dst = port}}}}}).
TCPDstPort(port) :- Packet(EthPacket{.payload = EthIP6{IP6Pkt{.payload = IPTCP{TCPPkt{.dst = port}}}}}).
Consider, the first of these rules. DDlog interprets it as follows: "for every packet in the
Packet relation that matches the specified pattern, bind a fresh variable port to the value of
the TCP destination port of the packet and add a record with this value to the TCPDstPort
relation". Note that this example requires two rules to capture two separate patterns (for IPv4 and
IPv6 packets).
It is possible to pattern-match the structure and content of a value at the same time. For instance, we can modify the last rule above to match only TCP packets whose source port number is 100:
TCPDstPort(port) :- Packet(EthPacket{.payload = EthIP6{IP6Pkt{.payload = IPTCP{TCPPkt{.src=100, .dst=port}}}}}).
Imagine that we wanted to write a function that takes a 32-bit IP address and splits it into
individual bytes. How do we define a function that returns multiple values? In languages like C or
Java this can be achieved by modifying function arguments passed by reference. Another option is to
define a new type with four fields, e.g., typedef four_bytes_t = FourBytes{b3: bit<8>, b2: bit<8>, b1: bit<8>, b0: bit<8>}. DDlog offers a nicer solution based on tuple types. A tuple is just a
group of related values of possibly different types. A tuple type lists the types of its values.
For example, our IP address splitting function could return a tuple with four 8-bit fields:
function addr_to_tuple(addr: bit<32>): (bit<8>, bit<8>, bit<8>, bit<8>) =
{
// construct an instance of a tuple
(addr[31:24], addr[23:16], addr[15:8], addr[7:0])
}
We use this function to compute a subset of IP addresses on the 192.168.*.* subnet:
input relation KnownHost(addr: ip4_addr_t)
output relation IntranetHost(addr: ip4_addr_t)
IntranetHost(addr) :- KnownHost(addr),
(var b3, var b2, _, _) = addr_to_tuple(addr),
b3 == 192,
b2 == 168.
Note the assignment that binds variables b2 and b3 to individual fields of the tuple returned by
addr_to_tuple(). DDlog allows an even more compact and intuitive way to express this rule by
combining assignment and pattern matching:
IntranetHost(addr) :- KnownHost(addr),
(192, 168, _, _) = addr_to_tuple(addr).
The fields of a tuple can be accessed using the record field access
syntax but using numbers instead of field names: .0 is the 0-th
field of the tuple (reading from the left). The previous rule can
be also written as:
IntranetHost(addr) :- KnownHost(addr),
var t = addr_to_tuple(addr),
t.0 == 192, t.1 == 168.
Let's revise the pkt_ip4() function defined above. The function extracts IPv4 header from a
packet. If the packet does not have an IPv4 header, it returns a default value with all fields set
to 0:
function pkt_ip4(pkt: eth_pkt_t): ip4_pkt_t = {
match (pkt) {
EthPacket{.payload = EthIP4{ip4}} -> ip4,
_ -> IP4Pkt{0,0,0,IPOther}
}
}
This does not feel satisfactory. A well-designed interface should give an explicit indication that the requested header is missing. One solution would be to return a variant type with two constructors:
// A type that represents an IP4 header or its absence.
typedef option_ip4_pkt_t = IP4Some{ p: ip4_pkt_t }
| IP4None
However, defining an extra variant type for each type in the program quickly becomes a burden on the programmer. Fortunately, DDlog allows us to define a generic option type that can be instantiated for any concrete type. This type is defined in the DDlog standard library as follows:
typedef Option<'A> = None
| Some {value : 'A}
Here 'A is a type argument that must be replaced with a concrete type to create a concrete
instantiation of Option. We can now rewrite the pkt_ip4() function using Option:
function pkt_ip4(pkt: eth_pkt_t): Option<ip4_pkt_t> = {
match (pkt) {
EthPacket{.payload = EthIP4{ip4}} -> Some{ip4},
_ -> None
}
}
TODO
Consider the following relation:
input relation Person (name: string, nationality: string, occupation: string)
We would like to define a function that filters this relation based on some criteria, e.g., selects
all US nationals who are students. Such a function could take all fields of the relation as
arguments; however a better software engineering practice is to pass the entire Person record to
the function:
function is_target_audience(person: Person): bool = {
(person.nationality == "USA") and
(person.occupation == "student")
}
Note how we use relation name as a type name here. In fact, the above relation declaration is merely syntactic sugar that expands into the following:
typedef Person = Person{name: string, nationality: string, occupation: string}
input relation Person [Person]
The square bracket syntax declares a relation by specifying its record type rather that a list of
fields. Using this syntax, we declare the TargetAudience relation that has the same record type
as Person and apply the is_target_audience() function to filter the Person relation.
output relation TargetAudience[Person]
TargetAudience[person] :- Person[person], is_target_audience(person).
In the context of a rule, square brackets are used to select or assign the entire record rather than its individual fields.
DDlog's rich type system makes it easy to store complex data structures in relations. As these data structures are copied to other relations, their contents is duplicated, potentially wasting memory. An alternative is to copy such values by reference instead.
The reference type is declared in the standard library along with two functions that manipulate references.
extern type Ref<'A>
extern function ref_new(x: 'A): Ref<'A>
extern function deref(x: Ref<'A>): 'A
Objects wrapped in references are automatically reference counted and
deallocated when the last reference is dropped. The programmer must however
avoid creating cyclic references, as those will form garbage that will stay
in memory forever. References can be compared using all comparison operators
(==, !=, <, <=, >, >=). The operators compare the objects wrapped in
references rather than their addresses. Two references are equal if and only
if the objects they point to are equal, whether they are the same object or not.
The ref_new() and deref() functions are in principle sufficient
to work with references, but they would be cumbersome to use without
additional syntactic support, described below.
Consider two relations that store information about schools and students:
typedef student_id = bit<64>
input relation &School(name: string, address: string)
input relation &Student(id: student_id, name: string, school: string, sat_score: bit<16>)
We are planning to copy records from these relations around and therefore wrap them
in Ref<>'s. The ampersant (&) in these declaration is a shortcut notation for:
typedef School{name: string, address: string}
input relation School[Ref<School>]
We now compute a new relation that stores the details of a student and their school in each record:
relation StudentInfo(student: Ref<Student>, school: Ref<School>)
StudentInfo(student, school) :-
student in &Student(.school = school_name),
school in &School(.name = school_name).
Note the use of the Ref type in relation declaration. Note also how & is
used to "open up" the relation and pattern match its fields without
calling deref on it. Here is an equivalent rule written without &:
StudentInfo(student, school) :-
Student[student],
Student{.school = var school_name} = deref(student),
School[school],
deref(school).name == school_name.
Next we use the StudentInfo relation to select the top SAT score for each school:
output relation TopScore(school: string, top_score: bit<16>)
TopScore(school, top_score) :-
StudentInfo(&Student{.sat_score = sat}, &School{.name = school}),
var top_score = Aggregate((school), group_max(sat)).
We again use & to pattern match values stored by reference.
As another syntactic convenience, DDlog allows accessing fields of structs and
tuples wrapped in references directly, without dereferencing them first.
The following rule is equivalent to the one above, but instead of opening up the
student record, we use .-notation to access the sat_score field:
TopScore(school, top_score) :-
StudentInfo(student, &School{.name = school}),
var top_score = Aggregate((school), group_max(student.sat_score)).
Some people dislike Datalog because the evaluation order of rules is not always obvious. DDlog provides an alternative syntax for writing rules which resembles more traditional imperative languages. For example, the previous program can be written as follows:
for (person in Person if is_target_audience(person))
TargetAudience(person)
Yet another way to write this program is:
for (person in Person)
if (is_target_audience(person))
TargetAudience(person)
Using this syntax, joins are expressed with nested for-loops; for example the following computes
an inner join of Region and Person relations over the region.id field:
for (region in Region) {
for (person in Person(.region = region.id))
if (is_target_audience(person))
TargetAudience(person, region)
}
One can also define variables:
for (person in Person)
var is_audience: bool = is_target_audience(person) in
if (is_audience)
TargetAudience(person)
Here is a more complex example that simultaneously filters out records that do not contain
a "router-port" option and assign the value of this option to the lrport_name variable.
for (lsp in Logical_Switch_Port) { Some{var lrport_name} = map_get(lsp.options, "router-port") in ... }
One can also use match statements:
for (person in Person)
var is_audience = is_target_audience(person) in
match (is_audience) {
true -> TargetAudience(person)
_ -> skip
}
These 5 kinds of statements (for, if, match, var, blocks enclosed in braces) can be nested
in arbitrary ways. When using this syntax semicolons must be used as separators between statements.
The DDlog compiler translates such programs into regular DDlog programs.
Modules make it easy to write DDlog programs that span multiple files. DDlog supports a simple module system, where each file is a separate module, and the module hierachy matches the directory structure.
Let's create a file called mod1.dl that declares a single relation R1:
input relation R1(f1: bool)
Next, create a directory called lib with a single file lib/mod2.dl:
input relation R2(f2: string)
We have just built a simple module hierarchy:
|-mod1
|-lib
|-mod2
Finally, create the main test file test.dl in the same directory as mod1.dl:
// declarations from mod1 are added to the local name space
import mod1
// declarations from mod2 are available via the "m2." prefix
import lib.mod2 as m2
output relation Main(f1: bool, f2: string)
Main(f1, f2) :- R1(f1), m2.R2(f2).
Note the two forms of the import directive. The first form (import <module_path>) makes all
type, function, relation, and constructor declarations from the module available in the local
namespace of the module importing it. The second form (import <module_path> as <alias>) makes all
declarations available to the importing module using the <alias>.<name> notation.
We create some data to feed to the test program in a test.dat file.
start;
insert mod1.R1(true),
insert lib.mod2.R2("hello");
commit;
dump;
Note that we refer to relations and constructors via their fully qualified names. Compile and run the test:
ddlog -i test.dl -L../../lib
cd test
cargo build --release
target/release/test_cli < ../test.dat
Modules do not have to be in the same directory with the main program. Assuming that our module
hierarchy is located, e.g., in ../modules, the first command above must be changed as follows:
ddlog -i test.dl -L../modules -L../../lib
Multiple -L options are allowed to access modules scattered across multiple directories.
The module system enables the creation of reusable DDlog libraries. Some of these libraries
are distributed with DDlog in the lib directory. A particularly important one is the standard
library std.dl, which defines types like Vec, Set, Map, Option, Ref, and others.
This library is imported automatically into every DDlog program; therefore the path to the
lib directory must always be specified using the -L switch.
DDlog normally operates on changes to relations: it accepts changes to
input relations as inputs and produces changes to output relations as
outputs. In some applications, however, it is necessary to know the current
state of a relation, i.e., the set of all records in the relation, or the set of
records with specific values in a subset of columns. For example, given a
relation that stores edges of a digraph
relation Edge(from: node_t, to: node_t)
one may need to query the set of edges with a given from or to node. To do
so, the developer must create an index on the Edge relation in the DDlog
program for each subset of columns used as a key in a query:
// Index the `Edge` relation by `from` node.
index Edge_by_from(from: node_t) on Edge(from, _)
This declaration creates an index that can be used to lookup all records in the
Edge relation with the given from node. It consists of a unique name
(Edge_by_from), a set of typed arguments that form the key (from: node_t), and a pattern, which defines the set of values associated with each
key (Edge(from, _)).
Once an index has been created, it can be queried at runtime, e.g., using the
query_index CLI command or an
equivalent method in your favorite language API:
# Output all edges where `from=100`.
query_index Edge_by_from(100);
It is also possible to dump the entire content of an index:
# Output all records in the `Edge` relation.
dump_index Edge_by_from;
One can create multiple indexes on the same relation:
// Index the `Edge` relation by `to` node.
index Edge_by_to(to: node_t) on Edge(_, to)
One can also define an index over multiple fields of the relation,
for example, we can index Edge by both from and to nodes:
// Index `Edge` by both source and destination nodes.
index Edge_by_to(from: node_t, to: node_t) on Edge(from, to)
The pattern expression simultaneously constraints the set of values and projects it on the subset of columns that form the key.
// Type with multiple constructors.
typedef Many = A{x: string}
| B{b: bool}
| D{t: tuple}
relation VI(a: bool, b: Many)
// Index over a subset of records with `a` field equal to `true`,
// `b` field having type constructor `D`, and using the second
// element of the `b.t` tuple as key.
index VI_by_t1(x: bit<8>) on VI(.a=true,.b=D{(_, x, _)})
DDlog indexes can be associated with any input, output, or internal relation.
Although similar in spirit to indexes in SQL, DDlog indexes serve a different purpose. They do not affect the performance of the DDlog program and cannot be used as an optimization technique (DDlog does use indexes for performance, but these indexes are constructed automatically and are not visible to the user).
DDlog offers several ways to feed data to a program:
-
Statically, by listing ground facts as part of the program.
-
Via a text-based command-line interface.
-
From a Rust program.
-
From a C or C++ program.
-
From a Java program.
In the following sections, we expand on each method.
This method is useful for specifying ground facts that are guaranteed to hold in every instantiation
of the program. Such facts can be specified statically to save the hassle of adding them manually
on every program instantiation. A ground fact is just a rule without a body. Add the following
declarations to the program to pre-populate Word1 and Word2 relations:
Word1("Hello,", CategoryOther).
Word2("world!", CategoryOther).
The text-based interface to DDlog, described so far, is primarily intended for testing and debugging purposes. In production use, a DDlog program is typically invoked as a library from a program written in a different language (e.g., C or Java).
Compile your program as explained above.
Let's assume your program is playpen.dl.
When compilation completes, the playpen_ddlog directory will be created,
containing generated Rust crate for the DDlog program. You can invoke this
crate from programs written in Rust, C/C++, Java, or any other language that
is able to invoke DDlog's C API through a foreign function interface.
- Rust
In order to use the generated crate directly from another Rust program,
add it as a dependency to your
Cargo.toml, e.g.,
[dependencies]
playpen = {path = "../playpen_ddlog"}
And invoke it through the API defined in the ./playpen_ddlog/api.rs file.
TODO: link to a separate API document
-
C/C++ The compiled program will contain a static library
playpen_ddlog/target/release/libplaypen_ddlog.athat can be linked against your C or C++ application. The C API to DDlog is defined inplaypen_ddlog/ddlog.h. To generare a dynamic library instead, pass the--dynlibflag toddlogto generate a.so(or.dylibon a Mac) file, along with--no-staticlibto disable generation of the static library. -
Java If you plan to use the library from a Java program, make sure to use
--dynliband--no-staticlibflags to generate a dynamically linked library. See Java API documentation for a detailed description of the Java API. -
The text-based interface is implemented by an auto-generated executable
./playpen_ddlog/target/release/playpen_cli. This interface is primarily meant for testing and debugging purposes, as it does not offer the same performance and flexibility as the API-based interfaces. See CLI documentation for a complete list of commands supported by the CLI tool.
DDlog's profiling features are designed to help the programmer to understand what parts of the DDlog program use the most CPU and memory. DDlog supports two commands related to profiling (also available through Rust, C, and Java APIs):
-
profile cpu on/off;- enables/disables recording of CPU usage info in additition to memory profiling. CPU profiling is not enabled by default, as it can slow down the program somewhat, especially for large programs that handle many small updates. -
profile;- returns information about program's CPU and memory usage. CPU usage is expessed as the total amount of time DDlog spent evaluating each operator, assuming CPU profiling was enabled. For example the following CPU profile record:
CPU profile
...
0s005281us ( 112calls) Join: DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc), LabeledNode(.node=child, .scc=childscc) 165
...
indicates that the program spent 5,281 microseconds in 112 activations of the
join operator that joins the prefix of the rule (DdlogDependency(.parent=parent, .child=child), LabeledNode(.node=parent, .scc=parentscc))
with the LabeledNode(.node=child, .scc=childscc) literal.
Memory profile reports current (at the time when the profile is being generated) and peak (since the start of the program) number of records in each DDlog arrangement. An arrangement is similar to an indexed representation of a relation in databases. Arrangements are responsible for the majority of memory consumption of a DDlog program. For example, the following memory profile fragment:
Arrangement peak sizes
...
451529 Arrange: LabeledNode{.node=_, .scc=_0} 136
372446 Arrange: LabeledNode{.node=_0, .scc=_} 132
indicates that the program contains two different arrangements of the LabeledNode
relation, indexed by the second and first fields, whose peak size is
451,529 and 372,446 records respectively (the numbered variables, e.g., _0)
indicate one or more fields used to index the relation by.
When using DDlog as a library, it may be difficult to isolate bugs in the DDlog
program from those in the application using DDlog. Replay debugging is a DDlog
feature that intercepts all DDlog invocations made by a program and dumps them
in a DDlog command file that can later be replayed against the DDlog program
running standalone, via the CLI interface. This has several advantages. First,
it allows the user to inspect DDlog inputs in human-readable form. Second, one
can easily modify the recorded command file, e.g., to dump relations of interest
or to print profiling information in various points throughout the execution, or
to simplify inputs in order to narrow down search for an error. Third, one can
replay the log against a modified DDlog program, e.g., to check that the program
behaves correctly after a bug fix. As another example, it may be useful to dump
the contents of intermediate relations by labeling them as output relation and
adding a dump <relation_name>; command to the command file.
Finally, by replaying recorded commands, one can obtain a relatively accurate
estimate of the amount of CPU time and memory spent in the DDlog computation
in isolation from the host program. For example, here we use the UNIX time
program (note: this is not the same as the time command in bash)
to measure time and memory footpint of a DDlog computation:
/usr/bin/time playpen_ddlog/target/release/playpen_cli -w 2 --no-print --no-store < replay.dat
where -w 2 runs DDlog with two worker threads,
--no-print stops DDlog from printing every update to output tables on stdout,
--no-store tells DDlog not to cache the content of output relations (which takes
time and memory), and
replay.dat is the name of the file that contains recorded DDlog commands.
To enable replay debugging, call the ddlog_record_commands() function in C (see
ddlog.h), DDlogAPI.record_commands() method in Java (DDlogAPI.java) or
the HDDlog.record_commands() method in Rust right after starting the
DDlog program, and before pushing any data to it.
TODO: checkpointing feature
** TODO **