-
salento-repl.py pop.jsonloads a dataset to memory. -
What are the most anomalous locations? Command
groupgives you a bird's-eye of your dataset by showing you the score per location (line number in a single source file), which is divided per package (a package consists of multiple source files, and usually matches a directory).> group --pkg :3 --limit 1 id: 0 pkg: ayttm-0.6.3/libproxy/ssl.c by: /home/tgc/ssl.c:152 anomalous: 57% id: 1 pkg: balsa-2.4.12/libbalsa/imap/pop3.c by: /home/tgc/pop3.c:610 anomalous: 80% id: 2 pkg: balsa-2.4.12/libbalsa/imap/siobuf.c by: /home/tgc/siobuf.c:339 anomalous: 65%Notes:
Each entry shows the unique package identifier
id, the package namepkg, the location nameby, and the scoreunlikely; 100% means maximum anomaly.Argument
--pkg :3shows the first 3 packages in the given dataset. The syntax of the argument of--pkgfollows Python's slicing notation.Argument
--limit 1shows the most anomalous location per package.You can always append
--helpfor more information. -
Which sequence of calls end in a given anomalous location? Command
seqshows the score of each sequence in our dataset.> seq --pkg 1 --sub pop3.c:454 --limit 3 id: 1119 count: 8 last: pop3.c:454 anomalous: 78% id: 1120 count: 9 last: pop3.c:454 anomalous: 73% id: 1120 count: 10 last: pop3.c:454 anomalous: 71%Notes:
Each entry shows the unique sequence identifier
id, the number of calls in the sequencecount, the last locationlast, and the anomaly scoreanomaly(where higher is more anomalous).Argument
--pkg 1selects the package with identifier 1.Argument
--subselects all sub-sequences that end with a locationpop3.c:454. Each sequence might have many sub-sequence, with varying anomaly scores. -
How do we visualize the sequence of calls? We can generate visualization all sequences in a query by appending
--saveto our query, which generates Graphviz*.gvfiles:> seq --pkg 1 --sub pop3.c:454 --limit 3 --saveIf we run the following command in a terminal (in the same working directory):
$ ls *.gv 1-1119.gv 1-11202.gv 1-1120.gvNote: Argument
--saveiterates over each sequence and creates a Graphviz file. You can change the filename with--save-fmt. -
Visualize Graphviz to identify an anomaly.
$ xdot 1-1119.gvColored edges represent the call sequence being rendered, these are annotated with the similarity between the given call and the most probable call.
Black edges represent the most probable call and is annotated with the probability of that call.
In this example, the tool is warning us about two anomalies:
- Missed checking the return value of an
sprintfcall. - After checking the return value of function
popWriteLinethe function should end.
- Missed checking the return value of an
