Adding Primitive Tainting#211
Conversation
|
So, as an update/status recap from my point of view:
I looked at adding more sources/sinks to the WebIDL generator. My first impression was this would be a fairly involved change that certainly does not work for every operation. I will investigate further to check for cost/benefit tradeoffs.
This works as is; it is just user-unfriendly in its current form. This is fixed once #2 lands. So the major remaining points seem to be:
I think we can get a good grasp by running a) the benchmarks (e.g., Ares 6) that mach supports and perform the same tests we did for FP tracer. There, the primitive tainting was fairly fast (<15% overhead on page load times)
I will look further, but I believe it might be best to decouple this change from this PR. Reworking the flow representation has benefits that go beyond primitive tainting. I certainly recall fighting this for the hand sanitizer study. As this change seems fairly disruptive (i.e., all tooling would require a significant rework), we probably have to get this right. What might be an option would be to store the flow internally as a directed graph and then offer the code to lineralize the flows to their current form. My impression is that the current Foxhound users mainly use Playwright (we can make an example available here) or extensions (same applies). So, to sum up this point: I believe we should decouple the flow representation from this PR and offer to disable flow creation for primitive sources via the configuration, similar to how we allow disabling taint sources. This would allow us to merge this PR earlier and avoid divergence between mainline Foxhound and primitaint Foxhound. |
|
I agree that we should avoid diverging the main and primitaint branches, the idea of this PR is to have something we can merge to main. I would also be in favour of splitting up the work if possible. As you suggest I think a full blown move to graph based taint flows is going to be a big effort and should be handled in a separate PR (if at all...). |
In this branch properties of WebIDL files can be marked as taint sources with an attribute. This is super nice, but if one wants to disable one of them one has to note down their names during a build. This is pretty cumbersome. This commit tries to resolve this issue, as it simply adds every IDL based source to the properties defaults.
… to make Math library taint aware
Before Array.indexOf and Array.includes had subtle differences in behavior when it came to tainted values or keys when using said functions. This adds some special case for tainted numbers and unboxes them, so that they behave the same. This requires to flag tainted numbers as not bitwise comparable, which might have downstream effects (?)
|
Just thinking about ways to format the taint graph output and found this: Might be more intuitive than trying to output in graphviz format. |
|
That looks interesting, because effectively we can still stuff it into MongoDB, and it seems like a lot of stuff could transfer. Just thinking generally, the export format is something where we have a fair share of room to experiment, right? Exporting the (full) graph to JSON graph, GraphViz, or say GML is something we could allow to be configured via a flag, be it at compile time or a property. Like to change the export format, it's just a different |
|
Exactly, the output format is something we can experiment with, it's independent of how we store the graph in memory. The output is also only computed on demand (like the current |
This PR enables tainting for numbers, and is a mergable version of the https://github.com/SAP/project-foxhound/tree/primitaint branch.
This PR adds the following features:
There is still some work to do before the merge:
a = b + c- if b and c are tainted, how does the taint flow look for c?See whether existing sources can be moved to the webidl generator