Skip to content

Commit 4bfa462

Browse files
committed
update: egress scan and URL to WASM version
1 parent 263f7de commit 4bfa462

3 files changed

Lines changed: 153 additions & 0 deletions

File tree

module5/egressdetection.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: Data Exfiltration Detection
3+
short_title: Data Egress scan
4+
---
5+
6+
7+
Detecting potential data exfiltration caused by Python programs is important.
8+
9+
:::{important}
10+
Detecting Data Exfiltration in Python Code that Uses Telemetry, Remote Analytics, and SaaS Integrations
11+
12+
This is an essential step in **mitigating security risks**.
13+
:::
14+
15+
16+
## Why Python Data Exfiltration Detection Matters
17+
In Static Application Security Testing (SAST), identifying interactions with remote services is a fundamental requirement. A robust security audit must prioritize data exfiltration—the unauthorized or undocumented transfer of information—as a primary risk factor.
18+
19+
## Understanding Data Egress
20+
21+
Data egress occurs when information travels from your secure internal perimeter to an external destination. In a Python context, this includes the public internet, third-party cloud environments, partner networks, or SaaS integrations.
22+
23+
## Legitimate vs. Malicious Intent
24+
25+
In Python development, outbound data flow is often a core functional requirement. Modern applications rely on authorized egress paths for:
26+
- Communication: Sending automated emails or notifications.
27+
- Integration: Delivering API responses to external consumers.
28+
- Infrastructure: Syncing database backups to remote cloud storage.
29+
30+
However, Python's flexibility makes it a prime candidate for advanced exfiltration techniques. Malicious actors or compromised dependencies can hide unauthorized data transfers within seemingly benign traffic, often bypassing standard network-level detection.
31+
32+
33+
**The Fallacy of "Anonymous" Collection**
34+
35+
While many Python telemetry modules claim anonymity, **privacy risks** persist. If the backend systems are closed-source, they rely on **security by obscurity**, violating a core security principle.
36+
37+
:::{danger}
38+
Telemetry and various Python analytics and remote monitoring modules often collect more metadata than documented, sending private data to unknown, potentially vulnerable services.
39+
:::
40+
41+
## How to Check for Data Exfiltration
42+
43+
**Python Code Audit** includes functionality to detect potential data exfiltration risks. This feature is available through:
44+
45+
- the [CLI interface](userguide), and
46+
47+
- the [API](apidocs/modules).
48+
49+
Using the Python Code Audit CLI interface:
50+
The egress detection function can be activated with the following command:
51+
52+
```bash
53+
codeaudit filescan <pythonfile|package-name|directory> [OUTPUTFILE]
54+
```
55+
56+
**Report Output**
57+
58+
In the generated HTML report, each analysed file is evaluated for potential data exfiltration to external services.
59+
60+
If a potential risk is detected, the report will display:
61+
> *&#9888;&#65039; External Egress Risk: Detected outbound connection logic or API keys that may facilitate data egress.*
62+
63+
The report also highlights the exact lines of code that triggered the detection.
64+
65+
:::{tip}
66+
**Always review discovered modules carefully.**
67+
68+
In the report, under the section:
69+
70+
`> View used modules in this file.`
71+
72+
the HTML report lists all modules detected per file. Understanding each module is critical—some are strong indicators of possible data exchange with external systems. Review them to assess potential security or privacy risks.
73+
:::
74+
75+
If no external egress risks are identified, the report will display:
76+
> *&#x2705; No logic for connecting to remote services found. Risk of data exfiltration to external systems is low.*
77+
78+
79+
80+
:::{important}
81+
No tool can provide 100% guarantees. This applies to Python Code Audit as well as to any other security analysis tool.
82+
:::
83+
84+
85+
:::{admonition} Scope and Intent of **Python Code Audit Egress Scanning**
86+
:class: note
87+
88+
It is critical to distinguish between Data Egress Detection and Secret Scanning. While both are vital components of a secure development lifecycle, they address entirely different threat vectors.
89+
:::
90+
91+
92+
The **Python Code Audit** egress detection functionality is **NOT** designed to identify secrets within your source code.
93+
94+
Understanding the Difference:
95+
* Data Egress Detection: Focuses on the destination and mechanism of data leaving your environment (e.g., identifying telemetry hooks, hidden API calls, or SaaS integrations).
96+
97+
* Secret Scanning: Focuses on the credentials themselves (e.g., hardcoded API keys, passwords, or certificates), whether they are plaintext or obfuscated.
98+
99+
100+
## The "Shift-Left" Advantage
101+
Detecting exfiltration at the network level is reactive and expensive. It often fails when traffic is encrypted or blended with legitimate SaaS calls. Moving detection to the code level (Shift-Left) is more cost-effective and provides:
102+
103+
1. Supply Chain Integrity: Auditing third-party libraries before integration. If a library contains undocumented "phone home" logic, it can be blocked early.
104+
2. Defense in Depth: Perimeter tools (Firewalls, DLP, CASBs) are essential but not infallible. Source code detection adds a vital internal layer of defense.
105+
106+
Security Mandate: From a **Zero Trust** standpoint, organisations must verify if telemetry is present in their Python code and ensure all associated risks are mitigated through code, systems, and management processes.
107+
108+
## Assessing the Security Risks
109+
110+
Telemetry represents a deliberate hole in your network perimeter. When Python applications implement advanced tracking without granular consent, they transition from a "utility" to a significant security liability.
111+
112+
1. Sensitive Data Leakage
113+
114+
Telemetry often captures more than just "events." Without rigorous sanitization, these streams can include:
115+
- **PII (Personally Identifiable Information):** Usernames, IP addresses, and location data.
116+
- **Secrets in Logs:** Authentication tokens or database strings caught in stack traces.
117+
- **Business Logic:** Proprietary metadata revealing internal infrastructure.
118+
119+
120+
2. Expanded Attack Surface
121+
Every external API endpoint is a potential point of failure.
122+
123+
- **Unauthenticated Endpoints:** Many telemetry "sinks" lack robust auth, making them easy targets for interception.
124+
- **Library Vulnerabilities:** The telemetry module itself may contain vulnerabilities (e.g., RCE or path traversal) that grant attackers a foothold.
125+
126+
3. The "When, Not If" Data Breach
127+
128+
Data sent to a third party is only as secure as their defenses.
129+
130+
- **Loss of Custody**: Once data leaves your perimeter, you lose the ability to protect it.
131+
- **Transparency Gaps:** You are dependent on the provider to detect and report breaches—a process that often takes months.
132+

module5/installation.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,26 @@ For this course we use the best FOSS (Free and Open Source) SAST Tool for Python
88
Python Code Audit is compatible with both Unix-based systems (Linux/macOS) and Windows.
99

1010

11+
## Use the browser-based version
12+
13+
To access the local browser-based version of Python Code Audit, follow the link below:
14+
15+
{button}`Launch webbased version <https://nocomplexity.com/codeauditapp/dashboardapp.html>`
16+
17+
18+
The browser-based (WASM) version allows you to run **Python Code Audit** directly in your web browser without installing anything. This means you can quickly validate and inspect packages hosted on PyPI.org in a safe and isolated environment. It is especially useful for learning, quick checks, and reviewing package integrity before downloading or installing them locally.
19+
20+
21+
:::{note}
22+
Validating local Python files and directories is not possible with the browser based version.
23+
:::
24+
25+
26+
## Install the package locally
27+
28+
In order to make use of all the functionality of **Python Code Audit**, you must install the Python package locally.
29+
For this course and for regular security validation it is advised to use the full version.
30+
1131
To install Python Code Audit, run the following command in your terminal or command prompt:
1232
```Python
1333
pip install -U codeaudit

toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ project:
3434
- file: module5/complexitycheck.md
3535
- file: module5/sastscan.md
3636
- file: module5/sast_boundaries.md
37+
- file: module5/egressdetection.md
3738
- file: module5/paretoprinciple.md
3839
- file: module6/module6_overview.md #dropdown menu
3940
children:

0 commit comments

Comments
 (0)