This repository provides a full, production-ready pipeline for detecting and analyzing suspicious web traffic collected from AWS CloudWatch. It bundles data, notebooks, trained models, deployment tooling (FastAPI, Streamlit), CI, Docker support, and documentation — everything you need to reproduce and deploy the project.
Dataset (local path): /mnt/data/CloudWatch_Traffic_Web_Attack.csv
Note: The dataset file is included in
data/CloudWatch_Traffic_Web_Attack.csv. If you plan to publish this repository publicly and the file is large or sensitive, consider using Git LFS or uploading it to GitHub Releases and adding a download script.
data/— original dataset and processed outputs.notebooks/— end-to-end Jupyter notebook with EDA, feature engineering, and modeling.src/— source code: data loaders, training scripts,streamlit_app.py,fastapi_app.py.models/— serialized model artifacts (IsolationForest, RandomForest, scaler) and model metadata.reports/— academic PDF, model documentation, architecture diagram, slides.assets_banner.png— project banner for README or GitHub repo header.Dockerfile,requirements.txt,.github/workflows/ci.yml,RUN_SERVICES.md, tests, and deployment helpers.LICENSE,CONTRIBUTING.md,CODE_OF_CONDUCT.md,.gitignore.
The project ships with pre-trained model artifacts saved under models/:
isolation_forest_model.pkl— unsupervised anomaly detector (IsolationForest).random_forest_classifier.pkl— supervised classifier for suspicious traffic.scaler.pkl— StandardScaler used to preprocess inputs for the classifier.
Model card (summary): see reports/Model_Documentation.pdf for input schema, expected performance notes, bias/limitations, and deployment guidance.
- Clone the repository:
git clone https://github.com/yourusername/yourrepo.git
cd yourrepo- Create a virtualenv and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt- Run the Streamlit dashboard:
streamlit run src/streamlit_app.py- Run the FastAPI prediction API:
uvicorn src.fastapi_app:app --reload --host 0.0.0.0 --port 8000- Run tests:
pytest -q- Use the included
Dockerfileto create a container for the FastAPI service. Use an orchestration platform (Kubernetes / ECS) for scaling and high availability. - Secure the API: add authentication (API keys, OAuth), TLS termination, rate limiting, and logging/monitoring.
- Model updates: store model versions in
models/and use a retraining pipeline; updatereports/Model_Documentation.pdfwith each release. - Sensitive data: remove or anonymize IPs if publishing publicly.
See CONTRIBUTING.md for contribution guidelines, coding style, and testing instructions.
This project is MIT licensed. See LICENSE for full details.
