Soil Organic Carbon (SOC) Prediction API

A machine learning API for predicting Soil Organic Carbon (SOC) using environmental and soil features. Multiple models were compared and the best performing model (Random Forest) was selected for deployment. The API is built using FastAPI and provides a simple and efficient way to make predictions.

Features

Multiple ML Models: Random Forest, Linear Regression, and XGBoost algorithms
Automated Feature Engineering: Simple and efficient reprocessing pipeline
Model Interpretability: SHAP value analysis for feature importance
REST API: FastAPI-based prediction service with automatic documentation
Docker Support: Containerized deployment for easy scaling
Model Persistence: Optimized model serialization and loading
Performance Monitoring: Built-in evaluation metrics and visualization tools

Model Performance

Model	R² Score	RMSE	MAE	CV RMSE	CV Std.
Random Forest	0.825433	0.254917	0.168785	0.265363	0.015952
XGBoost	0.821230	0.257968	0.173745	0.267518	0.013767
Linear Regression	0.800036	0.272831	0.182821	0.267765	0.015321

⭐ Best model: Random Forest selected based on validation performance

Project Structure

soc-predictor/
├── app/
│   ├── __init__.py
│   └── main.py              # FastAPI application and endpoints
├── data/
│   └── soil_nutrients.csv   # Training dataset
├── models/
│   ├── preprocessor.pkl     # Trained data preprocessor
│   └── soc_predictor.pkl    # Best performing model (Random Forest)
├── plots/                   # Model performance visualizations
|   ├──**.png
├── utils/
│   ├── data_loader.py       # Data loading utilities
│   ├── evaluator.py         # Model evaluation functions
│   ├── model_persistence.py # Model save/load functions
│   ├── trainer.py           # Model training pipeline
│   └── visualizer.py        # Plotting and visualization tools
├── .gitignore
├── config.py                # Configuration settings
├── Dockerfile               # Container configuration
├── README.md
├── requirements.txt         # Python dependencies
└── train_model.py           # Model training script

Getting Started

Prerequisites

Python 3.11 or higher
pip package manager
Docker (optional, for containerized deployment)

Installation

Clone the repository

git clone https://github.com/edudzikorku/soc-predictor.git
cd soc-predictor

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Start the FastAPI application
```
uvicorn app.main:app --reload
```

The API will be available at http://localhost:8000 with interactive documentation at http://localhost:8000/docs.

Docker Deployment

Build and Run Locally

# Build the Docker image
docker build -t soc-predictor .

# Run the container
docker run -p 8000:8000 soc-predictor

Pull from Docker Hub

# Pull the pre-built image
docker pull edudzi/soc-predictor:v1.0

# Run the container
docker run -p 8000:8000 edudzi/soc-predictor:v1.0

API Documentation

Endpoints

`POST /predict`

Predicts Soil Organic Carbon based on input features.

Request Body:

{
  "average_elevation": 436,
  "average_temperature": 29.87453613,
  "clay": 19,
  "land_class": "isda",
  "mean_precipitation": 7.909778506,
  "nitrogen": 0.0485999,
  "phosphorus": 12.9047,
  "potassium": 0.321371795,
  "sand": 62,
  "silt": 18,
  "soil_group": "isda",
  "soil_type": "isda",
  "sulfur": 8.66247,
  "zinc": 1.72509,
  "ph": 6.10277
}

Response:

{
  "predicted_soc": 0.7361968417456046,
  "interpretation": "Low SOC level, consider improving soil management practices."
}

`GET /`

Health check endpoint for monitoring service status.

Interactive Documentation

Visit http://localhost:8000/docs for the auto-generated Swagger UI documentation, or http://localhost:8000/redoc for ReDoc-style documentation.

Model Training

To retrain the model with new data or different parameters:

python train_model.py

This script will:

Load and preprocess the training data
Train multiple models (Random Forest, Linear Regression, XGBoost)
Evaluate model performance using cross-validation
Select the best performing model
Save the trained model and preprocessor
Generate performance visualization plots

Training Configuration

Modify config.py to adjust training parameters:

Input data directory
Unimportant features to drop
Output directories
Test size and random seed for reproducibility

Model Insights

Feature Importance

The most influential features for SOC prediction, showing nitrogen and mean_precipitation.

SHAP Analysis

SHAP values revealing how each feature contributes to individual predictions.

Model Performance Comparison

Cross-validation performance across different algorithms.

Example Usage

Using Python requests

import requests
import json

# API endpoint
url = "http://localhost:8000/predict"

# Sample soil data
data = {
    "average_elevation": 1000,
    "average_temperature": 20.5,
    "clay": 30,
    "land_class": "isda",
    "mean_precipitation": 800.0,
    "nitrogen": 0.1,
    "phosphorus": 0.05,
    "potassium": 0.2,
    "sand": 40,
    "silt": 30,
    "soil_group": "isda",
    "soil_type": "isda",
    "sulfur": 0.02,
    "zinc": 0.01,
    "ph": 6.5
}

# Make prediction request
response = requests.post(url, json = data)
result = response.json()

print(f"Predicted SOC: {result['soc_prediction']:.2f}")
print(f"Model Confidence: {result['model_confidence']:.2f}")

Using curl

curl -X POST "http://localhost:8000/predict" \
     -H "Content-Type: application/json" \
     -d '{
       "average_elevation": 1000,
       "average_temperature": 20.5,
       "clay": 30,
       "land_class": "isda",
       "mean_precipitation": 800.0,
       "nitrogen": 0.1,
       "phosphorus": 0.05,
       "potassium": 0.2,
       "sand": 40,
       "silt": 30,
       "soil_group": "isda",
       "soil_type": "isda",
       "sulfur": 0.02,
       "zinc": 0.01,
       "ph": 6.5
     }'

Input Features

Feature	Description	Unit	Range
average_elevation	Mean elevation of the area	meters	0-8000
average_temperature	Mean annual temperature	°C	-10-40
clay	Clay content percentage	%	0-100
land_class	Land classification type	categorical	-
mean_precipitation	Annual precipitation	mm	0-4000
nitrogen	Nitrogen content	%	0-1
phosphorus	Phosphorus content	%	0-1
potassium	Potassium content	%	0-1
sand	Sand content percentage	%	0-100
silt	Silt content percentage	%	0-100
soil_group	Soil group classification	categorical	-
soil_type	Specific soil type	categorical	-
sulfur	Sulfur content	%	0-1
zinc	Zinc content	%	0-1
ph	Soil pH level	-	3-10

Troubleshooting

Common Issues

1. Import errors when starting the application

# Ensure all dependencies are installed
pip install -r requirements.txt --upgrade

2. Model file not found

# Retrain the model
python train_model.py

3. Port already in use

# Use a different port
uvicorn app.main:app --port 8001

4. Docker build issues

# Clean build without cache
docker build --no-cache -t soc-predictor .

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this project in your research, please cite:

@software{soc_predictor,
  title = {Soil Organic Carbon Prediction API},
  author = {Edudzi K. Akpakli},
  year = {2025},
  url = {https://github.com/edudzikorku/soc-predictor.git}
}

Acknowledgments

Dataset provided by iPAGE
Built with FastAPI
Machine learning powered by scikit-learn and XGBoost
Model interpretability via SHAP
API development and deployment guidance from Machine Learning Mastery

Contact: kedudzi007@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
data		data
models		models
plots		plots
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Folders and files

Latest commit

History

Repository files navigation

Soil Organic Carbon (SOC) Prediction API

Table of Contents

Features

Model Performance

Project Structure

Getting Started

Prerequisites

Installation

Docker Deployment

Build and Run Locally

Pull from Docker Hub

API Documentation

Endpoints

POST /predict

GET /

Interactive Documentation

Model Training

Training Configuration

Model Insights

Feature Importance

SHAP Analysis

Model Performance Comparison

Example Usage

Using Python requests

Using curl

Input Features

Troubleshooting

Common Issues

Contributing

License

Citation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict`

`GET /`

Packages