Benchmarking the operational + environmental efficiency of four global FMCG manufacturers — Nestlé, Henkel, Procter & Gamble, and Unilever — using Data Envelopment Analysis, a non-parametric linear-programming technique for relative efficiency measurement.
This project turns a research study into a fully reproducible data-science pipeline: clean datasets, a from-scratch DEA solver, a tested analysis script, publication-quality visualizations, and a narrative Jupyter notebook.
Manufacturers face a hard question: are we converting resources (water, energy, labour) into sustainability outcomes (lower emissions, recyclable packaging, worker safety) as efficiently as our peers? Traditional ratios compare one input to one output at a time. DEA handles many inputs and outputs at once, with no need to pre-assign prices or weights, and identifies an efficient frontier of best performers that everyone else is measured against. It is widely used in real operations research — banking branch performance, hospital productivity, supply-chain benchmarking, and ESG analytics.
| Company | Technical Efficiency (2018–2022) | Read |
|---|---|---|
| Nestlé | ~1.00 | On the efficient frontier almost every year — lean, stable operations |
| Henkel | ~0.98 | Consistently near-frontier, marginal slack in 2020–21 |
| P&G | high Φ-slack | Largest measured improvement headroom, gradually improving |
| Unilever | rising | Started furthest from the frontier but shows the clearest upward trend |
Full numbers: data/processed/reported_efficiency_summary.csv
and the model-recomputed scores in results/computed_efficiency.csv.
For each Decision-Making Unit (here a company-year), DEA solves one linear program. The output-oriented, variable-returns-to-scale (BCC) model asks:
Holding inputs fixed, by what factor Φ ≥ 1 could this unit expand its outputs to reach the frontier?
Efficiency is reported as TE = 1 / Φ, where 1.0 means the unit is already
efficient. The solver (src/dea.py) also supports the input-oriented and
constant-returns (CCR) formulations.
from dea import efficiency_scores
scores = efficiency_scores(
inputs=[[8359, 53000], [8324, 52450]], # water, labour
outputs=[[682, 11938, 125], [665, 11618, 113]], # emissions, waste, safety
names=["2018", "2019"],
orientation="output", rts="VRS",
)
print(scores)Sustainable-Manufacturing-DEA/
├── data/
│ ├── raw/ # cleaned per-company input/output CSVs
│ │ ├── nestle.csv henkel.csv pg.csv unilever.csv
│ └── processed/
│ └── reported_efficiency_summary.csv
├── src/
│ ├── dea.py # DEA solver (CCR/BCC, input/output oriented)
│ ├── visualize.py # matplotlib plotting helpers
│ └── run_analysis.py # end-to-end pipeline -> results/
├── notebooks/
│ └── 01_DEA_walkthrough.ipynb # narrated analysis with inline charts
├── results/
│ ├── computed_efficiency.csv
│ └── figures/ # generated PNGs
├── docs/
│ ├── research_paper.docx / .pdf
│ └── source_files/ # original Excel workbooks
├── requirements.txt
├── LICENSE
└── README.md
# 1. clone and enter
git clone <your-repo-url>
cd Sustainable-Manufacturing-DEA
# 2. install dependencies
pip install -r requirements.txt
# 3. reproduce all results and figures
cd src && python run_analysis.py
# 4. (optional) open the guided walkthrough
jupyter notebook notebooks/01_DEA_walkthrough.ipynbrun_analysis.py regenerates everything in results/ from the raw CSVs, so the
analysis is fully reproducible from scratch.
Sustainability metrics were compiled from the four companies' public ESG / annual sustainability reports (2018–2022; Unilever 2010–2022). Each company is modelled with its own inputs and outputs:
| Company | Example inputs | Example outputs |
|---|---|---|
| Nestlé | water usage, energy, % covered by collective bargaining | renewable electricity, recyclable packaging, safety |
| Henkel | water consumption, labour | recycled waste, packaging recyclability, occupational safety |
| P&G | energy consumption, recyclable packaging | GHG scopes, fresh-water metrics |
| Unilever | water, value of contributions | emissions, waste disposed, safety at work |
DEA loses discriminating power when the number of DMUs is small relative to the
number of inputs + outputs — a common rule of thumb is n_DMU ≥ 3 × (inputs + outputs). With only 4–5 years per company, a textbook VRS model labels nearly
every unit "efficient" (see results/computed_efficiency.csv). The original
study addresses this with a two-phase subjective/objective weighting and
aggregated outputs, which sharpens the comparison between firms. Both views are
included here on purpose — surfacing the limitation is part of doing the analysis
honestly.
Possible extensions: pooled cross-company DEA on common normalised metrics, Malmquist productivity indices for year-over-year change, super-efficiency models to rank frontier units, and bootstrapped confidence intervals (Simar & Wilson).
Python · NumPy · pandas · SciPy (HiGHS LP solver) · Matplotlib · Jupyter
Likhith Raj Yesala — research, modelling, and implementation.
Released under the MIT License.

