Skip to content

Commit dcc06d2

Browse files
committed
upload developer_setup.md
1 parent f7014e7 commit dcc06d2

1 file changed

Lines changed: 277 additions & 0 deletions

File tree

docs/developer_setup.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
# OpenML Development Environment Setup Guide
2+
3+
This document describes the working and reproducible approach to setting up a local development environment for OpenML. While multiple configurations are possible, the steps below reflect a setup that has been tested and verified in practice.
4+
5+
OpenML currently provides two backend implementations:
6+
7+
* **API v1**: The legacy PHP-based server (currently in production)
8+
* **API v2**: The newer Python-based server built with FastAPI
9+
10+
According to the OpenML maintainers, API v1 will remain operational until at least the end of 2026, while development and migration efforts continue on API v2.
11+
12+
This guide covers:
13+
* Local setup of API v1 (PHP backend)
14+
* Local setup of API v2 (Python backend)
15+
* Development setup for the Python SDK (openml-python)
16+
___
17+
18+
## API v1 (PHP Backend) Setup
19+
20+
### Prerequisites
21+
22+
* Docker Desktop
23+
* Git
24+
25+
### Step 1: Clone the Services Repository
26+
27+
Fork and clone the OpenML services repository:
28+
29+
```bash
30+
git clone https://github.com/openml/services
31+
cd services
32+
```
33+
34+
### Step 2: Install Docker Desktop
35+
36+
Download and install Docker Desktop from: https://www.docker.com/products/docker-desktop/
37+
38+
Ensure Docker is running before proceeding.
39+
40+
### Step 3: Initialize File Permissions
41+
42+
On first use, you must ensure that the PHP data directory has the correct permissions. From the repository root, run:
43+
44+
```bash
45+
chown -R www-data:www-data data/php
46+
```
47+
48+
If this fails (for example, if www-data does not exist on your system), use:
49+
50+
```bash
51+
chmod -R 777 data/php
52+
```
53+
54+
### Step 4: Start the API v1 Services
55+
56+
With Docker running in the background, start all services using:
57+
58+
```bash
59+
docker compose --profile all up -d
60+
```
61+
62+
**Handling Container Name Conflicts**
63+
64+
If you have previously set up the API v2 (Python backend), you may encounter container name conflicts, as some service names overlap.
65+
66+
To resolve this:
67+
68+
Remove or rename the conflicting containers (easiest via Docker Desktop), then restart:
69+
70+
```bash
71+
docker compose --profile all down
72+
docker compose --profile all up -d
73+
```
74+
75+
### Step 5: Verify the API v1 Server
76+
77+
Confirm that the server is running correctly by opening: http://localhost:8080/api/v1/json/flow/181
78+
79+
A successful setup will return structured JSON data describing an OpenML flow.
80+
81+
### Step 6: Configure `openml-python` to Use the Local API v1 Server
82+
83+
To interact with your local API v1 instance, configure the OpenML client to point to the local server and use the API key defined in:
84+
85+
`services/config/php/.env`
86+
87+
**Example usage:**
88+
89+
```python
90+
import openml
91+
92+
openml.config.server = "http://localhost:8080/api/v1/xml"
93+
openml.config.apikey = "AD000000000000000000000000000000"
94+
95+
from openml_sklearn.extension import SklearnExtension
96+
from sklearn.neighbors import KNeighborsClassifier
97+
98+
clf = KNeighborsClassifier(n_neighbors=3)
99+
extension = SklearnExtension()
100+
knn_flow = extension.model_to_flow(clf)
101+
102+
knn_flow.publish()
103+
```
104+
105+
If successful, the flow will be uploaded to your local OpenML server and you should see output similar to:
106+
107+
```bash
108+
OpenML Flow
109+
===========
110+
Flow ID.........: 182 (version 1)
111+
Flow URL........: http://localhost:8080/f/182
112+
Flow Name.......: sklearn.neighbors._classification.KNeighborsClassifier
113+
Flow Description: Classifier implementing the k-nearest neighbors vote.
114+
Upload Date.....: 2025-12-30 09:11:17
115+
Dependencies....:
116+
- sklearn==1.8.0
117+
- numpy>=1.24.1
118+
- scipy>=1.10.0
119+
- joblib>=1.3.0
120+
- threadpoolctl>=3.2.0
121+
```
122+
---
123+
124+
## API v2 (Python Backend) Setup
125+
126+
### Prerequisites
127+
128+
* Docker Desktop
129+
* Git
130+
131+
### Step 1: Clone the Server API Repository
132+
133+
Fork and clone the API v2 repository:
134+
135+
```bash
136+
git clone https://github.com/openml/server-api
137+
cd server-api
138+
```
139+
140+
### Step 2: Install Docker Desktop
141+
142+
If not already installed, download Docker Desktop from:
143+
144+
https://www.docker.com/products/docker-desktop/
145+
146+
Ensure Docker is running.
147+
148+
### Step 3: Build and Start the Services
149+
150+
From the repository root, run:
151+
152+
```bash
153+
docker compose --profile all up
154+
```
155+
156+
This will build and start all containers and expose the services on your local machine.
157+
158+
### Step 4: Verify the API v2 Server
159+
160+
Once the containers are running, verify the setup using the following endpoints:
161+
162+
* FastAPI backend (v2): http://localhost:8001/tasks/31
163+
* Swagger UI documentation: http://localhost:8001/docs
164+
165+
166+
Both endpoints should return meaningful responses if the setup is successful.
167+
168+
---
169+
170+
## Python SDK (openml-python) Development Setup
171+
172+
### Prerequisites
173+
174+
* Python
175+
* Git
176+
* A virtual environment manager (conda, venv, uv, etc.)
177+
178+
### Step 1: Clone the Python SDK Repository
179+
180+
Fork and clone the OpenML Python client repository:
181+
182+
```bash
183+
git clone https://github.com/openml/openml-python
184+
cd openml-python
185+
```
186+
187+
### Step 2: Create and Activate a Virtual Environment
188+
189+
You may use any environment manager. Below is an example using conda:
190+
191+
```bash
192+
conda create -n openml-python-dev python=3.12
193+
conda activate openml-python-dev
194+
```
195+
196+
### Step 3: Install Development Dependencies
197+
198+
Install the package in editable mode along with development and documentation dependencies:
199+
200+
```bash
201+
python -m pip install -e ".[dev,docs]"
202+
```
203+
204+
### Step 4: Enable Pre-commit Hooks
205+
206+
Install and run the pre-commit hooks to ensure code quality and formatting:
207+
208+
```bash
209+
pre-commit install
210+
pre-commit run --all-files
211+
```
212+
---
213+
214+
### Running Tests with Pytest Markers
215+
216+
We are using pytest markers in OpenML Python SDK to categorize tests based on their dependencies and execution requirements. This allows developers to selectively include or skip certain groups of tests depending on their local setup.
217+
218+
#### Available Markers
219+
220+
* `sklearn`: Marks tests that require scikit-learn. These tests are skipped if scikit-learn is not installed.
221+
222+
* `production`: Marks tests that interact with the production OpenML server. These typically involve real API calls.
223+
224+
* `uses_test_server`: Marks tests that require OpenML test server.
225+
226+
#### Run full test suite:
227+
228+
```python
229+
pytest
230+
```
231+
232+
#### Run only tests with a specific marker (for example, `sklearn`):
233+
234+
```python
235+
pytest -m sklearn
236+
```
237+
238+
#### Run multiple markers using logical expressions:
239+
240+
```python
241+
pytest -m "sklearn and not production"
242+
```
243+
244+
### Skip tests that require the production server:
245+
246+
```python
247+
pytest -m "not production"
248+
```
249+
250+
### To list all registered pytest markers in the repository, run:
251+
252+
```python
253+
pytest --markers
254+
```
255+
256+
### Running Tests That Require Admin Privileges
257+
258+
Some tests require admin privileges on the test server and will be automatically skipped unless you provide an admin API key. For regular contributors, the tests will skip gracefully. For core contributors who need to run these tests locally:
259+
260+
Set up the key by exporting the variable:
261+
run this in the terminal before running the tests:
262+
263+
```bash
264+
# For windows
265+
$env:OPENML_TEST_SERVER_ADMIN_KEY = "admin-key"
266+
# For linux/mac
267+
export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"
268+
```
269+
270+
271+
## Notes and Recommendations
272+
273+
API v1 and API v2 can be run side-by-side, but container name conflicts must be resolved manually.
274+
275+
API v1 is required for many existing workflows (e.g. flow publishing).
276+
277+
API v2 is under active development and is the future direction of the OpenML platform.

0 commit comments

Comments
 (0)