Skip to content

Commit 7702451

Browse files
authored
Merge branch 'main' into issue-1562
2 parents 4c59099 + ede1497 commit 7702451

32 files changed

Lines changed: 556 additions & 319 deletions

.github/workflows/test.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,9 +114,9 @@ jobs:
114114
fi
115115
116116
if [ "${{ matrix.sklearn-only }}" = "true" ]; then
117-
marks="sklearn and not production and not uses_test_server"
117+
marks="sklearn and not production_server and not test_server"
118118
else
119-
marks="not production and not uses_test_server"
119+
marks="not production_server and not test_server"
120120
fi
121121
122122
pytest -n 4 --durations=20 --dist load -sv $codecov -o log_cli=true -m "$marks"
@@ -131,9 +131,9 @@ jobs:
131131
fi
132132
133133
if [ "${{ matrix.sklearn-only }}" = "true" ]; then
134-
marks="sklearn and production and not uses_test_server"
134+
marks="sklearn and production_server and not test_server"
135135
else
136-
marks="production and not uses_test_server"
136+
marks="production_server and not test_server"
137137
fi
138138
139139
pytest -n 4 --durations=20 --dist load -sv $codecov -o log_cli=true -m "$marks"
@@ -143,7 +143,7 @@ jobs:
143143
env:
144144
OPENML_TEST_SERVER_ADMIN_KEY: ${{ secrets.OPENML_TEST_SERVER_ADMIN_KEY }}
145145
run: | # we need a separate step because of the bash-specific if-statement in the previous one.
146-
pytest -n 4 --durations=20 --dist load -sv --reruns 5 --reruns-delay 1 -m "not uses_test_server"
146+
pytest -n 4 --durations=20 --dist load -sv --reruns 5 --reruns-delay 1 -m "not test_server"
147147
148148
- name: Check for files left behind by test
149149
if: matrix.os != 'windows-latest' && always()

docs/developer_setup.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,210 @@
1+
# OpenML Local Development Environment Setup
2+
3+
This guide outlines the standard procedures for setting up a local development environment for the OpenML ecosystem. It covers the configuration of the backend servers (API v1 and API v2) and the Python Client SDK.
4+
5+
OpenML currently has two backend architecture:
6+
7+
* **API v1**: The PHP-based server currently serving production traffic.
8+
* **API v2**: The Python-based server (FastAPI) currently under active development.
9+
10+
> Note on Migration: API v1 is projected to remain operational through at least 2026. API v2 is the target architecture for future development.
11+
12+
## 1. API v1 Setup (PHP Backend)
13+
14+
This section details the deployment of the legacy PHP backend.
15+
16+
### Prerequisites
17+
18+
* **Docker**: Docker Desktop (Ensure the daemon is running).
19+
* **Version Control**: Git.
20+
21+
### Installation Steps
22+
23+
#### 1. Clone the Repository
24+
25+
Retrieve the OpenML services source code:
26+
27+
```bash
28+
git clone https://github.com/openml/services
29+
cd services
30+
```
31+
32+
#### 2. Configure File Permissions
33+
34+
To ensure the containerized PHP service can write to the local filesystem, initialize the data directory permissions.
35+
36+
From the repository root:
37+
38+
```bash
39+
chown -R www-data:www-data data/php
40+
```
41+
42+
If the `www-data` user does not exist on the host system, grant full permissions as a fallback:
43+
44+
```bash
45+
chmod -R 777 data/php
46+
```
47+
48+
#### 3. Launch Services
49+
50+
Initialize the container stack:
51+
52+
```bash
53+
docker compose --profile all up -d
54+
```
55+
56+
#### Warning: Container Conflicts
57+
58+
If API v2 (Python backend) containers are present on the system, name conflicts may occur. To resolve this, stop and remove existing containers before launching API v1:
59+
60+
```bash
61+
docker compose --profile all down
62+
docker compose --profile all up -d
63+
```
64+
65+
#### 4. Verification
66+
67+
Validate the deployment by accessing the flow endpoint. A successful response will return structured JSON data.
68+
69+
* **Endpoint**: http://localhost:8080/api/v1/json/flow/181
70+
71+
### Client Configuration
72+
73+
To direct the `openml-python` client to the local API v1 instance, modify the configuration as shown below. The API key corresponds to the default key located in `services/config/php/.env`.
74+
75+
```python
76+
import openml
77+
from openml_sklearn.extension import SklearnExtension
78+
from sklearn.neighbors import KNeighborsClassifier
79+
80+
# Configure client to use local Docker instance
81+
openml.config.server = "http://localhost:8080/api/v1/xml"
82+
openml.config.apikey = "AD000000000000000000000000000000"
83+
84+
# Test flow publication
85+
clf = KNeighborsClassifier(n_neighbors=3)
86+
extension = SklearnExtension()
87+
knn_flow = extension.model_to_flow(clf)
88+
89+
knn_flow.publish()
90+
```
91+
92+
## 2. API v2 Setup (Python Backend)
93+
94+
This section details the deployment of the FastAPI backend.
95+
96+
### Prerequisites
97+
98+
* **Docker**: Docker Desktop (Ensure the daemon is running).
99+
* **Version Control**: Git.
100+
101+
### Installation Steps
102+
103+
#### 1. Clone the Repository
104+
105+
Retrieve the API v2 source code:
106+
107+
```bash
108+
git clone https://github.com/openml/server-api
109+
cd server-api
110+
```
111+
112+
#### 2. Launch Services
113+
114+
Build and start the container stack:
115+
116+
```bash
117+
docker compose --profile all up
118+
```
119+
120+
#### 3. Verification
121+
122+
Validate the deployment using the following endpoints:
123+
124+
* **Task Endpoint**: http://localhost:8001/tasks/31
125+
* **Swagger UI (Documentation)**: http://localhost:8001/docs
126+
127+
## 3. Python SDK (`openml-python`) Setup
128+
129+
This section outlines the environment setup for contributing to the OpenML Python client.
130+
131+
### Installation Steps
132+
133+
#### 1. Clone the Repository
134+
135+
```bash
136+
git clone https://github.com/openml/openml-python
137+
cd openml-python
138+
```
139+
140+
#### 2. Environment Initialization
141+
142+
Create an isolated virtual environment (example using Conda):
143+
144+
```bash
145+
conda create -n openml-python-dev python=3.12
146+
conda activate openml-python-dev
147+
```
148+
149+
#### 3. Install Dependencies
150+
151+
Install the package in editable mode, including development and documentation dependencies:
152+
153+
```bash
154+
python -m pip install -e ".[dev,docs]"
155+
```
156+
157+
#### 4. Configure Quality Gates
158+
159+
Install pre-commit hooks to enforce coding standards:
160+
161+
```bash
162+
pre-commit install
163+
pre-commit run --all-files
164+
```
165+
166+
## 4. Testing Guidelines
167+
168+
The OpenML Python SDK utilizes `pytest` markers to categorize tests based on dependencies and execution context.
169+
170+
| Marker | Description |
171+
|-------------------|-----------------------------------------------------------------------------|
172+
| `sklearn` | Tests requiring `scikit-learn`. Skipped if the library is missing. |
173+
| `production_server`| Tests that interact with the live OpenML server (real API calls). |
174+
| `test_server` | Tests requiring the OpenML test server environment. |
175+
176+
### Execution Examples
177+
178+
Run the full test suite:
179+
180+
```bash
181+
pytest
182+
```
183+
184+
Run a specific subset (e.g., `scikit-learn` tests):
185+
186+
```bash
187+
pytest -m sklearn
188+
```
189+
190+
Exclude production tests (local only):
191+
192+
```bash
193+
pytest -m "not production_server"
194+
```
195+
196+
### Admin Privilege Tests
197+
198+
Certain tests require administrative privileges on the test server. These are skipped automatically unless an admin API key is provided via environment variables.
199+
200+
#### Windows (PowerShell):
201+
202+
```shell
203+
$env:OPENML_TEST_SERVER_ADMIN_KEY = "admin-key"
204+
```
205+
206+
#### Linux/macOS:
207+
208+
```bash
209+
export OPENML_TEST_SERVER_ADMIN_KEY="admin-key"
210+
```

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ nav:
6565
- Advanced User Guide: details.md
6666
- API: reference/
6767
- Contributing: contributing.md
68+
- Developer Setup: developer_setup.md
6869

6970
markdown_extensions:
7071
- pymdownx.highlight:

openml/cli.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -102,15 +102,15 @@ def check_apikey(apikey: str) -> str:
102102

103103
def configure_server(value: str) -> None:
104104
def check_server(server: str) -> str:
105-
is_shorthand = server in ["test", "production"]
105+
is_shorthand = server in ["test", "production_server"]
106106
if is_shorthand or looks_like_url(server):
107107
return ""
108-
return "Must be 'test', 'production' or a url."
108+
return "Must be 'test', 'production_server' or a url."
109109

110110
def replace_shorthand(server: str) -> str:
111111
if server == "test":
112-
return "https://test.openml.org/api/v1/xml"
113-
if server == "production":
112+
return f"{config.TEST_SERVER_URL}/api/v1/xml"
113+
if server == "production_server":
114114
return "https://www.openml.org/api/v1/xml"
115115
return server
116116

@@ -119,7 +119,7 @@ def replace_shorthand(server: str) -> str:
119119
value=value,
120120
check_with_message=check_server,
121121
intro_message="Specify which server you wish to connect to.",
122-
input_message="Specify a url or use 'test' or 'production' as a shorthand: ",
122+
input_message="Specify a url or use 'test' or 'production_server' as a shorthand: ",
123123
sanitize=replace_shorthand,
124124
)
125125

openml/config.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@
2828
OPENML_TEST_SERVER_ADMIN_KEY_ENV_VAR = "OPENML_TEST_SERVER_ADMIN_KEY"
2929
_TEST_SERVER_NORMAL_USER_KEY = "normaluser"
3030

31+
TEST_SERVER_URL = "https://test.openml.org"
32+
3133

3234
class _Config(TypedDict):
3335
apikey: str
@@ -214,7 +216,7 @@ class ConfigurationForExamples:
214216
_last_used_server = None
215217
_last_used_key = None
216218
_start_last_called = False
217-
_test_server = "https://test.openml.org/api/v1/xml"
219+
_test_server = f"{TEST_SERVER_URL}/api/v1/xml"
218220
_test_apikey = _TEST_SERVER_NORMAL_USER_KEY
219221

220222
@classmethod
@@ -470,7 +472,8 @@ def get_cache_directory() -> str:
470472
471473
"""
472474
url_suffix = urlparse(server).netloc
473-
reversed_url_suffix = os.sep.join(url_suffix.split(".")[::-1]) # noqa: PTH118
475+
url_parts = url_suffix.replace(":", "_").split(".")[::-1]
476+
reversed_url_suffix = os.sep.join(url_parts) # noqa: PTH118
474477
return os.path.join(_root_cache_directory, reversed_url_suffix) # noqa: PTH118
475478

476479

openml/tasks/functions.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -415,9 +415,10 @@ def get_task(
415415
if not isinstance(task_id, int):
416416
raise TypeError(f"Task id should be integer, is {type(task_id)}")
417417

418-
cache_key_dir = openml.utils._create_cache_directory_for_id(TASKS_CACHE_DIR_NAME, task_id)
419-
tid_cache_dir = cache_key_dir / str(task_id)
420-
tid_cache_dir_existed = tid_cache_dir.exists()
418+
task_cache_directory = openml.utils._create_cache_directory_for_id(
419+
TASKS_CACHE_DIR_NAME, task_id
420+
)
421+
task_cache_directory_existed = task_cache_directory.exists()
421422
try:
422423
task = _get_task_description(task_id)
423424
dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
@@ -431,8 +432,8 @@ def get_task(
431432
if download_splits and isinstance(task, OpenMLSupervisedTask):
432433
task.download_split()
433434
except Exception as e:
434-
if not tid_cache_dir_existed:
435-
openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, tid_cache_dir)
435+
if not task_cache_directory_existed:
436+
openml.utils._remove_cache_dir_for_id(TASKS_CACHE_DIR_NAME, task_cache_directory)
436437
raise e
437438

438439
return task

openml/testing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ class TestBase(unittest.TestCase):
4747
"user": [],
4848
}
4949
flow_name_tracker: ClassVar[list[str]] = []
50-
test_server = "https://test.openml.org/api/v1/xml"
50+
test_server = f"{openml.config.TEST_SERVER_URL}/api/v1/xml"
5151
admin_key = os.environ.get(openml.config.OPENML_TEST_SERVER_ADMIN_KEY_ENV_VAR)
5252
user_key = openml.config._TEST_SERVER_NORMAL_USER_KEY
5353

openml/utils/__init__.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""Utilities module."""
2+
3+
from openml.utils._openml import (
4+
ProgressBar,
5+
ReprMixin,
6+
_create_cache_directory,
7+
_create_cache_directory_for_id,
8+
_create_lockfiles_dir,
9+
_delete_entity,
10+
_get_cache_dir_for_id,
11+
_get_cache_dir_for_key,
12+
_get_rest_api_type_alias,
13+
_list_all,
14+
_remove_cache_dir_for_id,
15+
_tag_entity,
16+
_tag_openml_base,
17+
extract_xml_tags,
18+
get_cache_size,
19+
thread_safe_if_oslo_installed,
20+
)
21+
22+
__all__ = [
23+
"ProgressBar",
24+
"ReprMixin",
25+
"_create_cache_directory",
26+
"_create_cache_directory_for_id",
27+
"_create_lockfiles_dir",
28+
"_delete_entity",
29+
"_get_cache_dir_for_id",
30+
"_get_cache_dir_for_key",
31+
"_get_rest_api_type_alias",
32+
"_list_all",
33+
"_remove_cache_dir_for_id",
34+
"_tag_entity",
35+
"_tag_openml_base",
36+
"extract_xml_tags",
37+
"get_cache_size",
38+
"thread_safe_if_oslo_installed",
39+
]

openml/utils.py renamed to openml/utils/_openml.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,7 @@
2626
import openml
2727
import openml._api_calls
2828
import openml.exceptions
29-
30-
from . import config
29+
from openml import config
3130

3231
# Avoid import cycles: https://mypy.readthedocs.io/en/latest/common_issues.html#import-cycles
3332
if TYPE_CHECKING:

0 commit comments

Comments
 (0)