Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/documentation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
uses: astral-sh/setup-uv@v5

- name: Set up Python 3.10
run: uv python install 3.10
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
uses: astral-sh/setup-uv@v5

- name: Set up Python
run: uv python install 3.11
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4
uses: astral-sh/setup-uv@v5

- name: Set up Python ${{ matrix.config.py }}
run: uv python install ${{ matrix.config.py }}
Expand All @@ -40,12 +40,14 @@ jobs:
run: uv run pytest tests/ --cov=imf_reader --cov-report=xml

- name: Use Codecov to track coverage
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./coverage.xml
fail_ci_if_error: true
verbose: true

- name: Check code style
run: uv run black src tests --check
run: |
uv run ruff format src tests --check
uv run ruff check src tests
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,4 +160,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/
.DS_Store
.DS_Store
/.claude
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# Changelog

## v1.5.0 (2026-04-29)
- The cache now uses OS-appropriate directories, segmented by package version. On Linux the
default is `~/.cache/imf_reader/<version>/`; on macOS `~/Library/Caches/imf_reader/<version>/`;
on Windows `%LOCALAPPDATA%\imf_reader\<version>\`. The version segment means upgrading the
package automatically starts with a clean cache.
- Users on Linux who want to reclaim disk space from the old hardcoded cache can run
`rm -rf ~/.cache/imf_reader/` after upgrading, or call `cache.set_cache_dir(...)` to keep
using the previous location.
- A new environment variable `IMF_READER_CACHE_DIR` lets you override the cache location
without changing code — useful on shared infrastructure or in CI.
- A new unified `imf_reader.cache` API replaces the scattered module-level helpers:
`clear_cache(scope=...)`, `set_cache_dir`, `reset_cache_dir`, `get_cache_dir`,
`enable_cache`, and `disable_cache`.
- WEO bulk SDMX downloads are now cached on disk and survive process restarts. A corrupted
zip is detected automatically and evicted; retrying the same call re-downloads cleanly
(`cache.BulkPayloadCorruptError` is raised so callers can handle it explicitly).
- SDR data (allocations and holdings, exchange rates, interest rates) now persists across
process restarts, matching the behaviour WEO users already had.
- `weo.clear_cache()` and `sdr.clear_cache()` continue to work and emit a `DeprecationWarning`
pointing at `cache.clear_cache()`. They will be removed in v2.0.

## v1.4.1 (2025-12-05)
- The new API implements a different scaling value behaviour. To preserve backwards compatibility, this new version
aligns with the old behaviour.
Expand Down
103 changes: 95 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,11 @@ print(weo.fetch_data.last_version_fetched)
```


Caching is used to avoid multiple requests to the IMF website for the same data and to enhance performance.
Caching using the LRU (Least Recently Used) algorithm approach and stores data in RAM. The cache is cleared when the program is terminated.
To clear the cache manually, use the `clear_cache` function.
#### Caching

```python
weo.clear_cache()
```
Caching is used to avoid multiple requests to the IMF website for the same data and to enhance
performance. See the [Caching](#caching) section below for full details on cache location,
environment variable overrides, and how to clear or redirect the cache.


For more advanced usage and tools for WEO data please use the [weo-reader package](https://github.com/epogrebnyak/weo-reader).
Expand Down Expand Up @@ -123,12 +121,101 @@ By default, the exchange rate is in USDs per 1 SDR. To get the exchange rate in
sdr.fetch_exchange_rates("USD")
```

To clear cached data use the `clear_cache` function.
To clear cached SDR data, see the [Caching](#caching) section below.


## Caching

`imf-reader` caches data to disk to avoid redundant requests and to survive process restarts.

### Cache location

The cache is stored in the platform-appropriate user cache directory, segmented by package
version so that upgrading the package starts with a clean cache automatically:

- **Linux:** `~/.cache/imf_reader/<version>/` (e.g. `~/.cache/imf_reader/1.5.0/`)
- **macOS:** `~/Library/Caches/imf_reader/<version>/`
- **Windows:** `%LOCALAPPDATA%\imf_reader\<version>\`

The version segment ensures that a package upgrade never silently serves data that was shaped
by an older version of the code.

### Overriding the cache directory

Set the `IMF_READER_CACHE_DIR` environment variable before importing the package:

```bash
export IMF_READER_CACHE_DIR=/path/to/my/cache
```

Or redirect programmatically at runtime:

```python
sdr.clear_cache()
from imf_reader import cache

cache.set_cache_dir("/path/to/my/cache")
cache.get_cache_dir() # inspect the current path
cache.reset_cache_dir() # restore to the default platformdirs path
```

### Clearing the cache

The canonical way to clear all cached data:

```python
from imf_reader import cache

cache.clear_cache() # clear everything
cache.clear_cache(scope="weo") # WEO data only
cache.clear_cache(scope="sdr") # SDR data only
cache.clear_cache(scope="http") # HTTP-layer cache only
cache.clear_cache(scope="all") # equivalent to no scope argument
```

A scoped clear only touches the named scope: `cache.clear_cache(scope="sdr")`
removes SDR data and leaves the WEO and HTTP caches intact. The HTTP-layer
clear additionally closes the active cached HTTP session so subsequent calls
hit the network rather than reusing a dropped on-disk SQLite cache.

The legacy module-level helpers still work but emit a `DeprecationWarning` pointing at
`cache.clear_cache()`. They will be removed in v2.0:

```python
from imf_reader import weo, sdr

weo.clear_cache() # deprecated — use cache.clear_cache(scope="weo")
sdr.clear_cache() # deprecated — use cache.clear_cache(scope="sdr")
```

### Disabling the cache for development

Disable the cache for the lifetime of the current process. While disabled,
no payloads are written under the cache directory: bulk downloads land in a
system temp file used only for the current call, and dataframe results are
returned without being persisted.

```python
from imf_reader import cache

cache.disable_cache()
# ... work without caching ...
cache.enable_cache()
```

### Corrupted bulk downloads

If a WEO bulk SDMX download is corrupted, it is automatically evicted from the cache and a
`cache.BulkPayloadCorruptError` is raised. Re-running the same call will trigger a fresh
download:

```python
from imf_reader import cache

try:
df = weo.fetch_data()
except cache.BulkPayloadCorruptError:
df = weo.fetch_data()
```

## Contributing

Expand Down
9 changes: 6 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "imf-reader"
version = "1.4.1"
version = "1.5.0"
description = "A package to access imf data"
authors = [{ name = "The ONE Campaign" }]
license = { text = "MIT" }
Expand All @@ -11,13 +11,16 @@ dependencies = [
"requests>=2.32.1",
"chardet>=5.2.0",
"beautifulsoup4>=4.12.3",
"diskcache>=5.6.0",
"filelock>=3.20",
"platformdirs>=4.5",
"pyarrow>=14.0",
"requests-cache>=1.2",
]

[project.optional-dependencies]
dev = [
"pytest>=8.2.0",
"black>=24.4.2",
"ruff>=0.6",
"sphinx>=7.3.7",
"myst-nb>=1.1.0",
"autoapi>=2.0.1",
Expand Down
2 changes: 2 additions & 0 deletions src/imf_reader/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
from importlib.metadata import version

__version__ = version("imf_reader")

from imf_reader import cache as cache # noqa: F401
127 changes: 127 additions & 0 deletions src/imf_reader/cache/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
"""Public API for the imf_reader cache subpackage.

Provides a unified interface for cache management across all imf_reader
sub-modules (WEO, SDR).

Usage::

import imf_reader.cache as cache

cache.clear_cache() # clear everything
cache.clear_cache(scope="weo") # clear only WEO sublayers
cache.set_cache_dir("/tmp/my_cache")
cache.reset_cache_dir()
print(cache.get_cache_dir())
cache.disable_cache()
cache.enable_cache()
"""

import shutil
from typing import Literal

from imf_reader.config import BulkPayloadCorruptError as BulkPayloadCorruptError
from imf_reader.config import logger
from imf_reader.cache.config import (
_clear_listeners,
_set_enabled,
get_active_root,
get_bulk_cache_dir as get_bulk_cache_dir,
get_dataframe_cache_dir as get_dataframe_cache_dir,
get_http_cache_path as get_http_cache_path,
reset_cache_dir as reset_cache_dir,
set_cache_dir as set_cache_dir,
)

# Expose get_cache_dir as the public name (I3 — mirrors oda_reader._cache.config).
get_cache_dir = get_active_root

_SCOPE_TO_SUBLAYERS: dict[str, tuple[str, ...]] = {
"weo": ("weo_sdmx", "weo_sdmx_parsed", "weo_api"),
"sdr": ("sdr",),
"http": ("http",),
}


def clear_cache(scope: Literal["all", "weo", "sdr", "http"] = "all") -> None:
"""Clear cached data for the named scope.

Args:
scope: Which sublayers to remove.

- ``"all"`` (default) — remove every immediate subdir of the cache root.
Uses a filesystem walk so future sublayers are automatically included
(avoids the F1 failure mode of silently leaking newly-added sublayers).
- ``"weo"`` — remove ``weo_sdmx``, ``weo_sdmx_parsed``, and ``weo_api``.
- ``"sdr"`` — remove the ``sdr`` sublayer.
- ``"http"`` — remove the ``http`` sublayer.
"""
root = get_active_root()
target_sublayers: set[str] | None = (
None if scope == "all" else set(_SCOPE_TO_SUBLAYERS[scope])
)

# Close the HTTP session before rmtree-ing its SQLite file: on Windows the
# open file would block deletion, and on Unix a stale connection can keep
# serving rows from the deleted DB until the process exits.
if target_sublayers is None or "http" in target_sublayers:
from imf_reader.cache import http as _http

_http._on_http_clear()

if root.exists():
if scope == "all":
# Walk every immediate subdir — no hardcoded list (I5 / decision 17).
for child in root.iterdir():
if child.is_dir():
shutil.rmtree(child, ignore_errors=True)
else:
for sublayer in _SCOPE_TO_SUBLAYERS[scope]:
path = root / sublayer
if path.exists():
shutil.rmtree(path, ignore_errors=True)

# Fire listeners (even when disk was empty) so in-memory state is reset —
# but only for sublayers in scope, so a scope='sdr' clear can't wipe weo_api.
_fire_clear_listeners(target_sublayers)


def _fire_clear_listeners(target_sublayers: set[str] | None) -> None:
for sublayer, cb in list(_clear_listeners):
if target_sublayers is not None and sublayer not in target_sublayers:
continue
try:
cb()
except Exception as exc:
logger.warning("clear-listener %r raised: %s", cb, exc)


def enable_cache() -> None:
"""Re-enable caching after a previous disable_cache() call.

Has no effect if caching is already enabled.
"""
_set_enabled(True)


def disable_cache() -> None:
"""Disable caching for this process.

All decorated functions bypass both the read and write cache paths and call
through to the underlying function directly. Has no effect on already-cached
data on disk.
"""
_set_enabled(False)


__all__ = [
"BulkPayloadCorruptError",
"clear_cache",
"disable_cache",
"enable_cache",
"get_bulk_cache_dir",
"get_cache_dir",
"get_dataframe_cache_dir",
"get_http_cache_path",
"reset_cache_dir",
"set_cache_dir",
]
Loading
Loading