Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d20d386
Updated doc comments above DRef, poolget, poolset
jlim5634 Apr 11, 2026
08a9f30
updated index.md and make.jl for MemPool
jlim5634 Apr 11, 2026
3a0c9c8
updated PR and fixed Julian's comments
jlim5634 Apr 12, 2026
4f7f172
Remove assets folder from PR
jlim5634 Apr 12, 2026
3beb1c5
Update docs/src/index.md
jlim5634 Apr 24, 2026
70f8b26
Update docs/src/index.md
jlim5634 Apr 24, 2026
5ca4a83
Update docs/src/index.md
jlim5634 Apr 24, 2026
7b35686
Update docs/src/index.md
jlim5634 Apr 24, 2026
0b04553
Update docs/src/index.md
jlim5634 Apr 24, 2026
c410bbc
Update docs/make.jl
jlim5634 Apr 24, 2026
dc40742
Update docs/make.jl
jlim5634 Apr 24, 2026
1543138
Update docs/src/index.md
jlim5634 Apr 24, 2026
7693c35
Update docs/src/index.md
jlim5634 Apr 24, 2026
f47fcee
Update docs/src/index.md
jlim5634 Apr 24, 2026
c6924fc
Update docs/src/index.md
jlim5634 Apr 24, 2026
f309923
Update docs/src/index.md
jlim5634 Apr 24, 2026
41ef1fa
Update docs/src/index.md
jlim5634 Apr 24, 2026
64cc98b
Update docs/src/index.md
jlim5634 Apr 24, 2026
69d23fe
Update docs/src/index.md
jlim5634 Apr 24, 2026
0605254
Update docs/src/index.md
jlim5634 Apr 24, 2026
aeed2d8
Update docs/src/index.md
jlim5634 Apr 24, 2026
8613fb9
Update docs/src/index.md
jlim5634 Apr 24, 2026
477211f
Update docs/src/index.md
jlim5634 Apr 24, 2026
9998c3a
Update index.md
jlim5634 Apr 24, 2026
dbc6e45
Fixed changes that were copied from Dagger
jlim5634 Apr 24, 2026
99fd558
Merge branch 'setup-mempool-docs' of https://github.com/jlim5634/MemP…
jlim5634 Apr 24, 2026
6b61b0b
Update docs/make.jl
jlim5634 Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MemPool = "f9f48841-c794-520a-933b-121f7ba6ed94"

[compat]
Documenter = "1"
julia = "1.11"

[sources]
MemPool = {path = ".."}
25 changes: 25 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
using MemPool
using Documenter
import Documenter.Remotes: GitHub

makedocs(;
modules = [MemPool],
authors = "JuliaParallel and contributors",
repo = GitHub("JuliaParallel", "MemPool.jl"),
sitename = "MemPool.jl",
format = Documenter.HTML(;
prettyurls = get(ENV, "CI", "false") == "true",
canonical = "https://juliaparallel.github.io/MemPool.jl",
/*assets = String["assets/favicon.ico"],*/
),
pages = [
"Home" => "index.md",
"API Reference" => "api.md",
],
warnonly = [:missing_docs]
)

deploydocs(;
repo = "github.com/JuliaParallel/MemPool.jl",
devbranch = "master",
)
Empty file added docs/src/api.md
Empty file.
132 changes: 132 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# MemPool: A framework for out-of-core and parallel execution

MemPool.jl is both a framework and in-memory datababse for storing and accessing Julia
objects, where those objects may live on local or remote (distributed) Julia processes.
This allows for communicating about data stored on remote workers, and even data
potentially paged-out to disk, with a single simple reference (the `DRef`).

As a database, MemPool stores references to objects, and also acts as a "gatekeeper"
when those objects are later accessed through their reference. It can be configured to
page data out to disk, and then when data is accessed, will page out other data to make
space in RAM for this newly-loaded data. This allows MemPool to provide "out-of-core"
data management for libraries or applications - Dagger.jl is one such library that utilizes
MemPool for this purpose.


### Remote Workers Caveat

When using MemPool with multiple workers, make sure that the workers are
initialized *before* importing MemPool. This ensures the package is loaded on all nodes:
```julia-repl
julia> using Distributed

julia> addprocs(2)

julia> using MemPool
```

-----

## Quickstart: Data Management

For more details: [Data Management](@ref)

The core of MemPool revolves around the `DRef` (Distributed Reference). A `DRef` is a pointer
to data that might live in local RAM, remote RAM, or on disk.

### Creating and retreiving data

Use `poolset` to register data with the pool and `poolget` to retrieve the actual value:

```julia
using MemPool

A = rand(1000, 1000)
ref = poolset(A)

A_retrieved = poolget(ref)
```
This will track a large array (`A`) as a `DRef` using `poolset(A)`.
You can now safely clear the reference `A` (such as by `A = nothing`),
and later retrieve `A` from the `DRef` using `poolget(ref)`.


### Manual Worker Assignment

You can force data to be stored on a specific worker by passing a worker ID to 'poolset':

```julia
ref_w2 = poolset(rand(500), 2)
```

Comment thread
jlim5634 marked this conversation as resolved.
Note that if the current worker is not worker 2, this will make a copy of the array
from `rand(500)` on worker 2, and will not share memory with the original array.

## Quickstart: Out-of-Core Configuration

MemPool provides helper functions to setup out-of-core data management for all
`DRef`s created with `poolset`.

### Enabling the Disk Cache

```julia
# 1. Define the configuration
cfg = MemPool.DiskCacheConfig(
toggle = true,
membound = 4 * 1024^3, # 4GB RAM Limit
diskpath = "/tmp/mempool_cache", # Disk storage location
allocator_type = "LRU" # Least Recently Used eviction
)

# 2. Apply the configuration
MemPool.setup_global_device!(cfg)
```

Comment thread
jlim5634 marked this conversation as resolved.
When the amount of data tracked by MemPool exceeds `membound` in byte size,
MemPool will perform activities such as triggering a GC sweep, or swapping other
data to `diskpath` and removing that other data from memory. Note that `diskpath`
must be a directory - each piece of data gets it own file.

### Memory Reservation Logic

MemPool includes an `ensure_memory_reserved` mechanism, which prevents memory
usage from exceeding a set global memory boundary. When a `poolset` is called,
the system checks if the OS is running tight on memory. If so, it will:
1. Trigger a local GC.
2. If memory is still tight, trigger a full `GC.gc(true)`.
3. Finally, trigger a cluster-wide GC (`@everywhere GC.gc(true)`).

This mechanism is separate from the `DiskCacheConfig` logic, and can be configured by
tuning `MemPool.MEM_RESERVED[]` (this is specified in terms of the minimum number
of bytes that must be free for use by the OS).


## Quickstart: Persistence & Migration

### Migrating Data Between Workers

If necessary, data can be moved (copied) from one worker to another, without breaking
existing `DRef` references:

```julia
# Move data from current owner to worker 3
new_ref = MemPool.migrate!(ref, 3)
```

Comment thread
jlim5634 marked this conversation as resolved.
While it does return `new_ref` (a reference to the newly-copied data on worker 3),
accesses to `ref` will also automatically redirect to `new_ref` during `poolget`.
This can be very helpful to seamlessly migrate data when it would be more efficient
to read the data from another worker. Dagger.jl uses this mechanism for its streaming
API, which uses it to migrate streaming tasks to other workers while they run.

### Managed File I/O

Treat files as managed `DRef` objects to avoid loading massive datasets into RAM all at once:

```julia
#Create a lazy refence (handled by dagger)
f = Dagger.File("large_dataset.jls")

#When you fetch, MemPool manages the resulting memory
data = fetch(f)
```
Comment thread
jlim5634 marked this conversation as resolved.
19 changes: 19 additions & 0 deletions src/datastore.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ else
import Distributed: ClusterSerializer, worker_id_from_socket
end

"""
DRef(owner::Int, id::Int, size::UInt)

A Distributed Reference (DRef) which acts as a handle to store data in MemPool.
It tracks which worker 'owner' holds the data and a unique 'id' assigned to the data.
'size' stores an aproximation of the in-memory byte size of the object.
"""
mutable struct DRef
owner::Int
id::Int
Expand Down Expand Up @@ -451,6 +458,12 @@ function ensure_memory_reserved(size::Integer=0; max_sweeps::Integer=MEM_RESERVE
end
end

"""
poolset(x, [pid]; kwargs...) -> DRef

Stores the value 'x' into the memory pool on worker 'pid' (defaults to myid())
and returns a 'DRef' handle that can be used to later access the value.
"""
function poolset(@nospecialize(x), pid=myid(); size=approx_size(x),
retain=false, restore=false,
device=GLOBAL_DEVICE[], leaf_device=initial_leaf_device(device),
Expand Down Expand Up @@ -523,6 +536,12 @@ function forwardkeyerror(f)
end
end

"""
poolget(ref::DRef)

Retrieves the data value referenced by 'ref'. If the data is remote or
on disk, MemPool handles the retrieval automatically.
"""
function poolget(ref::DRef)
DEBUG_REFCOUNTING[] && _enqueue_work(Core.print, "?? (", ref.owner, ", ", ref.id, ") at ", myid(), "\n")
return access_ref(identity, ref)
Expand Down
Loading