-
Notifications
You must be signed in to change notification settings - Fork 17
Updated doc for src/datastore.jl #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jlim5634
wants to merge
27
commits into
JuliaData:master
Choose a base branch
from
jlim5634:setup-mempool-docs
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
d20d386
Updated doc comments above DRef, poolget, poolset
jlim5634 08a9f30
updated index.md and make.jl for MemPool
jlim5634 3a0c9c8
updated PR and fixed Julian's comments
jlim5634 4f7f172
Remove assets folder from PR
jlim5634 3beb1c5
Update docs/src/index.md
jlim5634 70f8b26
Update docs/src/index.md
jlim5634 5ca4a83
Update docs/src/index.md
jlim5634 7b35686
Update docs/src/index.md
jlim5634 0b04553
Update docs/src/index.md
jlim5634 c410bbc
Update docs/make.jl
jlim5634 dc40742
Update docs/make.jl
jlim5634 1543138
Update docs/src/index.md
jlim5634 7693c35
Update docs/src/index.md
jlim5634 f47fcee
Update docs/src/index.md
jlim5634 c6924fc
Update docs/src/index.md
jlim5634 f309923
Update docs/src/index.md
jlim5634 41ef1fa
Update docs/src/index.md
jlim5634 64cc98b
Update docs/src/index.md
jlim5634 69d23fe
Update docs/src/index.md
jlim5634 0605254
Update docs/src/index.md
jlim5634 aeed2d8
Update docs/src/index.md
jlim5634 8613fb9
Update docs/src/index.md
jlim5634 477211f
Update docs/src/index.md
jlim5634 9998c3a
Update index.md
jlim5634 dbc6e45
Fixed changes that were copied from Dagger
jlim5634 99fd558
Merge branch 'setup-mempool-docs' of https://github.com/jlim5634/MemP…
jlim5634 6b61b0b
Update docs/make.jl
jlim5634 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| [deps] | ||
| Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" | ||
| MemPool = "f9f48841-c794-520a-933b-121f7ba6ed94" | ||
|
|
||
| [compat] | ||
| Documenter = "1" | ||
| julia = "1.11" | ||
|
|
||
| [sources] | ||
| MemPool = {path = ".."} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| using MemPool | ||
| using Documenter | ||
| import Documenter.Remotes: GitHub | ||
|
|
||
| makedocs(; | ||
| modules = [MemPool], | ||
| authors = "JuliaParallel and contributors", | ||
| repo = GitHub("JuliaParallel", "MemPool.jl"), | ||
| sitename = "MemPool.jl", | ||
| format = Documenter.HTML(; | ||
| prettyurls = get(ENV, "CI", "false") == "true", | ||
| canonical = "https://juliaparallel.github.io/MemPool.jl", | ||
| /*assets = String["assets/favicon.ico"],*/ | ||
| ), | ||
| pages = [ | ||
| "Home" => "index.md", | ||
| "API Reference" => "api.md", | ||
| ], | ||
| warnonly = [:missing_docs] | ||
| ) | ||
|
|
||
| deploydocs(; | ||
| repo = "github.com/JuliaParallel/MemPool.jl", | ||
| devbranch = "master", | ||
| ) |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| # MemPool: A framework for out-of-core and parallel execution | ||
|
|
||
| MemPool.jl is both a framework and in-memory datababse for storing and accessing Julia | ||
| objects, where those objects may live on local or remote (distributed) Julia processes. | ||
| This allows for communicating about data stored on remote workers, and even data | ||
| potentially paged-out to disk, with a single simple reference (the `DRef`). | ||
|
|
||
| As a database, MemPool stores references to objects, and also acts as a "gatekeeper" | ||
| when those objects are later accessed through their reference. It can be configured to | ||
| page data out to disk, and then when data is accessed, will page out other data to make | ||
| space in RAM for this newly-loaded data. This allows MemPool to provide "out-of-core" | ||
| data management for libraries or applications - Dagger.jl is one such library that utilizes | ||
| MemPool for this purpose. | ||
|
|
||
|
|
||
| ### Remote Workers Caveat | ||
|
|
||
| When using MemPool with multiple workers, make sure that the workers are | ||
| initialized *before* importing MemPool. This ensures the package is loaded on all nodes: | ||
| ```julia-repl | ||
| julia> using Distributed | ||
|
|
||
| julia> addprocs(2) | ||
|
|
||
| julia> using MemPool | ||
| ``` | ||
|
|
||
| ----- | ||
|
|
||
| ## Quickstart: Data Management | ||
|
|
||
| For more details: [Data Management](@ref) | ||
|
|
||
| The core of MemPool revolves around the `DRef` (Distributed Reference). A `DRef` is a pointer | ||
| to data that might live in local RAM, remote RAM, or on disk. | ||
|
|
||
| ### Creating and retreiving data | ||
|
|
||
| Use `poolset` to register data with the pool and `poolget` to retrieve the actual value: | ||
|
|
||
| ```julia | ||
| using MemPool | ||
|
|
||
| A = rand(1000, 1000) | ||
| ref = poolset(A) | ||
|
|
||
| A_retrieved = poolget(ref) | ||
| ``` | ||
| This will track a large array (`A`) as a `DRef` using `poolset(A)`. | ||
| You can now safely clear the reference `A` (such as by `A = nothing`), | ||
| and later retrieve `A` from the `DRef` using `poolget(ref)`. | ||
|
|
||
|
|
||
| ### Manual Worker Assignment | ||
|
|
||
| You can force data to be stored on a specific worker by passing a worker ID to 'poolset': | ||
|
|
||
| ```julia | ||
| ref_w2 = poolset(rand(500), 2) | ||
| ``` | ||
|
|
||
| Note that if the current worker is not worker 2, this will make a copy of the array | ||
| from `rand(500)` on worker 2, and will not share memory with the original array. | ||
|
|
||
| ## Quickstart: Out-of-Core Configuration | ||
|
|
||
| MemPool provides helper functions to setup out-of-core data management for all | ||
| `DRef`s created with `poolset`. | ||
|
|
||
| ### Enabling the Disk Cache | ||
|
|
||
| ```julia | ||
| # 1. Define the configuration | ||
| cfg = MemPool.DiskCacheConfig( | ||
| toggle = true, | ||
| membound = 4 * 1024^3, # 4GB RAM Limit | ||
| diskpath = "/tmp/mempool_cache", # Disk storage location | ||
| allocator_type = "LRU" # Least Recently Used eviction | ||
| ) | ||
|
|
||
| # 2. Apply the configuration | ||
| MemPool.setup_global_device!(cfg) | ||
| ``` | ||
|
|
||
|
jlim5634 marked this conversation as resolved.
|
||
| When the amount of data tracked by MemPool exceeds `membound` in byte size, | ||
| MemPool will perform activities such as triggering a GC sweep, or swapping other | ||
| data to `diskpath` and removing that other data from memory. Note that `diskpath` | ||
| must be a directory - each piece of data gets it own file. | ||
|
|
||
| ### Memory Reservation Logic | ||
|
|
||
| MemPool includes an `ensure_memory_reserved` mechanism, which prevents memory | ||
| usage from exceeding a set global memory boundary. When a `poolset` is called, | ||
| the system checks if the OS is running tight on memory. If so, it will: | ||
| 1. Trigger a local GC. | ||
| 2. If memory is still tight, trigger a full `GC.gc(true)`. | ||
| 3. Finally, trigger a cluster-wide GC (`@everywhere GC.gc(true)`). | ||
|
|
||
| This mechanism is separate from the `DiskCacheConfig` logic, and can be configured by | ||
| tuning `MemPool.MEM_RESERVED[]` (this is specified in terms of the minimum number | ||
| of bytes that must be free for use by the OS). | ||
|
|
||
|
|
||
| ## Quickstart: Persistence & Migration | ||
|
|
||
| ### Migrating Data Between Workers | ||
|
|
||
| If necessary, data can be moved (copied) from one worker to another, without breaking | ||
| existing `DRef` references: | ||
|
|
||
| ```julia | ||
| # Move data from current owner to worker 3 | ||
| new_ref = MemPool.migrate!(ref, 3) | ||
| ``` | ||
|
|
||
|
jlim5634 marked this conversation as resolved.
|
||
| While it does return `new_ref` (a reference to the newly-copied data on worker 3), | ||
| accesses to `ref` will also automatically redirect to `new_ref` during `poolget`. | ||
| This can be very helpful to seamlessly migrate data when it would be more efficient | ||
| to read the data from another worker. Dagger.jl uses this mechanism for its streaming | ||
| API, which uses it to migrate streaming tasks to other workers while they run. | ||
|
|
||
| ### Managed File I/O | ||
|
|
||
| Treat files as managed `DRef` objects to avoid loading massive datasets into RAM all at once: | ||
|
|
||
| ```julia | ||
| #Create a lazy refence (handled by dagger) | ||
| f = Dagger.File("large_dataset.jls") | ||
|
|
||
| #When you fetch, MemPool manages the resulting memory | ||
| data = fetch(f) | ||
| ``` | ||
|
jlim5634 marked this conversation as resolved.
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.