ryanpeach · ryanpeach · Nov 7, 2025 · Aug 3, 2025 · Aug 3, 2025 · Aug 3, 2025
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -4,6 +4,8 @@ Here are some tasks to complete before merging this PR:
 
 # Styleguide
 
+- [ ] Did you update the version in Cargo.toml?
+
 ## Jupyter
 
 - [ ] Add references to the bottom of the notebook.
@@ -28,9 +30,10 @@ if __name__ == '__main__':
     foo()
 ```
 
-## Logseq
+## Marksman
 
-- [ ] Make logseq notes and flash cards.
-- [ ] Use singular nouns for tags.
-- [ ] Use `-` or `_` in filenames instead of spaces.
-- [ ] Use aliases for spaces and plurals.
+- [ ] Make marksman notes and anki flash cards.
+- [ ] Use singular nouns for filenames.
+- [ ] Use `-` in filenames instead of spaces.
+- [ ] For any new notes, you need to go back in other notes and link to them using project search.
+- [ ] Put sources on each flashcard. Mark AI as the source if you used AI to generate the flashcard. These can be revisited later. Mark yourself as the source if you wrote the flashcard yourself.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -10,12 +10,12 @@ jobs:
     steps:
       - uses: actions/checkout@v4
         with:
-          submodules: 'true'
+          submodules: "true"
 
       - name: Install Python
         uses: actions/setup-python@v4
         with:
-          python-version: '3.11'
+          python-version: "3.11"
 
       - name: Install Just
         uses: extractions/setup-just@v2

diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
+anki.apkg
 notes/logseq/bak
 notes/logseq/.recycle
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,5 +1,5 @@
 repos:
--   repo: https://github.com/pre-commit/pre-commit-hooks
+  - repo: https://github.com/pre-commit/pre-commit-hooks
     rev: v5.0.0
     hooks:
       - id: end-of-file-fixer
@@ -11,38 +11,38 @@ repos:
       - id: detect-private-key
       - id: mixed-line-ending
         args: ["--fix=lf"]
-- repo: https://github.com/Lucas-C/pre-commit-hooks
-  rev: v1.4.1
-  hooks:
-    - id: remove-tabs
-    - id: remove-crlf
-- repo: https://github.com/astral-sh/ruff-pre-commit
-  # Ruff version.
-  rev: v0.9.6
-  hooks:
-    # Run the linter.
-    - id: ruff
-      types_or: [ python, pyi, jupyter ]
-      args: [ --fix ]
-    # Run the formatter.
-    - id: ruff-format
-      types_or: [ python, pyi, jupyter ]
-- repo: https://github.com/RobertCraigie/pyright-python
-  rev: v1.1.394
-  hooks:
-  - id: pyright
-- repo: https://github.com/mwouts/jupytext
-  rev: v1.16.7
-  hooks:
-  - id: jupytext
-    args: [--sync]
-- repo: https://github.com/ryanpeach/mdlinker
-  rev: v1.6.1
-  hooks:
-    - id: enforce-ascii
-      files: notes/pages/.*\.md
-    - id: mdlinker
-      files: notes/pages/.*\.md
-      args:
-      - "--fix"
-      - "--allow-dirty"
+  - repo: https://github.com/Lucas-C/pre-commit-hooks
+    rev: v1.4.1
+    hooks:
+      - id: remove-tabs
+      - id: remove-crlf
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    # Ruff version.
+    rev: v0.9.6
+    hooks:
+      # Run the linter.
+      - id: ruff
+        types_or: [python, pyi, jupyter]
+        args: [--fix]
+      # Run the formatter.
+      - id: ruff-format
+        types_or: [python, pyi, jupyter]
+  - repo: https://github.com/RobertCraigie/pyright-python
+    rev: v1.1.394
+    hooks:
+      - id: pyright
+  - repo: https://github.com/mwouts/jupytext
+    rev: v1.16.7
+    hooks:
+      - id: jupytext
+        args: [--sync]
+  # - repo: https://github.com/ryanpeach/mdlinker
+  #   rev: v1.7.2
+  #   hooks:
+  #     - id: enforce-ascii
+  #       files: continuing_education/.*\.md
+  #     - id: mdlinker
+  #       files: continuing_education/.*\.md
+  #       args:
+  #         - "--fix"
+  #         - "--allow-dirty"
diff --git a/Justfile b/Justfile
@@ -5,3 +5,6 @@ add path:
 # Sync all notebooks
 sync:
   uv run jupytext --sync **/*.ipynb
+
+anki:
+  mdanki continuing_education anki.apkg
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ If you then wrap cells (the code you run) in `if __name__ == "__main__"` you als
 
 ## Logseq
 
-I use [logseq](https://logseq.com/) to manage my [./notes](./notes) and [flashcards](https://hub.logseq.com/use-cases/1Sr4awszMQzD4GM5KvWim7/how-to-quickly-create-spaced-repetition-flashcards-in-logseq-using-notepad-and-excel/jxPxEdkM4BBhLfrz9PfjBr). This uses the [zettelkasten](https://zettelkasten.de/overview/) method to create a [knowledge graph](https://en.wikipedia.org/wiki/Knowledge_graph), which is perfect for studying.
+I use [marksman](https://github.com/artempyanykh/marksman) to manage my markdown notes and link them to each other. I then use [mdanki](https://github.com/ashlinchak/mdanki) to convert them to flashcards.
 
 # Styleguide
 
@@ -52,7 +52,7 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
 * Reinforcement Learning
     * Value Based Methods - I'm pretty much up to date with these methods, but might as well implement them. I may go into less explanation though.
         * 📖📓💻 [$TD(\lambda)$](https://web.stanford.edu/class/cs234/notes/cs234-notes7.pdf)
-        * 📖📓💻❗ [Deep Q Learning](https://arxiv.org/abs/1312.5602) 
+        * 📖📓💻❗ [Deep Q Learning](https://arxiv.org/abs/1312.5602)
           * <https://lightning.ai/docs/pytorch/LTS/notebooks/lightning_examples/reinforce-learning-DQN.html>
         * 📖❗[Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)
         * 📖❗[Double Q Learning](https://arxiv.org/abs/1509.06461)
@@ -85,21 +85,21 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
          * [ ] [Go-Explore](https://www.nature.com/articles/s41586-020-03157-9)
          * [ ] [NoisyNet](https://openreview.net/pdf?id=rywHCPkAW)
          * [ ] [DQN-PixelCNN](https://arxiv.org/abs/1606.01868)
-         * [ ] [#Exploration](http://papers.neurips.cc/paper/6868-exploration-a-study-of-count-based-exploration-for-deep-reinforcement-learning.pdf) 
-         * [ ] [EX2](https://papers.nips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf) 
-         * [ ] [ICM](https://arxiv.org/abs/1705.05363) 
-         * [ ] [RND](https://arxiv.org/abs/1810.12894) 
-         * [ ] [NGU](https://arxiv.org/abs/2002.06038) 
-         * [ ] [Agent57](https://arxiv.org/abs/2003.13350) 
-         * [ ] [VIME](https://arxiv.org/abs/1605.09674) 
-         * [ ] [EMI](https://openreview.net/forum?id=H1exf64KwH) 
-         * [ ] [DIYAN](https://arxiv.org/abs/1802.06070) 
-         * [ ] [SAC](https://arxiv.org/abs/1801.01290) 
-         * [ ] [BootstrappedDQN](https://arxiv.org/abs/1602.04621) 
-         * [ ] [PSRL](https://arxiv.org/pdf/1306.0940.pdf) 
-         * [ ] [HER](https://arxiv.org/pdf/1707.01495.pdf) 
-         * [ ] [DQfD](https://arxiv.org/abs/1704.03732) 
-         * [ ] [R2D3](https://arxiv.org/abs/1909.01387) 
+         * [ ] [#Exploration](http://papers.neurips.cc/paper/6868-exploration-a-study-of-count-based-exploration-for-deep-reinforcement-learning.pdf)
+         * [ ] [EX2](https://papers.nips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf)
+         * [ ] [ICM](https://arxiv.org/abs/1705.05363)
+         * [ ] [RND](https://arxiv.org/abs/1810.12894)
+         * [ ] [NGU](https://arxiv.org/abs/2002.06038)
+         * [ ] [Agent57](https://arxiv.org/abs/2003.13350)
+         * [ ] [VIME](https://arxiv.org/abs/1605.09674)
+         * [ ] [EMI](https://openreview.net/forum?id=H1exf64KwH)
+         * [ ] [DIYAN](https://arxiv.org/abs/1802.06070)
+         * [ ] [SAC](https://arxiv.org/abs/1801.01290)
+         * [ ] [BootstrappedDQN](https://arxiv.org/abs/1602.04621)
+         * [ ] [PSRL](https://arxiv.org/pdf/1306.0940.pdf)
+         * [ ] [HER](https://arxiv.org/pdf/1707.01495.pdf)
+         * [ ] [DQfD](https://arxiv.org/abs/1704.03732)
+         * [ ] [R2D3](https://arxiv.org/abs/1909.01387)
     * Multi Agent RL
         * [ ] [Emergent Communication through Negotiation](https://arxiv.org/abs/1804.03980)
         * [ ] Warp Drive
@@ -114,11 +114,11 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
 * Transformers
     * [ ] [Tokenization](https://huggingface.co/learn/nlp-course/en/chapter6/1?fw=pt)
     * [ ] [Word Embeddings](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html)
-    * 📖❗[Transformers](https://arxiv.org/abs/1706.03762) 
+    * 📖❗[Transformers](https://arxiv.org/abs/1706.03762)
       * <https://pytorch.org/tutorials/beginner/transformer_tutorial.html>
       * <https://lightning.ai/docs/pytorch/stable/notebooks/course_UvA-DL/05-transformers-and-MH-attention.html>
-    * 📖❗[BERT](https://arxiv.org/abs/1810.04805) 
-    * [ ]❗[Sentence-BERT](https://arxiv.org/pdf/1908.10084) 
+    * 📖❗[BERT](https://arxiv.org/abs/1810.04805)
+    * [ ]❗[Sentence-BERT](https://arxiv.org/pdf/1908.10084)
     * [ ] [Fine Tuning](https://huggingface.co/learn/nlp-course/en/chapter3/1?fw=pt)
     * [ ] [RLHF](https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo)
     * [ ] [Direct Preference Optimization](https://arxiv.org/pdf/2305.18290)
@@ -141,7 +141,7 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
     * [Survey on Graph RAG](https://arxiv.org/abs/2408.08921)
 * [ ] Diffusion Models
   * <https://lightning.ai/lightning-community-labs/studios/build-diffusion-models-with-pytorch-lightning-hf-diffusers>
-* [ ]❗Graph Neural Networks (GNN) 
+* [ ]❗Graph Neural Networks (GNN)
   * <https://lightning.ai/docs/pytorch/stable/notebooks/course_UvA-DL/06-graph-neural-networks.html>
 * Cognitive Science
    * [ ] [Hopfield Network](https://www.youtube.com/watch?v=1WPJdAW-sFo)

diff --git a/continuing_education/math/README.md b/continuing_education/math/README.md
diff --git a/continuing_education/math/basics/logarithm.md b/continuing_education/math/basics/logarithm.md
@@ -0,0 +1,4 @@
+## What are the identities of [[logarithm]]s?
+    - They turn multiplication into addition: $\log(a \cdot b) = \log(a) + \log(b)$
+    - They turn division into subtraction: $\log(a / b) = \log(a) - \log(b)$
+    - They turn exponentiation into multiplication: $\log(a^b) = b \cdot \log(a)$
diff --git a/continuing_education/math/calculus/gradient.md b/continuing_education/math/calculus/gradient.md
@@ -0,0 +1,9 @@
+# [[Gradient]]
+
+## What is the [[gradient]]?
+
+A vector of single partial derivatives in each dimension of the input space.
+
+## What does the symbol $\nabla$ mean?
+
+The [[gradient]]
diff --git a/continuing_education/math/calculus/henessian.md b/continuing_education/math/calculus/henessian.md
@@ -0,0 +1,9 @@
+# [[Henessian]]
+
+## What is a [[Henessian]]?
+
+A second order [[gradient]] that produces a matrix of second derivatives.
+
+## What is the difference between a [[Henessian]] and a [[Laplacian]]?
+
+- Henessians produce a matrix of second derivatives, while Laplacians produce a scalar. Both are second order derivatives.
diff --git a/continuing_education/math/calculus/laplacian.md b/continuing_education/math/calculus/laplacian.md
@@ -0,0 +1,9 @@
+## [[Laplacian]]
+
+## What is the [[Laplacian]]
+
+The second order [[gradient]]. Produces a scalar.
+
+## What does the symbol $\Delta$ mean?
+
+The [[Laplacian]] operator.
diff --git a/continuing_education/math/classic_ml/notes/bagging.md b/continuing_education/math/classic_ml/notes/bagging.md
@@ -0,0 +1,7 @@
+# Bagging
+
+## What is Bagging?
+
+Bootstrap aggregating, also called bagging (from bootstrap aggregating) or bootstrapping, is a machine learning (ML) ensemble meta-algorithm designed to improve the stability and accuracy of ML classification and regression algorithms. It also reduces variance and overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the ensemble averaging approach.
+
+https://en.wikipedia.org/wiki/Bootstrap_aggregating
diff --git a/continuing_education/math/statistics/log_probability/README.md b/continuing_education/math/statistics/log_probability/README.md
@@ -0,0 +1,9 @@
+# Log Probability
+
+## Why would you use a negative [[continuing_education/math/statistics/log_probability/README|log-probability]] in a loss function?
+
+It's infinity at 0 and 0 at 1, which means at high confidence in something you get a low loss approaching 0, and at low confidence you get a high loss approaching infinity. It gives a strong [[gradient]] signal to the network to update its parameters.
+
+## What is the equation for a sigmoid function?
+
+$f(x) = \frac{1}{1 + e^{-x}}$
diff --git a/continuing_education/math/__init__.py → ...th/statistics/log_probability/__init__.py b/continuing_education/math/__init__.py → ...th/statistics/log_probability/__init__.py
diff --git a/...uing_education/math/log_probability.ipynb → ...ics/log_probability/log_probability.ipynb b/...uing_education/math/log_probability.ipynb → ...ics/log_probability/log_probability.ipynb
diff --git a/continuing_education/math/log_probability.py → ...istics/log_probability/log_probability.py b/continuing_education/math/log_probability.py → ...istics/log_probability/log_probability.py
diff --git a/continuing_education/math/statistics/notes/bootstrapping.md b/continuing_education/math/statistics/notes/bootstrapping.md
@@ -0,0 +1,13 @@
+# Bootstrapping
+
+## What is Bootstrapping?
+
+Bootstrapping is a procedure for estimating the distribution of an estimator by resampling (often with replacement) one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
+
+https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
+
+## What is the difference between bootstrapping and bagging?
+
+Bagging is a specific type of bootstrapping that involves creating multiple subsets of the training data by sampling with replacement, training a model on each subset, and then averaging the predictions of the models to improve accuracy and reduce variance. Bootstrapping, in general, refers to the process of resampling data to estimate statistics or build confidence intervals.
+
+source: AI
diff --git a/continuing_education/policy_gradient_methods/README.md b/continuing_education/policy_gradient_methods/README.md
@@ -0,0 +1,2 @@
+
+# [[Policy]]-[[gradient]] methods
diff --git a/continuing_education/policy_gradient_methods/actor_critic/README.md b/continuing_education/policy_gradient_methods/actor_critic/README.md
@@ -0,0 +1,51 @@
+# Actor Critic
+
+## What is the [[value]] function used for in [[continuing_education/value_based_methods/dqn/README|actor critic]] methods?
+    - The [[value]] function is used to estimate the *average* expected return from a given state.
+    - It could theoretically be a Q-function, but in practice, it is often a state-value function using [[TD]] error.
+## What is the [[advantage]] function used for in [[continuing_education/value_based_methods/dqn/README|actor critic]] methods?
+    - The difference between the expected reward from a state-action pair (Q) and the average expected reward from just the state (V).
+        - $A(s, a) = Q(s, a) - V(s)$
+    - It is used to normalize the [[continuing_education/policy_gradient_methods/README|policy-gradient]], as well as to push the [[continuing_education/policy_gradient_methods/README|policy-gradient]] towards actions that are better than average and away from actions that are worse than average.
+## What is the training loop for a2c? #todo
+## What is the difference between a2c and a3c?
+
+## What is the [[value]] function used for in [[actor critic]] methods?
+
+- The [[value]] function is used by the critic to evaluate the expected return from a given state or state-action pair.
+- It provides a baseline for the [[policy gradient]] updates, helping to reduce variance in the learning process. It does this by providing an estimate of the expected return from a state, which can be subtracted from the actual return to compute the state [[advantage]] function.
+- The [[value]] function is used to estimate the *average* expected return from a given state.
+- It could theoretically be a Q-function, but in practice, it is often a [[value]] function.
+
+source: Myself
+
+## What is the [[advantage]] function used for in [[actor critic]] methods?
+
+- The difference between the expected reward from a state-action pair (Q) and the average expected reward from just the state (V).
+    - $A(s, a) = Q(s, a) - V(s)$
+- It is normalized making the [[policy gradient]] normalized as well (providing updates around +-0)
+- It pushes the [[policy gradient]] towards actions that are better than average and away from actions that are worse than average.
+
+source: Myself
+
+## What kinds of normalization are used in [[actor critic]] methods? What are their effects?
+
+
+
+## What is the training loop for A2C?
+
+## What is the training loop for A3c?
+
+## What does A2C stand for?
+
+A2C stands for Advantage Actor-Critic.
+
+## What does A3C stand for?
+
+A3C stands for Asynchronous Advantage Actor-Critic.
+
+## What is the difference between A2C and A3C?
+
+A3C is parallel and asynchronous, meaning it uses multiple agents to explore the environment in parallel and updates the model asynchronously. A2C is synchronous, meaning it uses a single agent to explore the environment and updates the model synchronously.
+
+source: https://en.wikipedia.org/wiki/Actor-critic_algorithm#Variants
diff --git a/notes/pages/advantage.md → ...olicy_gradient_methods/notes/advantage.md b/notes/pages/advantage.md → ...olicy_gradient_methods/notes/advantage.md
@@ -1,13 +1,9 @@
----
-alias: advantage function
----
-
-- What shape is the [[advantage]] function? #card
+## What shape is the [[advantage]] function?
     - $A(s,a) \in \mathbb{R}^{|A|}$, where $|A|$ is the number of actions.
     - It works in a fixed integer number of actions.
     - Same shape as the Q-value function.
-- What is the intuition behind the [[advantage]] function? #card
+## What is the intuition behind the [[advantage]] function?
     - The [[advantage]] function is a measure of how much better an action is compared to the average action in a given state.
     - Learning relative [[advantage]] is easier and has less variance than learning absolute values. [[Advantage]] is more relevant to decision making via argmax than absolute values.
-- Define the [[advantage]] function in terms of the Q-value function and the value function. #card
+## Define the [[advantage]] function in terms of the Q-value function and the [[value]] function.
   - $A(s,a) = Q(s,a) - V(s)$
diff --git a/notes/pages/policy.md → ...n/policy_gradient_methods/notes/policy.md b/notes/pages/policy.md → ...n/policy_gradient_methods/notes/policy.md