Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Here are some tasks to complete before merging this PR:

# Styleguide

- [ ] Did you update the version in Cargo.toml?

## Jupyter

- [ ] Add references to the bottom of the notebook.
Expand All @@ -28,9 +30,10 @@ if __name__ == '__main__':
foo()
```

## Logseq
## Marksman

- [ ] Make logseq notes and flash cards.
- [ ] Use singular nouns for tags.
- [ ] Use `-` or `_` in filenames instead of spaces.
- [ ] Use aliases for spaces and plurals.
- [ ] Make marksman notes and anki flash cards.
- [ ] Use singular nouns for filenames.
- [ ] Use `-` in filenames instead of spaces.
- [ ] For any new notes, you need to go back in other notes and link to them using project search.
- [ ] Put sources on each flashcard. Mark AI as the source if you used AI to generate the flashcard. These can be revisited later. Mark yourself as the source if you wrote the flashcard yourself.
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
submodules: 'true'
submodules: "true"

- name: Install Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
python-version: "3.11"

- name: Install Just
uses: extractions/setup-just@v2
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
anki.apkg
notes/logseq/bak
notes/logseq/.recycle

Expand Down
72 changes: 36 additions & 36 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: end-of-file-fixer
Expand All @@ -11,38 +11,38 @@ repos:
- id: detect-private-key
- id: mixed-line-ending
args: ["--fix=lf"]
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.4.1
hooks:
- id: remove-tabs
- id: remove-crlf
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.6
hooks:
# Run the linter.
- id: ruff
types_or: [ python, pyi, jupyter ]
args: [ --fix ]
# Run the formatter.
- id: ruff-format
types_or: [ python, pyi, jupyter ]
- repo: https://github.com/RobertCraigie/pyright-python
rev: v1.1.394
hooks:
- id: pyright
- repo: https://github.com/mwouts/jupytext
rev: v1.16.7
hooks:
- id: jupytext
args: [--sync]
- repo: https://github.com/ryanpeach/mdlinker
rev: v1.6.1
hooks:
- id: enforce-ascii
files: notes/pages/.*\.md
- id: mdlinker
files: notes/pages/.*\.md
args:
- "--fix"
- "--allow-dirty"
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.4.1
hooks:
- id: remove-tabs
- id: remove-crlf
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.9.6
hooks:
# Run the linter.
- id: ruff
types_or: [python, pyi, jupyter]
args: [--fix]
# Run the formatter.
- id: ruff-format
types_or: [python, pyi, jupyter]
- repo: https://github.com/RobertCraigie/pyright-python
rev: v1.1.394
hooks:
- id: pyright
- repo: https://github.com/mwouts/jupytext
rev: v1.16.7
hooks:
- id: jupytext
args: [--sync]
# - repo: https://github.com/ryanpeach/mdlinker
# rev: v1.7.2
# hooks:
# - id: enforce-ascii
# files: continuing_education/.*\.md
# - id: mdlinker
# files: continuing_education/.*\.md
# args:
# - "--fix"
# - "--allow-dirty"
3 changes: 3 additions & 0 deletions Justfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ add path:
# Sync all notebooks
sync:
uv run jupytext --sync **/*.ipynb

anki:
mdanki continuing_education anki.apkg
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you then wrap cells (the code you run) in `if __name__ == "__main__"` you als

## Logseq

I use [logseq](https://logseq.com/) to manage my [./notes](./notes) and [flashcards](https://hub.logseq.com/use-cases/1Sr4awszMQzD4GM5KvWim7/how-to-quickly-create-spaced-repetition-flashcards-in-logseq-using-notepad-and-excel/jxPxEdkM4BBhLfrz9PfjBr). This uses the [zettelkasten](https://zettelkasten.de/overview/) method to create a [knowledge graph](https://en.wikipedia.org/wiki/Knowledge_graph), which is perfect for studying.
I use [marksman](https://github.com/artempyanykh/marksman) to manage my markdown notes and link them to each other. I then use [mdanki](https://github.com/ashlinchak/mdanki) to convert them to flashcards.

# Styleguide

Expand All @@ -52,7 +52,7 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
* Reinforcement Learning
* Value Based Methods - I'm pretty much up to date with these methods, but might as well implement them. I may go into less explanation though.
* 📖📓💻 [$TD(\lambda)$](https://web.stanford.edu/class/cs234/notes/cs234-notes7.pdf)
* 📖📓💻❗ [Deep Q Learning](https://arxiv.org/abs/1312.5602)
* 📖📓💻❗ [Deep Q Learning](https://arxiv.org/abs/1312.5602)
* <https://lightning.ai/docs/pytorch/LTS/notebooks/lightning_examples/reinforce-learning-DQN.html>
* 📖❗[Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)
* 📖❗[Double Q Learning](https://arxiv.org/abs/1509.06461)
Expand Down Expand Up @@ -85,21 +85,21 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
* [ ] [Go-Explore](https://www.nature.com/articles/s41586-020-03157-9)
* [ ] [NoisyNet](https://openreview.net/pdf?id=rywHCPkAW)
* [ ] [DQN-PixelCNN](https://arxiv.org/abs/1606.01868)
* [ ] [#Exploration](http://papers.neurips.cc/paper/6868-exploration-a-study-of-count-based-exploration-for-deep-reinforcement-learning.pdf)
* [ ] [EX2](https://papers.nips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf)
* [ ] [ICM](https://arxiv.org/abs/1705.05363)
* [ ] [RND](https://arxiv.org/abs/1810.12894)
* [ ] [NGU](https://arxiv.org/abs/2002.06038)
* [ ] [Agent57](https://arxiv.org/abs/2003.13350)
* [ ] [VIME](https://arxiv.org/abs/1605.09674)
* [ ] [EMI](https://openreview.net/forum?id=H1exf64KwH)
* [ ] [DIYAN](https://arxiv.org/abs/1802.06070)
* [ ] [SAC](https://arxiv.org/abs/1801.01290)
* [ ] [BootstrappedDQN](https://arxiv.org/abs/1602.04621)
* [ ] [PSRL](https://arxiv.org/pdf/1306.0940.pdf)
* [ ] [HER](https://arxiv.org/pdf/1707.01495.pdf)
* [ ] [DQfD](https://arxiv.org/abs/1704.03732)
* [ ] [R2D3](https://arxiv.org/abs/1909.01387)
* [ ] [#Exploration](http://papers.neurips.cc/paper/6868-exploration-a-study-of-count-based-exploration-for-deep-reinforcement-learning.pdf)
* [ ] [EX2](https://papers.nips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf)
* [ ] [ICM](https://arxiv.org/abs/1705.05363)
* [ ] [RND](https://arxiv.org/abs/1810.12894)
* [ ] [NGU](https://arxiv.org/abs/2002.06038)
* [ ] [Agent57](https://arxiv.org/abs/2003.13350)
* [ ] [VIME](https://arxiv.org/abs/1605.09674)
* [ ] [EMI](https://openreview.net/forum?id=H1exf64KwH)
* [ ] [DIYAN](https://arxiv.org/abs/1802.06070)
* [ ] [SAC](https://arxiv.org/abs/1801.01290)
* [ ] [BootstrappedDQN](https://arxiv.org/abs/1602.04621)
* [ ] [PSRL](https://arxiv.org/pdf/1306.0940.pdf)
* [ ] [HER](https://arxiv.org/pdf/1707.01495.pdf)
* [ ] [DQfD](https://arxiv.org/abs/1704.03732)
* [ ] [R2D3](https://arxiv.org/abs/1909.01387)
* Multi Agent RL
* [ ] [Emergent Communication through Negotiation](https://arxiv.org/abs/1804.03980)
* [ ] Warp Drive
Expand All @@ -114,11 +114,11 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
* Transformers
* [ ] [Tokenization](https://huggingface.co/learn/nlp-course/en/chapter6/1?fw=pt)
* [ ] [Word Embeddings](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html)
* 📖❗[Transformers](https://arxiv.org/abs/1706.03762)
* 📖❗[Transformers](https://arxiv.org/abs/1706.03762)
* <https://pytorch.org/tutorials/beginner/transformer_tutorial.html>
* <https://lightning.ai/docs/pytorch/stable/notebooks/course_UvA-DL/05-transformers-and-MH-attention.html>
* 📖❗[BERT](https://arxiv.org/abs/1810.04805)
* [ ]❗[Sentence-BERT](https://arxiv.org/pdf/1908.10084)
* 📖❗[BERT](https://arxiv.org/abs/1810.04805)
* [ ]❗[Sentence-BERT](https://arxiv.org/pdf/1908.10084)
* [ ] [Fine Tuning](https://huggingface.co/learn/nlp-course/en/chapter3/1?fw=pt)
* [ ] [RLHF](https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo)
* [ ] [Direct Preference Optimization](https://arxiv.org/pdf/2305.18290)
Expand All @@ -141,7 +141,7 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the
* [Survey on Graph RAG](https://arxiv.org/abs/2408.08921)
* [ ] Diffusion Models
* <https://lightning.ai/lightning-community-labs/studios/build-diffusion-models-with-pytorch-lightning-hf-diffusers>
* [ ]❗Graph Neural Networks (GNN)
* [ ]❗Graph Neural Networks (GNN)
* <https://lightning.ai/docs/pytorch/stable/notebooks/course_UvA-DL/06-graph-neural-networks.html>
* Cognitive Science
* [ ] [Hopfield Network](https://www.youtube.com/watch?v=1WPJdAW-sFo)
Expand Down
8 changes: 0 additions & 8 deletions continuing_education/math/README.md

This file was deleted.

4 changes: 4 additions & 0 deletions continuing_education/math/basics/logarithm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## What are the identities of [[logarithm]]s?
- They turn multiplication into addition: $\log(a \cdot b) = \log(a) + \log(b)$
- They turn division into subtraction: $\log(a / b) = \log(a) - \log(b)$
- They turn exponentiation into multiplication: $\log(a^b) = b \cdot \log(a)$
9 changes: 9 additions & 0 deletions continuing_education/math/calculus/gradient.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# [[Gradient]]

## What is the [[gradient]]?

A vector of single partial derivatives in each dimension of the input space.

## What does the symbol $\nabla$ mean?

The [[gradient]]
9 changes: 9 additions & 0 deletions continuing_education/math/calculus/henessian.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# [[Henessian]]

## What is a [[Henessian]]?

A second order [[gradient]] that produces a matrix of second derivatives.

## What is the difference between a [[Henessian]] and a [[Laplacian]]?

- Henessians produce a matrix of second derivatives, while Laplacians produce a scalar. Both are second order derivatives.
9 changes: 9 additions & 0 deletions continuing_education/math/calculus/laplacian.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## [[Laplacian]]

## What is the [[Laplacian]]

The second order [[gradient]]. Produces a scalar.

## What does the symbol $\Delta$ mean?

The [[Laplacian]] operator.
7 changes: 7 additions & 0 deletions continuing_education/math/classic_ml/notes/bagging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Bagging

## What is Bagging?

Bootstrap aggregating, also called bagging (from bootstrap aggregating) or bootstrapping, is a machine learning (ML) ensemble meta-algorithm designed to improve the stability and accuracy of ML classification and regression algorithms. It also reduces variance and overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the ensemble averaging approach.

https://en.wikipedia.org/wiki/Bootstrap_aggregating
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Log Probability

## Why would you use a negative [[continuing_education/math/statistics/log_probability/README|log-probability]] in a loss function?

It's infinity at 0 and 0 at 1, which means at high confidence in something you get a low loss approaching 0, and at low confidence you get a high loss approaching infinity. It gives a strong [[gradient]] signal to the network to update its parameters.

## What is the equation for a sigmoid function?

$f(x) = \frac{1}{1 + e^{-x}}$
13 changes: 13 additions & 0 deletions continuing_education/math/statistics/notes/bootstrapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Bootstrapping

## What is Bootstrapping?

Bootstrapping is a procedure for estimating the distribution of an estimator by resampling (often with replacement) one's data or a model estimated from the data. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

## What is the difference between bootstrapping and bagging?

Bagging is a specific type of bootstrapping that involves creating multiple subsets of the training data by sampling with replacement, training a model on each subset, and then averaging the predictions of the models to improve accuracy and reduce variance. Bootstrapping, in general, refers to the process of resampling data to estimate statistics or build confidence intervals.

source: AI
2 changes: 2 additions & 0 deletions continuing_education/policy_gradient_methods/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

# [[Policy]]-[[gradient]] methods
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Actor Critic

## What is the [[value]] function used for in [[continuing_education/value_based_methods/dqn/README|actor critic]] methods?
- The [[value]] function is used to estimate the *average* expected return from a given state.
- It could theoretically be a Q-function, but in practice, it is often a state-value function using [[TD]] error.
## What is the [[advantage]] function used for in [[continuing_education/value_based_methods/dqn/README|actor critic]] methods?
- The difference between the expected reward from a state-action pair (Q) and the average expected reward from just the state (V).
- $A(s, a) = Q(s, a) - V(s)$
- It is used to normalize the [[continuing_education/policy_gradient_methods/README|policy-gradient]], as well as to push the [[continuing_education/policy_gradient_methods/README|policy-gradient]] towards actions that are better than average and away from actions that are worse than average.
## What is the training loop for a2c? #todo
## What is the difference between a2c and a3c?

## What is the [[value]] function used for in [[actor critic]] methods?

- The [[value]] function is used by the critic to evaluate the expected return from a given state or state-action pair.
- It provides a baseline for the [[policy gradient]] updates, helping to reduce variance in the learning process. It does this by providing an estimate of the expected return from a state, which can be subtracted from the actual return to compute the state [[advantage]] function.
- The [[value]] function is used to estimate the *average* expected return from a given state.
- It could theoretically be a Q-function, but in practice, it is often a [[value]] function.

source: Myself

## What is the [[advantage]] function used for in [[actor critic]] methods?

- The difference between the expected reward from a state-action pair (Q) and the average expected reward from just the state (V).
- $A(s, a) = Q(s, a) - V(s)$
- It is normalized making the [[policy gradient]] normalized as well (providing updates around +-0)
- It pushes the [[policy gradient]] towards actions that are better than average and away from actions that are worse than average.

source: Myself

## What kinds of normalization are used in [[actor critic]] methods? What are their effects?



## What is the training loop for A2C?

## What is the training loop for A3c?

## What does A2C stand for?

A2C stands for Advantage Actor-Critic.

## What does A3C stand for?

A3C stands for Asynchronous Advantage Actor-Critic.

## What is the difference between A2C and A3C?

A3C is parallel and asynchronous, meaning it uses multiple agents to explore the environment in parallel and updates the model asynchronously. A2C is synchronous, meaning it uses a single agent to explore the environment and updates the model synchronously.

source: https://en.wikipedia.org/wiki/Actor-critic_algorithm#Variants
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
---
alias: advantage function
---

- What shape is the [[advantage]] function? #card
## What shape is the [[advantage]] function?
- $A(s,a) \in \mathbb{R}^{|A|}$, where $|A|$ is the number of actions.
- It works in a fixed integer number of actions.
- Same shape as the Q-value function.
- What is the intuition behind the [[advantage]] function? #card
## What is the intuition behind the [[advantage]] function?
- The [[advantage]] function is a measure of how much better an action is compared to the average action in a given state.
- Learning relative [[advantage]] is easier and has less variance than learning absolute values. [[Advantage]] is more relevant to decision making via argmax than absolute values.
- Define the [[advantage]] function in terms of the Q-value function and the value function. #card
## Define the [[advantage]] function in terms of the Q-value function and the [[value]] function.
- $A(s,a) = Q(s,a) - V(s)$
File renamed without changes.
Loading