From b88d18cefd99ca2b6e8e699d244cd0a246070b52 Mon Sep 17 00:00:00 2001 From: Ryan Peach Date: Sun, 23 Nov 2025 21:50:14 -0500 Subject: [PATCH] Update README.md --- README.md | 111 +----------------------------------------------------- 1 file changed, 1 insertion(+), 110 deletions(-) diff --git a/README.md b/README.md index d4508ee..b522b4e 100644 --- a/README.md +++ b/README.md @@ -42,116 +42,7 @@ See [.github/pull_request_template.md](.github/pull_request_template.md) for the # TODO -* Emoji Meanings - * ❗ Indicates Priority - * πŸ“– Paper Read - * πŸ““ Notes Taken - * πŸ’» Implementation Completed - - -* Reinforcement Learning - * Value Based Methods - I'm pretty much up to date with these methods, but might as well implement them. I may go into less explanation though. - * πŸ“–πŸ““πŸ’» [$TD(\lambda)$](https://web.stanford.edu/class/cs234/notes/cs234-notes7.pdf) - * πŸ“–πŸ““πŸ’»β— [Deep Q Learning](https://arxiv.org/abs/1312.5602) - * - * πŸ“–β—[Prioritized Experience Replay](https://arxiv.org/abs/1511.05952) - * πŸ“–β—[Double Q Learning](https://arxiv.org/abs/1509.06461) - * [ ] [Dueling Q Learning](https://arxiv.org/abs/1511.06581) - * [ ] [Multi Step Learning](https://arxiv.org/abs/1901.02876) - * [ ] [Distributional DQN](https://arxiv.org/abs/1707.06887) - * [ ] [Noisy Nets](https://arxiv.org/abs/1706.10295) - * πŸ“– [RAINBOW](https://arxiv.org/abs/1710.02298) - * Policy Based Methods - * πŸ“–πŸ““πŸ’» [REINFORCE](https://arxiv.org/abs/2010.11364) * - * πŸ“–β— [Actor-Critic](https://arxiv.org/pdf/1602.01783v2) (A2C, A3C) * - * [ ] [Trust Region Policy Optimization](https://arxiv.org/pdf/1502.05477) (TRPO) - * [ ]❗[Proximal Policy Optimization](https://arxiv.org/abs/1707.06347) (PPO) * - * [ ] [Deep Deterministic Policy Gradient](https://arxiv.org/abs/1509.02971v6) (DDPG) - * Model Based Reinforcement Learning - * πŸ“–β—[AlphaZero](https://arxiv.org/abs/1712.01815) - * [ ] [MuZero](https://www.nature.com/articles/s41586-020-03051-4.epdf?sharing_token=kTk-xTZpQOF8Ym8nTQK6EdRgN0jAjWel9jnR3ZoTv0PMSWGj38iNIyNOw_ooNp2BvzZ4nIcedo7GEXD7UmLqb0M_V_fop31mMY9VBBLNmGbm0K9jETKkZnJ9SgJ8Rwhp3ySvLuTcUr888puIYbngQ0fiMf45ZGDAQ7fUI66-u7Y%3D) - * - * [ ] [Dreamer](https://arxiv.org/pdf/1912.01603) - * - * [ ] [Efficient Zero](https://arxiv.org/abs/2111.00210) - * [ ] [Efficient Zero V2](https://arxiv.org/abs/2403.00564) - * [ ] [SIMA](https://arxiv.org/abs/2404.10179) - * - * [ ] [Genie 1](https://arxiv.org/abs/2402.15391) - * - * [ ] [Genie 2](https://arxiv.org/pdf/2405.15489) - * - * [ ] [Exploration in RL](https://github.com/opendilab/awesome-exploration-rl) - * [ ] [Go-Explore](https://www.nature.com/articles/s41586-020-03157-9) - * [ ] [NoisyNet](https://openreview.net/pdf?id=rywHCPkAW) - * [ ] [DQN-PixelCNN](https://arxiv.org/abs/1606.01868) - * [ ] [#Exploration](http://papers.neurips.cc/paper/6868-exploration-a-study-of-count-based-exploration-for-deep-reinforcement-learning.pdf) - * [ ] [EX2](https://papers.nips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf) - * [ ] [ICM](https://arxiv.org/abs/1705.05363) - * [ ] [RND](https://arxiv.org/abs/1810.12894) - * [ ] [NGU](https://arxiv.org/abs/2002.06038) - * [ ] [Agent57](https://arxiv.org/abs/2003.13350) - * [ ] [VIME](https://arxiv.org/abs/1605.09674) - * [ ] [EMI](https://openreview.net/forum?id=H1exf64KwH) - * [ ] [DIYAN](https://arxiv.org/abs/1802.06070) - * [ ] [SAC](https://arxiv.org/abs/1801.01290) - * [ ] [BootstrappedDQN](https://arxiv.org/abs/1602.04621) - * [ ] [PSRL](https://arxiv.org/pdf/1306.0940.pdf) - * [ ] [HER](https://arxiv.org/pdf/1707.01495.pdf) - * [ ] [DQfD](https://arxiv.org/abs/1704.03732) - * [ ] [R2D3](https://arxiv.org/abs/1909.01387) - * Multi Agent RL - * [ ] [Emergent Communication through Negotiation](https://arxiv.org/abs/1804.03980) - * [ ] Warp Drive - * - * [Human-Timescale Adaptation in an Open-Ended Task Space](https://sites.google.com/view/adaptive-agent/) - * [ ] [Muesli](https://arxiv.org/pdf/2104.06159) - * [ ] [Transformer-XL](https://arxiv.org/abs/1901.02860) - * [ ] [Robust PLR](https://arxiv.org/pdf/2110.02439) - * Distributed RL - * [ ] [Survey](https://arxiv.org/pdf/2011.11012) - * [ ] [RLLib](https://docs.ray.io/en/master/rllib.html) -* Transformers - * [ ] [Tokenization](https://huggingface.co/learn/nlp-course/en/chapter6/1?fw=pt) - * [ ] [Word Embeddings](https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html) - * πŸ“–β—[Transformers](https://arxiv.org/abs/1706.03762) - * - * - * πŸ“–β—[BERT](https://arxiv.org/abs/1810.04805) - * [ ]❗[Sentence-BERT](https://arxiv.org/pdf/1908.10084) - * [ ] [Fine Tuning](https://huggingface.co/learn/nlp-course/en/chapter3/1?fw=pt) - * [ ] [RLHF](https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo) - * [ ] [Direct Preference Optimization](https://arxiv.org/pdf/2305.18290) - * [ ] [Multimodality](https://lightning.ai/docs/pytorch/stable/notebooks/course_UvA-DL/11-vision-transformer.html) - * [ ] [Mamba and SSM's](https://towardsdatascience.com/mamba-ssm-theory-and-implementation-in-keras-and-tensorflow-32d6d4b32546) - * [ ] [Sentence Transformers](https://medium.com/@vipra_singh/building-llm-applications-sentence-transformers-part-3-a9e2529f99c1) - * [ ] [Multi token prediction](https://arxiv.org/pdf/2404.19737) - * [ ] Time Series - * -* RAG - * πŸ“– [Survey on RAG](https://arxiv.org/abs/2405.06211) - * [ ]❗REALM - * [ ]❗Hyde - * [ ]❗DPR - * [ ]❗Raft - * [ ] PRCA - * [ ] EAE - * [ ] MIPS - * [ ] Self reinforce - * [Survey on Graph RAG](https://arxiv.org/abs/2408.08921) -* [ ] Diffusion Models - * -* [ ]❗Graph Neural Networks (GNN) - * -* Cognitive Science - * [ ] [Hopfield Network](https://www.youtube.com/watch?v=1WPJdAW-sFo) - * [ ] [Boltzman Machine](https://www.youtube.com/watch?v=_bqa_I5hNAo) - * [ ] [Conformal Prediction](https://blog.dataiku.com/measuring-models-uncertainty-conformal-prediction?utm_source=pocket_saves) - * [ ] [Predictive Coding Models](https://arxiv.org/abs/2202.09467) - * [ ] [Liquid Neural Networks](https://arxiv.org/pdf/2006.04439) -* Techniques - * Profiling - * Debugging Metrics +https://ryanpeach.com/Publish/Machine+Learning/Research+Papers ## Sources for further work