-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
Abhranil Chandra PRO
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
updated
a model
4 days ago
abhranil14/L8B_on_MBPP_Code_G27B_IT_H_Paraphrased_subset_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
published
a model
4 days ago
abhranil14/L8B_on_MBPP_Code_G27B_IT_H_Paraphrased_subset_W_354_BS_64_lr_2e-5_epoch10_linear_schedule
updated
a model
11 days ago
abhranil14/G2B_on_CODE_MBPP_G_601_subset_wrt_G_601_BS_64_lr_2e-5_epoch10_linear_schedule