Project of MoE reward model

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

shengyi-qian authored a paper about 2 months ago

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

zhuokai authored a paper about 2 months ago

Scaling Agent Learning via Experience Synthesis

zhuokai authored a paper 2 months ago

From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding

View all activity

shengyi-qian

authored a paper about 2 months ago

DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Paper • 2511.07413 • Published Nov 10, 2025 • 5

zhuokai

authored a paper about 2 months ago

Scaling Agent Learning via Experience Synthesis

Paper • 2511.03773 • Published Nov 5, 2025 • 81

zhuokai

authored 10 papers 2 months ago

From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding

Paper • 2412.06474 • Published Dec 9, 2024

Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment

Paper • 2501.09620 • Published Jan 16, 2025

Transfer between Modalities with MetaQueries

Paper • 2504.06256 • Published Apr 8, 2025 • 2

S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning

Paper • 2504.06426 • Published Apr 8, 2025 • 2

CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning

Paper • 2503.19900 • Published Mar 25, 2025

Boosting LLM Reasoning via Spontaneous Self-Correction

Paper • 2506.06923 • Published Jun 7, 2025

RecoWorld: Building Simulated Environments for Agentic Recommender Systems

Paper • 2509.10397 • Published Sep 12, 2025 • 7

StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

Paper • 2508.15717 • Published Aug 21, 2025 • 1

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

Paper • 2510.05251 • Published Oct 6, 2025 • 7

Thought Communication in Multiagent Collaboration

Paper • 2510.20733 • Published Oct 23, 2025 • 14

shengyi-qian

updated a model 6 months ago

MoeReward/rl_checkpoints

Updated Jun 27, 2025

zyhang1998

updated a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_imdb_main_2K

Viewer • Updated May 6, 2025 • 2k • 8

zyhang1998

published a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_imdb_main_2K

Viewer • Updated May 6, 2025 • 2k • 8

zyhang1998

updated a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_metamath_main_2K

Viewer • Updated May 6, 2025 • 2k • 9

zyhang1998

published a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_metamath_main_2K

Viewer • Updated May 6, 2025 • 2k • 9

zyhang1998

updated a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_arc_main_2K

Viewer • Updated May 6, 2025 • 2k • 7

zyhang1998

published a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_arc_main_2K

Viewer • Updated May 6, 2025 • 2k • 7

zyhang1998

updated a dataset 8 months ago

MoeReward/combined_rlhf_dataset_grpo_nq_main_2K

Viewer • Updated May 6, 2025 • 2k • 8