Reward model - a transZ Collection

transZ 's Collections

Reward model

updated 6 days ago

Reward modelling

RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 18

Note Training
transZ/shp

Viewer • Updated Jan 23 • 10.3k • 14

Note Test and validation
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 140 • 4

Note Training
transZ/anthropic_helpful_test

Viewer • Updated Jan 23 • 2.33k • 6

Note Test
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 39 • 4

Note Training
transZ/anthropic_harmless_test

Viewer • Updated Jan 23 • 2.3k • 5

Note Test
transZ/helpsteer3

Viewer • Updated Feb 7 • 18.6k • 5

Note Training and testing
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 10 • 3

Note Training
transZ/pku_safe_rlhf

Viewer • Updated Feb 7 • 1.22k • 7

Note Test
HuggingFaceH4/cai-conversation-harmless

Viewer • Updated Feb 2, 2024 • 44.8k • 109 • 16

Note Training and testing
transZ/helpsteer3_v2

Viewer • Updated 6 days ago • 18.3k • 6