1 7

jingshu

lilaczheng

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 months ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

authored a paper 5 months ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

liked a Space 6 months ago

BAAI/FlagEval-Robo

View all activity

Organizations

upvoted a paper 4 months ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Paper • 2510.26865 • Published Oct 30, 2025 • 12

authored a paper 5 months ago

FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

Paper • 2509.17177 • Published Sep 21, 2025 • 13

liked a Space 6 months ago

FlagEval-Robo

🐢

Compare and evaluate language models side-by-side

liked a Space about 1 year ago

Open Flageval Vlm Leaderboard

🥇

FlagEval VLM Leaderboard

liked 2 datasets about 1 year ago

FlagEval/CLCC_v1

Viewer • Updated Jul 29, 2024 • 760 • 16 • 3

FlagEval/HalluDial

Updated Jun 26, 2024 • 27 • 4

published an article over 1 year ago

Article

Letting Large Models Debate: The First Multilingual LLM Debate Competition

Nov 20, 2024

•

liked 2 Spaces over 1 year ago

FlagEval-Arena

🐢

Arena

FlagEval-Debate

🐠

Display a debate interface

updated a Space over 1 year ago

Open Chinese LLM Leaderboard

🏆

124

Explore LLM benchmark leaderboard and submit models

liked a Space almost 2 years ago

Open Chinese LLM Leaderboard

🏆

124

Explore LLM benchmark leaderboard and submit models

jingshu

AI & ML interests

Recent Activity

Organizations

lilaczheng's activity

FlagEval-Robo

Open Flageval Vlm Leaderboard

Letting Large Models Debate: The First Multilingual LLM Debate Competition

FlagEval-Arena

FlagEval-Debate

Open Chinese LLM Leaderboard

Open Chinese LLM Leaderboard