openfree

80 11 467

https://discord.gg/openfreeai

AI & ML interests

None yet

Recent Activity

updated a Space about 15 hours ago

VIDraft/vkae

reacted to ginigen-ai's post with ❤️ about 19 hours ago

🧠 Does your LLM know when it's about to be wrong? Most leaderboards measure accuracy. We measure metacognition — whether a model catches its own errors. Benchmark + leaderboard + adapters, all open. 🎉 The surprise: even a K-AI #1 model (JGOS-31B-Citizen) is the strongest on multiple-choice traps (trap_rate 0.005 — ~2 misses in 400) yet blind to its own free-form mistakes (self-confidence AUROC = 0.5, pure random). A tiny base-frozen adapter recovers that signal. Two independent axes (never compared across a row): ① trap_rate — does it fall for tempting trap options? (lower = stronger) ② adapter gain Δ — how much a lightweight adapter catches errors the model itself misses. (higher = more adapter value) What's open: 📊 300+100 trap problems (each with a hidden trap + TICOS type) 🏆 24-model leaderboard 🧩 11 per-model adapters — adapters, NOT fine-tunes (base stays frozen; the adapter just reads the hidden state → P(wrong)) Submit any HF model → auto-scored daily at 09:00 KST and added to the board. 🏆 Leaderboard → https://huggingface.co/spaces/ginigen-ai/Metacognition-Leaderboard-Space 📊 Benchmark → https://huggingface.co/datasets/ginigen-ai/Metacognition-Bench 🧩 Adapters → https://huggingface.co/collections/FINAL-Bench/metacognition-adapters-6a42c032e6beb803dd032961 📊 Article → https://huggingface.co/blog/ginigen-ai/metacognition Benchmark by ginigen-ai · Adapters by FINAL-Bench (Darwin/Chimera platform + AETHER metacognition tech).

updated a bucket about 21 hours ago

gemma-challenge/gemma-vidraft-darwin

View all activity

Organizations

liked 11 models 2 days ago

liked a dataset 2 days ago

ginigen-ai/Metacognition-Bench

Viewer • Updated 2 days ago • 300 • 78 • 19

liked 2 Spaces 2 days ago

Metacognition Leaderboard

🧠

Explore LLM metacognition rankings and submit models for evaluation

RoboCasa Kitchen Leaderboard

🍳

Neutral aggregation of VLA success rates on RoboCasa Kitchen

liked a model 4 days ago

VIDraft/JGOS-31B-Think

Image-Text-to-Text • 31B • Updated 3 days ago • 71 • 2

liked a Space 4 days ago

VKAE

🚀

View Gemma model speedup benchmarks and request access

liked a model 17 days ago

FINAL-Bench/Darwin-398B-JGOS

Text Generation • 403B • Updated 6 days ago • 376 • 29

liked a Space 17 days ago

Fast Gemma Challenge

⚡

Multi-agent collab to make Gemma go brrr

liked a Space 18 days ago

FINAL-Bench Quantum Leaderboard

⚛

Neutral quantum-method benchmark — QEC decoders & more

liked a model 18 days ago

FINAL-Bench/Darwin-4B-Opus

Text Generation • 8B • Updated May 15 • 68 • 24