BEEspoke Data

community

https://www.bees.org/

Activity Feed

AI & ML interests

'an LLM is only as good as the dataset it was trained on' - Sun Tzu

Recent Activity

pszemraj updated a model 2 days ago

BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2

pszemraj published a model 2 days ago

BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2

kenhktsui authored a paper 5 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

View all activity

pszemraj

updated a model 2 days ago

BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2

Image-Text-to-Text • 0.9B • Updated 2 days ago • 20

pszemraj

published a model 2 days ago

BEE-spoke-data/NVIDIA-Nemotron-Parse-v1.2

Image-Text-to-Text • 0.9B • Updated 2 days ago • 20

kenhktsui

authored a paper 5 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 9

huu-ontocord

authored a paper 5 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 9

pszemraj

updated a model 5 months ago

BEE-spoke-data/neobert-100k-test

Fill-Mask • 0.1B • Updated Dec 29, 2025

pszemraj

published a model 5 months ago

BEE-spoke-data/neobert-100k-test

Fill-Mask • 0.1B • Updated Dec 29, 2025

pszemraj

updated 2 datasets 7 months ago

BEE-spoke-data/govdocs1-pdf-source

Viewer • Updated Dec 29, 2025 • 235k • 2.62k • 4

BEE-spoke-data/govdocs1-by-extension

Viewer • Updated Dec 29, 2025 • 733k • 158 • 2

amazingvince

updated a dataset 7 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Dec 29, 2025 • 14.4k • 18 • 2

pszemraj

updated a collection 7 months ago

Survivor Library Books - OCR

Collection

Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs • 2 items • Updated Jul 14, 2025 • 5

pszemraj

updated 2 datasets 8 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Dec 29, 2025 • 14.4k • 18 • 2

BEE-spoke-data/SurvivorLib-rolmOCR

Viewer • Updated Dec 29, 2025 • 14.6k • 28 • 1

amazingvince

published a dataset 8 months ago

BEE-spoke-data/SurvivorLib-Nanonets-OCR-s

Viewer • Updated Dec 29, 2025 • 14.4k • 18 • 2

pszemraj

published a dataset 8 months ago

BEE-spoke-data/SurvivorLib-rolmOCR

Viewer • Updated Dec 29, 2025 • 14.6k • 28 • 1

kenhktsui

authored a paper 8 months ago

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

pszemraj

published a dataset 8 months ago

BEE-spoke-data/govdocs1-pdf-source

Viewer • Updated Dec 29, 2025 • 235k • 2.62k • 4

pszemraj

updated a model 8 months ago

BEE-spoke-data/tiny-random-MPNetForMaskedLM

Fill-Mask • 237k • Updated Dec 29, 2025 • 2

huu-ontocord

authored 2 papers 8 months ago

EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition

Paper • 2505.20033 • Published May 26, 2025 • 4

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Paper • 2506.09827 • Published Jun 11, 2025 • 21

AI & ML interests

Recent Activity

Team members 9

BEE-spoke-data's activity