StableLM FineWeb Training Runs

This repository contains multiple StableLM training runs and checkpoints trained on FineWeb-Edu, with different weight decay configurations.

⚠️ This repository is intended for research and analysis purposes. It is not a single ready-to-use Hugging Face model.

📁 Repository Structure

stablelm_fineweb_wd0/ stablelm_fineweb_wd0.01/ stablelm_fineweb_wd0.1/ stablelm_fineweb_wd1/ stablelm_fineweb_wd3/

Inside each folder you will find:

stepXXXX-unsharded/: unsharded model checkpoints
logs/: training logs
wandb/: experiment tracking data
train_data/: metadata for training

🧪 Training Setup

Base architecture: StableLM (1B)
Dataset: FineWeb-Edu (tokenized)
Optimizer: AdamW
Global batch size: 128
Sequence length: 4096
Weight decay: varies per run

Each run was trained independently using identical configurations except for the weight decay parameter.

📌 Intended Use

This repository is intended for:

Studying the effect of weight decay on large language model training
Analyzing optimization dynamics
Reproducing or extending StableLM training experiments

It is not recommended to directly deploy these checkpoints in production without additional validation and fine-tuning.

📥 Downloading Checkpoints

You can download specific runs using:

git lfs install
git clone https://huggingface.co/xsong69/stablelm-runs-all



Each subfolder corresponds to an independent training run:

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support