StableLM FineWeb Training Runs

This repository contains multiple StableLM training runs and checkpoints trained on FineWeb-Edu, with different weight decay configurations.

⚠️ This repository is intended for research and analysis purposes. It is not a single ready-to-use Hugging Face model.


πŸ“ Repository Structure

stablelm_fineweb_wd0/ stablelm_fineweb_wd0.01/ stablelm_fineweb_wd0.1/ stablelm_fineweb_wd1/ stablelm_fineweb_wd3/

Inside each folder you will find:

  • stepXXXX-unsharded/: unsharded model checkpoints
  • logs/: training logs
  • wandb/: experiment tracking data
  • train_data/: metadata for training

πŸ§ͺ Training Setup

  • Base architecture: StableLM (1B)
  • Dataset: FineWeb-Edu (tokenized)
  • Optimizer: AdamW
  • Global batch size: 128
  • Sequence length: 4096
  • Weight decay: varies per run

Each run was trained independently using identical configurations except for the weight decay parameter.


πŸ“Œ Intended Use

This repository is intended for:

  • Studying the effect of weight decay on large language model training
  • Analyzing optimization dynamics
  • Reproducing or extending StableLM training experiments

It is not recommended to directly deploy these checkpoints in production without additional validation and fine-tuning.


πŸ“₯ Downloading Checkpoints

You can download specific runs using:

git lfs install
git clone https://huggingface.co/xsong69/stablelm-runs-all



Each subfolder corresponds to an independent training run:
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support