StableLM FineWeb Training Runs
This repository contains multiple StableLM training runs and checkpoints trained on FineWeb-Edu, with different weight decay configurations.
β οΈ This repository is intended for research and analysis purposes. It is not a single ready-to-use Hugging Face model.
π Repository Structure
stablelm_fineweb_wd0/ stablelm_fineweb_wd0.01/ stablelm_fineweb_wd0.1/ stablelm_fineweb_wd1/ stablelm_fineweb_wd3/
Inside each folder you will find:
stepXXXX-unsharded/: unsharded model checkpointslogs/: training logswandb/: experiment tracking datatrain_data/: metadata for training
π§ͺ Training Setup
- Base architecture: StableLM (1B)
- Dataset: FineWeb-Edu (tokenized)
- Optimizer: AdamW
- Global batch size: 128
- Sequence length: 4096
- Weight decay: varies per run
Each run was trained independently using identical configurations except for the weight decay parameter.
π Intended Use
This repository is intended for:
- Studying the effect of weight decay on large language model training
- Analyzing optimization dynamics
- Reproducing or extending StableLM training experiments
It is not recommended to directly deploy these checkpoints in production without additional validation and fine-tuning.
π₯ Downloading Checkpoints
You can download specific runs using:
git lfs install
git clone https://huggingface.co/xsong69/stablelm-runs-all
Each subfolder corresponds to an independent training run:
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support