c4-model

This model is a fine-tuned version of bowphs/pythia-70m-multi on the allenai/c4 en dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
training_steps: 30000

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	0.0000	1	10.7029	0.0164
No log	0.0001	2	10.5331	0.0496
No log	0.0001	4	10.3022	0.0533
No log	0.0003	8	10.0235	0.0536
No log	0.0005	16	9.6536	0.0635
No log	0.0011	32	9.0284	0.0759
No log	0.0021	64	8.0249	0.0832
No log	0.0043	128	6.9172	0.1129
No log	0.0085	256	6.1629	0.1558
No log	0.0171	512	5.5805	0.1817
No log	0.0341	1024	5.1235	0.2028
5.4529	0.0667	2000	4.7613	0.2264
5.4529	0.0683	2048	4.7481	0.2281
4.5765	0.1333	4000	4.4123	0.2610
4.5765	0.1365	4096	4.4043	0.2625
4.3252	0.2	6000	4.2221	0.2827
4.146	0.2667	8000	4.0350	0.3098
4.146	0.2731	8192	4.0134	0.3129
3.9652	0.3333	10000	3.8860	0.3304
3.8441	0.4	12000	3.8005	0.3418
3.7739	0.4667	14000	3.7315	0.3503
3.72	0.5333	16000	3.6880	0.3553
3.72	0.5461	16384	3.6777	0.3564
3.6718	0.6	18000	3.6533	0.3593
3.6527	0.6667	20000	3.6212	0.3633
3.6201	0.7333	22000	3.5985	0.3660
3.593	0.8	24000	3.5819	0.3679
3.5857	0.8667	26000	3.5683	0.3697
3.5801	0.9333	28000	3.5582	0.3711
3.5649	1.0	30000	3.5532	0.3716

Safetensors

Model size

70.4M params

Tensor type

F32

Base model

Finetuned

(1)

this model