MMS_10langs_sim_ct
This model is a fine-tuned version of facebook/mms-1b-all on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.2628
- Wer: 0.4053
- Bleu: 0.4148
- Rouge: {'rouge1': 0.734222200782478, 'rouge2': 0.5706793057992829, 'rougeL': 0.7332514906273471, 'rougeLsum': 0.7335017821747949}
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 100
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Wer | Bleu | Rouge |
|---|---|---|---|---|---|---|
| 1.9341 | 1.0 | 563 | 0.3961 | 0.4540 | 0.3693 | {'rouge1': 0.6194799391036223, 'rouge2': 0.46434843772712986, 'rougeL': 0.6172392118630954, 'rougeLsum': 0.6178199675400444} |
| 0.5254 | 2.0 | 1126 | 0.3743 | 0.4467 | 0.3775 | {'rouge1': 0.6250992334659873, 'rouge2': 0.4739031340857137, 'rougeL': 0.6236029348991206, 'rougeLsum': 0.6234010880387659} |
| 0.5058 | 3.0 | 1689 | 0.3612 | 0.4402 | 0.3849 | {'rouge1': 0.6298350439537708, 'rouge2': 0.4783898458774042, 'rougeL': 0.6281814950947691, 'rougeLsum': 0.6284393327845401} |
| 0.4929 | 4.0 | 2252 | 0.3697 | 0.4415 | 0.3856 | {'rouge1': 0.6334485473033613, 'rouge2': 0.48564177943983533, 'rougeL': 0.6316198158544744, 'rougeLsum': 0.6317389029466575} |
| 0.4839 | 5.0 | 2815 | 0.3673 | 0.4400 | 0.3883 | {'rouge1': 0.6291193762173194, 'rouge2': 0.48054942060106765, 'rougeL': 0.6268055922399272, 'rougeLsum': 0.6271649799168699} |
| 0.4751 | 6.0 | 3378 | 0.3486 | 0.4213 | 0.4018 | {'rouge1': 0.6447936409330529, 'rouge2': 0.5002381427715037, 'rougeL': 0.6430743213457508, 'rougeLsum': 0.6435655026581161} |
| 0.4659 | 7.0 | 3941 | 0.3462 | 0.4291 | 0.3955 | {'rouge1': 0.641594881331492, 'rouge2': 0.4962915901710817, 'rougeL': 0.6399779135959315, 'rougeLsum': 0.6401049430611638} |
| 0.4576 | 8.0 | 4504 | 0.3552 | 0.4253 | 0.3998 | {'rouge1': 0.6368955937088554, 'rouge2': 0.4894448633455363, 'rougeL': 0.6350720736779953, 'rougeLsum': 0.6354125526187782} |
| 0.4503 | 9.0 | 5067 | 0.3497 | 0.4308 | 0.3963 | {'rouge1': 0.6415766378561767, 'rouge2': 0.49397060787226355, 'rougeL': 0.6401280047607609, 'rougeLsum': 0.6396665966485348} |
| 0.4493 | 10.0 | 5630 | 0.3920 | 0.4653 | 0.3494 | {'rouge1': 0.61745493806306, 'rouge2': 0.4645674350368151, 'rougeL': 0.6157902693313584, 'rougeLsum': 0.6155486534547023} |
| 0.4432 | 11.0 | 6193 | 0.3493 | 0.4282 | 0.3973 | {'rouge1': 0.6497805561059471, 'rouge2': 0.505071806258327, 'rougeL': 0.648162035629783, 'rougeLsum': 0.6481878640170416} |
| 0.4348 | 12.0 | 6756 | 0.3464 | 0.4386 | 0.3839 | {'rouge1': 0.6303556574469422, 'rouge2': 0.4793748722900568, 'rougeL': 0.6285375560289082, 'rougeLsum': 0.6284314521288271} |
| 0.4336 | 13.0 | 7319 | 0.3372 | 0.4129 | 0.4141 | {'rouge1': 0.6466612625285739, 'rouge2': 0.5034968212900892, 'rougeL': 0.6454955184626125, 'rougeLsum': 0.6454712742479207} |
| 0.4276 | 14.0 | 7882 | 0.3462 | 0.4223 | 0.4065 | {'rouge1': 0.6533604974509974, 'rouge2': 0.5108020531781451, 'rougeL': 0.6515718032664506, 'rougeLsum': 0.6519949388175588} |
| 0.4212 | 15.0 | 8445 | 0.3420 | 0.4374 | 0.3961 | {'rouge1': 0.6330143517474989, 'rouge2': 0.48473846295463624, 'rougeL': 0.6312427282296429, 'rougeLsum': 0.631016393264616} |
| 0.417 | 16.0 | 9008 | 0.3374 | 0.4110 | 0.4172 | {'rouge1': 0.6593501805337192, 'rouge2': 0.5180575046657927, 'rougeL': 0.6577536432873535, 'rougeLsum': 0.6577803753186642} |
| 0.4144 | 17.0 | 9571 | 0.3371 | 0.4146 | 0.4117 | {'rouge1': 0.6517530776000595, 'rouge2': 0.5099667107786572, 'rougeL': 0.650344879439215, 'rougeLsum': 0.6504231468260594} |
| 0.4099 | 18.0 | 10134 | 0.3342 | 0.4020 | 0.4282 | {'rouge1': 0.655798828387135, 'rouge2': 0.5157779824344639, 'rougeL': 0.6551214221869609, 'rougeLsum': 0.6544034660423784} |
| 0.4046 | 19.0 | 10697 | 0.3358 | 0.4130 | 0.4194 | {'rouge1': 0.655645765399892, 'rouge2': 0.5132711154178711, 'rougeL': 0.6537665630351752, 'rougeLsum': 0.6542401982196812} |
| 0.4038 | 20.0 | 11260 | 0.3360 | 0.4017 | 0.4278 | {'rouge1': 0.6626729750932345, 'rouge2': 0.5232688135774992, 'rougeL': 0.6608647492545847, 'rougeLsum': 0.6611163678131431} |
| 0.3969 | 21.0 | 11823 | 0.3591 | 0.4429 | 0.3952 | {'rouge1': 0.6223947701841236, 'rouge2': 0.474513259312104, 'rougeL': 0.6208597133035478, 'rougeLsum': 0.6212920022879174} |
| 0.3976 | 22.0 | 12386 | 0.3308 | 0.4038 | 0.4273 | {'rouge1': 0.65794926139884, 'rouge2': 0.5190844296334247, 'rougeL': 0.6566319650777966, 'rougeLsum': 0.6566608566134198} |
| 0.3898 | 23.0 | 12949 | 0.3307 | 0.4011 | 0.4296 | {'rouge1': 0.658651506373759, 'rouge2': 0.5202119236332472, 'rougeL': 0.6572068822954353, 'rougeLsum': 0.6570243226737007} |
| 0.3908 | 24.0 | 13512 | 0.3351 | 0.4075 | 0.4243 | {'rouge1': 0.6622801332764499, 'rouge2': 0.5230352351758334, 'rougeL': 0.6603666005574962, 'rougeLsum': 0.6604687351969698} |
| 0.3863 | 25.0 | 14075 | 0.3301 | 0.4019 | 0.4295 | {'rouge1': 0.6571081888746078, 'rouge2': 0.5189446434182212, 'rougeL': 0.6554586907386226, 'rougeLsum': 0.6558929588716552} |
| 0.3822 | 26.0 | 14638 | 0.3316 | 0.4017 | 0.4299 | {'rouge1': 0.6624969419452018, 'rouge2': 0.5256712465340425, 'rougeL': 0.6610004105835426, 'rougeLsum': 0.6615012662793225} |
| 0.3798 | 27.0 | 15201 | 0.3286 | 0.4025 | 0.4268 | {'rouge1': 0.6583365930635563, 'rouge2': 0.51891322836611, 'rougeL': 0.6562511279899681, 'rougeLsum': 0.6567357916390837} |
| 0.3808 | 28.0 | 15764 | 0.3300 | 0.4164 | 0.4160 | {'rouge1': 0.6528752008244955, 'rouge2': 0.512403283878367, 'rougeL': 0.6514019731946912, 'rougeLsum': 0.6513964083339834} |
| 0.3767 | 29.0 | 16327 | 0.3327 | 0.4027 | 0.4314 | {'rouge1': 0.656988140454903, 'rouge2': 0.5201724914025685, 'rougeL': 0.6559759879942904, 'rougeLsum': 0.6564952679665647} |
| 0.3726 | 30.0 | 16890 | 0.3439 | 0.4017 | 0.4329 | {'rouge1': 0.6534001209156233, 'rouge2': 0.5150586862773879, 'rougeL': 0.6516895747682114, 'rougeLsum': 0.6522414658991624} |
| 0.3703 | 31.0 | 17453 | 0.3322 | 0.3973 | 0.4374 | {'rouge1': 0.6623874928238279, 'rouge2': 0.5263877556961678, 'rougeL': 0.6603966577490825, 'rougeLsum': 0.6608058852708725} |
| 0.3682 | 32.0 | 18016 | 0.3322 | 0.4024 | 0.4321 | {'rouge1': 0.6582498755720785, 'rouge2': 0.5198519850436492, 'rougeL': 0.6571224634669467, 'rougeLsum': 0.6571949918063122} |
| 0.3624 | 33.0 | 18579 | 0.3320 | 0.4163 | 0.4186 | {'rouge1': 0.6508305671652452, 'rouge2': 0.5107918889310479, 'rougeL': 0.6495894626014581, 'rougeLsum': 0.6491951414892321} |
| 0.3617 | 34.0 | 19142 | 0.3344 | 0.4008 | 0.4289 | {'rouge1': 0.6655166838097435, 'rouge2': 0.5260174150043655, 'rougeL': 0.6640025840473446, 'rougeLsum': 0.6638551560314845} |
| 0.3571 | 35.0 | 19705 | 0.3321 | 0.3977 | 0.4351 | {'rouge1': 0.6604766815791591, 'rouge2': 0.5234974378981434, 'rougeL': 0.6593970219924705, 'rougeLsum': 0.6596981523481513} |
| 0.3539 | 36.0 | 20268 | 0.3286 | 0.3971 | 0.4389 | {'rouge1': 0.6612859732075183, 'rouge2': 0.5244526427089478, 'rougeL': 0.6596387233714615, 'rougeLsum': 0.659519150245337} |
| 0.3527 | 37.0 | 20831 | 0.3237 | 0.3935 | 0.4378 | {'rouge1': 0.6675853302971371, 'rouge2': 0.5315069675492735, 'rougeL': 0.6664309900012062, 'rougeLsum': 0.6661034737099508} |
| 0.3484 | 38.0 | 21394 | 0.3257 | 0.3929 | 0.4422 | {'rouge1': 0.671605039316399, 'rouge2': 0.5382113349379088, 'rougeL': 0.6705811747227897, 'rougeLsum': 0.6705800940600171} |
| 0.3503 | 39.0 | 21957 | 0.3242 | 0.3925 | 0.4428 | {'rouge1': 0.6676717501021441, 'rouge2': 0.5335489605323414, 'rougeL': 0.6657704555135927, 'rougeLsum': 0.6663568940614225} |
| 0.3446 | 40.0 | 22520 | 0.3227 | 0.3919 | 0.4405 | {'rouge1': 0.6721727843386212, 'rouge2': 0.5375835085890535, 'rougeL': 0.6712030128579608, 'rougeLsum': 0.6708101695465758} |
| 0.3418 | 41.0 | 23083 | 0.3306 | 0.4027 | 0.4326 | {'rouge1': 0.6666538547956423, 'rouge2': 0.5313996389059243, 'rougeL': 0.665735863422591, 'rougeLsum': 0.6654569863877089} |
| 0.3376 | 42.0 | 23646 | 0.3220 | 0.3941 | 0.4364 | {'rouge1': 0.6727908812703601, 'rouge2': 0.5385970989187185, 'rougeL': 0.6711962769621307, 'rougeLsum': 0.6706226364445047} |
| 0.3411 | 43.0 | 24209 | 0.3272 | 0.3929 | 0.4420 | {'rouge1': 0.6675994567022867, 'rouge2': 0.5323258455802069, 'rougeL': 0.6666819561935813, 'rougeLsum': 0.6662236919778652} |
| 0.337 | 44.0 | 24772 | 0.3293 | 0.3919 | 0.4406 | {'rouge1': 0.6710021949532108, 'rouge2': 0.5376491313521814, 'rougeL': 0.6698985896374592, 'rougeLsum': 0.6695319365904155} |
| 0.3337 | 45.0 | 25335 | 0.3298 | 0.3924 | 0.4411 | {'rouge1': 0.6675114516046587, 'rouge2': 0.5315778128073576, 'rougeL': 0.665720025353038, 'rougeLsum': 0.6659328882258422} |
| 0.332 | 46.0 | 25898 | 0.3313 | 0.4078 | 0.4322 | {'rouge1': 0.658727386248767, 'rouge2': 0.5216497533917515, 'rougeL': 0.6575179103675783, 'rougeLsum': 0.6572258894192742} |
| 0.333 | 47.0 | 26461 | 0.3259 | 0.4014 | 0.4340 | {'rouge1': 0.6645474633443631, 'rouge2': 0.5279839931594242, 'rougeL': 0.6636035100022255, 'rougeLsum': 0.6633772361519483} |
| 0.3283 | 48.0 | 27024 | 0.3322 | 0.3947 | 0.4431 | {'rouge1': 0.6648258185441953, 'rouge2': 0.5302048008043037, 'rougeL': 0.6629690379369351, 'rougeLsum': 0.6629543554517346} |
| 0.3284 | 49.0 | 27587 | 0.3266 | 0.3962 | 0.4371 | {'rouge1': 0.6651785676283977, 'rouge2': 0.5284102304887665, 'rougeL': 0.6639185425125356, 'rougeLsum': 0.6636101088528281} |
| 0.3264 | 50.0 | 28150 | 0.3277 | 0.4018 | 0.4360 | {'rouge1': 0.6634823476620092, 'rouge2': 0.5281132874546199, 'rougeL': 0.6618109712544935, 'rougeLsum': 0.6616931156293221} |
| 0.3226 | 51.0 | 28713 | 0.3278 | 0.3886 | 0.4443 | {'rouge1': 0.6731137680088992, 'rouge2': 0.540057063569765, 'rougeL': 0.6715175742945547, 'rougeLsum': 0.6717528131967394} |
| 0.3194 | 52.0 | 29276 | 0.3340 | 0.3915 | 0.4415 | {'rouge1': 0.6745440866340855, 'rouge2': 0.5400400332225461, 'rougeL': 0.6731883661858653, 'rougeLsum': 0.6736360712930871} |
Framework versions
- Transformers 4.49.0
- Pytorch 2.8.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 309
Model tree for ilyes25/MMS_10langs_sim_ct
Base model
facebook/mms-1b-all