MMS_10langs_sim_ct

This model is a fine-tuned version of facebook/mms-1b-all on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2628
  • Wer: 0.4053
  • Bleu: 0.4148
  • Rouge: {'rouge1': 0.734222200782478, 'rouge2': 0.5706793057992829, 'rougeL': 0.7332514906273471, 'rougeLsum': 0.7335017821747949}

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Bleu Rouge
1.9341 1.0 563 0.3961 0.4540 0.3693 {'rouge1': 0.6194799391036223, 'rouge2': 0.46434843772712986, 'rougeL': 0.6172392118630954, 'rougeLsum': 0.6178199675400444}
0.5254 2.0 1126 0.3743 0.4467 0.3775 {'rouge1': 0.6250992334659873, 'rouge2': 0.4739031340857137, 'rougeL': 0.6236029348991206, 'rougeLsum': 0.6234010880387659}
0.5058 3.0 1689 0.3612 0.4402 0.3849 {'rouge1': 0.6298350439537708, 'rouge2': 0.4783898458774042, 'rougeL': 0.6281814950947691, 'rougeLsum': 0.6284393327845401}
0.4929 4.0 2252 0.3697 0.4415 0.3856 {'rouge1': 0.6334485473033613, 'rouge2': 0.48564177943983533, 'rougeL': 0.6316198158544744, 'rougeLsum': 0.6317389029466575}
0.4839 5.0 2815 0.3673 0.4400 0.3883 {'rouge1': 0.6291193762173194, 'rouge2': 0.48054942060106765, 'rougeL': 0.6268055922399272, 'rougeLsum': 0.6271649799168699}
0.4751 6.0 3378 0.3486 0.4213 0.4018 {'rouge1': 0.6447936409330529, 'rouge2': 0.5002381427715037, 'rougeL': 0.6430743213457508, 'rougeLsum': 0.6435655026581161}
0.4659 7.0 3941 0.3462 0.4291 0.3955 {'rouge1': 0.641594881331492, 'rouge2': 0.4962915901710817, 'rougeL': 0.6399779135959315, 'rougeLsum': 0.6401049430611638}
0.4576 8.0 4504 0.3552 0.4253 0.3998 {'rouge1': 0.6368955937088554, 'rouge2': 0.4894448633455363, 'rougeL': 0.6350720736779953, 'rougeLsum': 0.6354125526187782}
0.4503 9.0 5067 0.3497 0.4308 0.3963 {'rouge1': 0.6415766378561767, 'rouge2': 0.49397060787226355, 'rougeL': 0.6401280047607609, 'rougeLsum': 0.6396665966485348}
0.4493 10.0 5630 0.3920 0.4653 0.3494 {'rouge1': 0.61745493806306, 'rouge2': 0.4645674350368151, 'rougeL': 0.6157902693313584, 'rougeLsum': 0.6155486534547023}
0.4432 11.0 6193 0.3493 0.4282 0.3973 {'rouge1': 0.6497805561059471, 'rouge2': 0.505071806258327, 'rougeL': 0.648162035629783, 'rougeLsum': 0.6481878640170416}
0.4348 12.0 6756 0.3464 0.4386 0.3839 {'rouge1': 0.6303556574469422, 'rouge2': 0.4793748722900568, 'rougeL': 0.6285375560289082, 'rougeLsum': 0.6284314521288271}
0.4336 13.0 7319 0.3372 0.4129 0.4141 {'rouge1': 0.6466612625285739, 'rouge2': 0.5034968212900892, 'rougeL': 0.6454955184626125, 'rougeLsum': 0.6454712742479207}
0.4276 14.0 7882 0.3462 0.4223 0.4065 {'rouge1': 0.6533604974509974, 'rouge2': 0.5108020531781451, 'rougeL': 0.6515718032664506, 'rougeLsum': 0.6519949388175588}
0.4212 15.0 8445 0.3420 0.4374 0.3961 {'rouge1': 0.6330143517474989, 'rouge2': 0.48473846295463624, 'rougeL': 0.6312427282296429, 'rougeLsum': 0.631016393264616}
0.417 16.0 9008 0.3374 0.4110 0.4172 {'rouge1': 0.6593501805337192, 'rouge2': 0.5180575046657927, 'rougeL': 0.6577536432873535, 'rougeLsum': 0.6577803753186642}
0.4144 17.0 9571 0.3371 0.4146 0.4117 {'rouge1': 0.6517530776000595, 'rouge2': 0.5099667107786572, 'rougeL': 0.650344879439215, 'rougeLsum': 0.6504231468260594}
0.4099 18.0 10134 0.3342 0.4020 0.4282 {'rouge1': 0.655798828387135, 'rouge2': 0.5157779824344639, 'rougeL': 0.6551214221869609, 'rougeLsum': 0.6544034660423784}
0.4046 19.0 10697 0.3358 0.4130 0.4194 {'rouge1': 0.655645765399892, 'rouge2': 0.5132711154178711, 'rougeL': 0.6537665630351752, 'rougeLsum': 0.6542401982196812}
0.4038 20.0 11260 0.3360 0.4017 0.4278 {'rouge1': 0.6626729750932345, 'rouge2': 0.5232688135774992, 'rougeL': 0.6608647492545847, 'rougeLsum': 0.6611163678131431}
0.3969 21.0 11823 0.3591 0.4429 0.3952 {'rouge1': 0.6223947701841236, 'rouge2': 0.474513259312104, 'rougeL': 0.6208597133035478, 'rougeLsum': 0.6212920022879174}
0.3976 22.0 12386 0.3308 0.4038 0.4273 {'rouge1': 0.65794926139884, 'rouge2': 0.5190844296334247, 'rougeL': 0.6566319650777966, 'rougeLsum': 0.6566608566134198}
0.3898 23.0 12949 0.3307 0.4011 0.4296 {'rouge1': 0.658651506373759, 'rouge2': 0.5202119236332472, 'rougeL': 0.6572068822954353, 'rougeLsum': 0.6570243226737007}
0.3908 24.0 13512 0.3351 0.4075 0.4243 {'rouge1': 0.6622801332764499, 'rouge2': 0.5230352351758334, 'rougeL': 0.6603666005574962, 'rougeLsum': 0.6604687351969698}
0.3863 25.0 14075 0.3301 0.4019 0.4295 {'rouge1': 0.6571081888746078, 'rouge2': 0.5189446434182212, 'rougeL': 0.6554586907386226, 'rougeLsum': 0.6558929588716552}
0.3822 26.0 14638 0.3316 0.4017 0.4299 {'rouge1': 0.6624969419452018, 'rouge2': 0.5256712465340425, 'rougeL': 0.6610004105835426, 'rougeLsum': 0.6615012662793225}
0.3798 27.0 15201 0.3286 0.4025 0.4268 {'rouge1': 0.6583365930635563, 'rouge2': 0.51891322836611, 'rougeL': 0.6562511279899681, 'rougeLsum': 0.6567357916390837}
0.3808 28.0 15764 0.3300 0.4164 0.4160 {'rouge1': 0.6528752008244955, 'rouge2': 0.512403283878367, 'rougeL': 0.6514019731946912, 'rougeLsum': 0.6513964083339834}
0.3767 29.0 16327 0.3327 0.4027 0.4314 {'rouge1': 0.656988140454903, 'rouge2': 0.5201724914025685, 'rougeL': 0.6559759879942904, 'rougeLsum': 0.6564952679665647}
0.3726 30.0 16890 0.3439 0.4017 0.4329 {'rouge1': 0.6534001209156233, 'rouge2': 0.5150586862773879, 'rougeL': 0.6516895747682114, 'rougeLsum': 0.6522414658991624}
0.3703 31.0 17453 0.3322 0.3973 0.4374 {'rouge1': 0.6623874928238279, 'rouge2': 0.5263877556961678, 'rougeL': 0.6603966577490825, 'rougeLsum': 0.6608058852708725}
0.3682 32.0 18016 0.3322 0.4024 0.4321 {'rouge1': 0.6582498755720785, 'rouge2': 0.5198519850436492, 'rougeL': 0.6571224634669467, 'rougeLsum': 0.6571949918063122}
0.3624 33.0 18579 0.3320 0.4163 0.4186 {'rouge1': 0.6508305671652452, 'rouge2': 0.5107918889310479, 'rougeL': 0.6495894626014581, 'rougeLsum': 0.6491951414892321}
0.3617 34.0 19142 0.3344 0.4008 0.4289 {'rouge1': 0.6655166838097435, 'rouge2': 0.5260174150043655, 'rougeL': 0.6640025840473446, 'rougeLsum': 0.6638551560314845}
0.3571 35.0 19705 0.3321 0.3977 0.4351 {'rouge1': 0.6604766815791591, 'rouge2': 0.5234974378981434, 'rougeL': 0.6593970219924705, 'rougeLsum': 0.6596981523481513}
0.3539 36.0 20268 0.3286 0.3971 0.4389 {'rouge1': 0.6612859732075183, 'rouge2': 0.5244526427089478, 'rougeL': 0.6596387233714615, 'rougeLsum': 0.659519150245337}
0.3527 37.0 20831 0.3237 0.3935 0.4378 {'rouge1': 0.6675853302971371, 'rouge2': 0.5315069675492735, 'rougeL': 0.6664309900012062, 'rougeLsum': 0.6661034737099508}
0.3484 38.0 21394 0.3257 0.3929 0.4422 {'rouge1': 0.671605039316399, 'rouge2': 0.5382113349379088, 'rougeL': 0.6705811747227897, 'rougeLsum': 0.6705800940600171}
0.3503 39.0 21957 0.3242 0.3925 0.4428 {'rouge1': 0.6676717501021441, 'rouge2': 0.5335489605323414, 'rougeL': 0.6657704555135927, 'rougeLsum': 0.6663568940614225}
0.3446 40.0 22520 0.3227 0.3919 0.4405 {'rouge1': 0.6721727843386212, 'rouge2': 0.5375835085890535, 'rougeL': 0.6712030128579608, 'rougeLsum': 0.6708101695465758}
0.3418 41.0 23083 0.3306 0.4027 0.4326 {'rouge1': 0.6666538547956423, 'rouge2': 0.5313996389059243, 'rougeL': 0.665735863422591, 'rougeLsum': 0.6654569863877089}
0.3376 42.0 23646 0.3220 0.3941 0.4364 {'rouge1': 0.6727908812703601, 'rouge2': 0.5385970989187185, 'rougeL': 0.6711962769621307, 'rougeLsum': 0.6706226364445047}
0.3411 43.0 24209 0.3272 0.3929 0.4420 {'rouge1': 0.6675994567022867, 'rouge2': 0.5323258455802069, 'rougeL': 0.6666819561935813, 'rougeLsum': 0.6662236919778652}
0.337 44.0 24772 0.3293 0.3919 0.4406 {'rouge1': 0.6710021949532108, 'rouge2': 0.5376491313521814, 'rougeL': 0.6698985896374592, 'rougeLsum': 0.6695319365904155}
0.3337 45.0 25335 0.3298 0.3924 0.4411 {'rouge1': 0.6675114516046587, 'rouge2': 0.5315778128073576, 'rougeL': 0.665720025353038, 'rougeLsum': 0.6659328882258422}
0.332 46.0 25898 0.3313 0.4078 0.4322 {'rouge1': 0.658727386248767, 'rouge2': 0.5216497533917515, 'rougeL': 0.6575179103675783, 'rougeLsum': 0.6572258894192742}
0.333 47.0 26461 0.3259 0.4014 0.4340 {'rouge1': 0.6645474633443631, 'rouge2': 0.5279839931594242, 'rougeL': 0.6636035100022255, 'rougeLsum': 0.6633772361519483}
0.3283 48.0 27024 0.3322 0.3947 0.4431 {'rouge1': 0.6648258185441953, 'rouge2': 0.5302048008043037, 'rougeL': 0.6629690379369351, 'rougeLsum': 0.6629543554517346}
0.3284 49.0 27587 0.3266 0.3962 0.4371 {'rouge1': 0.6651785676283977, 'rouge2': 0.5284102304887665, 'rougeL': 0.6639185425125356, 'rougeLsum': 0.6636101088528281}
0.3264 50.0 28150 0.3277 0.4018 0.4360 {'rouge1': 0.6634823476620092, 'rouge2': 0.5281132874546199, 'rougeL': 0.6618109712544935, 'rougeLsum': 0.6616931156293221}
0.3226 51.0 28713 0.3278 0.3886 0.4443 {'rouge1': 0.6731137680088992, 'rouge2': 0.540057063569765, 'rougeL': 0.6715175742945547, 'rougeLsum': 0.6717528131967394}
0.3194 52.0 29276 0.3340 0.3915 0.4415 {'rouge1': 0.6745440866340855, 'rouge2': 0.5400400332225461, 'rougeL': 0.6731883661858653, 'rougeLsum': 0.6736360712930871}

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.8.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
309
Safetensors
Model size
1.0B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ilyes25/MMS_10langs_sim_ct

Finetuned
(326)
this model

Evaluation results