About

This model was created to support experiments for evaluating phonetic transcription with the Buckeye corpus as part of https://github.com/ginic/multipa. This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a specific subset of the Buckeye corpus. For details about specific model parameters, please view the config.json here or training scripts in the scripts/buckeye_experiments folder of the GitHub repository.

Experiment Details

These experiments are targeted at understanding how increasing the amount of data used to train the model affects performance. The first number in the model name indicates the total number of randomly selected data samples. Data samples are selected to maintain 50/50 gender split from speakers, with the exception of the models trained on 20000 samples, as there are 18782 audio samples in our train split of Buckeye, but they are not split equally between male and female speakers. Experiments using 20000 samples actually use all 8252 samples from female speakers in the train set, but randomly select 10000 samples from male speakers for a total of 18252 samples.

For each number of train data samples, 5 models are trained to vary train data selection (train_seed) without varying other hyperparameters. Before these models were trained, simple grid search hyperparameter tuning was done to select reasonable hyperparameters for fine-tuning with the target number of samples. The hyperparam tuning models have not been uploaded to HuggingFace.

Goals:

See how performance on the test set changes as more data is used in fine-tuning

Params to vary:

training seed (--train_seed)
number of data samples used in training the model (--train_samples): 100, 200, 400, 800, 1600, 3200, 6400, 12800, 20000

Downloads last month: 2

Safetensors

Model size

0.3B params

Tensor type

F32