Release 22.01 (Ionawr / January 2022) · techiaith/docker-huggingface-stt-cy

Read this release note in English

Dyma ein sgriptiau ym mis Ionawr 2022 (22.01) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail model wav2vec2-large-xlsr-53 gan Facebook ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 8, a chyhoeddwyd ym mis Ionawr 2022, a data corpws testunau Cymraeg OSCAR o fis Ionawr 2022.

Ceir ffeiliau modelau ar wefan HuggingFace: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.01

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 13.79% o eiriau mewn brawddeg.

in English

Here are our January 2021 (22.01) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2-large-xlsr-53 by Facebook AI and HuggingFace, as well as KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 8 as published in January 2022 and the Welsh text corpus dataset from OSCAR from January 2022.

Models can be found on the HuggingFace website: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/22.01

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 13.79%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

22.01 (Ionawr / January 2022)

in English