# firefox-translations-evaluation **Repository Path**: mirrors_mozilla/firefox-translations-evaluation ## Basic Information - **Project Name**: firefox-translations-evaluation - **Description**: Translation quality evaluation for Firefox Translations models - **Primary Language**: Unknown - **License**: MPL-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-25 - **Last Updated**: 2026-04-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Firefox Translations Evaluation **The code was moved to https://github.com/mozilla/firefox-translations-models/tree/main/evals for easier debugging.** Calculates BLEU and COMET scores for Firefox Translations [models](https://github.com/mozilla/firefox-translations-models) using [bergamot-translator](https://github.com/mozilla/bergamot-translator) and compares them to other translation systems. ## Running We recommend running this on a Linux machine with at least one GPU, and inside a docker container. If you intend to run it on macOS, run the `eval/evaluate.py ` script standalone inside a virtualenv, and skip the `Start docker` section below. You might need to manually install the correspondent packages in the `Dockerfile` in your system and virtual environment. ### Clone repo ``` git clone https://github.com/mozilla/firefox-translations-evaluation.git cd firefox-translations-evaluation ``` ### Download models Use `install/download-models.sh` to get Firefox Translations [models](https://github.com/mozilla/firefox-translations-models) (be sure to have [git-lfs](https://git-lfs.com/) enabled) or use your own ones. ### Install NVIDIA Container Toolkit https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html ### Start docker Recommended memory size for Docker is **16gb**. ``` export MODELS= # Specify Azure key and location if you want to add Azure Translator API for comparison export AZURE_TRANSLATOR_KEY= # optional, specify if it's different than default 'global' export AZURE_LOCATION= # Specify GCP credentials json path if you want to add Google Translator API for comparison export GCP_CREDS_PATH= # Build and run docker container bash start_docker.sh ``` On completion, your terminal should be attached to the launched container. ### Run evaluation From inside docker container run: ``` python3 eval/evaluate.py --translators=bergamot,microsoft,google --pairs=all --skip-existing --gpus=1 --evaluation-engine=comet,bleu --models-dir=/models/models/prod --results-dir=/models/evaluation/prod ``` If you don't have a GPU, use `0` in the `--gpus` argument. More options: ``` python3 eval/evaluate.py --help ``` ## Details ### Installation scripts `install/install-bergamot-translator.sh` - clones and compiles [bergamot-translator](https://github.com/mozilla/bergamot-translator) and [marian](https://github.com/marian-nmt/marian-dev) (launched in docker image). `install/download-models.sh` - downloads current Mozilla production [models](https://github.com/mozilla/firefox-translations-models). ### Docker & CUDA The COMET evaluation framework supports CUDA, and you can enable it by setting the `--gpus` argument in the `eval\evaluate.py` script to the number of GPUs you wish to utilize (`0` disables it). If you are using it, make sure you have the [nvidia container toolkit enabled](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) in your docker setup. ### Translators 1. **bergamot** - uses compiled [bergamot-translator](https://github.com/mozilla/bergamot-translator) in wasm mode 2. **marian** - uses compiled [marian](https://github.com/marian-nmt/marian-dev) 3. **google** - users Google Translation [API](https://cloud.google.com/translate) 4. **microsoft** - users Azure Cognitive Services Translator [API](https://azure.microsoft.com/en-us/services/cognitive-services/translator/) ### Reuse already calculated scores Use `--skip-existing` option to reuse already calculated scores saved as `results/xx-xx/*.bleu` files. It is useful to continue evaluation if it was interrupted or to rebuild a full report reevaluating only selected translators. ### Datasets [SacreBLEU](https://github.com/mjpost/sacrebleu) - all available datasets for a language pair are used for evaluation. [Flores](https://github.com/facebookresearch/flores) - parallel evaluation dataset for 101 languages. ### Language pairs With option `--pairs=all`, language pairs will be discovered in the specified models folder (option `--models-dir`) and evaluation will run for all of them. ### Results Results will be written to the specified directory (option `--results-dir`). Evaluation results for models that are used in Firefox Translation can be found in [firefox-translations-models/evaluation](https://github.com/mozilla/firefox-translations-models/tree/main/evaluation)