Base File Validation
What is validated?
MMG mainly validates the following contents.
- Whether the number of each tag used by the user is the same
- Whether a duplicate tag appears again before all tags appear once
- Whether there are tags other than the tags specified in advance
- Whether the
no-suffix
option is set correctly
If the validation fails, you can use the verbosity setting described on this page to let the user know why the validation failed. Even if the validation fails, you can force the conversion.
Whether the number of tags used is the same
To check if the user has missed a tag, MMG counts the number of each tag used by the user and compares it.
The validation fails because B is missing.
<!-- multilingual suffix: A, B, C -->
<!-- [A] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [C] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- multilingual suffix: A, B, C -->
<!-- [A] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [B] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [C] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Whether a duplicate tag appears again before all tags appear once
Even if all tags are used the same number of times throughout the document, the validation will fail if any tags do not appear by set.
The reason for this validation is that if all tags do not appear uniformly, it is difficult for the user to know which tag is missing. Even if the number of tags used is the same, it is even more difficult to know if the document is long. Through this validation, potential problems can be prevented in advance.
There is no missing content in A, B, and C, so there is no problem with the content MMG will generate. However, since A appears again after A and B appear, the validation fails.
<!-- multilingual suffix: A, B, C -->
<!-- [A] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [B] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [A] -->
Aenean in ultrices metus, in semper mi.
<!-- [B] -->
Aenean in ultrices metus, in semper mi.
<!-- [C] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [C] -->
Aenean in ultrices metus, in semper mi.
A, B, and C do not necessarily have to appear in order.
<!-- multilingual suffix: A, B, C -->
<!-- [C] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [A] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [B] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [A] -->
Aenean in ultrices metus, in semper mi.
<!-- [B] -->
Aenean in ultrices metus, in semper mi.
<!-- [C] -->
Aenean in ultrices metus, in semper mi.
Whether there are tags other than the tags specified in advance
Since the user can make a typo at any time, MMG checks whether the tag is declared in the header when reading the user-defined tag.
The validation fails because ko
was incorrectly entered as kr
and ja
was also incorrectly entered as jp
.
<!-- multilingual suffix: ko, ja -->
<!-- [kr] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [jp] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- multilingual suffix: ko, ja -->
<!-- [ko] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [ja] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Whether the no-suffix
option is set correctly
en-US
is not a user-defined tag, so the validation fails.
<!-- multilingual suffix: ko, ja -->
<!-- no suffix: en-US -->
<!-- [kr] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [jp] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- multilingual suffix: ko, ja -->
<!-- no suffix: ko -->
<!-- [ko] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<!-- [ja] -->
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Verbosity setting
Note
There are example files that can test the verbosity setting in the GitHub repository, so please use them. (Example file location: ./examples/validation-examples)
Since 0 is the default value, the usual MMG only outputs the validation results.
$ mmg -r
----------------------
❌ bad_md.base.md
✅ good_md.base.md
❌ bad_jupyter.base.ipynb
----------------------
=> 3 base files were found.
Do you want to convert these files? [y/N]
However, if you set the verbosity, you can find out why the validation failed.
Verbosity 1 (--verbose
or -v
)
If the verbosity is 1, the number of tags is output additionally. This allows you to quickly check for missing tags or typos.
$ mmg -r -v
----------------------
❌ bad_md.base.md
3 language(s) not translated.
Tag count: {'A': 4, 'B': 2, 'C': 2, '<Unknown>': 1}
✅ good_md.base.md
Tag count: {'A': 3, 'B': 3, 'C': 3}
❌ bad_jupyter.base.ipynb
1 language(s) not translated.
Tag count: {'en-US': 2, 'fr-FR': 2, 'ko-KR': 2, 'ja-JP': 2, '<Unknown>': 1}
----------------------
=> 3 base files were found.
Do you want to convert these files? [y/N]
Verbosity 2 (--verbose --verbose
or -vv
)
If you set the verbosity to 2, you can find out exactly which line failed the validation for what reason. In particular, in the case of jupyter notebook, you can find out which line of which cell failed the validation, so you don't have to check each rendered cell.
$ mmg -r -vv
----------------------
❌ bad_md.base.md
3 language(s) not translated.
Tag count: {'A': 4, 'B': 2, 'C': 2, '<Unknown>': 1}
Config: no_suffix 'en-US' is not in lang_tags.
Line 10: Unknown tag 'D' detected.
Line 12: 'common' appeared before all tags appeared once.
Line 18: 'A' appeared again before all tags appeared once.
Line 22: 'common' appeared before all tags appeared once.
Line 28: 'common' appeared before all tags appeared once.
Line 34: 'common' appeared before all tags appeared once.
✅ good_md.base.md
Tag count: {'A': 3, 'B': 3, 'C': 3}
❌ bad_jupyter.base.ipynb
1 language(s) not translated.
Tag count: {'en-US': 2, 'fr-FR': 2, 'ko-KR': 2, 'ja-JP': 2, '<Unknown>': 1}
Cell 4, Line 3: Unknown tag 'English' detected.
----------------------
=> 3 base files were found.
Do you want to convert these files? [y/N]
Skip validation
You can skip validation by using the -s
or --skip-validation
option.
In this case, markdown files are displayed as 📄 and jupyter notebook files are displayed as 📒.
$ mmg -r -s
----------------------
📄 bad.base.md
📄 good.base.md
📒 sample_jupyter.base.ipynb
----------------------
=> 3 base files were found.
Do you want to convert these files? [y/N]
Validation only mode for CI/CD (file creation disabled)
This feature is available from v2.0.0.
This mode only performs validation and does not generate converted files.
It calls sys.exit(1)
if it fails the validation because it was created for CI/CD.
It calls sys.exit(0)
if it passes the validation.
This can be used to branch the CI/CD pipeline depending on the results of the validation.
mmg -r --validation-only
----------------------
❌ bad_md.base.md
✅ good_md.base.md
❌ bad_jupyter.base.ipynb
----------------------
=> 3 base files were found.
=> Some files are unhealthy.
Through the verbosity setting, you can also leave the reason for failing the validation in the CI/CD log.
$ mmg -r --validation-only -vv
----------------------
❌ bad_md.base.md
3 language(s) not translated.
Tag count: {'A': 4, 'B': 2, 'C': 2, '<Unknown>': 1}
Config: no_suffix 'en-US' is not in lang_tags.
Line 10: Unknown tag 'D' detected.
Line 12: 'common' appeared before all tags appeared once.
Line 18: 'A' appeared again before all tags appeared once.
Line 22: 'common' appeared before all tags appeared once.
Line 28: 'common' appeared before all tags appeared once.
Line 34: 'common' appeared before all tags appeared once.
✅ good_md.base.md
Tag count: {'A': 3, 'B': 3, 'C': 3}
❌ bad_jupyter.base.ipynb
1 language(s) not translated.
Tag count: {'en-US': 2, 'fr-FR': 2, 'ko-KR': 2, 'ja-JP': 2, '<Unknown>': 1}
Cell 4, Line 3: Unknown tag 'English' detected.
----------------------
=> 3 base files were found.
=> Some files are unhealthy.