patchFreeciv - Patches: patch #4190, [Metaticket] Split translations to...

 
 
Show feedback again

patch #4190: [Metaticket] Split translations to multiple po-files

Submitted by:  Marko Lindqvist <cazfi>
Submitted on:  Mon 16 Sep 2013 09:39:41 AM UTC  
 
Category: bootstrapPriority: 5 - Normal
Status: In ProgressPrivacy: Public
Assigned to: Jacob Nevins <jtn>Open/Closed: Open
Planned Release: 2.5.0, 2.6.0

Add a New Comment (Rich MarkupRich Markup):
   

You are not logged in

Please log in, so followups can be emailed to you.

 

(Jump to the original submission Jump to the original submission)

Sun 26 Jan 2014 03:25:18 PM UTC, comment #10:

> - From what I've thought of so far, I prefer adding
> "translation_domain = "freeciv"" to core nations. [reasons]

(now done)
Fair enough, I'm content with this.

> - Purpose of "R_()" is not collection of translatable strings
> [...but...] runtime fetching of translations from correct domain

I now appreciate this better.
Patch #4455 has an awful proposal in this connection (which is more or less orthogonal to the rest of the ideas here).

> - Opposite to your example, Sini has told me that splitting
> the translations would make improvements to Finnish
> translation much more likely

Noted. So far, translators who would like to work on monolithic files remain purely hypothetical. I'd better find at least one real person whose life would be improved before sinking lots of effort into making it possible.

> - When I generated extended set .po -files by msgmerging full
> .po against their .pot, I noticed how the files were about
> same size as the originals. I have not checked them, but I
> assume this to be due to the fact that even when .pot lacks
> the string existing in .po, it's not removed but merely
> commented out. That's not a big problem, but speaks against
> doing the conversion repeatedly (i.e. getting all the
> obsolete messages back in even if they have once been removed)

Indeed, every .po file contains all the translations from the "other" ones as commented "obsolete" translations, which is untidy.

As an early step to getting on top of this, I'm trying to find a procedure for importing monolithic files from older branches that puts existing "obsolete" strings in freeciv/*.po by default and leaves other domains clean of them (using tools like msggrep).

If I succeed, I plan to re-run the import from S2_4 with that, and use it for any further updates from S2_4.

Jacob Nevins <jtn>
Project AdministratorIn charge of this item.
Wed 06 Nov 2013 11:59:42 PM UTC, comment #9:

I've finished with the implementation I had in mind.

Marko Lindqvist <cazfi>
Project Administrator
Sun 13 Oct 2013 04:53:06 PM UTC, comment #8:

Here's various points that come to my mind after reading your comment. I need to think overall picture a bit longer - but I'm inclined to continue with my current patchset for now. We can improve it later, but I want working system relative soon (in terms of our releases: I want it to S2_5) I try to keep doors open to multiple directions as long as possible, however, when implementing the patchset.

- I had already discarded the idea of reusing group definiton "Core" for marking translation domain, on the basis that custom ruleset may add new nations to their "Core" group and such a ruleset should not expect to find translations from core domain.
- From what I've thought of so far, I prefer adding "translation_domain = "freeciv"" to core nations. Other nations would default to using extended nations domain. We've already established that there should not be much changes on what nations belongs to core group, so there would be no much maintenance work. Also, this would be rather clean ruleset format to add localization support for custom rulesets, which feature I'm now considering quite seriously to implement
- Opposite to your example, Sini has told me that splitting the translations would make improvements to Finnish translation much more likely - she has no time to work on massive full .po, but provided more compact core .po she could update that part (and leave extended nations untranslated)
- Purpose of "R_()" is not collection of translatable strings (that happens based on "_(" -part of it, and dictates that we can't have more namespace clean naming like "_R()", "PL_R()". "R_()" is meant for runtime fetching of translations from correct domain as default domain for freeciv-ruledit too will be freeciv core (for all the code shared with freeciv server to get correct translations)
- We already use multiple domains though in a less obvious way. The libraries we use have translations of their own, most obvious example being gtk+ stock buttons.
- When I generated extended set .po -files by msgmerging full .po against their .pot, I noticed how the files were about same size as the originals. I have not checked them, but I assume this to be due to the fact that even when .pot lacks the string existing in .po, it's not removed but merely commented out. That's not a big problem, but speaks against doing the conversion repeatedly (i.e. getting all the obsolete messages back in even if they have once been removed)

Marko Lindqvist <cazfi>
Project Administrator
Sun 13 Oct 2013 01:02:05 PM UTC, comment #7:

>> we could equally well keep the combined files in svn as the
>> canonical po-file
> once we add freeciv-ruledit domain (See patch #4243) to the mix,
> it becomes clear that domains should be completely separate

Aha, I didn't realise you had plans beyond "core nations". That does change things a bit :)

> 1) Have nation ruleset to say which domain it belongs to. [...]

Well, we do have that information in nation group/set membership (groups=..., "Core"). It would be easy enough to add an optional property to the nation set definition in nationlist.ruleset for a gettext domain for nations in that set, which would provide the relevant information at runtime. And a cheesy regex could be used to filter nation files into the two sets when building potfiles.
(I'd like to avoid a solution that means committers have to add files to the correct POTFILES.in-equivalent if possible, because that's unnecessarily error-prone.)

That said, of those two options, I'm more inclined towards option 2, the run-time search.
I'm not too worried about the performance hit (the most frequently used "nation" strings are the nation noun/adjective, I think).
However, in general, trying to use multiple domains in a single application does go seem to against the grain of gettext's design -- Googling for "multiple gettext domains" finds various unhappy stories.
One concrete thing that I can think of that could go wrong is if people have a language path set (e.g., fr,es,en) -- this arrangement would prefer a translation from the user's lesser-ranked language if it happened to be in the domain we searched first.

And I'm still worried about the usability from translators' and developers' point of view, so I think we will need an automatic procedure of some kind in the workflow to reconcile overlapping strings in separate domains.

(Aside: re R_() in patch #4243, I wonder if it would be sufficient to filter on which source file contains a string, rather than explicit markup? With the munging I detail later to deal with overlaps, I think that might be good enough?)


>> so if "Speaker %s" is translated in one but not the other it'll
>> end up in both (and conflicts will hopefully be spotted).
> Unfortunately msgmerge does not know "common ancestor" [...]
> One of the files is always the primary one and the other is
> just used to fill those translations that are completely
> missing from the other.

However, it seems that "msgcat" treats multiple input po-files as peers, and in case of conflict will produce a fuzzy msgstr with conflict markers for the translator to sort out.
(We could spot the presence of such conflict markers and warn at merge time, but the existing workflow where we tell translators how many fuzzies they have after merge would suffice.)

> One problem is that due to above mentioned msgmerge
> limitation there's risk of losing translation changes when
> merging translator provided split files to master.

...given the above I think it's possible to create a workflow where no translator changes can be lost in this way.


Attached is a picture elaborating my previous idea (in rather vague terms) for separate runtime translation domains.

  • Set notation is used for "union" and "relative complement".
  • For concreteness, I've shown "gd" as an example of a translator who prefers separate files (so that they can give extended nations lower priority or ignore them), and "fi" as an example of a translator who prefers to translate all strings in a single file (because e.g. their tooling makes working with multiple files a pain) -- these aren't necessarily the real preferences of these translators :)
  • Translations will never be lost, but they may "migrate" to their "correct" file if they come in in the "wrong" one (or the "right" one changes, say due to addition of a leader name to a core nation). It is implied that everyone must translate "core" somehow.
  • As shown, this implies frequent msgmerge'ing against source files; actually I'm vaguely hoping this can be avoided (because it's very slow), but it depends on the answer to the next question, I think:

One thing that's deliberately vague on this picture is what files actually get checked into svn. Allowing variant per-translator workflows complicates this.

  • If translators' "unclean" files get checked into svn, then we have to allow different translators to have different subsets of files checked in somehow. The build system may need to learn each translator's preference, which it would be nice to avoid; otherwise it'll need to glob whatever files are lying around, which is generally bad for reproducible builds.
  • If we want to have a consistent subset of files checked into svn (without loss of generality, the combined "fi.po") then the "munge" step has to be applied on every commit, including in the workflows of translators with direct svn access; and if svn is their working file, they are forced to re-import the result into their tool after every commit.

The picture shows each application getting its own single message catalog at runtime, obviating the need for a run-time search path. However, that leaves this:

> Supporting multiple domains could evolve to a system where
> high-profile rulesets and scenarios could have translations.

...as unsupported -- for that you definitely need support for multiple domains at runtime, so the message catalogs can be distributed separately.

However, a single mo-file is not an essential part of this proposal, it's just made easy by it.


My proposal may be overly baroque; I'm still not entirely sure it's a good idea myself; but I think we will want something like its "munge" step somewhere in our workflow, every if only as an optional check off to the side.

(file #19163, file #19164)

Jacob Nevins <jtn>
Project AdministratorIn charge of this item.
Sat 12 Oct 2013 11:33:25 AM UTC, comment #6:

I see two ways to implement how nations related translations get fetched from correct domain.

1) Have nation ruleset to say which domain it belongs to. That's more work to setup initially (adding the definition to all, or at least those with non-default value, nation rulesets), and maybe somewhat ugly in respect to custom rulesets with their own nations - what domain they should define? On the plus side, that's certainly unambiguous, and better performance wise. Also, the ability to define domain of their own for custom ruleset could turn from disadvantage to a feature with a bit more work to support localization of the custom rulesets.

2) Try to fetch translation from one domain (presumably from extended nations domain as it has more nations). Compare string returned by gettext to the original. If they are identical, consider string untranslated on that domain, and fetch the translation from the other domain. Sometimes (often, in case of lacking translations) fetching string from two domains means performance hit. If there's same string translated in both domains (once for core nations, and once for extended set) translation being used depends on which domain translation is fetched from first. On the plus side this would be automatic, and not require extra maintenance for nation rulesets.

Opinions?

Marko Lindqvist <cazfi>
Project Administrator
Fri 11 Oct 2013 09:52:30 AM UTC, comment #5:

>> we could equally well keep the combined files in svn as the
>> canonical po-files (there's no technical need for separate
>> catalogues to end up in binary packages, after all), and do
>> the splitting at the translator interface: offer split
>> po-files at cazfi.net, post stats on split files, etc.


I have been undecided about this when thinking split of the "extra nations" domain out of "freeciv core" - in that case it would have clear advantages. However, once we add freeciv-ruledit domain (See patch #4243) to the mix, it becomes clear that domains should be completely separate (I don't think even more complex system of handling different domains different ways would make sense)

Marko Lindqvist <cazfi>
Project Administrator
Sat 28 Sep 2013 09:46:17 PM UTC, comment #4:

> so if "Speaker %s" is translated in one but not the other it'll
> end up in both (and conflicts will hopefully be spotted).


Unfortunately msgmerge does not know "common ancestor", so it can't tell which one of the two translations has changed (relative to ancestor), or if both have changed. One of the files is always the primary one and the other is just used to fill those translations that are completely missing from the other.

> we could equally well keep the combined files in svn as the
> canonical po-files (there's no technical need for separate
> catalogues to end up in binary packages, after all), and do
> the splitting at the translator interface: offer split
> po-files at cazfi.net, post stats on split files, etc.


Main benefit I see with this is that we wouldn't need to worry in the code about getting translations from correct domain if we had only one.
One problem is that due to above mentioned msgmerge limitation there's risk of losing translation changes when merging translator provided split files to master. Converting the other way there's no such risk. If translator provides translation in full po, all split files generated from it would get current translation.
Supporting multiple domains could evolve to a system where high-profile rulesets and scenarios could have translations.

> (The notion of splitting would still need to be reflected in
> our automake infrastructure to some extent


We would need both the master po-files, and split files automatically generated from it. If we need split files to be ganerated only at 'make dist' time this might be doable - we certainly don't want to add their generation to time of 'make' or even 'make install'.

Marko Lindqvist <cazfi>
Project Administrator
Sun 22 Sep 2013 01:01:27 PM UTC, comment #3:

We might need to think about how to minimise disruption to the workflow of those translators who are happy to keep up with all nations.
(But maybe I'm over-engineering this, so I welcome feedback from translators about whether they care, as it is extra work for us :)

In particular, splitting by source file means there's a small set of strings that will be duplicated: leader titles. I haven't checked how many that'll be, but I'm sure there'll be some.

  • When we developers "make update-po" for release, I think we should canonicalise by propagating any duplicates by using each po-file as a compendium for the other (or something like that), so if "Speaker %s" is translated in one but not the other it'll end up in both (and conflicts will hopefully be spotted).
  • Translators who choose to work with two separate po-files can make their own arrangements to spot duplicates.
  • I think we should consider continuing to support translators working with a single po-file (but would welcome feedback from translators if they care about this, since it's a bit of work for us). We could arrange that automatically-generated combined po-files end up on cazfi.net, and when we receive a combined po-file we can split it back into individual translations for commit.

In fact... if we were to go to that effort, we could equally well keep the combined files in svn as the canonical po-files (there's no technical need for separate catalogues to end up in binary packages, after all), and do the splitting at the translator interface: offer split po-files at cazfi.net, post stats on split files, etc.
(The notion of splitting would still need to be reflected in our automake infrastructure to some extent, for bug #19087 if nothing else.)

> Changing what nations belong to core group later (when actively
> maintained translation is already in split form) will probably
> be painful.

Patch #3448 (which I am still slowly picking away and and hope to get in 2.5) will also make it painful, as assumptions about which nations are part of the "core" group will end up embedded in save files and cause disruption if a nation is removed from "core" and an old savefile is loaded with nationset="core" but mentioning the old nation.

> Added freeciv-i18n to cc of this ticket.

I don't think that'll work, sadly, as the From: line of emails from Gna is not a list member. (I don't see comment #2 on -i18n, anyway.)

Jacob Nevins <jtn>
Project AdministratorIn charge of this item.
Mon 16 Sep 2013 09:59:46 AM UTC, comment #2:

> - Once we do the split, I think it's possible to get po-files
> correctly split by simply msgmerging old (single) po-files
> against both freeciv.pot and nations.pot. This should extract
> the strings relevant for the .pot in question to the resulting
> po-files. Same applies when copying updated translation from
> S2_4 (where is only single po-file) to S2_5 & TRUNK


Changing what nations belong to core group later (when actively maintained translation is already in split form) will probably be painful.

Marko Lindqvist <cazfi>
Project Administrator
Mon 16 Sep 2013 09:41:19 AM UTC, comment #1:

Added freeciv-i18n to cc of this ticket.

Marko Lindqvist <cazfi>
Project Administrator
Mon 16 Sep 2013 09:39:41 AM UTC, original submission:

When core-nations were introduced, idea was that legends of the other nations should live in separate po-file (different domain in gettext terms).

Here's my current thinking

- What is certain, is that two files with same name cannot live in same directory. That is, we cannot have translations of multiple domains (with filename <lang>.po) in the same directory. For the directory hierarchy I propose replacing current po/ -directory with translations/<domain> -directories:
translations/freeciv/*.po
translations/nations/*.po
translations/Strings.txt

- Once we do the split, I think it's possible to get po-files correctly split by simply msgmerging old (single) po-files against both freeciv.pot and nations.pot. This should extract the strings relevant for the .pot in question to the resulting po-files. Same applies when copying updated translation from S2_4 (where is only single po-file) to S2_5 & TRUNK

- Out po/Makefile.in.in is originally from rather old version of gettext. We cannot update it simply by copying newer version in as we have some local modifications to it, and our configure.ac depends on some of the internals of the current Makefile.in.in. It might make sense to update it now, before making even heavier modifications to it making later update even harder. Not that I see any need to update it any time soon - current version is sufficient for us and when I looked latest version it would mainly add complications we would need to patch against.

- I'm not aware any way to make 'intltool-update -m" to consider files other than POTFILES.in & POTFILES.skip, so for us to be able to find files missing from both domains, we probably need to add all files of one POTFILES.in to the POTFILES.skip of the other.

- Targeting 2.5

Marko Lindqvist <cazfi>
Project Administrator

 

(Note: upload size limit is set to 1024 kB, after insertion of the required escape characters.)

Attach File(s):
   
   
Comment:
   

Attached Files
file #19163:  4190_translation_workflow.png added by jtn (102kB - image/png - mad workflow proposal (and Graphviz source code: "dot -Tpng 4190_translation_workflow.dot >4190_translation_workflow.png"))
file #19164:  4190_translation_workflow.dot added by jtn (6kB - application/msword - mad workflow proposal (and Graphviz source code: "dot -Tpng 4190_translation_workflow.dot >4190_translation_workflow.png"))

 

Digest:
   patch dependencies.

Items that depend on this one: None found

 

Carbon-Copy List
  • -unavailable- added by jtn (Posted a comment)
  • -unavailable- added by cazfi
  • -unavailable- added by cazfi (Submitted the item)
  •  

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.

     

    Please enter the title of George Orwell's famous dystopian book (it's a date):

     

     

    Follow 11 latest changes.

    Date Changed By Updated Field Previous Value => Replaced By
    Wed 09 Apr 2014 12:10:40 AM UTCjtnDependencies-=>Depends on patch #4650
    Sun 26 Jan 2014 01:40:22 PM UTCjtnDependencies-=>Depends on patch #4454
    Wed 06 Nov 2013 11:59:42 PM UTCcazfiAssigned toNone=>jtn
    Tue 29 Oct 2013 10:18:27 PM UTCcazfiDependencies-=>Depends on patch #4285
    Tue 29 Oct 2013 09:36:45 PM UTCcazfiDependencies-=>Depends on patch #4283
    Sun 13 Oct 2013 01:02:05 PM UTCjtnAttached File-=>Added 4190_translation_workflow.png, #19163
      Attached File-=>Added 4190_translation_workflow.dot, #19164
    Sat 12 Oct 2013 05:26:28 PM UTCcazfiDependencies-=>Depends on patch #4244
    Wed 25 Sep 2013 02:14:57 AM UTCcazfiDependencies-=>Depends on patch #4218
    Tue 17 Sep 2013 12:36:45 AM UTCcazfiDependencies-=>Depends on patch #4192
    Mon 16 Sep 2013 09:40:39 AM UTCcazfiCarbon-Copy-=>Added -unavailable-
    Show feedback again

    Back to the top


    Powered by Savane 3.1-cleanup