bugFreeciv - Bugs: bug #22767, UTF-8 signature

Show feedback again

bug #22767: UTF-8 signature

Submitted by:  Frank <dunnoob>
Submitted on:  Tue Oct 7 01:13:27 2014  
Category: rulesetsSeverity: 2 - Minor
Priority: 5 - NormalStatus: None
Assigned to: NoneOpen/Closed: Open
Release: Operating System: Microsoft Windows
Planned Release: Contains string changes: None

Add a New Comment (Rich MarkupRich Markup):

You are not logged in

Please log in, so followups can be emailed to you.


Thu Aug 18 02:31:44 2016, comment #4:

Nothing wrong with the Lendians, and I have no tool to check what viewcvs actually says (whatever it says can't be okay, Chrome can handle UTF-8). Also see bug #24994 for a reproducible Unicode issue.

Frank <dunnoob>
Thu Jul 28 12:25:29 2016, comment #3:

Based on http://forum.freeciv.org/f/viewtopic.php?f=9&p=6527 http://svn.gna.org/viewcvs/freeciv/trunk/data/nation/lendian.ruleset might be broken UTF-8, or rather, my non-Unicode text editor is apparently unable to display it as Latin-1, and Chrome renders it as garbage in (from viewcvs) garbage out.

Frank <dunnoob>
Mon Oct 13 21:05:22 2014, comment #2:

Yes, the BOM is alaways the same u+FEFF in any UTF. Your bug #22793 will also help. The signature would be good for tools I'm not using (notepad or wordpad or similar).

My text editor doesn't support UTF-8, I have macros to convert it to "UTF-4" and back again on the fly, where "UTF-4" is C0 + ASCII + Latin-1 as is (bytes), and everything else encoded as 0x82..0x86 (hex. digit count, lead byte) followed by 0x90..0x9F (hex. digits 0..F, tail bytes).

Frank <dunnoob>
Sun Oct 12 10:08:10 2014, comment #1:

Freeciv is already clear in its own mind that its files contain UTF-8 (FC_DEFAULT_DATA_ENCODING), so it needs no clue in the files to tell it.

So presumably your proposed signature is for the benefit of other programs reading these files. Which other programs need this, and would they all agree on a single kind of clue?

I'd have hoped that something in a comment would do, no need to change the Freeciv file parser.

I'd sort of hope that in this day and age, something happening on an unmarked text file might be reasonably prepared for it to turn out to be in UTF-8 without explicit clues.

Jacob Nevins <jtn>
Project Administrator
Tue Oct 7 01:13:27 2014, original submission:

Some tools in some environments support an "Unicode signature" at the begin of text files (example: XML).

This "signature" was also known as BOM (byte order mark) for UTF-16 and UTF-32, but of course the byte order of UTF-8 is clear. For text files starting with lots of ASCII characters "sniffing" the charset if it could turn out to be UTF-8, Latin-1, windows-1252, pure ASCII, or something similar is not a good option. Freeciv should accept (and ignore) an UTF-8 signature at the begin of a .spec, .tilespec, or *.ruleset, because some of these files really contain UTF-8 (i18n for artists, i18n for city names and nation leaders, etc.)

Frank <dunnoob>


(Note: upload size limit is set to 1024 kB, after insertion of the required escape characters.)

Attach File(s):

No files currently attached


Depends on the following items: None found

Items that depend on this one: None found


Carbon-Copy List
  • -unavailable- added by jtn (Posted a comment)
  • -unavailable- added by dunnoob (Submitted the item)

    Do you think this task is very important?
    If so, you can click here to add your encouragement to it.
    This task has 0 encouragements so far.

    Only logged-in users can vote.


    Error: not logged in



    No Changes Have Been Made to This Item
    Show feedback again

    Back to the top

    Powered by Savane 3.1-cleanup