[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[UDUNITS #DRL-868187]: udunits-2.0.0: locale character set determination bug
- Subject: [UDUNITS #DRL-868187]: udunits-2.0.0: locale character set determination bug
- Date: Tue, 05 Aug 2008 12:37:46 -0600
Sam,
> There's a problem in the regex code used in the udunits2 command-line
> utility for detecting the correct default character set to use. The regular
> expressions are compiled without the REG_EXTENDED flag, and the
> regular expressions themselves don't properly match the appropriate
> targets.
Thanks for sending this in.
Because the code is intended to match a character-set specification
regardless of where it occurs in a string, I modified your suggested
regular expressions somewhat. The code is now
{"^c$", UT_ASCII},
{"^posix$", UT_ASCII},
{"ascii", UT_ASCII},
{"latin.?1([^0-9]|$)", UT_LATIN1},
{"8859.?1([^0-9]|$)", UT_LATIN1},
{"utf.?8([^0-9]|$)", UT_UTF8},
I did add the REG_EXTENDED option. I forgot that the "?" metacharacter
only exists in extended regular expressions.
> I've attached a patch which addresses the problem.
>
> Would using something like libcharset
> <URL: http://www.haible.de/bruno/packages-libcharset.html> be an
> option? nl_langinfo(CODESET) is the semi-portable way of detecting
> locale character encoding, but it is notoriously arbitrary.
I did it the way I did because that heuristic is not reliable ---
especially for only a few, short strings. Determining the character
set is outside the scope of a units package and is most appropriately
the responsibility of the client.
Again, thanks for sending this in.
> Best regards,
> Sam Yates
Regards,
Steve Emmerson
Ticket Details
===================
Ticket ID: DRL-868187
Department: Support UDUNITS
Priority: Normal
Status: Closed