This archive contains answers to questions sent to Unidata support through mid-2025. Note that the archive is no longer being updated. We provide the archive for reference; many of the answers presented here remain technically correct, even if somewhat outdated. For the most up-to-date information on the use of NSF Unidata software and data services, please consult the Software Documentation first.
Sam, > There's a problem in the regex code used in the udunits2 command-line > utility for detecting the correct default character set to use. The regular > expressions are compiled without the REG_EXTENDED flag, and the > regular expressions themselves don't properly match the appropriate > targets. Thanks for sending this in. Because the code is intended to match a character-set specification regardless of where it occurs in a string, I modified your suggested regular expressions somewhat. The code is now {"^c$", UT_ASCII}, {"^posix$", UT_ASCII}, {"ascii", UT_ASCII}, {"latin.?1([^0-9]|$)", UT_LATIN1}, {"8859.?1([^0-9]|$)", UT_LATIN1}, {"utf.?8([^0-9]|$)", UT_UTF8}, I did add the REG_EXTENDED option. I forgot that the "?" metacharacter only exists in extended regular expressions. > I've attached a patch which addresses the problem. > > Would using something like libcharset > <URL: http://www.haible.de/bruno/packages-libcharset.html> be an > option? nl_langinfo(CODESET) is the semi-portable way of detecting > locale character encoding, but it is notoriously arbitrary. I did it the way I did because that heuristic is not reliable --- especially for only a few, short strings. Determining the character set is outside the scope of a units package and is most appropriately the responsibility of the client. Again, thanks for sending this in. > Best regards, > Sam Yates Regards, Steve Emmerson Ticket Details =================== Ticket ID: DRL-868187 Department: Support UDUNITS Priority: Normal Status: Closed