[MacPorts] #70670: libiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct conversions
MacPorts
noreply at macports.org
Sun Sep 1 09:45:45 UTC 2024
#70670: libiconv @1.17_0: iconv on macOS Ventura 13.6+ does not perform correct
conversions
---------------------------+------------------------
Reporter: seamusdemora | Owner: ryandesign
Type: defect | Status: closed
Priority: Normal | Milestone:
Component: ports | Version: 2.10.1
Resolution: invalid | Keywords:
Port: libiconv |
---------------------------+------------------------
Changes (by ryandesign):
* status: assigned => closed
* resolution: => invalid
Old description:
> I'm trying to do something that seems simple (it can be done simply on my
> Linux box):
>
> I have C&P a line from a PDF file (a French programming guide), to
> Terminal.app:
> 'print("Numéro de boucle", i)'
>
> I wanted to convert this line to ASCII before pasting into my editor. So
> I used 'iconv' as shown below. In each case, I used 'file' to check the
> "from" encoding :
>
> {{{
> % echo 'print("Numéro de boucle", i)' | file -
> /dev/stdin: Unicode text, UTF-8 text
>
> % echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
> print("Num'ero de boucle", i)
>
> ?!?!?!? I tried another example:
>
> % echo "print("Protégé. Señorita. Coup de grâce", i)" | file -
> /dev/stdin: Unicode text, UTF-8 text
>
> % echo 'Protégé Señorita Coup de grâce' | iconv -f UTF-8 -t
> ASCII//TRANSLIT
> Prot'eg'e Se~norita Coup de gr^ace
> }}}
>
> **PLEASE NOTE: I have also tried using 'utf-8-mac' and 'utf8-mac' for the
> "from" encoding; thhis had no effect on the results - they were identical
> in all cases.**
>
> As you can see, this is not correct: a single quote has been added. I'm
> not a frequent user of 'iconv', so I checked this on my Debian 'bookworm'
> Linux box:
>
> {{{
> $ echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
> print("Numero de boucle", i)
> }}}
>
> I've checked to confirm that the version of 'iconv' on my macOS Ventura
> 13.6+ is one from MacPorts. I believe that it is:
>
> {{{
> % whereis iconv
> iconv: /usr/bin/iconv /opt/local/share/man/man1/iconv.1.gz
>
> % port installed requested
> The following ports are currently installed:
> ...
> libiconv @1.17_0 (active)
> ...
> %
> }}}
>
> And confirmation of my macports version:
> {{{
> % port -v
> MacPorts 2.10.1
> }}}
>
> I can accept that it's broken, and I can accept that it can't be fixed
> (if that turns out to be the case). But I surely would appreciate an
> explanation of what has gone wrong - especially if it's something that I
> am doing incorrectly!
>
> Rgds,
> ~S
New description:
I'm trying to do something that seems simple (it can be done simply on my
Linux box):
I have C&P a line from a PDF file (a French programming guide), to
Terminal.app:
{{{
print("Numéro de boucle", i)'
}}}
I wanted to convert this line to ASCII before pasting into my editor. So I
used 'iconv' as shown below. In each case, I used 'file' to check the
"from" encoding :
{{{
% echo 'print("Numéro de boucle", i)' | file -
/dev/stdin: Unicode text, UTF-8 text
% echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
print("Num'ero de boucle", i)
}}}
?!?!?!? I tried another example:
{{{
% echo "print("Protégé. Señorita. Coup de grâce", i)" | file -
/dev/stdin: Unicode text, UTF-8 text
% echo 'Protégé Señorita Coup de grâce' | iconv -f UTF-8 -t
ASCII//TRANSLIT
Prot'eg'e Se~norita Coup de gr^ace
}}}
**PLEASE NOTE: I have also tried using 'utf-8-mac' and 'utf8-mac' for the
"from" encoding; thhis had no effect on the results - they were identical
in all cases.**
As you can see, this is not correct: a single quote has been added. I'm
not a frequent user of 'iconv', so I checked this on my Debian 'bookworm'
Linux box:
{{{
$ echo 'print("Numéro de boucle", i)' | iconv -f utf-8 -t ascii//translit
print("Numero de boucle", i)
}}}
I've checked to confirm that the version of 'iconv' on my macOS Ventura
13.6+ is one from MacPorts. I believe that it is:
{{{
% whereis iconv
iconv: /usr/bin/iconv /opt/local/share/man/man1/iconv.1.gz
% port installed requested
The following ports are currently installed:
...
libiconv @1.17_0 (active)
...
%
}}}
And confirmation of my macports version:
{{{
% port -v
MacPorts 2.10.1
}}}
I can accept that it's broken, and I can accept that it can't be fixed (if
that turns out to be the case). But I surely would appreciate an
explanation of what has gone wrong - especially if it's something that I
am doing incorrectly!
Rgds,
~S
--
Comment:
I get the same conversions as you (insertion of `'` and `^` after accented
characters in an attempt to mimic in ASCII what those accents look like)
regardless whether I use /usr/bin/iconv on macOS 12 (Apple's GNU libiconv
1.11) or /opt/local/bin/iconv (MacPorts GNU libiconv 1.17) therefore it is
not a MacPorts bug.
I believe iconv uses locale information provided by the operating system
to guide its conversions. Therefore your bug, I suppose, is with macOS,
although I assume the result we observe is intentional and not considered
a bug. In particular, what we're observing is called transliteration:
https://www.gnu.org/software/libiconv/
> It has also some limited support for transliteration, i.e. when a
character cannot be represented in the target character set, it can be
approximated through one or several similarly looking characters.
Transliteration is activated when `//TRANSLIT` is appended to the target
encoding name.
You have specifically requested that transliteration be enabled.
I don't know why you get different results on Linux. That is, it is
presumably because the locale information provided by Linux differs from
that provided by macOS, but I don't know why these two OS vendors have
decided to do that. Possibly, the locale information on your Linux does
not support transliteration therefore your request to enable
transliteration is being ignored on Linux.
--
Ticket URL: <https://trac.macports.org/ticket/70670#comment:3>
MacPorts <https://www.macports.org/>
Ports system for macOS
More information about the macports-tickets
mailing list