[MacPorts] #71237: sed: RE error: illegal byte sequence

MacPorts noreply at macports.org
Mon Nov 4 20:43:11 UTC 2024


#71237: sed: RE error: illegal byte sequence
---------------------------+--------------------
  Reporter:  ballapete     |      Owner:  (none)
      Type:  defect        |     Status:  new
  Priority:  Normal        |  Milestone:
 Component:  ports         |    Version:  2.10.2
Resolution:                |   Keywords:
      Port:  Perl modules  |
---------------------------+--------------------

Old description:

> While trying to proof that Perl 5.38 is ready for production use I tried
> in a final step to patch all the test files that start with
> `#!/usr/bin/perl` (or similarly) to start with `#!/usr/bin/env perl` or
> `${perl5.bin}` and ran into that `sed` problem. On the command line it
> gives:
>
> {{{
> pete 313 /\ head -20
> /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
> -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/80_rt.t | tail -3 | sed -e
> 's:/usr/bin/perl:/usr/bin/env perl:'
> while (<DATA>) {
> sed: RE error: illegal byte sequence
> Exit 1
> pete 314 /\ which sed
> }}}
>
> and also:
>
> {{{
> pete 312 /\ head -20
> /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
> -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/80_rt.t | tail -3 | gsed -e
> 's:/usr/bin/perl:/usr/bin/env perl:'
> while (<DATA>) {
>     if (s/^\253(\d+)\273\s*-?\s*//) {
>         chomp;
> }}}
>
> So it's likely that using `gsed` instead of the `system's sed` will solve
> the problem.
>
> How can I make `port` use `gsed` instead of the `system's sed`?
>
> Another example:
>
> {{{
> pete 318 /\ head -47
> /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
> -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/42_bindparam.t | tail -3 | sed
> -e 's:/usr/bin/perl:/usr/bin/env perl:'
> # Now try the explicit type settings
> ok ($sth->bind_param (1, " 4", &SQL_INTEGER),   "bind 4 int");
> sed: RE error: illegal byte sequence
> Exit 1
>
> pete 319 /\ head -47
> /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
> -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/42_bindparam.t | tail -3 |
> gsed -e 's:/usr/bin/perl:/usr/bin/env perl:'
> # Now try the explicit type settings
> ok ($sth->bind_param (1, " 4", &SQL_INTEGER),   "bind 4 int");
> ok ($sth->bind_param (2, "Andreas K\366nig"),   "bind str");
> }}}
>
> Another one would be
>
> {{{
> head -55
> /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5-encode/p5.38-encode/work/Encode-3.21/t
> /at-cn.t  | tail -7 | {g,}sed -e 's:/usr/bin/perl:/usr/bin/env perl:'
> }}}
>
> where Chinese characters are somehow "encoded".
>
> Obviously `sed` "knows" some "forbidden" characters it cannot work on,
> and obviously it has problems with encodings other than 7 or 8 bit that
> gsed has learned to overcome.
> }}}

New description:

 While trying to proof that Perl 5.38 is ready for production use I tried
 in a final step to patch all the test files that start with
 `#!/usr/bin/perl` (or similarly) to start with `#!/usr/bin/env perl` or
 `${perl5.bin}` and ran into that `sed` problem. On the command line it
 gives:

 {{{
 pete 313 /\ head -20
 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
 -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/80_rt.t | tail -3 | sed -e
 's:/usr/bin/perl:/usr/bin/env perl:'
 while (<DATA>) {
 sed: RE error: illegal byte sequence
 Exit 1
 pete 314 /\ which sed
 }}}

 and also:

 {{{
 pete 312 /\ head -20
 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
 -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/80_rt.t | tail -3 | gsed -e
 's:/usr/bin/perl:/usr/bin/env perl:'
 while (<DATA>) {
     if (s/^\253(\d+)\273\s*-?\s*//) {
         chomp;
 }}}

 So it's likely that using `gsed` instead of the `system's sed` will solve
 the problem.

 How can I make `port` use `gsed` instead of the `system's sed`?

 Another example:

 {{{
 pete 318 /\ head -47
 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
 -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/42_bindparam.t | tail -3 | sed
 -e 's:/usr/bin/perl:/usr/bin/env perl:'
 # Now try the explicit type settings
 ok ($sth->bind_param (1, " 4", &SQL_INTEGER),   "bind 4 int");
 sed: RE error: illegal byte sequence
 Exit 1

 pete 319 /\ head -47
 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5
 -dbd-csv/p5.38-dbd-csv/work/DBD-CSV-0.60/t/42_bindparam.t | tail -3 | gsed
 -e 's:/usr/bin/perl:/usr/bin/env perl:'
 # Now try the explicit type settings
 ok ($sth->bind_param (1, " 4", &SQL_INTEGER),   "bind 4 int");
 ok ($sth->bind_param (2, "Andreas K\366nig"),   "bind str");
 }}}

 Another one would be

 {{{
 head -55
 /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_perl_p5-encode/p5.38-encode/work/Encode-3.21/t
 /at-cn.t  | tail -7 | {g,}sed -e 's:/usr/bin/perl:/usr/bin/env perl:'
 }}}

 where Chinese characters are somehow "encoded".

 Obviously `sed` "knows" some "forbidden" characters it cannot work on, and
 obviously it has problems with encodings other than 7 or 8 bit that gsed
 has learned to overcome.

--

Comment (by ryandesign):

 System `sed` expects input to be UTF-8 by default; you're trying to use it
 on files that aren't UTF-8. Working around this by using `gsed` instead is
 not necessary nor recommended. You can still use system `sed` as long as
 you set the `LC_CTYPE` environment variable either to the correct locale
 or just to `C`.

-- 
Ticket URL: <https://trac.macports.org/ticket/71237#comment:1>
MacPorts <https://www.macports.org/>
Ports system for macOS


More information about the macports-tickets mailing list