Namazu-users-en(old)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Malformed UTF-8 character ...

From: Earl Hood <earl@xxxxxxxxxxxx>
Date: Wed, 05 May 2004 17:11:33 -0500
X-ml-name: namazu-users-en
X-mail-count: 00498

Namazu version: 2.0.13
(BPerl version: 5.8.4
(BOS: Linux 2.4.21-4.ELsmp #1 SMP Fri Oct 3 17:52:56 EDT 2003 i686 i686 i386 GNU/Linux
(B
(BRunning mknmz generates the following message repeatedly:
(B
(BMalformed UTF-8 character (unexpected continuation byte 0xa4, with no preceding 
(Bstart byte) in pattern match (m//) at /usr/local/share/namazu/filter/mailnews.pl
(B line 216, <GEN5> line 71.
(B...
(B
(B
(BFiguring it was a LANG envariable setting, I explicitly sent LANG
(Bto en_US (it was defaulted to en_US.UTF-8), but it did not fix it.
(BMaybe I should try en_US.ISO-8859-1?
(B
(BTo suppress the message I added a "use bytes" pragma to mailnews.pl
(Bto avoid Perl doing any character processing:
(B
(B--- mailnews.pl.20040505        2004-05-05 14:52:23.000000000 -0700
(B+++ mailnews.pl 2004-05-05 14:53:56.000000000 -0700
(B@@ -209,6 +209,7 @@ sub mailnews_citation_filter ($$) {
(B     $$contref = "";
(B     my $i = 0;
(B     for my $line (@tmp) {
(B+	use bytes;
(B	# Complete excluding is impossible. I tnink it's good enough.
(B         # Process only first five paragrahs.
(B	# And don't handle the paragrah which has five or longer lines.
(B
(BI put the pragma just within the block that was generating the
(Bwarnings.
(B
(BI'm unsure if this is the best fix, but since mailnews.pl contains
(B8-bit values in a regex, something should be done to avoid Perl
(Btrying to interpret the octets under a character encoding.
(B
(BIt may be better to conditionalize the code based upon language
(Bsetting.  I.e.  Have a different regex for each support locale.
(B
(B--ewh

Follow-Ups:
- Re: Malformed UTF-8 character ...
  - From: Tadamasa Teranishi
- Re: Malformed UTF-8 character ...
  - From: Pankaj K Garg

Prev by Date: wwwoffle-mknmz-lasttime just adds, not replaces
Next by Date: Re: Malformed UTF-8 character ...
Previous by thread: wwwoffle-mknmz-lasttime just adds, not replaces
Next by thread: Re: Malformed UTF-8 character ...
Index(es):
- Date
- Thread