Namazu-devel-ja(旧)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why EUCJP?

From: Ryuji Abe <raeva@xxxxxxxxxxxx>
Date: Sun, 22 Oct 2000 15:30:44 +0900
X-ml-name: namazu-devel-ja
X-mail-count: 01035
References: <200010210930.SAA12609@mail2.rim.or.jp> <20001021233704M.satoru-t@is.aist-nara.ac.jp>

On Sat, 21 Oct 2000 23:37:04 +0900
Satoru Takabayashi <satoru-t@xxxxxxxxxxxxxxxxxx> wrote:

> >kakasi/chasenモジュールを使う場合にsgmt_codeset()
> >は"EUCJP"を返しますけど、"EUC-JP"でなく"EUCJP"と
> >しているのは何か理由があるのでしょうか?
> 
> iconv(3) が "EUCJP" の方を好むから…、という理由だった気がし
> ますが、今、試したら glibc 2.1.2 の iconv(3) では "EUC-JP" 
> でも受け付けますね。

簡単なプログラムを書いてnl_langinfo(CODESET)の返り
値を見てみましたが、glibc 2.1.3なRed Hat Linux 6.2J
とLinux-Mandrake 7.1では日本語localeのときにどちら
も"EUC-JP"を返します。

#include <locale.h>
#include <langinfo.h>
#include <stdio.h>

int
main ()
{
  char *codeset;

  setlocale (LC_ALL ,"");
  codeset = nl_langinfo (CODESET);
  printf ("%s\n", codeset);
  return 0;
}


> 他の環境でも試しました:
> 
> OSF1 V4.0: (たぶん True 64 UNIX)
(snip)
> EUCJP, EUC-JP は駄目で eucJP は OK ですね。うむむ。

うむむ。codesetの変換に関してはsgmt_codeset()と
nl_langinfo(CODESET)の返り値を比較して、違う場合に
iconv()を通すというやり方が考えられますけど、比較
処理は単純にstrcmp()とかstrcasecmp()というわけには
いかないようですね。例えばcodeset.aliasとか作って、
それを見るようにするとかしないと。


  A A
= . . =
   V
end
Ryuji Abe

References:
- Why EUCJP?
  - From: Ryuji Abe
- Re: Why EUCJP?
  - From: Satoru Takabayashi

Prev by Date: Let's release rc3 as 2.0.5
Next by Date: Re: Let's release rc3 as 2.0.5
Previous by thread: Re: Why EUCJP?
Next by thread: フィールド検索用インデックスの書き出しに失敗する (namazu-bugs-ja#40)
Index(es):
- Date
- Thread