--- Begin Message ---
- From: namazu-users-en-admin@xxxxxxxxxx
- Date: Thu, 16 Oct 2003 16:52:16 +0900
- References: <A6054190-FFAD-11D7-B8D3-000393A63FC8@w3.org>
NOT MEMBER article from ot@xxxxxx
Original mail as follows:
From owner-namazu-users-en@xxxxxxxxxxxxxxxx Thu Oct 16 16:52:15 2003
Return-Path: <owner-namazu-users-en@xxxxxxxxxxxxxxxx>
Delivered-To: namazu-users-en@xxxxxxxxxx
Received: from toro.w3.mag.keio.ac.jp (toro.w3.mag.keio.ac.jp [133.27.228.201])
by karin.namazu.org (Postfix) with ESMTP id 4210EF861
for <namazu-users-en@xxxxxxxxxx>; Thu, 16 Oct 2003 16:52:15 +0900 (JST)
Received: from w3.org (navi.w3.mag.keio.ac.jp [133.27.228.212])
by toro.w3.mag.keio.ac.jp (Postfix) with ESMTP id 33E6AA8E
for <namazu-users-en@xxxxxxxxxx>; Thu, 16 Oct 2003 16:52:12 +0900 (JST)
Date: Thu, 16 Oct 2003 16:52:11 +0900
Mime-Version: 1.0 (Apple Message framework v552)
Content-Type: text/plain; charset=US-ASCII; format=flowed
Subject: namazu 2.0.12 massively dumping keywords from index
From: Olivier Thereaux <ot@xxxxxx>
To: namazu-users-en@xxxxxxxxxx
Content-Transfer-Encoding: 7bit
Message-Id: <A6054190-FFAD-11D7-B8D3-000393A63FC8@xxxxxx>
X-Mailer: Apple Mail (2.552)
Greetings.
Here is a puzzling case for your consideration. Hopefully among namazu
users and developers on this list this may have happened before, and I
would appreciate any input.
Now for the story:
The users of my namazu-based system started complaining recently that
for the main (big) indexes namazu does not seem to find results beyond
a few days ago, whereas the system indexes documents dating from now
to... 1994.
A quick look at the MNZ.log file for these indexes show something very
strange... Namazu is apparently getting rid of keywords.
> grep "Added Keywords:" NMZ.log | tail -20
Added Keywords: 160
Added Keywords: 58
Added Keywords: 95
Added Keywords: 331
Added Keywords: 286
Added Keywords: 30
Added Keywords: -105,957
Added Keywords: 545
Added Keywords: 552
Added Keywords: 176
Added Keywords: -215,331
Added Keywords: 1,175
Added Keywords: 1,300
Added Keywords: 958
Added Keywords: -120,305
Added Keywords: 1,965
Added Keywords: -1,017,652
Added Keywords: 1,521
Added Keywords: 2,287
Added Keywords: 11,221
according to my logs, it started there, for apparently (?) no reason:
[Append]
Date: Fri Oct 10 20:42:01 2003
Added Documents: 84
Updated Documents: 2
Size (bytes): 345,675
Total Documents: 366,050
Added Keywords: -362,979
Total Keywords: 5,447,163
Wakati: module_kakasi -ieuc -oeuc -w
Time (sec): 1,429
File/Sec: 0.06
System: linux
Perl: 5.006001
Namazu: 2.0.12
That's right, adding 84 documents, removing over 300000 keywords.
My first guess that maybe namazu would get rid of very popular keywords
in order to improve performance doesn't stand when seeing these insane
figures. Maybe it's a bug, then. Is that a known bug? Any idea what I
could do? I know my document base is a bit big for namazu2, but I'd
rather hear something else than "don't use namazu" since, performance
excluded, namazu does everything I need perfectly ;)
Thanks.
--
olivier
--- End Message ---