Namazu-devel-ja(旧)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pointer is byte count ? (Re: NMZ.i ( Re:http://www.namazu.org/doc/nmz.html ))

From: 藤原誠 / Makoto Fujiwara <makoto@xxxxx>
Date: Tue, 08 Jul 2003 22:12:00 +0900
X-ml-name: namazu-devel-ja
X-mail-count: 03033
References: <yfmisqe9832.wl@harry.ki.nu> <200307072344.AA00643@inspire_seed_pr.nifty.ne.jp> <yfmsmphannl.wl@harry.ki.nu> <3F0A1666.5E891DE1@asahi-net.or.jp>

>                                            千葉市中央区長洲
>                                                    藤原  誠
皆様、おさわがせしています。
これではいかがでしょうか。
Index: doc/en/nmz.html
===================================================================
RCS file: /storage/cvsroot/namazu/doc/en/nmz.html,v
retrieving revision 1.10.8.1
diff -u -r1.10.8.1 nmz.html
--- doc/en/nmz.html	11 Jul 2001 07:40:44 -0000	1.10.8.1
+++ doc/en/nmz.html	8 Jul 2003 13:07:46 -0000
@@ -42,11 +42,14 @@
 
 <h3>Structure</h3>
 
+For each word, [documentID containing that word][score]
+pair is stored sequencially, making the record for the word.
+The record is of variable length, the byte count of data part
+is placed in front of them.
 <pre>
-
-    [number of documents word1 is found * 2][documentID][score][documentID][score]...
-    [number of documents word2 is found * 2][documentID][score][documentID][score]...
-    [number of documents word3 is found * 2][documentID][score][documentID][score]...
+    [data length for word1][documentID][score][documentID][score]...
+    [data length for word2][documentID][score][documentID][score]...
+    [data length for word3][documentID][score][documentID][score]...
        :
 </pre>
 
@@ -162,13 +165,14 @@
 <h3>Structure</h3>
 
 <pre>
-
-    [number of documents including hash value \x0000][documentID including hash value \x0000]...
-    [number of documents including hash value \x0000][documentID including hash value \x0001]...
-    [number of documents including hash value \x0000][documentID including hash value \xffff]...
+                    |<------            data byte count (1)  ------->|
+[data byte count(1)][documentID including hash value \x0000]...
+                    |<------            data byte count (2)     ------->|
+[data byte count(2)][documentID including hash value \x0001]...
+...
+[data byte count(n)][documentID including hash value \xffff]...
 </pre>
 
-
 <h3>Note</h3>  
 
 <ul>
@@ -176,6 +180,7 @@
 <li>DocumentID are stored by only gaps.<br>
  e.g., 1, 5, 29, 34 -&gt; 1, 4, 24, 5
 <li>All data is stored in "pack 'w'".  (BER compression)
+(except data byte count is in pack 'N')
 </ul>
 
 <h2><a name="pi">NMZ.pi</a></h2>
Index: doc/ja/nmz.html
===================================================================
RCS file: /storage/cvsroot/namazu/doc/ja/nmz.html,v
retrieving revision 1.12
diff -u -r1.12 nmz.html
--- doc/ja/nmz.html	6 Apr 2000 01:40:01 -0000	1.12
+++ doc/ja/nmz.html	8 Jul 2003 13:07:47 -0000
@@ -41,12 +41,12 @@
 <p>単語検索用のインデックスファイル (転置ファイル, inverted ファイル)</p>
 
 <h3>構造</h3>
-
+単語毎に、[その単語を含む文書 ID][スコア]を並べて「レコード」を作る。
+その長さは可変になるので、先頭に、そのデータ長を記録する。
 <pre>
-
-    [単語1を含む文書の総数 * 2][文書ID][スコア][文書ID][スコア]...
-    [単語2を含む文書の総数 * 2][文書ID][スコア][文書ID][スコア]...
-    [単語3を含む文書の総数 * 2][文書ID][スコア][文書ID][スコア]...
+    [単語1用 データ長][文書ID][スコア][文書ID][スコア]...
+    [単語2用 データ長][文書ID][スコア][文書ID][スコア]...
+    [単語3用 データ長][文書ID][スコア][文書ID][スコア]...
        :
 </pre>
 
@@ -159,10 +159,12 @@
 <h3>構造</h3>
 
 <pre>
-
-    [ハッシュ値\x0000を含む文書数][ハッシュ値\x0000を含む文書ID]...
-    [ハッシュ値\x0000を含む文書数][ハッシュ値\x0001を含む文書ID]...
-    [ハッシュ値\x0000を含む文書数][ハッシュ値\xffffを含む文書ID]...
+                 |←                     データバイト数1                          →|
+[データバイト数1][ハッシュ値\x0000を含む文書ID 1][ハッシュ値\x0000を含む文書ID 2]...
+                 |←                     データバイト数2                        →|
+[データバイト数2][ハッシュ値\x0001を含む文書ID 1][ハッシュ値\x0001を含む文書ID 2]...
+...
+[データバイト数n][ハッシュ値\xffffを含む文書ID 1]...
 </pre>
 
 
@@ -173,6 +175,7 @@
 <li>文書IDは差分だけを記録する。<br>
 例: 1, 5, 29, 34 -&gt; 1, 4, 24, 5
 <li>データはすべて pack 'w' で保存される (BER圧縮)
+    (ただしバイト数は pack 'N')
 </ul>
 
 <h2><a name="pi">NMZ.pi</a></h2>


---
(藤原)
株式会社 絹
   043 -221-8082
H" 070-5073-4063
Makoto Fujiwara, 
Chiba, Japan, Narita Airport and Disneyland prefecture.
http://www.ki.nu/software/NetBSD/iBook2/
http://www.ki.nu/software/namazu/tutorial/

Follow-Ups:
- Re: pointer is byte count ? (Re: NMZ.i (Re:http://www.namazu.org/doc/nmz.html ))
  - From: Komai @home

References:
- Re: NMZ.i ( Re: http://www.namazu.org/doc/nmz.html )
  - From: 藤原誠 / Makoto Fujiwara
- Re: NMZ.i ( Re: http://www.namazu.org/doc/nmz.html )
  - From: Komai @home
- pointer is byte count ? (Re: NMZ.i ( Re: http://www.namazu.org/doc/nmz.html ))
  - From: 藤原誠 / Makoto Fujiwara
- Re: pointer is byte count ? (Re: NMZ.i ( Re:http://www.namazu.org/doc/nmz.html ))
  - From: Tadamasa Teranishi

Prev by Date: Re: HACKING{,-ja}
Next by Date: Re: pointer is byte count ? (Re: NMZ.i (Re:http://www.namazu.org/doc/nmz.html ))
Previous by thread: Re: pointer is byte count ? (Re: NMZ.i ( Re:http://www.namazu.org/doc/nmz.html ))
Next by thread: Re: pointer is byte count ? (Re: NMZ.i (Re:http://www.namazu.org/doc/nmz.html ))
Index(es):
- Date
- Thread