[English | Japanese]

Specification of NMZ.* files

NMZ.i
NMZ.ii
NMZ.w
NMZ.wi
NMZ.r
NMZ.p
NMZ.pi
NMZ.t
NMZ.field.{subject,from,date,message-id,...}
NMZ.field.{subject,from,date,message-id,...}.i
NMZ.access
NMZ.status
NMZ.result
NMZ.head
NMZ.foot
NMZ.body
NMZ.tips
NMZ.log
NMZ.lock
NMZ.lock2
NMZ.slog

NMZ.i

Index file for word searching. (inverted file)

Structure

For each word, the pair of [documentID containing that word][score] is stored sequencially, making the record for the word. The record is of variable length, the byte count of each data part is placed in front of them.


    [data length for word1][documentID][score][documentID][score]...
    [data length for word2][documentID][score][documentID][score]...
    [data length for word3][documentID][score][documentID][score]...
       :

Note

DocumentID are sorted in ascending order --INPORTANT.
DocumentID are stored by only gaps.
e.g., 1, 5, 29, 34 -> 1, 4, 24, 5
Data is stored in "pack 'w'" format. (BER compression)

NMZ.ii

Index for 'seek'ing NMZ.i.

Structure


    [position of word 1 in NMZ.i][position of word 2 in NMZ.i]
    [position of word 3 in NMZ.i]...

Note

All data is in binary. (pack 'N')

NMZ.w

List of words.

Structure

A simple line-oriented text. Sorted in ascending order. You can seek NMZ.ii by line number. (Note: line number = wordID)

Note

Words are sorted in ascending order. (list of words are recorded in "NMZ.w")
Regular Expression/substring/suffix matching greps the entire file.
Characters in JIS X 0208 are recored in EUC-JP.

NMZ.wi

Index for 'seek'ing NMZ.w

Structure


    [position of word 1 in NMZ.w][position of word 2 in NMZ.w]
    [position of word 3 in NMZ.w]...

Note

All data is in binary. (pack 'N')

NMZ.r

List of files registered in index.

Structure

Each line records a document file which is registered in the index file. However, a line beginning with '#' indicates a file deleted from the index. A line beginning with '##' indicates comment. Example:


    /home/foo/bar1.html
    /home/foo/bar2.html
    /home/foo/bar3.html
    ## indexed: Sun, 08 Jan 2006 02:28:00 +0900
    (an empty line)
    # /home/foo/bar1.html
    ## deleted: Sun, 08 Jan 1998 12:34:56 +0900

NMZ.p

Index for phrase searching.

Description

Two words are converted to a 16 bit hash value. For phrase searching, all words in a phrase are 'AND'ed and searched, then check the word order by referring NMZ.p. Note that the word order are recorded for each two word pairs. So, to search "foo bar baz", documents including "foo bar" or "bar baz" are retrieved. By collision of hash values, inappropriate documents may also be retrieved. Though phrase search is inaccurate, it usually works fine.

Structure


                    |<------   data byte count (1)    ------->|
[data byte count(1)][documentID including hash value \x0000]...
                    |<------   data byte count (2)    ------->|
[data byte count(2)][documentID including hash value \x0001]...
...
[data byte count(n)][documentID including hash value \xffff]...

Note

DocumentID are sorted in ascending order. -- IMPORTANT
DocumentID are stored by only gaps.
e.g., 1, 5, 29, 34 -> 1, 4, 24, 5
All data is stored in "pack 'w'". (BER compression)

NMZ.pi

Index of index for phrase searching.

Structure


    [position of \x0000 in NMZ.p][position of \x0001 in NMZ.p] ...
    [position of \xffff in NMZ.p]

Note

All data is in binary. (pack 'N')
Always 256 Kb

NMZ.t

Record information about time stamps and deleted documents.

Description

File time stamps are recorded in 32 bits. This is used for sorting search results by date. Also, if value is -1, then the document is regarded as deleted.

Structure


    [time stamp of documentID1][time stamp of documentID2]...

Note

All data is in binary. (pack 'N')
Has the year 2038 problem.

NMZ.field.{subject,from,date,message-id,...}

File to record field information.

Description

Used in field-specified searching. A simple line-oriented text. grep'ed by the regular expression engine. A line number can be used as a documentID. Also, used in displaying the search results.

Structure

A simple line-oriented text. (line number = documentID)

Note

Since it is a line-oriented text, it can be edited by an editor or other tools. In case you edit, you should rebuild NMZ.field.{subject,from,date,message-id,...}.i files by rfnmz.

NMZ.field.{subject,from,date,message-id,...}.i

Index for 'seek'ing NMZ.field.{subject,from,date,message-id,...}

Structure


    [field position in documentID1][field position in documentID2]...

Note

All data is in binary.
All data is in binary. (pack 'N')

NMZ.access

Configuration file for user access control.

Structure

Access control by IP address, host name and/or domain name. deny defines hosts from which you deny user access, and allow defines hosts from which you allow user access. When host is specified by IP address, prefix matching is used, and when host if specified by host name or domain name, suffix matching is used. all indicates all hosts. Configuration is evaluated from the top. Example:


    deny all
    allow localhost
    allow 123.123.123.
    allow .example.jp

This configuration allows access from the localhost, hosts with IP address 123.123.123.*, or hosts with domain name *.example.jp. Access from other hosts are denied.

For Apache web sever, access control by host name and/or domain name requires the following description in "httpd.conf".


    HostnameLookups On

NMZ.status

Data necessary to update index is stored.

NMZ.result

File to specify the style of search results.

Description

${field name} is replaced by the contents of the field. For example, ${title} is replaced by the contents of NMZ.field.title. ${namazu::counter} and ${namazu::score} have special meanings. They are replaced by the counter of search results and its score respectively.

By default, NMZ.result.normal and NMZ.result.short are provided. Users can freely create NMZ.result.*.

Note

Do not lock when writing.

Namazu Homepage

$Id: nmz.html.en,v 1.20 2008-03-04 20:56:20 opengl2772 Exp $

developers@namazu.org

Specification of NMZ.* files

Table of Contents

Structure

Note

Structure

Note

Structure

Note

Structure

Note

Structure

Description

Structure

Note

Structure

Note

Description

Structure

Note

Description

Structure

Note

Structure

Note

Structure

Description

Note