Re: 検索結果の表示について


Namazu v2.0.5

$ mknmz -O /home/httpd/html/namazu/ /home/httpd/html/ml/

(/home/httpd/html/ml/ は MHonArc で作成した html が保存

$ cat NMZ.head.ja NMZ.body.ja NMZ.foot.ja >
form のアクション設定を /home/httpd/cgi-bin/namazu.cgi 

namazu.cgi は何処から移動させたか忘れてしまいました・・

# cp /usr/etc/namazu/namazurc


また、設定ファイル namazurc , mknmzrc を添付いたしました

$ namazu -C

Loaded rcfile: /usr/etc/namazu/namazurc
Index:        /home/httpd/html/namazu/
Logging:      on
Lang:         C
Scoring:      tfidf
Template:     /home/httpd/html/namazu/
MaxHit:       10000
MaxMatch:     1000
EmphasisTags: <strong class="keyword">  </strong>
Replace: /home/httpd/html/namazu/      

$ mknmz -C

Loaded rcfile: /usr/etc/namazu/mknmzrc
System: linux
Namazu: 2.0.5
Perl: 5.00503
NKF: module_nkf
KAKASI: module_kakasi -ieuc -oeuc -w
ChaSen: no -j -F '%m '
Wakati: module_kakasi -ieuc -oeuc -w
Lang: C
Coding System: euc
CONFDIR: /usr/etc/namazu
LIBDIR: /usr/share/namazu/pl
FILTERDIR: /usr/share/namazu/filter
TEMPLATEDIR: /usr/share/namazu/template
Supported media types:
  text/html; x-type=mhonarc
  text/plain; x-type=rfc


# This is a Namazu configuration file for mknmz.
package conf;  # Don't remove this line!

# Administrator's email address
$ADDRESS = 'webmaster@';

# Regular Expression Patterns

# This pattern specifies HTML suffixes.
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";

# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
# $ALLOW_FILE =	".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
# 		"|.*\\.gz|.*\\.Z|.*\\.bz2" .       # Compressed files
# 		"|.*\\.pdf" . 			   # PDF
# 		"|.*\\.tex" .   		   # TeX
# 		"|.*\\.doc|.*\\.xls" .		   # Word, Excel
# 		"|.*\\.j[sab]w" .                  # Ichitaro 4, 5, 6
# 		"|\\d+|[-\\w]+\\.[1-9n]";          # Mail/News, man

# This pattern specifies file names which will NOT be targeted.
# NOTE: It can be specified by --deny=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
# $DENY_FILE = ".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";

# This pattern specifies PATHNAMEs which will NOT be targeted.
# NOTE: Usually specified by --exclude=regex option.
# $EXCLUDE_PATH = undef;

# This pattern specifies file names which can be omitted 
# in URI.  e.g., 'index.html|index.htm|Default.html'
# NOTE: This is similar to Apache's "DirectoryIndex" directive.
$DIRECTORY_INDEX = "/home/httpd/html/namazu/l5users/";

# This pattern specifies Mail/News's fields in its header which 
# should be searchable.  NOTE: case-insensitive
$REMAIN_HEADER = "From|Date|Message-ID";

# This pattern specifies fields which used for field-specified 
# searching.  NOTE: case-insensitive
# $SEARCH_FIELD = "message-id|subject|from|date|uri|newsgroups|to|summary|size";

# This pattern specifies meta tags which used for field-specified 
# searching.  NOTE: case-insensitive
# $META_TAGS = "keywords|description";

# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
%FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');

# This pattern specifies HTML elements which should be replaced with 
# null string when removing them. Normally, the elements are replaced 
# with a single space character.

# Critical Numbers

# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
# $ON_MEMORY_MAX   = 5000000;

# The max file size for indexing. Files larger than this 
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because 
#       binary-formated files such as PDF, Word are larger.
# $FILE_SIZE_MAX   = 2000000;

# The max text size for indexing. Files larger than this 
# will be ignored.
# $TEXT_SIZE_MAX   =  600000;

# The max length of a word. the word longer than this will be ignored.
# $WORD_LENG_MAX   = 128;

# Weights for HTML elements which are used for term weightning.
# %Weight = 
#     (
#      'html' => {
#          'title'  => 16,
#          'h1'     => 8,
#          'h2'     => 7,
#          'h3'     => 6,
#          'h4'     => 5,
#          'h5'     => 4,
#          'h6'     => 3,
#          'a'      => 4,
#          'strong' => 2,
#          'em'     => 2,
#          'kbd'    => 2,
#          'samp'   => 2,
#          'var'    => 2,
#          'code'   => 2,
#          'cite'   => 2,
#          'abbr'   => 2,
#          'acronym'=> 2,
#          'dfn'    => 2,
#      },
#      'metakey' => 32, # for <meta name="keywords" content="foo bar">
#      'headers' => 8,  # for Mail/News' headers
# );

# The max length of a HTML-tagged string which can be processed for
# term weighting. 
# NOTE: There are not a few people has a bad manner using 
#       <h[1-6]> for changing a font size.
# $INVALID_LENG = 128; 

# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).

# Softwares for handling a Japanese text

# Network Kanji Filter nkf v1.62 or later
$NKF = "module_nkf"; 

$KAKASI = "module_kakasi -ieuc -oeuc -w";

# ChaSen 1.51 or later (simple wakatigaki)
# $CHASEN = "no -j -F '\%m '";

# ChaSen 1.51 or later (with noun words extraction)
# $CHASEN_NOUN = "no -j -F '\%m %H\\n'";

# Default Japanese processer: KAKASI or ChaSen.

# Directories

# 1;

# This is a Namazu configuration file for namazu or namazu.cgi.
#  Originally, this file is named 'namazurc-sample'.  so you should
#  copy this to 'namazurc' to make the file effective.
#  Each item is must be separated by one or more SPACE or TAB characters. 
#  You can use a double-quoted string for represanting a string which 
#  contains SPACE or TAB characters like "foo bar baz".

## Index: Specify the default directory.
Index         /home/httpd/html/namazu/

## Template: Set the template directory containing
## NMZ.{head,foot,body,tips,result} files.
Template      /home/httpd/html/namazu/

## Replace: Replace TARGET with REPLACEMENT in URIs in search
## results.  
## TARGET is specified by Ruby's perl-like regular expressions.  
## You can caputure sub-strings in TARGET by surrounding them 
## with `(' and `)'and use them later as backreferences by
## \1, \2, \3,... \9.
## To use meta characters literally such as `*', `+', `?', `|', 
## `[', `]', `{', `}', `(', `)', escape them with `\'.
## e.g.,
##    Replace  /home/foo/public_html/
##    Replace  /home/(.*)/public_html/\1/
##    Replace   /C\|/foo/     
## If you do not want to do the processing on command line use, 
## run namazu with -U option.
## You can specify more than one Replace rules but the only 
## first-matched rule are applied. 
Replace       /home/httpd/html/namazu/  http://localhost/namazu/

## Logging: Set OFF to turn off keyword logging to NMZ.slog. 
## Default is ON.
#Logging       off

## Lang: Set the locale code such as `ja_JP.eucJP', `ja_JP.SJIS', 
## `de', etc.  This directive works only if the environment 
## variable LANG is not set because the directive is mainly 
## intended for CGI use.  On the shell, You can set 
## environemtnt variable LANG instead of using the directive.
## If you set `de' to it, namazu.cgi use 
## NMZ.(head|foot|body|tips|results).de for displaying results 
## and use a proper message catalog for `de'.
Lang          ja

## Scoring: Set the scoring method "tfidf" or "simple".
Scoring       tfidf

## EmphasisTags: Set the pair of html elements which is used in
## keyword emphasizing for search results.
EmphasisTags  "<strong class=\"keyword\">"   "</strong>"

## MaxHit: Set the maximum number of documents which can be
## handled in query operation.  If documents matching a
## query exceed the value, they will be ignored.
MaxHit	10000

## MaxMatch: Set the maximum number of words which can be
## handled in regex/prefix/inside/suffix query. If documents
## matching a query exceed the value, they will be ignored.
MaxMatch	1000