Namazu-users-en(old)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Out of memory!
- From: "Collins, Leon" <Leon.Collins@xxxxxxxxxxxxxxxxxx>
- Date: Thu, 24 Oct 2002 10:33:35 -0400
- X-ml-name: namazu-users-en
- X-mail-count: 00359
Dan:
I was getting that same "out memory" problem for the longest time on my
Unix system even though I had tons of memory on the machine and what I did
to fix the problem was I ran this command "ulimit -d 800000".
Leon
-----Original Message-----
From: Daniel.Jaime@xxxxxxxxxxxx [mailto:Daniel.Jaime@xxxxxxxxxxxx]
Sent: Thursday, October 24, 2002 11:28 AM
To: namazu-users-en@xxxxxxxxxx
Cc: Daniel.Jaime@xxxxxxxxxxxx
Subject: Out of memory!
Hallo,
in the attachment you can find the log file of an indexing session, which
was interrupted by a "Out of memory!" error. I'm not able to find out what
is wrong in my configuration. I may a lot of attempt to index some big file,
but big nummers worked even worst. How how I figure out, what are the
maximum values, that I can define, without parallizing my system?
Another question is, what is the right syntax for the $EXCLUDE_PATH
statement.
Thank you very much
Kind regards
Daniel Jaime
PS: The error message reported by wvware is an independent one.
========== Log file of indexing session ==========
Indexing 2>&1
/<Directories to be indexed>.99
/<Directories to be indexed>.00
/<Directories to be indexed>.01
/<Directories to be indexed>.02
cd /common/Software.Index 2>&1
mknmz --target-list=/common/Software.Index/Indexer.Liste 2>&1
Bareword found where operator expected at /usr/local/etc/namazu/mknmzrc line
55, near "/common/Intranet"
(Missing operator before Intranet?)
Looking for indexing files...
867 files are found to be indexed.
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
1/867 - /<Directories to be indexed>.00/<a subdirectory>/<Dir A>/<File
A>%20<a subdirectory>.doc [application/msword]
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
2/867 - /<Directories to be indexed>.00/<a subdirectory>/<Dir B>/<Dir C>/<a
subdirectory>/<Another File>.doc [application/msword]
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
3/867 - /<Directories to be indexed>.00/<a subdirectory>/>Dir B>/<Dir C>/<a
subdirectory>/<File D>.doc [application/msword]
4/867 - /<Directories to be indexed>.00/<a subdirectory>/<Replaced>/<File
E>.ppt [application/powerpoint]
5/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>001.ppt [application/powerpoint]
.
.
.
9/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>005.ppt [application/powerpoint]
Writing index files...
10/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>006.ppt [application/powerpoint]
11/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>007.ppt [application/powerpoint]
12/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>008.ppt is larger than your setup
after filtered, skipped:
conf::TEXT_SIZE_MAX (600000) < 3140608
12/866 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>009.ppt [application/powerpoint]
13/866 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>010.ppt is larger than your setup
after filtered, skipped:
conf::TEXT_SIZE_MAX (600000) < 3120640
13/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>011.ppt [application/powerpoint]
14/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>012.ppt [application/powerpoint]
15/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>014.ppt [application/powerpoint]
16/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>015.ppt [application/powerpoint]
17/865 - /<Directories to be indexed>.00/<a subdirectory>/<Replaced>/<File
w>.ppt [application/powerpoint]
.
.
.
44/865 - /<Directories to be indexed>.00/<Dir E>/<File F>.xls
[application/excel]
45/865 - /<Directories to be indexed>.00/<Dir E>/<File G>.xls
[application/excel]
Out of memory!
chown -R root:wwwadmin /common/Software.Index 2>&1
chmod -R ug+rw /common/Software.Index 2>&1
chmod -R o+r /common/Software.Index 2>&1
rm -f /common/Software.Index/NMZ.lock2 2>&1
========== Setting ==========
mknmz -C
Bareword found where operator expected at /usr/local/etc/namazu/mknmzrc line
55, near "/common/Intranet"
(Missing operator before Intranet?)
Loaded rcfile: /usr/local/etc/namazu/mknmzrc
System: linux
Namazu: 2.0.12
Perl: 5.006001
NKF: no
KAKASI: no
ChaSen: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /usr/local/etc/namazu
LIBDIR: /usr/local/share/namazu/pl
FILTERDIR: /usr/local/share/namazu/filter
TEMPLATEDIR: /usr/local/share/namazu/template
Supported media types:
application/excel
application/msword
application/pdf
application/postscript
application/powerpoint
application/x-bzip2
application/x-compress
application/x-gzip
application/x-rpm
message/news
message/rfc822
text/hnf
text/html
text/html; x-type=mhonarc
text/plain
text/plain; x-type=rfc
text/x-hdml
text/x-roff
========== /usr/local/etc/namazu/mknmzrc ==========
#
# This is a Namazu configuration file for mknmz.
#
package conf; # Don't remove this line!
#===================================================================
#
# Administrator's email address
#
$ADDRESS = 'Daniel.Jaime@xxxxxxxxxxxx';
#===================================================================
#
# Regular Expression Patterns
#
#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";
#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
# $ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
# "|.*\\.gz|.*\\.Z|.*\\.bz2" . # Compressed files
# "|.*\\.pdf" . # PDF
# "|.*\\.tex" . # TeX
# "|.*\\.doc|.*\\.xls" . # Word, Excel
# "|.*\\.j[sab]w" . # Ichitaro 4, 5, 6
# "|\\d+|[-\\w]+\\.[1-9n]"; # Mail/News, man
$ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
"|.*\\.pdf" . # PDF
"|.*\\.tex" . # TeX
"|.*\\.doc|.*\\.xls" ; # Word, Excel
#
# This pattern specifies file names which will NOT be targeted.
# NOTE: It can be specified by --deny=regex option.
# Do NOT use `$' or `^' anchors.
# Case-insensitive.
#
# $DENY_FILE =
".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
$DENY_FILE =
".*\\.(a|avi|bin|bmp|bz2|cab|cdr|com|drv|exe|dll|gif|gz|jpeg|jpg|lib|mcd|mdl
|msi|ocx|pcx|png|so|sys|tar|tif|zip)|.*\\.tar\\.gz|core|.
*\\.bak|.*~|\\..*|\x23.*";
#
# This pattern specifies PATHNAMEs which will NOT be targeted.
# NOTE: Usually specified by --exclude=regex option.
#
$EXCLUDE_PATH = /common/Intranet/Downloads ;
#
# This pattern specifies file names which can be omitted
# in URI. e.g., 'index.html|index.htm|Default.html'
#
# NOTE: This is similar to Apache's "DirectoryIndex" directive.
#
# $DIRECTORY_INDEX = "";
#
# This pattern specifies Mail/News's fields in its header which
# should be searchable. NOTE: case-insensitive
#
# $REMAIN_HEADER = "From|Date|Message-ID";
#
# This pattern specifies fields which used for field-specified
# searching. NOTE: case-insensitive
#
# $SEARCH_FIELD =
"message-id|subject|from|date|uri|newsgroups|to|summary|size";
#
# This pattern specifies meta tags which used for field-specified
# searching. NOTE: case-insensitive
#
# $META_TAGS = "keywords|description";
#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');
#
# This pattern specifies HTML elements which should be replaced with
# null string when removing them. Normally, the elements are replaced
# with a single space character.
#
# $NON_SEPARATION_ELEMENTS =
'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
#
'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';
#===================================================================
#
# Critical Numbers
#
#
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
# $ON_MEMORY_MAX = 5000000;
# $ON_MEMORY_MAX = 67108864;
# $ON_MEMORY_MAX = 33554432;
$ON_MEMORY_MAX = 16777216;
#
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
# binary-formated files such as PDF, Word are larger.
#
# $FILE_SIZE_MAX = 2000000;
# $FILE_SIZE_MAX = 134217728;
$FILE_SIZE_MAX = 33554432;
#
# The max text size for indexing. Files larger than this
# will be ignored.
#
# $TEXT_SIZE_MAX = 600000;
# $TEXT_SIZE_MAX = 67108864;
$TEXT_SIZE_MAX = 16777216;
#
# The max length of a word. the word longer than this will be ignored.
#
# $WORD_LENG_MAX = 128;
# $WORD_LENG_MAX = 1024;
$WORD_LENG_MAX = 512;
#
# Weights for HTML elements which are used for term weightning.
#
# %Weight =
# (
# 'html' => {
# 'title' => 16,
# 'h1' => 8,
# 'h2' => 7,
# 'h3' => 6,
# 'h4' => 5,
# 'h5' => 4,
# 'h6' => 3,
# 'a' => 4,
# 'strong' => 2,
# 'em' => 2,
# 'kbd' => 2,
# 'samp' => 2,
# 'var' => 2,
# 'code' => 2,
# 'cite' => 2,
# 'abbr' => 2,
# 'acronym'=> 2,
# 'dfn' => 2,
# },
# 'metakey' => 32, # for <meta name="keywords" content="foo bar">
# 'headers' => 8, # for Mail/News' headers
# );
#
# The max length of a HTML-tagged string which can be processed for
# term weighting.
# NOTE: There are not a few people has a bad manner using
# <h[1-6]> for changing a font size.
#
# $INVALID_LENG = 128;
#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
# $MAX_FIELD_LENGTH = 200;
#===================================================================
#
# Softwares for handling a Japanese text
#
#
# Network Kanji Filter nkf v1.62 or later
#
# $NKF = "no";
#
# KAKASI
#
# $KAKASI = "no -ieuc -oeuc -w";
#
# ChaSen 1.51 or later (simple wakatigaki)
#
# $CHASEN = "no -j -F '\%m '";
#
# ChaSen 1.51 or later (with noun words extraction)
#
# $CHASEN_NOUN = "no -j -F '\%m %H\\n'";
#
# Default Japanese processer: KAKASI or ChaSen.
#
# $WAKATI = $none;
#===================================================================
#
# Directories
#
# $LIBDIR = "@PERLLIBDIR@";
# $FILTERDIR = "@FILTERDIR@";
# $TEMPLATEDIR = "@TEMPLATEDIR@";
# 1;
Diehl Munitionssysteme GmbH & Co. KG
English - autocreated E-Mail Appendix:
The content of this E-Mail is not legally binding upon Diehl, even though
the
certified electronic signature technique may point to the writer of the
E-Mail.
If this E-Mail was transmitted to you by error, then please inform us
accordingly
(+49-911-957-2634). In such case you are requested to erase the message.
Any unauthorized reproduction, disclosure, modification, distribution
and/or
publication of such E-Mail message is strictly prohibited.
Deutsch - automatisch erzeugter E-Mail Anhang:
In Verbindung mit Kostenuebernahmen, Lieferungen, Angeboten und Vertraegen
ist der Inhalt dieses E-Mails fuer DIEHL rechtlich nicht verbindlich, auch
wenn die
Anwendung des elektronischen, zertifizierten Signaturverfahrens den
Ersteller des
E-Mails nachweist.
Informieren Sie uns bitte, wenn Sie diese E-Mail faelschlicherweise erhalten
haben
(+49 -911-957-2723). Bitte loeschen Sie in diesem Fall die Nachricht.
Jede unerlaubte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung
und/oder Publikation dieser E-Mail ist strengstens verboten.