Namazu-users-en(old)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Antwort: Re: Out of memory!



Hi Leon,

here are my settings after entering "ulimit -d 800000" and changing the shell from "ksh" to "bash" (on SuSe Linux 8.0)

      ulimit -SHacdflmnpstuv
      core file size (blocks)     0
      data seg size (kbytes)      800000
      file size (blocks)          unlimited
      max locked memory (kbytes)  unlimited
      max memory size (kbytes)    unlimited
      open files                  1024
      pipe size (512 bytes)       8
      stack size (kbytes)         unlimited
      cpu time (seconds)          unlimited
      max user processes          1023
      virtual memory (kbytes)     unlimited

I'm still testing, so I can't say if it really works. At the moment it  seems to me as if not one of the files to be indexed is to large but the summ
of the files indexed. As a workaround I splitted the indexing process (start indexing one directory and its subdirectories, index a second directory
and reindex the first one, and so on) which at the moment seems to work.

Daniel

By the way: The warnings sent by mknmz show/sugest the modified value for TEXT_SIZE_MAX in mknmzrc is not used:
      subdirectory>/<Replaced>/<Powerpoint file>008.ppt is larger than your
setup after filtered, skipped: conf::TEXT_SIZE_MAX (600000) < 3140608





"Collins, Leon" <Leon.Collins@xxxxxxxxxxxxxxxxxx> am 24.10.2002 16:33:35

Bitte antworten an namazu-users-en@xxxxxxxxxx

An:    "'namazu-users-en@xxxxxxxxxx'" <namazu-users-en@xxxxxxxxxx>
Kopie:

Thema: Re: Out of memory!

Dan:
   I was getting that same "out memory" problem for the longest time on my
Unix system even though I had tons of memory on the machine and what I did
to fix the problem was I ran this command "ulimit -d 800000".

Leon

-----Original Message-----
From: Daniel.Jaime@xxxxxxxxxxxx [mailto:Daniel.Jaime@xxxxxxxxxxxx]
Sent: Thursday, October 24, 2002 11:28 AM
To: namazu-users-en@xxxxxxxxxx
Cc: Daniel.Jaime@xxxxxxxxxxxx
Subject: Out of memory!

Hallo,

in the attachment you can find the log file of an indexing session, which
was interrupted by a "Out of memory!" error. I'm not able to find out what
is wrong in my configuration. I may a lot of attempt to index some big
file,
but big nummers worked even worst. How how I figure out, what are the
maximum values, that I can define, without parallizing my system?

Another question is, what is the right syntax for the  $EXCLUDE_PATH
statement.

Thank you very much

Kind regards

Daniel Jaime

PS: The error message reported by wvware is an independent one.

========== Log file of indexing session ==========

Indexing 2>&1
/<Directories to be indexed>.99
/<Directories to be indexed>.00
/<Directories to be indexed>.01
/<Directories to be indexed>.02

cd /common/Software.Index 2>&1

mknmz --target-list=/common/Software.Index/Indexer.Liste 2>&1
Bareword found where operator expected at /usr/local/etc/namazu/mknmzrc
line
55, near "/common/Intranet"
      (Missing operator before Intranet?)
Looking for indexing files...
867 files are found to be indexed.
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
1/867 - /<Directories to be indexed>.00/<a subdirectory>/<Dir A>/<File
A>%20<a subdirectory>.doc [application/msword]
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
2/867 - /<Directories to be indexed>.00/<a subdirectory>/<Dir B>/<Dir C>/<a
subdirectory>/<Another File>.doc [application/msword]
/usr/local/bin/wvWare: error while loading shared libraries: libpng.so.3:
cannot open shared object file: No such file or directory
3/867 - /<Directories to be indexed>.00/<a subdirectory>/>Dir B>/<Dir C>/<a
subdirectory>/<File D>.doc [application/msword]
4/867 - /<Directories to be indexed>.00/<a subdirectory>/<Replaced>/<File
E>.ppt [application/powerpoint]
5/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>001.ppt [application/powerpoint]
.
.
.
9/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>005.ppt [application/powerpoint]
Writing index files...
10/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>006.ppt [application/powerpoint]
11/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>007.ppt [application/powerpoint]
12/867 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>008.ppt is larger than your setup
after filtered, skipped:
conf::TEXT_SIZE_MAX (600000) < 3140608
12/866 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>009.ppt [application/powerpoint]
13/866 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>010.ppt is larger than your setup
after filtered, skipped:
conf::TEXT_SIZE_MAX (600000) < 3120640
13/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>011.ppt [application/powerpoint]
14/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>012.ppt [application/powerpoint]
15/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>014.ppt [application/powerpoint]
16/865 - /<Directories to be indexed>.00/<a
subdirectory>/<Replaced>/<Powerpoint file>015.ppt [application/powerpoint]
17/865 - /<Directories to be indexed>.00/<a subdirectory>/<Replaced>/<File
w>.ppt [application/powerpoint]
.
.
.
44/865 - /<Directories to be indexed>.00/<Dir E>/<File F>.xls
[application/excel]
45/865 - /<Directories to be indexed>.00/<Dir E>/<File G>.xls
[application/excel]
Out of memory!

chown -R root:wwwadmin /common/Software.Index 2>&1

chmod -R ug+rw /common/Software.Index 2>&1

chmod -R o+r /common/Software.Index 2>&1
rm -f /common/Software.Index/NMZ.lock2 2>&1

========== Setting ==========

mknmz -C
Bareword found where operator expected at /usr/local/etc/namazu/mknmzrc
line
55, near "/common/Intranet"
        (Missing operator before Intranet?)
Loaded rcfile: /usr/local/etc/namazu/mknmzrc
System: linux
Namazu: 2.0.12
Perl: 5.006001
NKF: no
KAKASI: no
ChaSen: no
Lang_Msg: C
Lang: C
Coding System: euc
CONFDIR: /usr/local/etc/namazu
LIBDIR: /usr/local/share/namazu/pl
FILTERDIR: /usr/local/share/namazu/filter
TEMPLATEDIR: /usr/local/share/namazu/template
Supported media types:
  application/excel
  application/msword
  application/pdf
  application/postscript
  application/powerpoint
  application/x-bzip2
  application/x-compress
  application/x-gzip
  application/x-rpm
  message/news
  message/rfc822
  text/hnf
  text/html
  text/html; x-type=mhonarc
  text/plain
  text/plain; x-type=rfc
  text/x-hdml
  text/x-roff

========== /usr/local/etc/namazu/mknmzrc ==========

#
# This is a Namazu configuration file for mknmz.
#
package conf;  # Don't remove this line!

#===================================================================
#
# Administrator's email address
#
  $ADDRESS = 'Daniel.Jaime@xxxxxxxxxxxx';


#===================================================================
#
# Regular Expression Patterns
#

#
# This pattern specifies HTML suffixes.
#
# $HTML_SUFFIX = "html?|[ps]html|html\\.[a-z]{2}";

#
# This pattern specifies file names which will be targeted.
# NOTE: It can be specified by --allow=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
#
# $ALLOW_FILE =   ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
#           "|.*\\.gz|.*\\.Z|.*\\.bz2" .       # Compressed files
#           "|.*\\.pdf" .                    # PDF
#           "|.*\\.tex" .              # TeX
#           "|.*\\.doc|.*\\.xls" .           # Word, Excel
#           "|.*\\.j[sab]w" .                  # Ichitaro 4, 5, 6
#           "|\\d+|[-\\w]+\\.[1-9n]";          # Mail/News, man

  $ALLOW_FILE =   ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
            "|.*\\.pdf" .                    # PDF
            "|.*\\.tex" .              # TeX
            "|.*\\.doc|.*\\.xls" ;           # Word, Excel

#
# This pattern specifies file names which will NOT be targeted.
# NOTE: It can be specified by --deny=regex option.
#       Do NOT use `$' or `^' anchors.
#       Case-insensitive.
#
# $DENY_FILE =
".*\\.(gif|png|jpg|jpeg)|.*\\.tar\\.gz|core|.*\\.bak|.*~|\\..*|\x23.*";
  $DENY_FILE =
".
*\\.(a|avi|bin|bmp|bz2|cab|cdr|com|drv|exe|dll|gif|gz|jpeg|jpg|lib|mcd|mdl
|msi|ocx|pcx|png|so|sys|tar|tif|zip)|.*\\.tar\\.gz|core|.
*\\.bak|.*~|\\..*|\x23.*";

#
# This pattern specifies PATHNAMEs which will NOT be targeted.
# NOTE: Usually specified by --exclude=regex option.
#
  $EXCLUDE_PATH = /common/Intranet/Downloads ;

#
# This pattern specifies file names which can be omitted
# in URI.  e.g., 'index.html|index.htm|Default.html'
#
# NOTE: This is similar to Apache's "DirectoryIndex" directive.
#
# $DIRECTORY_INDEX = "";

#
# This pattern specifies Mail/News's fields in its header which
# should be searchable.  NOTE: case-insensitive
#
# $REMAIN_HEADER = "From|Date|Message-ID";

#
# This pattern specifies fields which used for field-specified
# searching.  NOTE: case-insensitive
#
# $SEARCH_FIELD =
"message-id|subject|from|date|uri|newsgroups|to|summary|size";

#
# This pattern specifies meta tags which used for field-specified
# searching.  NOTE: case-insensitive
#
# $META_TAGS = "keywords|description";

#
# This pattern specifies aliases for NMZ.field.* files.
# NOTE: Editing NOT recommended.
#
# %FIELD_ALIASES = ('title' => 'subject', 'author' => 'from');

#
# This pattern specifies HTML elements which should be replaced with
# null string when removing them. Normally, the elements are replaced
# with a single space character.
#
# $NON_SEPARATION_ELEMENTS =
'A|TT|CODE|SAMP|KBD|VAR|B|STRONG|I|EM|CITE|FONT|U|'.
#
'STRIKE|BIG|SMALL|DFN|ABBR|ACRONYM|Q|SUB|SUP|SPAN|BDO';

#===================================================================
#
# Critical Numbers
#

#
# The max size of files which can be loaded in memory at once.
# If you have much memory, you can increase the value.
# If you have less memory, you can decrease the value.
#
# $ON_MEMORY_MAX   = 5000000;
# $ON_MEMORY_MAX   = 67108864;
# $ON_MEMORY_MAX   = 33554432;
  $ON_MEMORY_MAX   = 16777216;

#
# The max file size for indexing. Files larger than this
# will be ignored.
# NOTE: This value is usually larger than TEXT_SIZE_MAX because
#       binary-formated files such as PDF, Word are larger.
#
# $FILE_SIZE_MAX   = 2000000;
# $FILE_SIZE_MAX   = 134217728;
  $FILE_SIZE_MAX   = 33554432;

#
# The max text size for indexing. Files larger than this
# will be ignored.
#
# $TEXT_SIZE_MAX   =  600000;
# $TEXT_SIZE_MAX   =  67108864;
  $TEXT_SIZE_MAX   =  16777216;
#
# The max length of a word. the word longer than this will be ignored.
#
# $WORD_LENG_MAX   = 128;
# $WORD_LENG_MAX   = 1024;
  $WORD_LENG_MAX   = 512;


#
# Weights for HTML elements which are used for term weightning.
#
# %Weight =
#     (
#      'html' => {
#          'title'  => 16,
#          'h1'     => 8,
#          'h2'     => 7,
#          'h3'     => 6,
#          'h4'     => 5,
#          'h5'     => 4,
#          'h6'     => 3,
#          'a'      => 4,
#          'strong' => 2,
#          'em'     => 2,
#          'kbd'    => 2,
#          'samp'   => 2,
#          'var'    => 2,
#          'code'   => 2,
#          'cite'   => 2,
#          'abbr'   => 2,
#          'acronym'=> 2,
#          'dfn'    => 2,
#      },
#      'metakey' => 32, # for <meta name="keywords" content="foo bar">
#      'headers' => 8,  # for Mail/News' headers
# );

#
# The max length of a HTML-tagged string which can be processed for
# term weighting.
# NOTE: There are not a few people has a bad manner using
#       <h[1-6]> for changing a font size.
#
# $INVALID_LENG = 128;

#
# The max length of a field.
# This MUST be smaller than libnamazu.h's BUFSIZE (usually 1024).
#
# $MAX_FIELD_LENGTH = 200;


#===================================================================
#
# Softwares for handling a Japanese text
#

#
# Network Kanji Filter nkf v1.62 or later
#
# $NKF = "no";

#
# KAKASI
#
# $KAKASI = "no -ieuc -oeuc -w";

#
# ChaSen 1.51 or later (simple wakatigaki)
#
# $CHASEN = "no -j -F '\%m '";

#
# ChaSen 1.51 or later (with noun words extraction)
#
# $CHASEN_NOUN = "no -j -F '\%m %H\\n'";

#
# Default Japanese processer: KAKASI or ChaSen.
#
# $WAKATI  = $none;


#===================================================================
#
# Directories
#
# $LIBDIR = "@PERLLIBDIR@";
# $FILTERDIR = "@FILTERDIR@";
# $TEMPLATEDIR = "@TEMPLATEDIR@";

# 1;
Diehl Munitionssysteme GmbH & Co. KG

English - autocreated E-Mail Appendix:
The content of this E-Mail is not legally binding upon Diehl, even though
the
certified electronic signature technique may point to the writer of the
E-Mail.
If this E-Mail was transmitted to you by error, then please inform us
accordingly
(+49-911-957-2634). In such case you are requested to erase the message.
 Any unauthorized reproduction, disclosure, modification, distribution
and/or
publication of such E-Mail message is strictly prohibited.

Deutsch - automatisch erzeugter E-Mail Anhang:
In Verbindung mit Kostenuebernahmen, Lieferungen, Angeboten und Vertraegen
ist der Inhalt dieses E-Mails fuer DIEHL rechtlich nicht verbindlich, auch
wenn die
Anwendung des elektronischen, zertifizierten Signaturverfahrens den
Ersteller des
E-Mails nachweist.
Informieren Sie uns bitte, wenn Sie diese E-Mail faelschlicherweise
erhalten
haben
(+49 -911-957-2723). Bitte loeschen Sie in diesem Fall die Nachricht.
Jede unerlaubte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung
und/oder Publikation dieser E-Mail ist strengstens verboten.










Diehl Munitionssysteme GmbH & Co. KG



English - autocreated E-Mail Appendix:

The content of this E-Mail is not legally binding upon Diehl, even though
       the

certified electronic signature technique may point to the writer of the
       E-Mail.

If this E-Mail was transmitted to you by error, then please inform us
       accordingly

(+49-911-957-2634). In such case you are requested to erase the message.

 Any unauthorized reproduction, disclosure, modification, distribution
       and/or

publication of such E-Mail message is strictly prohibited.



Deutsch - automatisch erzeugter E-Mail Anhang:

In Verbindung mit Kostenuebernahmen, Lieferungen, Angeboten und Vertraegen

ist der Inhalt dieses E-Mails fuer DIEHL rechtlich nicht verbindlich, auch
       wenn die

Anwendung des elektronischen, zertifizierten Signaturverfahrens den
       Ersteller des

E-Mails nachweist.

Informieren Sie uns bitte, wenn Sie diese E-Mail faelschlicherweise
       erhalten haben

(+49 -911-957-2723). Bitte loeschen Sie in diesem Fall die Nachricht.

Jede unerlaubte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung

und/oder Publikation dieser E-Mail ist strengstens verboten.