Namazu-users-en(old)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: PDF and description/keywords
- From: knok@xxxxxxxxxxxxx (NOKUBI Takatsugu)
- Date: Wed, 3 Apr 2002 16:48:02 JST
- X-ml-name: namazu-users-en
- X-mail-count: 00251
In article <20020326162406.4FB6.DARREN@xxxxxxxxxxxxxxx>
darren@xxxxxxxxxxxxxxx writes:
>> Is this possible with PDF files? I had a look at html.pl and pdf.pl in
>> /usr/share/namazu/filter and it looks like I could hack something if the
>> information was in the pdf file and I could get at it. Has anyone tried
>> something like this?
Yes, you can. The point is weight_element function in
filter/html.pl. $heading variable is used to make summary information.
It is proccessed in make_summary function of mknmz command.
To modify filter/pdf.pl as such, you can do it. However, it's little
hard to determine what sentence is appropriate as summary because any
output of pdftotext commmand is simple text format (HTML is an
architectural format, so it's easier).
--
NOKUBI Takatsugu
E-mail: knok@xxxxxxxxxxxxx
knok@xxxxxxxxxx / knok@xxxxxxxxxx