Namazu-devel-ja(旧)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: filter/postscript.pl
From: baba@xxxxxxxxxxxxxxxxxxxxxx
Subject: [namazu-devel-ja] filter/postscript.pl
Date: Wed, 27 Dec 2000 15:46:05 +0900
> ファイル検出に問題がある
これは単に mknmzrc の $ALLOW_FILE で *.ps を検出させるようにすれば
いいだけでした。まだ text/plain と表示するけど、これでも良いのかな?
===================================================================
RCS file: /storage/cvsroot/namazu/pl/conf.pl.in,v
retrieving revision 1.27
diff -u -u -r1.27 conf.pl.in
--- conf.pl.in 2000/03/16 13:00:14 1.27
+++ conf.pl.in 2000/12/27 09:54:41
@@ -28,8 +28,10 @@
#
$ALLOW_FILE = ".*\\.(?:$HTML_SUFFIX)|.*\\.txt" . # HTML, plain text
"|.*\\.gz|.*\\.Z|.*\\.bz2" . # Compressed files
- "|.*\\.pdf" . # PDF
"|.*\\.tex" . # TeX
+ "|.*\\.dvi" . # DVI
+ "|.*\\.ps" . # PostScript
+ "|.*\\.pdf" . # PDF
"|.*\\.doc|.*\\.xls" . # Word, Excel
"|.*\\.j[sab]w" . # Ichitaro 4, 5, 6
"|\\d+|[-\\w]+\\.[1-9n]"; # Mail/News, man
ついでに、dvi ファイルもテキスト抽出できるはずだとおもうのだけれど、
dviware はどれが良いのかさっぱりわかりません。しかも同じ名前でちょっ
とづつ違うものがいっぱい。他にもたくさんあるんでしょうがよくわから
んです。
dvi2tty
ftp://ftp.web.ad.jp/pub/TeX/akiu/dviwares/dvi2tty/
ftp://ftp.iis.u-tokyo.ac.jp/pub/TeX/CTAN/dviware/dvi2tty/
ftp://contrib.redhat.com/pub/contrib/libc5/SRPMS/dvi2tty-5.1-1.src.rpm
jdvi2tty
http://www.geocities.co.jp/SiliconValley/7231/jdvi2tty.htm
Nifty のものを勝手に転載したもの?
dvi2text
http://www.toc.lcs.mit.edu/~dmjones/dvi2text/
Perl スクリプトで TeX::DVI::TXT, TeX::DVI::BYTE といったところを
使っているらしい。試してない。
とりあえず、一番上の akiu の dvi2tty を持ってきて、同梱されている
日本語化パッチを当てて作ると jdvi2tty ができます。これと以下の
filter/dvi.pl で、なんとなくいけそうではあります。
--
馬場 肇 ( Hajime BABA ) E-mail: baba@xxxxxxxxxxxxxxxxxxxxxx
京都大学理学部宇宙物理学教室 博士後期課程
--
#
# -*- Perl -*-
# $Id: dvi.pl,v 1.16 2000/03/23 10:41:04 knok Exp $
# Copyright (C) 2000 Namazu Project All rights reserved ,
# This is free software with ABSOLUTELY NO WARRANTY.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either versions 2, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
# 02111-1307, USA
#
# This file must be encoded in EUC-JP encoding
#
package dvi;
use strict;
require 'util.pl';
my $dvipath = undef;
sub mediatype() {
return ('application/x-dvi');
}
sub status() {
if (util::islang("ja")) {
$dvipath = util::checkcmd('jdvi2tty');
} else {
$dvipath = util::checkcmd('dvi2tty');
}
return 'no' unless (defined $dvipath);
return 'yes';
}
sub recursive() {
return 1;
}
sub pre_codeconv() {
return 0;
}
sub post_codeconv () {
return 0;
}
sub add_magic ($) {
return;
}
sub filter ($$$$$) {
my ($orig_cfile, $cont, $weighted_str, $headings, $fields)
= @_;
my $cfile = defined $orig_cfile ? $$orig_cfile : '';
my $tmpfile = util::tmpnam('NMZ.dvi');
my $tmpfile2 = util::tmpnam('NMZ.dvi2');
# note that dvi2tty need suffix .dvi
my $fh = util::efopen("> $tmpfile.dvi");
print $fh $$cont;
undef $fh;
util::vprint("Processing dvi file ... (using '$dvipath')\n");
system("$dvipath -q $tmpfile -o $tmpfile2");
return 'Unable to convert dvi file' unless (-e $tmpfile2);
$fh = util::efopen("$tmpfile2");
my $size = util::filesize($fh);
if ($size > $conf::FILE_SIZE_MAX) {
return 'too_large_dvi_file';
}
$$cont = util::readfile($fh);
undef $fh;
unlink($tmpfile);
unlink($tmpfile2);
return undef;
}
1;