Home Learn More Purchase Knowledge Base Support Contact

Community Forums

KBPublisher Forums » KBPublisher General discussion

xpdf integration

(9 posts)
  • Started 3 years ago by cnielsen
  • Latest reply from rocket2009

No tags yet.


  1. cnielsen
    Member

    I've integrated xpdf for searching in pdf files. The command "pdftotext -raw example.pdf example.txt" works fine. Here's my config.inc.php:

    <?php
    $win = (substr(PHP_OS, 0, 3) == "WIN");

    // change this if you install xpdf to other directory
    $file_conf['extract_tool']['pdf'] = ($win) ? APP_EXTRA_MODULE_DIR . 'file_extractors/xpdf/win/'
    : '/usr/local/groundwork/apache2/htdocs/kb/admin/extra/file_extractors/xpdf/win/';
    ?>

    Have someone made this running in his environment?
    Thank's!

    Posted 3 years ago #
  2. onesign
    Key Master

    Sorry, what is the question? Does it work for you?

    Posted 3 years ago #
  3. cnielsen
    Member

    sry for the misunderstanding - no it doesnt work for me in my kbpublisher-installation. i can convert pdf's to txt's on the command prompt, but i can not search in pdf documents attached to kb-articles.

    Posted 3 years ago #
  4. onesign
    Key Master

    This tool does not search in files, it extract index raw text from pdf files and KBP index such files.
    So search will be possible for new uploaded files (after xpdx installation).
    In next KBPublisher release it will be possible to reindex existing files.

    Make tests with php and real path.
    php system('/usr/path_to_xpdf/pdftotext -raw file_read.pdf file_write.txt', $return);

    try to set with $file_conf['extract_tool']['pdf'] = '';

    Posted 3 years ago #
  5. cnielsen
    Member

    so when i upload a new pdf file, kbp will extract the pdf to txt and then i'll have two files in my kb_file folder, one called test.pdf and one called test.txt, is this correct? and how long do i have to wait for indexing the new files?

    Posted 3 years ago #
  6. onesign
    Key Master

    Text from pdf fle will be extracted to the database, it wil be indexed by MySQL fulltext index.
    Text extracted when you add file.
    There is a "Text" field in files listing if extraction successful then you can able to see extracted text.

    Posted 3 years ago #
  7. cnielsen
    Member

    ahh now i see how it work's :-) sry it's nothing of my daily business...
    and how long do i have to wait till the mysql fulltext index run's? every hour or when?

    Posted 3 years ago #
  8. onesign
    Key Master

    It runs when you add/update file. No need to wait. Files added before enabling xpdf have never been indexed. You have to update it.

    Posted 3 years ago #
  9. rocket2009
    Member

    I had your team install my system and I was the test of xpdf. I have uploaded several pdf files and they were converted to text by xpdf as I can see the yes and click on it.

    My problem is I can't seem to create a search that finds any of the articles. I try small words, long words, multiple words, I go into advanced search and select attachments and inline files, I select all categories by the all button and by highlighting all.

    Search works on articles, but I can't seem to get it work reliably on attachments. Any hints?

    Posted 2 years ago #

RSS feed for this topic

Reply

You must log in to post.

© 2008 Double Jade LLC | customer.service@kbpublisher.com