Recently I had to set up an intranet search engine to crawl trough thousands of PDF files. There are a ton of commercial solutions (read: ()) out there on the market, ranging from Google Search Appliance to IBM’s OmniFind. There are also a few good Open Source engines, such as Apache’s Lucene. The problem is that these are primarily intended for enterprises with server farms full of data. That’s really not what I was looking for. I was looking something simple that was easy to set up and maintain. That’s when I came across Xapian. It’s Open Source and lightweight. Combine Xapian with Omega and you got exactly what I was looking for — A lightweight intranet search engine.
- For intranets, content is king! Thus, our Intranet Content Blog Series focuses on several practical content tips and ideas organizations can use to keep employees coming back to the intranet. It’s a new year and what better time to start planning out this year’s intranet content calendar.
- Full support for GeForce RTX graphics cards and GeForce GTX 10-Series. RGB LED Control supporting graphics cards and/or NVLink Bridge. LED Sync that syncs with other EVGA RGB components. New OC Scanner for finding the best stable overclock. On-Screen-Display (OSD) shows your system vitals at a glance.
Recently I had to set up an intranet search engine to crawl trough thousands of PDF files. There are a ton of commercial solutions (read: ) out there on the market, ranging from Google Search Appliance to IBM’s OmniFind. There are also a few good Open Source engines, such as Apache’s Lucene. The problem is that these are primarily intended for enterprises with server farms full of data.
This howto will walk you trough how to set up Xapian with Omega on FreeBSD. The version I used was FreeBSD 8.1, but I’m sure any recent version of FreeBSD (7.x>) will do. Please note that I do expect you to know your way around FreeBSD, so I’m not going to spend time on simple tasks like how to edit files etc. I also assume you already got your system up and running.
I’ve called the path we’re going to index (recursively) ‘/path/to/something’. This can be either a local path or something mounted from a remote server. Also, as you’ll see below, a lot of dependencies are installed. This is to increase the number of file-format Xapian will index. It should be able to index PDF-files, Word-files, RTF-files, in addition to plain-text files.
Let’s get started.
Note: If you don’t have the ports-tree installed (/usr/ports), you can download it by simply running:
Install Apache
Install Xapian with Xapian-Omega
Install Xpdf
Make sure to uncheck X11 and DRAW
Install Catdoc
Uncheck WORDVIEW
Install Unzip
Install Gzip
Install Antiword
Install Unrtf
Install Catdvi
Next we need to edit Apache’s config-file (/usr/local/etc/apache22/httpd.conf)
Change:
Into:
We also need to create a new config-file for Xapian. Create the file /usr/local/etc/apache22/Include/xapian.conf
Descargar Programa B Series Intranet Search Add Settings
With all Apache configuration being done, let’s fire up Apache:
Create the holding directory
B Series Intranet Search Add Settings Descargar Google Chrome
Copy over the templates. For some reason FreeBSD doesn’t do this by default.
We also need to tell Xapian-Omega where to look for the files. Create the file /usr/local/www/xapian-omega/cgi-bin/omega.conf
B Series Intranet Search Add Settings Descargar Google
Create a search page. I’ll just use index.html in Apache’s default DocumentRoot (/usr/local/www/apache22/data/index.html).