Words, Words, Words–Introducing OpenSearchServer

How to create your own search engine combined with a crawler that will index all sorts of documents.

In William Shakespeare's Hamlet, one of my favorite plays, Prince Hamlet is approached by Polonius, chief counselor to Claudius, King of Denmark, who happens to be Hamlet's stepfather, and uncle, and the new husband of his mother, Queen Gertrude, whose recently deceased last husband was the previous King of Denmark. That would be Hamlet's biological father for those who might be having trouble following along. He was King Hamlet. Polonius, I probably should mention, is also the father of Hamlet's sweetheart, Ophelia. Despite this hilarious sounding setup, Hamlet is most definitely not a comedy. (Note: if you need a refresher, you can read Hamlet here.)

For reasons I won't go into here, Hamlet is doing a great job of trying to convince people that he's completely lost it and is pretending to be reading a book when Polonius approaches and asks, "What do you read, my lord?"

Hamlet replies by saying, "'Words, words, words." In other words, ahem, nothing of any importance, you annoying little man.

Shakespeare wrote a lot of words. In fact, writers, businesses and organizations of any size tend to amass a lot of words in the form of countless documents, many of which seem to contain a great deal of importance at the time they are written and subsequently stored on some lonely corporate server. There, locked in their digital prisons, these many texts await the day when somebody will seek out their wisdom. Trouble is, there are so many of them, in many different formats, often with titles that tell you nothing about the content inside. What you need is a search engine.

Google is a pretty awesome search engine, but it's not for everybody, especially if the documents in question aren't meant for consumption by the public at large. For those times, you need your own search engine, combined with a crawler that will index all sorts of documents, from OpenDocument format, to old Microsoft Docs, to PDFs and even plain text. That's where OpenSearchServer comes into play. OpenSearchServer is, as the name implies, an open-source project designed to perform the function of crawling through and indexing large collections of documents, such as you would find on a website.

I'm going to show you how to go about getting this documentation site set up from scratch so that you can see all the steps. You may, of course, already have a web server up and running, and that's fine. I've gone ahead and spun up a Linode server running Ubuntu 18.04 LTS. This is a great way to get a server up and running quickly without spending a lot of money if you don't want to, and if you've never done this, it's also kind of fun.