We hear it all the time. Sometimes, we even say it. Often, you spend more time searching your Intranet for solutions than you do implementing the right one. It seems that finding what you are looking for the first time seldom happens even on the Internet. Then how do you determine which of the thousands of results that you just received in 0.10 of a second really is the answer to your search? What if the result you need is not indexed because of a non-supported file format? You will probably never see it.
For the purposes of this discussion, let’s focus on internal search and what may be happening there. Intranet platforms like SharePoint depend on proper indexing of documents in order to consider and suggest results to users. In a SharePoint environment for example, PDFs, CAD and Outlook file formats require an IFilter in order to index them. According to an article posted on KMWorld written by Susan Feldman, “40% of corporate users reported that they cannot find the information they need to do their jobs on their intranets.” In many cases that may be due to the fact that the index of data being considered by the search engine does not include all of the documents in the organization’s repositories. This may happen for a variety of reasons from improper file extensions, misplaced underscores in the file or directory to non-supported file formats.
It is quite possible that many relevant documents are not being surfaced due to a non-supported document file type. Senior executives may want to ask their CIO for a report of document file types within their environments and compare those results with a list of the file types supported by their search engine and indexing capabilities. A quick comparison of that data could help you sleep better at night, or give you an understanding of why some documents aren’t being suggested.
One Google whitepaper points out that “nearly half of a knowledge worker’s time is non-productive, spent gathering information, converting formats, unsuccessfully searching or recreating content that already exists.” Any mid-level manager can instantly do the math on those costs. It is incumbent on us as IT professionals to do everything in our power to ensure that the Knowledge Workers in our organizations have access to all documents in the portal. One of the best ways we can do that is to make sure all the documents are actionable; read “able to be indexed.”
If we can eke out a 1% increase in productivity by making more documents actionable through better filter technology, the payback is significant. As you might imagine, my company, Discover Technologies, is about to release ACE 3.0 – the latest version of our connector engine – which happens to have new and improved filtering technology designed to accomplish this goal. Regardless of whose filter you use, the takeaway from this blog is meant to be “before you blame your search engine, consider the possibility that improper indexing may be your problem.”