SurfRay

  • Login
  • Register
    Registration
    *
    *
    *
    *
    *
    Fields marked with an asterisk (*) are required.
Home Articles Tech Blog Tech Talk How to index PDF files in SharePoint

How to index PDF files in SharePoint

pdf60pxMicrosoft SharePoint does not index Adobe PDF files by default. Hence, additional steps are required to enable and perform indexing, to be able to search for the content of PDF files - either by using a standard SharePoint search or Ontolica search extension for SharePoint. There is a number of blog posts on the web from private individuals as well as official sites of Adobe and Microsoft providing suggestions on the topic. Based on their input and my own tests, I have prepared a short compilation of what needs to be done to perform PDF indexing on Microsoft Office SharePoint Server 2007.

Installation (Indexing Server)

1. Install the latest Adobe Reader (version 9.3.1 at the moment of writing)

2. Install the appropriate version of Adobe IFilter:

- Adobe IFilter 6.0 for 32-bit Windows

- Adobe IFilter 9.0 for 64-bit Windows

3. Add the ‘bin’ directory of the installed IFilter to the ‘Path’ environment variable

4. Restart the server

Configuration (Indexing Server)

1. Go to Search Service Administration > Search Settings > File Types > Add New File Type and add “pdf” to the list of indexed documents, if not already listed on the File Types page. You can verify that file format was properly added by checking the registry key, where “pdf” should be listed among other file types:

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0
\Search\Applications\{GUID}\Gather\Portal_Content\Extensions\ExtensionList

If PDF is not listed, add a new string value to the key and give it a new unique name (e.g. 38) and in its data value field type “pdf”.

1

2. While in the registry editor, verify that the following key has correct values configured as shown in the screen shot:

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0
\Search\Setup\Filters\.pdf

2

3. On 64-bit Windows (only!), locate the following registry key:

\\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0
\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Replace the value of the above key with the following one:

{E8978DA6-047F-4E3D-9C78-CDBE46041603}

3

4. Finish configuration by either restarting your server or by issuing the following commands in the command prompt:

C:\ iisreset /noforce
C:\ net stop osearch
C:\ net start osearch

5. Perform a full crawl from Shared Services Administration > Search Settings > Content Sources > Start Full Crawl, and verify on the Crawl Log page that PDF documents were successfully crawled and indexed. Once documents were indexed, it should be possible to search for their content from any search center.

Configuration (Web Front-End Servers and optionally Indexing Server)

1. To get PDF icon to display for search results, copy the ICPDF.GIF file to the following location:

C:\Program Files\Common Files\Microsoft Shared
\Web Server Extensions\12\Template\Images

2. Register the copied icon by editing the following file:

C:\Program Files\Common Files\Microsoft Shared
\Web server extensions\12\Template\Xml\DOCICON.XML

Add a new entry to the file as shown on the screen shot:

4

3. Restart the server.

That concludes all the required configurations. For differences in configuring PDF indexing on Microsoft Search Server Express or using IFilters from different providers, please refer to sources listed in the Acknowledgement section.

 

Acknowledgement

I would like to thank the authors of the following web pages for their valuable input:
http://www.moss2007.be/blogs/vandest/archive/2007/09/19/sharepoint-2007-and-pdf-indexing.aspx
http://bloggingabout.net/blogs/harold/archive/2008/10/02/index-pdf-documents-on-sharepoint-using-adobe-pdf-ifilter-9.aspx
http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

 
Get Trial
Get Price Quote
Get WebDemo

Total Search

Search is the mechanism which enables work processes in knowledge-driven organizations. By understanding the entire context and process of knowledge discovery from fir...

More:

Disable “View duplicates” link in Ontolica search

Disable “View duplicates” link in Ontolica search When performing a search query in Ontolica for SharePoint , the documents and files that have similar text content will be grouped by default. To access and view each sim...
More: