Tuesday, August 27, 2013

Inside SharePoint 2013 Search

SharePoint 2013 Search consist of below areas:

Search Topology

We have 6 components in SharePoint 2013 search. In a medium or large farm, we might need multiple application servers for search where we can have Crawl, Index and Query components on different servers respectively to manage latencies. We can have multiple typologies on a server but only one can be active at a time. Below is powershell cmdlet to find active topology on the server.
$ssa = Get-SPEnterpriseSearchServiceApplication
$active = Get-SPEnterpriseSearchTopology -Active -SearchApplication $ssa 
$active

We can also retrieve a list of all search components from a particular topology by using below powershell cmdlet
$ssa = Get-SPEnterpriseSearchServiceApplication
$active = Get-SPEnterpriseSearchTopology -SearchApplication $ssa -Active
Get-SPEnterpriseSearchComponent -SearchTopology $active
By default we would have all the 6 components in our default search topology. To remove/add components to search topology, we first have to create a clone of existing topology, add/romove/move components as per requirement and than active the new cloned topology. This can all be done using powershell cmdlet. For more information, please follow this Technet article. Index component implementation add/remove is little different. You can follow Technet for this information.
Note: Search topologies can also be operated via Central Administration -> Search Service instead of powershell.

Search Components

Crawl

This runs under mssearch.exe and invokes daemon processes MSSDmn.exe during crawl, to fetch content. Crawl items(documents, pages) and their properties (metadata/columns) are sent over to the content processing component for further processing and than to index component. Crawler uses SharePoint object model to access content database during crawl and Search Service account to access search databases. After creating search service application, we have to first configure "Content access account". This account should at least have read permission on the content. Using crawl rules we can have different content access accounts for different contents. Do not provide application pool account of any web application as content access account, if done so it will crawl unpublished content also. For more information on this, click here. If you are using claims-based authentication, make sure that Windows authentication is enabled on any Web applications to be crawled. The crawl component can only crawl a file if its name extension is included in the list of file name extensions on the Manage File Types page. One crawl component can utilize multiple crawl databases for one web application. This is only true if that large single host (web application) is utilizing more than one content database and/or the SharePoint administrator manually invokes a re-balance.

Content Processing

The Content Processing receives crawled content from the crawl component and performs different operations of inserts, deletes and partial updates. Deletions and partial updates operations do not have web service call outs. Operations for new and updated documents, below are the activities

  • Parser is sophisticated version of ifilter which processes document contents to generate indexes. If the content processing component is unable to parse a file, the search index will only include file properties, such as the file name. You cannot override a built-in format handler by installing a third-party iFilter.
  • New crawl properties from document content are stored in Search admin DB.
  • Mapping registered crawl properties with managed properties according to Search Schema settings.
  • Security Descriptors are data structures of security information associated with files, folders, etc. which helps to grant or deny permissions to user or group. If user does not have access a particular document, that document will not be a part of his/her search results.
  • Auto Detect language
  • Web service call outs is a opening to write logic eg. map new managed properties, refine data etc. before indexing.
  • Phonetic Search variation.
  • Word breaking
  • Custom Entity Extraction help to create refiners from words in body text of documents.
  • Metadata Extraction
  • Document Summary 
  • Analytics

Index
The index component host the actual index itself. It receives data from the content processing component and indexes this data. It receives search query requests coming in from the Query Processing Component and returns results back. As I stated before, the index stores both crawled items and their associated properties. The index is more efficient now because it's been broken up into update groups. A single crawled document could be indexed across several different update groups. Yes, each update group contains a unique portion of the index. This allows for partial updates which means if I make a change to a document, only that change is updated within the index of the associated update group instead of the entire document. Security descriptors associated with crawled items are also now solely stored in the index.

Analytics Processing

In SharePoint 2010, the web analytics service application was responsible for analytic processing. In SharePoint 2013, the web analytics service application has been removed and now all analytics processing is performed by the analytics processing component within the Search Service Application. During a crawl, as analytic information like links, anchor text, etc… are discovered, they are eventually processed by the Analytics Processing component. This is referred to as Search Analytics. The analytic process component also processes user initiated analytics like items clicked, etc.. This is referred to as usage analytics. The analytics processing component uses both the links database and analytic reporting database.

Query

When a search query comes in from a web-front end, the Query Processing Component analyzes and processes the query to attempt to optimize precision, recall and relevancy. The processed query is then submitted to the index components. The QPC is where Query Rule matching takes place and will transform the query if there is a match. The Query Processing component also performs word breaking, linguistics\stemming and parsing operations. It packages the results from the indexer and passes them back to the WFE which are passed onto the user.

Admin

The search admin component manages and controls the entire search infrastructure. It maps to a Search Admin database and the search admin component can be made fault tolerant (add additional search admin components) which is yet another improvement over SharePoint 2010 search. The search admin component governs topology changes and stores things like the following: Topology
Crawl and Query Rules
Managed Property Mappings (Search Schema)
Content Sources
Crawl Schedules 3.

Search Services:

SharePoint Search Service 

To verify if your SharePoint Search Service instance is started,
  • Central Administrator\System Settings\Manage Services on Server.
  • Execute below powershell cmdlet:
     Get-SPEnterpriseSearchServiceInstance -Identity "<Server name>"

SharePoint Server Search 15

  • Verify SharePoint Server Search 15 windows service is started - Run\services.msc.
  • mssearch.exe process is started.

SharePoint Search Host Controller

Majority of Search processing is performed by this service and also manages search topology components.
  • Verify SharePoint Search Host Controller windows service is started  - Run\services.msc.
  •  HostControllerService.exe process is started.

Search Processes

mssearch.exe

Only used by search's crawl component.

HostControllerService.exe

This process is responsible for Initializing, stopping and monitoring the search components that run within noderunner.exe process.

noderunner.exe

All component except crawl component run under this process. Each search component has its own noderunner.exe process.



0 comments:

Post a Comment