Tuesday, August 13, 2013

SharePoint 2013 Search Architecture and Configuration

Overview of search
Search Architecture
Executing Queries
Search Verticals
Custom Entity Extraction
Parser
Web service Call –out

Overview of Search

Below image shows all the different products we had up till now. In O14 we had SharePoint Search and FAST Search. Both the products had their own strength and weakness. 
So focus of SharePoint search was on Enterprise while FAST was search application platform. SharePoint search came with SharePoint, so we didn't had to buy anything additional and FAST was much about having scalability for internet site with ability to handle large catalogs. So we had these 2 products and there was a constant tag-for on which one to use.
Now in SharePoint 2013, we only have one product to worry about. FAST search and Enterprise search has been unified into a single extensible platform and the idea was to take the best from both of these products and bring them together. Now the FAST Engine is married to SharePoint crawler.


Extensible Search Platform


  • Enterprise Search
  •  People Search
  • Site Search
  • Video/Media Search
  • Topics Pages and Content by Search are new WCM Feature that allow create search driven sites. So when we are creating internet sites that are driven by products, then this allows us to create these pages easily and have them dynamically populated based on products that somebody is interested
  • My Tasks – In previous version we had to create custom component to fetch tasks for user from various sites. My Tasks pulls all the tasks in different sites together which can also be a part of social networking feature in user’s my site.
  • Customer and partners represent search that we can extend to build out own custom applications.

Search Architecture

One of the biggest differences in Enterprise and FAST search was enterprise Search is based on INDEX being created by crawler. So crawler goes through the content and builds up an index and it is the index that gets searched.
FAST on the other hand didn't crawl. It allowed push into the index known as content API  So you build up an index in FAST by taking content and pushing it into the index.
In Search 2013, Content API is gone. To get the latest SharePoint 2013 content we have new feature called continuous crawl, which keeps the index updated. Continuous crawl is only for SharePoint index and not for other content sources. Continuous crawl runs in parallel, so it does not wait for previous thread to complete before starting a new one. Continuous Crawl refers to change logs. Continuous crawl does not retry any errors generated during the crawl, so Incremental and full crawl is also necessary for Search. So you can have a vision of continuous crawl running continuously, incremental crawl running every night and full crawl running once a week.
Managed property functionality remains the same where we map crawl property with managed property to create keyword query searches against the index.
  • Content Sources: Search architecture needs to connect to different content source and get access to the content. Search engine needs to take individual pieces of content from the content source and crawl through to create an index. Connecting to various content sources is done through .Net assembly connectors and working though the content (document) is called a parser.
  •  Parser is new piece added to 2013 which is similar to iFilter which was used in SharePoint 2010 to index content in the repository.
  • Content Pipeline and CTS Run-time really just represent the functionality of bringing in content through the crawl. CTS Run-time has no extensibility functionality and works internally out of the box. There is no API provided to us to talk to Content pipeline except there is one entry point called web service callout which we will discuss later.
  • Indexing Engine is just a box that represents actual Indexing of content that builds up the index that we will search 
  • Analyzer is used to process user behavior, perform click analysis and supports things like recommendations based on behavior. So search results are presented to anybody based on their behaviors.
  • Query engine itself is responsible to execute queries against the index.
  • Query Pipeline and IMS Run-time are representation of code and functionality  necessary to execute that query and move it from user interface
  • At Query pipeline level search provides RESTful interface, its REST API gives us way to execute query against index.
  • Client framework representation CSOM, so we can use CSOM to execute queries as well.
Using CSOM API we can import and export search settings from DEV to QA to PROD. So we don’t need to write powershell scripts to configure Search anymore. Rule, sources and managed properties can be export but not masterpages, templates or web parts. EG: we cannot export search result page where we have Search Core Result web part configured.
Executing Queries


In SharePoint 2013 search, all above query syntax are unify together so that we have fewer things to worry about.
Keyword Query Syntax is preferred method for executing query against index. It allows us to take manage property and say managed property equal a value eg: Title = SharePoint or Title;SharePoint means Title contains SharePoint. This is basics of keyword query syntax.There are lot of improvement done in Keyword query syntax. We don’t need to use any other query syntax in Search 2013.Everything that is been used in FAST query syntax is now available in Keyword query syntax. SQL query syntax is removed from the product.
REST

This slide shows REST based API into search. So as developers we can use the keyword query syntax into REST based API to get results. So under the keyword section here, you see there is a URL, where there is _api which is pointing to REST full API endpoint search query and then we provide query string argument that refer to keyword query syntax that we want to use. Executing RESTful based query against the search engine is a simple as taking your KQL query and inserting right there in placeholder and you will get result back from the search engine. Any property that was part of keyword query object inside of SharePoint as exposed by REST as query string.In second example above we have selecting properties where we have select property which only returns Title and Rank. Similarly we have example of sorting.

Query throttling


The idea with query throttling is, if you are in multi-tenant environment, it’s important to prioritize and even throttle the queries because there could be lot of demand on the search server and tenants should not impact each other’s performance in same environment. OOB in on premises, query throttling is off by default. The way it works is through the concept of client type. Client type is a mechanism of throttling incoming queries. Client type will assign Search center as P1 priority and custom apps using search query as P3. The throttling being based on query latencies, if the search engine is working very hard to answer lot of queries and there are latencies involved, than lower priorities calls (P3) will receive message “System is too busy try again later”. 

Content Search Web Part Configuration
Add Content Search Web Part on the page as shown below
In this web part we have to configure couple of parameters. Configure Query as shown in below screenshot
Previous in SharePoint 2010, we had to modify xsl of web part to change the look and feel. Now in SharePoint 2013, we modify html display template to change the look and feel.
Change the look and feel of the web part by uploading custom display template Item_ThreeLines-Demo.html under  Site Settings -> Master Pages and Page Layouts -> Display Templates -> Content Web parts. 
In Content Search web part properties. Select the display template from the drop down and change the mapping of managed properties for the fields in the Item Display Template as shown in screenshot below.

Search Verticals Configuration
We have come across search verticals before in Google or Bing. In previous version we had to do lot of customization to create these search verticals. In Search 2013 we just have to do some configuration steps to achieve it. Below is the Production list, whose below crawl properties mapped with managed properties
Region – RegionOWS
Item – ItemOWS
Total – TotalOWS
You will already find OOB search verticals in your search center site. To create a new custom search vertical below are the steps.

Create Search Vertical called ‘Production’ by creating new page from “Search Results” page layout. Publish this page and add new search navigation link under Site Settings -> Search Settings
Below is how ‘Production’ search vertical will look like after we configure couple of things shown in next few screens.
Create Result source @ site settings -> Result Source. Click on Query Builder to create query transform which would display only results from Production list
Note: Result source in SharePoint 2013 is similar to Scopes we had in SharePoint 2010.
Create display template (DefaultProduction.html) and upload it @ Site Settings -> Master pages and page layouts -> Display Templates -> Search. During upload of this display template, on ItemUpdating/ItemUpdated event its corresponding javascript file is generated by SharePoint internally which is used to modify the look and feel of the web part.  Edit Search Results web part properties on Production Search Result page, click on “Change Query” button. Attach ‘Production’ result source as show below
Also attach the display template “Default Production” and add the managed properties which you would like to display in search results in “Hit Highlighted Properties” as shown below
Now in Everything search vertical, out of thousands of items in Production list, I would like to display only couple of items in Result Block search results and than continue with rest of the results from the site excluding Production list. Below is how, this is configured. Configure Everything search vertical with Query rule to create Result Block which contains only 2 items from Production list. Create a query rule against “Local SharePoint Results” Result Source which is associated with Everything search vertical. Below is the screenshot for the same.
Site Settings -> Query Rule -> Select “Local SharePoint Results” in dropdown -> New Query Rule
Add Result Block

Custom Entity Extraction
To create a refiners in search result page we need to map a crawl property (list column) with managed property. Custom Entity Extraction comes into picture if we do not have crawl property but still we want to create refiners on search result page. For example, we want to refine search results based on the content of documents in document library. With custom entity extraction we can plug our own dictionary of crawl properties into search system. This dictionary is a simple .csv file which we import using powershell. Below we have configured Technologies refiner using Custom Entity Extraction and Companies refiner using OOB Extraction. Note: We are filtering documents based on the content (body managed property) inside the documents. 
Create Technologies.csv file in key and display form format. 
Import this *.csv file and attach it to one of 12 custom extraction directories. 
In Central Administration -> Search Service Application -> Search Schema -> Search for ‘Body’ managed property-> Edit -> 
Execute full crawl Edit Refinement web part on Results.aspx and click “Choice Refiners”

Parser

Whenever we access content source and crawl it, we need 2 things. First we need ability to access the content source and secondly we need ability to crawl items inside them. We had mention that we use .Net assembly connect to crawl inside external content source and use parser to crawl individual items that we find there. Parser is more sophisticated than iFilter where we have 2 concepts. Parser and format handler. Parser detect the document format and does not rely on document extension to identify the document type. Than it calls appropriate format handler to do the parsing. Now OOB Parser can detect more document formats than supported by format handler so iFilter are not gone complete. Wherever Parser does not find appropriate format handler it will revert back to calling iFilter to do the parsing. Deep link extraction identifies relevant headers in documents and display header and its corresponding content in as preview when we hover on search result link. Visual Meta data extraction pulls title authors and dates instead of relying on metadata properties. 

Web service call-out

Allows you to modify manage property and also add or delete one, before the crawling is completed. We can use them for data cleansing. Eg: People are tagging document with company information. So some people will write MS for Microsoft or Microsoft corporation for Microsoft. So using web service call out mechanism we can normalize this data to Microsoft. We can also add new managed properties before the crawling is completed.

2 comments:

Unknown said...

Very useful post. Thanks

Baccarat Bobbili Simham said...

Nice Post!

Post a Comment