Episode 2 – What is Solr Cloud

A blog series about Sitecore and Solr Cloud

How to setup Solr Cloud to work with Sitecore SXA using an XP license

  1. To adventure or not to adventure
  2. What is Solr Cloud
  3. How to setup Solr cloud (To be published on 2019-09-11)
  4. How to configure Sitecore SXA to make use of Solr Cloud for an enterprise environment (To be published on 2019-09-11)

This blogpost is the second one in a series about a real-life experience while Setting up Sitecore 9 update 2 using Sitecore SXA in order to work with Solr Cloud. In the next two blogs, I am going to write about the technical solutions, this blog post will give you some technical background on Solr.

The almost autobiographical customer case is “Build an intranet application for a multinational company using Sitecore SXA. Use Solr Cloud as the Enterprise Search solution”. The environment is created in the Microsoft Azure Cloud

Scenario

  • The intranet has to be globally available
  • Support is  needed for possible 20 + languages
  • An intranet is for employees that seek information. Search is one of the most important features. Indexing needs to be done for all Sitecore content, many document types in different versions like Word, PowerPoint, and Excel but also media like JPG and more

Azure Search

Talking about the customer, case the reader might think – Solr Cloud in a Microsoft Azure environment? Really, Why not Azure Search? Well, Azure Search certainly has its advantages. First of all, it is a Microsoft Cloud-native application. Also, it is a service so it fits well in the beautiful Sitecore PAAS environment.

My opinion is that Azure Search can work well with Sitecore. However in an enterprise environment that relies very much on Search, Azure Search seems not mature enough yet. We worked with Azure Search. The conclusion was: Still too many bugs, restriction of 32-bit fields, restriction of the number of fields per index. It has fewer configuration options in regards to sharding en many other features, and above all, it lags some important features in configuring search results.

We will see about the future. Azure Search is public since 2015. Solr is public since 2004. Microsoft is working hard to improve Azure Search so in the future Azure Search might be the preferred one. Our current Choice is Solr Cloud.

From Solr to Solr Cloud mode (Using Sitecore)

First a little bit History – In 2004, Solr was created by Yonik Seeley at CNET Networks as an in-house project to add search capability for the company website.

In January 2006, CNET Networks decided to openly publish the source code by donating it to the Apache Software Foundation.

Solr was developed as java war and eventually evolved to a standalone application In April 2016, Solr 6.0 was released support for executing Parallel SQL queries across Solr Cloud collections. Sitecore supported Solr since  Sitecore 7.0

At first, there was the problem that Sitecore did not support the Solr Cloud functionality yet.

In a production environment, a Solr single instance was not a solid solution. There was a single point of error and no load balancing. It is only suitable for development, test and demo purposes

By using scripting, it was possible to create Load balancing in the Search environment. It was called the master-slave method. The big challenge was that you had to script everything your self. And even then. Possibilities were really restricted.

Since Sitecore 9.1 update 2, Sitecore officially supports Solr Cloud Mode. Solr Cloud mode is Solr functionality that takes of a cluster of independent Solr Servers to work together as an enterprise search platform.

Important to know that Solr Cloud mode is not a different version of Solr. Solr Cloud mode is functionality activated in Solr by configuration. When Solr is in cloud mode it makes use of software called zookeeper. Zookeeper takes care of all communication between the Solr Nodes. It also takes care of load balancing.  

However, Sitecore does not support the load balancing features of Solr Cloud. This is a quite simplified explanation of how Solr Cloud (Zookeeper) by default handles load balancing:

  1. Send a request to a Zookeeper instance.
  2. Zookeeper returns an endpoint of the best available Solr Node.
  3. The application sends a search request using the received endpoint.

Sitecore supports only a single endpoint. Therefore it needs to be configured a little different. A load balancer is needed. In Sitecore, the endpoint of the load balancer needs to be configured as the endpoint. The load balancer will redirect to a healthy Solr Cloud instance. From there on Zookeeper will take over and will handle the communication between the Solr Cloud services. Communication from Solr to Sitecore will happen by using the load balancer again

The beauty of Solr Cloud

Is Solr the best product out there? Solr is a product that already exists for a while. It is robust and future proof. The scenario described in this blog post, we experienced surprises. Indexes grew to bigger sizes a lot faster than expected. This time we got away with only increasing internal memory and JAVA heap size. However, when necessary we could have easily out scaled the environment by adding new nodes.

There is a lot more done or available with which you can organize your Solr environment, make it flexible and robust.

  • A Cluster can host multiple Collections of Solr Documents.
  • A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.

Physical

  • A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
  • Each Node can host multiple Cores.
  • Each Core in a Cluster is a physical Replica for a logical Shard.
  • Every Replica uses the same configuration specified for the Collection that it is a part of.
  • The number of Replicas that each Shard has determined:
    • The level of redundancy built into the Collection and how fault-tolerant the Cluster can be in the event that some Nodes become unavailable.
    • The theoretical limit in the number of concurrent search requests that can be processed under heavy load.

Search capabilities

According to experience, documentation and approved by the community, Solr is good at:

Powerful full-text search, Wildcards, Phrase queries, Regular expressions, Conditional login (and, or, not), Range queries (date/integer), score, result ranking customized as per the application’s requirements. getting relevant content. schema-driven, context-specific facets, dynamic fields

Conclusion:

Are there other options? Yes, there are…

For example for website search there is Coveo. This is a feasible option too. However, that comes with a price tag and it only covers website search. One could also think of other custom solutions.

However, because of strong capabilities and the strong integration with Sitecore, we believe Solr is currently the best option

Because there will be a lot of information I will create a series of blogs about this subject

  1. To adventure or not to adventure
  2. What is Solr Cloud
  3. How to setup Solr cloud (To be published on 2019-09-11)
  4. How to configure Sitecore SXA to make use of Solr Cloud for an enterprise environment (To be published on 2019-09-11)

Leave a Reply

Your email address will not be published. Required fields are marked *