Native Apps At The Client & Cloud

Srinivasan Sundara Rajan

Subscribe to Srinivasan Sundara Rajan: eMailAlertsEmail Alerts
Get Srinivasan Sundara Rajan: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Blog Post

Solr vs Azure Search

Search-as-a-service from Microsoft Azure

Microsoft Azure, a cloud platform, is rapidly expanding its scope to include newer enterprise class services. Some of the significant new additions are:

  • Azure Search: Azure Search Service is a fully managed, cloud-based service that allows developers to build rich search applications using REST APIs. It includes full-text search scoped over your content, plus advanced search behaviors similar to those found in commercial web search engines, such as type-ahead, suggested queries based on near matches, and faceted navigation.
  • Azure Machine Learning: Azure Machine Learning makes it possible for people without deep data science backgrounds to start mining data for predictions. ML Studio, an integrated development environment, uses drag-and-drop gestures and simple data flow graphs to set up experiments. For many tasks, you don't have to write a single line of code.
  • Azure Stream Analytics: Azure Stream Analytics is a fully managed service providing low latency, highly available, scalable complex event processing over streaming data in the cloud.

All these new services with a road map for new ones will position Azure as a leading platform in the enterprise adoption of PaaS.

In the following notes, I compare the open source search platform Solr against the capabilities of Azure Search services and note some advantages enterprises may derive by adopting the PaaS implementation of search.

Solr Features Compared with Azure Search
Solr is a fast open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

The following are the some of the aspects in the usage of Solr in enterprises against that of Azure Search. As the open source vs commercial software is a religious debate, the intent is not aimed at the argument, as the most enterprises define their own IT Policies between the choice of Open Source vs commercial products and same sense will prevail here also, the below notes are meant for understanding the new Azure service in the light of an existing proven search platform.

Feature

Usage In Solr

Usage In Azure Search

Installation & Setup

While Solr can be installed as a self-contained engine by using Jetty. Most sites utilize Tomcat as the container for the Solr web application.

As typical of many open source products, there are few more dependencies like Apache Commons, SLF4J and JDK needs to be installed as part of setup.

Being a PaaS platform, Azure Search is a fully managed and readily available service and any of the internal dependencies are managed by the Azure platform.

Schema

Solr works on a pre defined schema and every Solr instance of Solr requires a schema.xml file, which provides the structure of the documents that will be stored as part of that instance.

As typical of any database schema, this consists of two major sections.

Types section - Definition for all types.

Fields Section - Definition of document structures using types.

Solr also supports a Schema less mode , Solr's dynamic field capability reduces up-front configuration requirements for fields with predictable naming patterns. For example, the following dynamic field definition maps any field name with suffix "_i" to the "int" field type.

In Azure Search, a JSON schema that defines the index is needed. The schema specifies the field-attribute combinations supported in your search application. Fields contain searchable data, such as product names, descriptions, customer comments, brands, prices, promotional notifications, and so forth. Attributes inform the types of operations that can be performed. Examples of the more commonly used attributes include whether a field supports full-text search (searchable=true), filters (filterable=true), or facets (facetable=true).

Azure Search uses most typical enterprise data types like Edm.String, Collection(Edm.String), Edm.DateTimeOffset, Edm.Int32.

At this time there is no clear cut documentation on Schema less operations in Azure Search, but mostly this feature can be work around with appropriate field naming conventions.

Document Ingestion (Loading)

Solr provides command line utilities that will help in loading the documents.

There is a also Web Service api which can be invoked for Updating and deleting specific documents.

Solr schema defines a primary key for the document collection, which will be used for Update decisions.

We can upload, merge or delete documents from a specified index using HTTP POST. For large numbers of updates, batching of documents (up to 1000 documents per batch, or about 16 MB per batch) is recommended.

Much like Solr the request pay load will contain a "key_field_name" to uniquely identify a document for updating requests.

Azure Search supports, upload: An upload action is similar to an "upsert" where the document will be inserted if it is new and updated/replaced if it exists. Note that all fields are replaced in the update case.

Searching Documents

Solr is built for searching and hence has rich set of features to support search.

 

  • Faceted Searching based on unique field values, explicit queries, date ranges, numeric ranges or pivot
  • Spelling suggestions for user queries
  • Auto-suggest functionality for completing user queries
  • Simple join capability between two document types
  • Numeric field statistics such as min, max, average, standard deviation

 

  • Function Query - influence the score by user specified complex functions of numeric fields or query relevancy scores.

 

  • More Like This suggestions for given document

To query your search data, your application sends a request that includes the service URL and an api-key for authenticating the request, along with a search query formulated from either OData syntax or a simple query syntax that provides the same functionality. When a query is sent to the Search API, the search engine in Azure Search processes the query and returns the results in a JSON document which can then be parsed and added to the presentation layer of your application.

Azure Search uses a simple query syntax for search text. This syntax is designed to be end-user friendly and is processed in a way that is tolerant to errors.

Azure Search supports a subset of the OData expression syntax for $filter.

Some of the salient features of Solr are also fully supported in Azure Search.

  • Full-text search
  • Scoring profiles
  • Faceted navigation
  • Suggestions for type-ahead or autocomplete
  • Count of the search hits returned for a query
  • Highlighted hits

Value Proposition for Azure Search
As we see from above, Azure Search tries to match the features of Solr in most aspects, however Solr is a seasoned search engine and Azure Search is in its preview stage, so some small deficiencies may occur in the understanding and proper application of Azure Search, however there is one area where the Azure Search may be a real winner for enterprises, which is ‘Scalability & Availability'.

Solr installation require highly competent administrator to ensure that Solr installations scales to 10s of 1000s of documents and yet the searches are load balanced against multiple nodes and the performance is not affected.

Solr adopts a number of features to support this level of massive scalability.

When your data is too large for one node, you can break it up and store it in sections by creating one or more shards. Each is a portion of the logical index, or core, and it's the set of all nodes containing that section of the index.

SolrCloud is the name of a set of new distributed capabilities in Solr. Passing parameters to enable these capabilities will enable you to set up a highly available, fault tolerant cluster of Solr servers. Use SolrCloud when you want high scale, fault tolerant, distributed indexing and search capabilities.

Implementing SolrCloud and associated maintenance requires good knowledge from administrators.

However Azure Search, really makes scalability a much simpler thing. When we provision a new Azure Search service, the following building blocks are automatically managed. A Standard search is allocated in user-defined bundles of partitions (storage) and replicas (service workloads). You can scale up on partitions or replicas independently, adding more of whatever resource is needed.

Every search service starts with a minimum of one replica and one partition. If you signed up for dedicated resources using the Standard pricing tier, you can click the SCALE tile in the service dashboard to readjust the number of partitions and replicas used by your service. When you add either resource, the service uses them automatically. No further action is required on your part.

Increasing queries per second (QPS) or achieving high availability is done by adding replicas. Each replica has one copy of an index, so adding one more replica translates to one more index that can be used to service query requests. Currently, the rule of thumb is that you need at least 3 replicas for high availability.

Most service applications have a built-in need for more replicas rather than partitions, as most applications that utilize search can fit easily into a single partition that can support up to 15 million documents. For those cases where an increased document count is required, you can add partitions.

Summary
As always utilizing a commercial PaaS option comes with a price, but enterprises do find a trade-off between the ease of maintenance and quick go to market on choosing a managed platform versus self-maintained products. Also Azure Search is currently in the beta and hence we may have to wait for deploying mission critical and production applications, but it is worth to get started with pilot projects and it will be in the best interest of Microsoft to quickly make the service to mission critical standards.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).