The great improvements in the capabilities of Lucene and Solr open source search technology have created rapidly growing interest in using them as alternatives to other search applications. As is often the case with open-source technology, online community documentation provides rich details on features and variations, but it provides explicit direction on which technologies would be the best choice. So when is Lucene preferable to Solr and vice versa?
There is no single answer, as Lucene and Solr are complementary technologies that bring very similar underlying capabilities to bear on somewhat distinct problems. Solr is versatile and powerful, a full featured, production-ready search application server requiring relatively less formal software programming. Lucene presents a collection of directly callable Java libraries, with fine-grained control of machine functions and independence from higher-level protocols.
The functions of Solr & Lucene are highly familiar, if not just the same. If you are building an app for the enterprise sector, for instance, you will find Solr an almost 100% match to your business requirements: it comes ready to run in a servlet container such as Tomcat or Jetty, and ready to scale in a production Java environment. Its RESTful interfaces and XML-based configuration files can greatly accelerate development and maintenance.
In fact, Lucene programmers have often reported that they find Solr to contain "the same features I was going to build myself as a framework for Lucene, but already very-well implemented." Once you start with Solr, and you find yourself using a lot of the features Solr provides out of the box, you will likely be better off using Solr’s well-organized extension mechanisms instead of starting from scratch using Apache Lucene.
Searching with Solr
The data once imported was not very large, only 50GB worth of data overall. This again could be managed by adjusting the field types, whether data had to be stored or not, and the amount of historical information to be imported. Now that the data was available, searches could be executed on the data.
I also found the packaged Schema Browser was very handy. Admittedly, the Schema Browser takes a while to process all the fields in the index so if you have a lot of data this can take a while. However the benefit is that it can provide answers to some of the more common questions that could be asked such as: the number of documents per value which can help for groups of items such as types of orders; how many documents actually have parent accounts; how orders are provided by various sending systems;how many orders are for a given state or postal code; etc.
The data can also yield additional insights from more advanced searches such as faceted searches, such as what postal codes are responding to which advertising or product promotions; which areas have the most activity for certain types of orders; or, how many domains are covered per type of account. And the list goes on.
Operationally speaking, the Solr instances were managed in one of two ways: periodic updates from the main production instances or continual updates with application code not only adding data to the Oracle database but inserting them into the Solr index as well. Hence the operations against the existing production instances could be managed to minimize impacts and eliminate any unnecessary processing.
If, on the other hand, you don’t want to make any calls via HTTP, and want to have all of your resources controlled exclusively by Java API calls that you write, Lucene may be a better choice. Lucene works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java application. Some programmers set aside the convenience of Solr in order to more directly control the large set of sophisticated features with low-level access, data, or state manipulation, and choose Lucene instead, for example with byte-level manipulation of segments or intervention in data I/O. Investment at the lower level enables development of extremely sophisticated, cutting edge text search and retrieval capabilities.
As for features, the latest version of Solr generally encapsulates the latest version of Lucene. As the two are in many ways functional siblings, spending time on gaining a solid understanding how Lucene works internally can help you understand Apache Solr and its extension of Lucene's workings.
Conclusion
With these new capabilities, answers to key questions can be found in seconds. Data can be mined quickly, efficiently and flexibly without a lot of specialized training for business users. Additionally, the indexes could be managed in such a way such that additional data could be added for to increase the scope of analysis, or subsets of data could be indexed and searched for specific business reasons such as service outages or legal reasons.
In the end the users were quite happy with the new capabilities provided by Solr which allowed them to address business needs much more quickly and explore new patterns that had eluded them before, and operations was happy since the new capabilities came with little additional hardware or operational costs.
To know more about
Solr and
Lucene check out Lucid Imagination website http://www.lucidimagination.com
Loading...