Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
- Apache Lucene is an open source project available for free download.
- Scalable, High-Performance Indexing
- Over 150GB/hour on modern hardware
- Small RAM Requirements – only 1MB heap
- Incremental indexing as fast as batch indexing
- Index size roughly 20-30% the size of text indexed
- Powerful, Accurate and Efficient Search Algorithms
- Ranked searching — best results returned first
- Many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- Fielded searching (e.g. title, author, contents)
- Sorting by any field
- Multiple-index searching with merged results
- Allows simultaneous update and searching
- Flexible faceting, highlighting, joins and result grouping
- Fast, memory-efficient and typo-tolerant suggesters
- Pluggable ranking models, including the Vector Space Model and Okapi BM25
- Configurable storage engine (codecs)
- Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
- 100% pure Java
- Implementations in other programming languages available that are index-compatible
Lucene VS SOLR
- A simple way to conceptualize the relationship between Solr and Lucene is that of a car and its engine. You can’t drive an engine, but you can drive a car. Similarly, Lucene is a programmatic library which you can’t use as-is, whereas Solr is a complete application which you can use out-of-box.
- Apache Solr is a web application built around Lucene with all kinds of goodies. Unlike Lucene, Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat, Resin, etc.
- Solr can be installed and used by non-programmers. Lucene cannot.
- Since Solr uses Lucene under the hood, Solr indexes and Lucene indexes are one and the same thing. There is technically no such thing as a Solr index, only a Lucene index created by a Solr instance.
When should I use Lucene?
- If you need to embed search functionality into a desktop application for example, Lucene is the more appropriate choice.
- For situations where you have very customized requirements requiring low-level access to the Lucene API classes, Solr may be more a hindrance than a help, since it is an extra layer of indirection
How Search Application Works in Lucene
- Acquire Raw Content:
The first step of any search application is to collect the target contents on which search application is to be conducted.
- Build the document
The next step is to build the document(s) from the raw content, which the search application can understand and interpret easily.
- Analyze the document
Before the indexing process starts, the document is to be analyzed as to which part of the text is a candidate to be indexed. This process is where the document is analyzed.
- Indexing the document
Once documents are built and analyzed, the next step is to index them so that this document can be retrieved based on certain keys instead of the entire content of the document. Indexing process is similar to indexes in the end of a book where common words are shown with their page numbers so that these words can be tracked quickly instead of searching the complete book.
- User Interface for Search
Once a database of indexes is ready then the application can make any search. To facilitate a user to make a search, the application must provide a user a mean or a user interface where a user can enter text and start the search process.
- Build Query
Once a user makes a request to search a text, the application should prepare a Query object using that text which can be used to inquire index database to get the relevant details.
- Search Query
Using a query object, the index database is then checked to get the relevant details and the content documents.
- Render Results
Once the result is received, the application should decide on how to show the results to the user using User Interface. How much information is to be shown at first look and so on.
Lucene’s Role in Search Application:
In a nutshell Lucene is the heart of any search application and provides vital operations pertaining to indexing and searching.