When you complete a search, each relevant study receives a ranking score that indicates how closely its content matches your search terms. This score helps prioritize the most relevant documents so that you see the best results first. Our app uses Elasticsearch, a high-powered search engine that excels at finding and sorting data in real-time. Elasticsearch assigns ranking scores based on how often the search terms appear in a document and how unique or informative those terms are in the dataset. It also accounts for document length, ensuring that shorter, more concise documents aren’t unfairly ranked lower than longer ones.
In more detail, the scoring mechanism used by Elasticsearch, particularly for ranking scores, is based on the sophisticated BM25 (Best Matching 25) algorithm. BM25 is a probabilistic model that evaluates document relevance by incorporating term frequency (TF), inverse document frequency (IDF), and field-length normalisation. These factors ensure that documents containing more occurrences of the search term, and where the term is rarer and thus more informative, receive higher scores. Elasticsearch ranks in stages, initially filtering documents with an inverted index and then applying the BM25 score, while also allowing custom adjustments like recency and proximity, and geolocation-based boosting. These combined methodologies enable Elasticsearch to deliver highly accurate and contextually relevant search results, making it a robust backbone for the BIOMATDB search functionality.