-7.7 C
New York
Thursday, January 23, 2025

Juicebox hires Amazon OpenSearch service to enhance expertise search


This put up is co-written by Ishan Gupta, Co-Founder and CTO of Juicebox.

juice field is an AI-powered expertise search engine that makes use of superior pure language fashions to assist recruiters determine the perfect candidates from an unlimited information set of over 800 million profiles. The core of this performance is Amazon Open Search Servicewhich supplies the spine of Juicebox’s highly effective search infrastructure, enabling a seamless mixture of conventional full-text search strategies with trendy, cutting-edge semantic search capabilities.

On this put up, we share how Juicebox makes use of the OpenSearch service to enhance search.

Challenges in recruiting

Recruitment serps historically depend on easy Boolean or keyword-based searches. These strategies will not be efficient at capturing the nuances and intent behind advanced queries, usually leading to massive volumes of irrelevant outcomes. Recruiters spend pointless time filtering by these outcomes, a time-consuming and inefficient course of.

Moreover, recruiting serps usually wrestle to scale with massive information units, resulting in latency points and efficiency bottlenecks as extra information is listed. At Juicebox, with a database rising to over a billion paperwork and thousands and thousands of profiles searched per minute, we wanted an answer that would not solely deal with large-scale information ingestion and querying, but in addition assist contextual understanding. of advanced queries.

Answer Overview

The next diagram illustrates the structure of the answer.

OpenSearch Service securely unlocks real-time search, monitoring, and evaluation of enterprise and operational information to be used circumstances resembling utility monitoring, log evaluation, observability, and web site search. Ship search paperwork to the OpenSearch service and retrieve them with matching search queries with textual content and vector embeddings for quick, related outcomes.

At Juicebox, we solved 5 challenges with Amazon OpenSearch Service, which we talk about within the following sections.

Downside 1: excessive latency in candidate search

Initially, we confronted vital delays in acquiring search outcomes as a result of scale of our dataset, particularly for advanced semantic queries that require deep contextual understanding. Different full-text serps could not meet our pace or relevance necessities when it got here to understanding the recruiter’s intent behind every search.

Answer: BM25 for quick and correct full-text search

The OpenSearch Service BM25 algorithm rapidly proved invaluable in permitting Juicebox to optimize full-text search efficiency whereas sustaining accuracy. By key phrase relevance scoring, BM25 helps rank profiles based mostly on how seemingly they’re to match the recruiter’s question. This optimization diminished our common question latency from round 700 milliseconds to 250 milliseconds, permitting recruiters to retrieve related profiles a lot quicker than our earlier search implementation.

With BM25, we noticed a virtually three-fold discount in latency for keyword-based searches, bettering the general search expertise for our customers.

Downside 2: Matching intent, not simply key phrases

In recruiting, actual key phrase matching can usually trigger you to overlook out on certified candidates. A recruiter looking for “information scientists with NLP expertise” may miss out on candidates with “machine studying” of their profiles, even when they’ve the appropriate expertise.

Answer: k-NN-powered vector seek for semantic understanding

To handle this, Juicebox makes use of k-Nearest Neighbor (k-NN) Vector Search. Vector embeddings permit the system to grasp the context behind recruiters’ queries and match candidates based mostly on semantic which means, not simply key phrase matches. We keep a billion-scale vector search index that’s able to low-latency k-NN searches, due to OpenSearch service optimizations resembling product quantization capabilities. The neural search functionality allowed us to construct a retrieval augmented era (RAG) pipeline to include pure language queries earlier than performing the search. OpenSearch Service permits us to optimize the hyperparameters of the algorithm for hidden small navigable worlds (HNSW) resembling m, ef_searchand ef_construction. This allowed us to fulfill our latency, restoration and value objectives.

Semantic search, powered by k-NN, allowed us to search out 35% extra related candidates in comparison with keyword-only searches for advanced queries. The pace of those searches remained quick and correct, with vectorized queries reaching 0.9+. keep in mind.

Downside 3: Problem evaluating machine studying fashions

There are a number of key efficiency indicators (KPIs) that measure the success of your search. If you use vector embeddings, you have got a number of decisions to make when choosing the mannequin, tuning it, and selecting the hyperparameters to make use of. It is best to evaluate your resolution to make sure you get the appropriate latency, price, and particularly accuracy. Benchmarking machine studying (ML) fashions for restoration and efficiency is difficult as a result of massive variety of quickly evolving fashions obtainable (such because the MTEB leaderboard in Hugging Face). We face difficulties in choosing and measuring fashions precisely whereas making certain they carry out effectively on large-scale information units.

Answer: Actual k-NN with scoring script in OpenSearch Service

used juice field actual k-NN with punctuation script traits to deal with these challenges. This function permits correct benchmarking by operating brute pressure nearest neighbor searches and making use of filters to a subset of vectors, making certain that restoration metrics are correct. Mannequin testing was simplified utilizing the big selection of pre-trained fashions and ML Connectors (built-in with Amazon Rock and Amazon SageMaker) offered by the OpenSearch service. The pliability to use customized scoring and filtering scripts helped us confidently consider a number of fashions on high-dimensional information units.

Juicebox was in a position to measure the efficiency of the mannequin with detailed monitoring and achieved a recall of 0.9+. Utilizing actual k-NN allowed Juicebox to benchmark extra rapidly and reliably, even on billion-scale information, offering the arrogance wanted for mannequin choice.

Downside 4: Lack of data-driven insights

Recruiters not solely want to search out candidates, but in addition achieve perception into broader expertise trade tendencies. Analyzing lots of of thousands and thousands of profiles to determine tendencies in expertise, geographies and industries was a computationally intensive course of. Most different serps that assist full-text search or k-NN search don’t assist aggregations.

Answer: Superior aggregations with OpenSearch service

OpenSearch Service’s highly effective aggregation capabilities allowed us to create Expertise Insightsa function that gives recruiters with helpful insights from aggregated information. By performing large-scale aggregations throughout thousands and thousands of profiles, we determine key expertise and hiring tendencies, and assist purchasers regulate their sourcing methods.

Aggregation queries now run on over 100 million profiles and return ends in lower than 800 milliseconds, permitting recruiters to generate insights immediately.

Downside 5: Streamline information ingestion and indexing

Juicebox constantly ingests information from a number of sources on the net, reaching terabytes of recent information monthly. We would have liked a sturdy information pipeline to ingest, index, and question this information at scale with out efficiency degradation.

Answer: Scalable information ingestion with Amazon OpenSearch ingestion pipelines

Carrying Amazon OpenSearch Ingestionwe implement scalable pipelines. This allowed us to effectively course of and index lots of of thousands and thousands of profiles every month with out worrying about pipeline failures or system bottlenecks. We use AWS Glue to pre-process information from a number of sources, chunk it for optimum processing, and feed it into our indexing course of.

Conclusion

On this put up, we share how Juicebox makes use of the OpenSearch service to enhance search. We are able to now index lots of of thousands and thousands of profiles monthly, conserving our information recent and recent, whereas sustaining real-time availability for searches.


In regards to the authors

Ishan Gupta is the co-founder and CTO of Juicebox, an AI-powered recruiting software program startup backed by prime Silicon Valley buyers together with Y Combinator, Nat Friedman, and Daniel Gross. He has created search merchandise utilized by hundreds of purchasers to recruit expertise for his or her groups.

Jon Handler is the Director of Options Structure for Search Providers at Amazon Internet Providers, based mostly in Palo Alto, CA. Jon works carefully with OpenSearch and Amazon OpenSearch Service, offering assist and steerage to a variety of consumers who’ve search and log evaluation workloads for OpenSearch. Previous to becoming a member of AWS, Jon’s profession as a software program developer included 4 years of coding a large-scale e-commerce search engine. Jon holds a Bachelor of Arts from the College of Pennsylvania and a Grasp of Science and PhD in Pc Science and Synthetic Intelligence from Northwestern College.

Related Articles

Latest Articles