Findegil: an open source ecommerce search engine

Software unlocked

My two obsessions that led to the launching of 2701 Labs are (a) how do software products change in nature when incorporating LLMs; and (b) how software development is totally unlocked with coding agents.

This is a story about the latter.

I have been a software engineer for 25 years, working on projects big and small, in teams of one and teams of a hundred. However, I feel I can move and create software faster than ever before.

When I read Mercadona Tech’s CTO detailed post on their quest to vibecode build a search engine that worked for them, I decided to build a clean room, reference implementation as open source, something I would have never been able to do before.

Full disclosure I don’t work at Mercadona Tech but I have history with the project, since we helped them build their initial team, back in 2017. Also, Jose, their CTO, is a very good friend of mine.

For those outside Spain: Mercadona is the country’s largest supermarket chain, like by a lot.

In their post, Jose detailed why, after years using a commercial, SaaS search engine service, they decided to build their own.

Their main point is that they were happy to put up with a generic search engine, which is not necessarily the best fit for groceries (small title, description, etc.), even while seeing a significant amount (~4%) of queries return no results, and paying a steep cost for their search volume. They were happy putting up with it, until they weren’t.

They got their confidence to make a switch while making a vibe coded experiment during a weekend, building out from there and reaching a bespoke architecture that worked for them, at a fraction of the cost, with significantly more flexibility.

In a world where advanced coding agents are readily available, they could use their intuition and experience to build something they couldn’t, or wouldn’t, do before.

The beautiful thing is that they publicly documented their algorithms, tech stack and, critically, their governance rules. With that blueprint, I decided to build Findegil, a clean room, reference implementation of their search engine.

Findegil at 10,000 feet

Findegil is a small, fast, transparent e-commerce search engine. It is an open-source package built on a simple premise: the vast majority of e-commerce catalogs fit comfortably in the RAM of a single CPU machine.

It executes a four-stage process (as per the Mercadona playbook):

  1. Normalize the query.
  2. Run two retrievers in parallel (BM25 and e5-small embeddings).
  3. Fuse the results with Reciprocal Rank Fusion (RRF) and filter by a tenant bitset (if applicable).
  4. Rerank the final candidates with CatBoost YetiRank, a Learning-To-Rank algorithm.

The headline feature is extreme efficiency without sacrificing relevance. It delivers sub-15ms p99 latency on a single CPU core while keeping top-tier ranking quality (0.922 MRR@10 on the WANDS dataset). There are no GPUs, no external clusters, and no vector databases. To be clear about what this is not: it is not a general-purpose document search, it’s tailored for a specific set of use cases around e-commerce and online groceries stores.

Deep dive and drifts from the original playbook

The implementation closely follows the original playbook, adapted and generalised for an open source project.

  • Normalization: I enforce strict NFKC unicode normalization, casefolding, and whitespace collapse. This runs on the exact same code path at index time and query time.
  • Lexical and Semantic Layers: For the lexical layer, I use tantivy-py embedded in-process. There is a slight forced drift here: tantivy-py does not expose BM25 parameters and they are fixed at (k1=1.2, b=0.75), which is not necessarily great for short documents, but we can probably fix it in the future. For semantic search, I use the wonderful e5-small exported to ONNX INT8.

Where Findegil diverges: The biggest difference is data. Mercadona had years of private click logs, which I did not have when building this. Findegil ships with a synthetic click pipeline derived from Amazon ESCI labels, running an Inverse Propensity Weighting (IPW) recovery test to assert that a model trained on these synthetic clicks matches a model trained on perfect labels.

Findegil’s golden set evaluation uses the public WANDS dataset, which is basically the benchmark for this type of project.

It’s important to note that the YetiRank LTR layer is wired but disabled by default, because in reality what users will want is to ingest their own data.

Vibecoding or agentic engineering

I would say 99% of the code for Findegil has been written by either Claude Code (Opus 4.7) or Codex (GPT 5.5). But I would not say this was a vibecoded project, I prefer to use Simon’s take on agentic engineering.

I used Mercadona’s original playbook as a starting point and built the project in phases, with their original CLAUDE.md as a kind of constitution on latency budgets, external dependencies and complexity. And if the agent wants to change an architectural rule, it must write an Architecture Decision Record (ADR) first. You can see the full repo history, too!

Some tips and lessons from the last couple of weeks building Findegil:

  • Writing architectural decisions down up front is definitely worth it. These are the foundations of the project. Steering them with experience and intuition is not as easy as people online try to convey.
  • Tight feedback loops via CI gates (latency tests, golden set regressions and clear metrics) are like super powers for the agent, because it can iterate by itself until done with something.
  • I followed a phased roadmap (Phase 0 through 4) with evaluation files at the end of each phase. I wonder if this was the fastest way to reach my end goal and the next time I will probably build an end to end PoC and then grow from there.
  • I can’t really decide if I like Codex or Claude Code more, yet. I use them in a pretty barebones fashion: no skills, no plugins, just the harness and the model. We’ll see if I stick to one of them or I move towards using Pi and switching between models.

What’s next

I would love to see Findegil being used and improved for retail and other use cases out there. At the moment it’s kind of a research preview, but there is a lot of potential for these type of small, bespoke-ish projects and I am dying to see where we go from here.

If you have a real catalog, I would love to see you clone the repository and report back, especially with real click-log data we can use to train the LTR model!

What a fun few weeks!