Feedback on our instant search experiment

feature · May 20, 2024, 2:44pm

Recently we've embarked on a two-month-long experiment to prototype a new search experience on Discourse.

Please give it a test at https://meta.discourse.org/instant-search

Features

Fast search
Ability to search topics, posts, chat messages and users
- Posts and Topics results include PMs
- Chat Messages include private channels and DMs
UI-based filters for things like tags, categories, users, inboxes, channels, etc.
Keyword, Semantic, Hybrid and Hyde search modes

FAQ

Search stops working after a while on the page

Indeed it does; please refresh.

It doesn't support our search grammar, like @user or #category

Indeed it doesn't, but it's something that can easily be added if we decide to ship this.

Topic and Post search targets being distinct is a weird choice

I can see why, especially if you are used to how Discourse search works for the last decade. If we decide to ship this, we could build a mode that does both at the same time, or even simply run both and show them both in the UI. For the constraints of this experiment, this was the easiest way to address both use cases of:

I know that this topic exists and just want to find it (Topic search)
I want to research any occurrence of this query (Post search)

Results quality isn't quite there

We barely touched what's possible here; at the moment, we only prioritize categories and assign weights to title and body. This would need further tweaks to match the refinement we have on the existing search, but also brings the possibilities of going further. Unfortunately, a lot is controlled via the JS API, and the library we are using hamstrung us quite a bit here.

Semantic / HyDE / Hybrid are slow

We added a larger debounce on those to work around some annoyances on the JS library we are using. If we decide to ship this, this JS library is the first on the chopping block. As for the overall speed of those, they depend on two requests, the first one for embeddings, which is running on ancient hardware on AWS, and that doesn't help. We could also inject embeddings at the middleware proxy for cutting down the latency. Again, experiment time constraints.

Technical details

This experiment is using Typesense, an Algolia open-source clone. It's running in an EC2 instance in the same place as everything else on the Meta hosting.
The front-end doesn't request from Typesense directly; instead, all calls are proxied via the Discourse app, using a Rack middleware.
The search bar / results / refinements is using InstantSearchJS wrapped in EmberJS. Unfortunately, this library caused a lot of trouble, and we won't be using it if we ship this.
The server is using 7.35 GB of RAM to index all of Meta. Just keep in mind that most of that is because of embeddings; it would be less than 2 GB without embeddings.

Discuss this on our forum.