NHacker Next
login
▲Index 1.6B Keys with Automata and Rust (2015)burntsushi.net
47 points by djoldman 4 days ago | 6 comments
Loading comments...
scrubs 8 hours ago [-]
A darn good write up! It's clarity is refreshing. Well well done. Thanks for posting.
stuhood 11 hours ago [-]
An interesting exercise would be to compare this with the (confusingly similarly named) `fsst` string compression strategy: https://github.com/cwida/fsst
VivaTechnics 11 hours ago [-]
Impressive! This approach can be applied to designing a NoSQL database. The flow could probably look something like this? Right?

- The client queries for "alice123". - The Query Engine checks the FST Index for an exact or prefix match. - The FST Index returns a pointer to the location in Data Storage. - Data Storage retrieves and returns the full document to the Query Engine.

yazaddaruvala 5 hours ago [-]
What you’ve described is the foundation of Lucene and as such the foundation of Elastic Search.

FSTs are “expensive” to re-optimize and so it’s typically done “without writes”. So the database would need some workaround for that low write throughput.

To save you the time thinking about it: The only extra parts you’re missing are what Lucene calls segments and merge operations. Those decisions obviously have some tradeoffs (in Lucene’s case the tradeoff is CRUD).

There are easily another 100 ways to be creative in these tradeoffs depending on your specific need. However, I wouldn’t be surprised if the super majority of databases’ indexing implementations are roughly similar.

sa-code 4 hours ago [-]
Lucene's WFST is an insanely good and underappreciated in-process key value store. Assuming that you're okay with a 1 hour lag on your data.

Keyvi is also interesting in this regard

yorwba 5 hours ago [-]
That wouldn't be a good idea in most cases due to reasons laid out in the "Not a general-purpose data structure" section. https://burntsushi.net/transducers/#not-a-general-purpose-da...