Replace the per-chunk query cache from []Result slices to fixed-size
bitmaps (ChunkBitmap: [16]uint64 = 128 bytes per entry). Each bit
indicates whether the corresponding item in the chunk matched.
This reduces cache memory by 86 times in testing:
- Old []Result cache: ~22KB per chunk per query (for 500 matches)
- New bitmap cache: ~262 bytes per chunk per query (fixed)
With the reduced per-entry cost, queryCacheMax is raised from
chunkSize/5 to chunkSize/2, allowing broader queries (up to 50% match
rate) to be cached while still using far less memory.
By not storing item index twice, we can cut down the size of Result
struct and now it makes more sense to store and pass Results by values.
Benchmarks show no degradation of performance by additional pointer
indirection for looking up index.
- Make structs smaller
- Introduce Result struct and use it to represent matched items instead of
reusing Item struct for that purpose
- Avoid unnecessary memory allocation
- Avoid growing slice from the initial capacity
- Code cleanup
I profiled fzf and it turned out that it was spending significant amount
of time repeatedly converting character arrays into Unicode codepoints.
This commit greatly improves search performance after the initial scan
by memoizing the converted results.
This commit also addresses the problem of unbounded memory usage of fzf.
fzf is a short-lived process that usually processes small input, so it
was implemented to cache the intermediate results very aggressively with
no notion of cache expiration/eviction. I still think a proper
implementation of caching scheme is definitely an overkill. Instead this
commit introduces limits to the maximum size (or minimum selectivity) of
the intermediate results that can be cached.