In Phase 3 of FuzzyMatchV2, when a cell's left neighbor score is <= 1
and the current character doesn't match the pattern character, the
cell's score is guaranteed to be 0 (since gap penalties are -1 and -3).
Skip the bonus/gap computation entirely and fast-forward through
consecutive non-matching characters in the dead zone.
This yields 6-11% faster fuzzy searches on typical workloads.
There is an edge-case in FuzzyMatchV1 during backward scan, related to
normalization: if string is initially denormalized (e.g. Unicode symbol),
backward scan will proceed further to the next char; however, when the
score is computed, the string is normalized first, then scanned based on
the pattern. This leads to accessing pattern index increment, which
itself leads to out-of-bound index access, resulting in a panic.
To illustrate the process, here's the sequence of operations when search
is perfored:
1. during backward scan by "minim" pattern
```
xxxxx Minímal example
^^^^^^^^^^^^
||||||||||||
miniiiiiiiim <- compute score for this substring
```
2. during compute score by "minim" pattern
```
Minímal exam
minimal exam <- normalize chars before computing the score
^^^^^^
||||||
minim <- at this point the pattern is already fully scanned and index
is out-of-the-bound
```
In this commit the char is normalized during backward scan, to detect
properly the boundaries for the pattern.
Find the last occurrence of the last character in the pattern and
perform the search algorithm only up to that point.
The effectiveness of this mechanism depends a lot on the shape of the
input and the pattern.
- Prefix matcher will trim leading whitespaces only when the pattern
doesn't start with a whitespace
- Suffix matcher will trim trailing whitespaces only when the pattern
doesn't end with a whitespace
- Equal matcher will trim leading whitespaces only when the pattern
doesn't start with a whitespace, and trim trailing whitespaces only
when the pattern doesn't end with a whitespace
Previously, only suffix matcher would trim whitespaces unconditionally.
Fix#1894
* Remove 1 unused field and 3 unused functions
unused elements fount by running
golangci-lint run --disable-all --enable unused
src/result.go:19:2: field `index` is unused (unused)
index int32
^
src/tui/light.go:716:23: func `(*LightWindow).stderr` is unused (unused)
func (w *LightWindow) stderr(str string) {
^
src/terminal.go:1015:6: func `numLinesMax` is unused (unused)
func numLinesMax(str string, max int) int {
^
src/tui/tui.go:167:20: func `ColorPair.is24` is unused (unused)
func (p ColorPair) is24() bool {
^
* Address warnings from "gosimple" linter
src/options.go:389:83: S1003: should use strings.Contains(str, ",,,") instead (gosimple)
if str == "," || strings.HasPrefix(str, ",,") || strings.HasSuffix(str, ",,") || strings.Index(str, ",,,") >= 0 {
^
src/options.go:630:18: S1007: should use raw string (`...`) with regexp.MustCompile to avoid having to escape twice (gosimple)
executeRegexp = regexp.MustCompile(
^
src/terminal.go:29:16: S1007: should use raw string (`...`) with regexp.MustCompile to avoid having to escape twice (gosimple)
placeholder = regexp.MustCompile("\\\\?(?:{[+sf]*[0-9,-.]*}|{q}|{\\+?f?nf?})")
^
src/terminal_test.go:92:10: S1007: should use raw string (`...`) with regexp.MustCompile to avoid having to escape twice (gosimple)
regex = regexp.MustCompile("\\w+")
^
* Address warnings from "staticcheck" linter
src/algo/algo.go:374:2: SA4006: this value of `offset32` is never used (staticcheck)
offset32, T := alloc32(offset32, slab, N)
^
src/algo/algo.go:456:2: SA4006: this value of `offset16` is never used (staticcheck)
offset16, C := alloc16(offset16, slab, width*M)
^
src/tui/tui.go:119:2: SA9004: only the first constant in this group has an explicit type (staticcheck)
colUndefined Color = -2
^
- Make structs smaller
- Introduce Result struct and use it to represent matched items instead of
reusing Item struct for that purpose
- Avoid unnecessary memory allocation
- Avoid growing slice from the initial capacity
- Code cleanup