Fix egregious matching bug

This was introduced six months ago in 301fd66 ("Fix off-by-one error in memoization indexing"). It basically meant that you could repeat any matching character and it would be considered to match, as long as your query string didn't exceed the available space in the haystack. For example, given a file "foobar", the following query strings would all match it: "foo", "fooo", "foooo", "fooooo", "fffbar" etc; but strings such as "ffffbar" and "foooooo" would not (due to being overlength). This is the most egregious bug in the matcher ever, so I'm very grateful to Pavel Sergeev (@dzhioev) for reporting it. Alas, the fix comes with a price; while the fix brings a marginal performance increase for most of our test cases, the "pathological" benchmark gets distinctly slower: Before: user system total real pathological 0.010000 0.000000 0.010000 ( 0.003380) command-t 0.410000 0.000000 0.410000 ( 0.413851) chromium (subset) 1.190000 0.010000 1.200000 ( 0.552776) chromium (whole) 4.540000 0.010000 4.550000 ( 1.557351) After: user system total real pathological 1.750000 0.000000 1.750000 ( 1.751254) command-t 0.380000 0.000000 0.380000 ( 0.375418) chromium (subset) 1.170000 0.010000 1.180000 ( 0.498126) chromium (whole) 4.470000 0.010000 4.480000 ( 1.537441) This reverses the win reported in 301fd66, which at the time I described as "so marked that I am almost suspicious of it". I should have been more suspicious... I'll keep digging into this to see if I can improve the speed of the pathological case, but for now I just want to get this fix out as correctness is more important than speed. Fixes: #82 Signed-off-by: Greg Hurrell <greg@hurrell.net>
wincent · May 24, 2014 · 5621c4e · 5621c4e
1 parent 66d9f6f
commit 5621c4e
Show file tree

Hide file tree

Showing 3 changed files with 17 additions and 5 deletions.
diff --git a/ruby/command-t/match.c b/ruby/command-t/match.c
@@ -1,4 +1,4 @@
-// Copyright 2010-2013 Wincent Colaiuta. All rights reserved.
+// Copyright 2010-2014 Wincent Colaiuta. All rights reserved.
 //
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are met:
@@ -125,11 +125,10 @@ double recursive_match(matchinfo_t *m,    // sharable meta-data
                 }
 
                 score += score_for_char;
-                last_idx = haystack_idx + 1;
+                last_idx = ++haystack_idx;
                 break;
             }
         }
-
         if (!found) {
             score = 0.0;
             goto memoize;

diff --git a/ruby/command-t/matcher.c b/ruby/command-t/matcher.c
@@ -127,7 +127,6 @@ void *match_thread(void *thread_args)
     return NULL;
 }
 
-
 VALUE CommandTMatcher_sorted_matches_for(int argc, VALUE *argv, VALUE self)
 {
     long i, limit, path_count, thread_count;

diff --git a/spec/command-t/matcher_spec.rb b/spec/command-t/matcher_spec.rb
@@ -1,4 +1,4 @@
-# Copyright 2010-2013 Wincent Colaiuta. All rights reserved.
+# Copyright 2010-2014 Wincent Colaiuta. All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are met:
@@ -203,5 +203,19 @@ def ordered_matches(paths, query)
         TODO
       ]
     end
+
+    it "doesn't incorrectly accept repeats of the last-matched character" do
+      # https://github.com/wincent/Command-T/issues/82
+      matcher = matcher(*%w[ash/system/user/config.h])
+      matcher.sorted_matches_for('usercc').should == []
+
+      # simpler test case
+      matcher = matcher(*%w[foobar])
+      matcher.sorted_matches_for('fooooo').should == []
+
+      # minimal repro
+      matcher = matcher(*%w[ab])
+      matcher.sorted_matches_for('aa').should == []
+    end
   end
 end