Knuth-Morris-Pratt vs. Boyer–Moore in LISP.
I’ve ‘implemented’ full (i.e. 3*N) Boyer–Moore substring search algorithm in LISP. But I was lazy to analyze the algorithm and essentially used what Wikipedia provides (with its unoptimized good-suffix-skip array calculation). Word ‘imlemented’ above is in quotes since I would not call this implementation, but just a transposing Wikipedia’s C code into LISP.
Both kmp-search and bm-search were compiled into lisp bytecode (using
(compile-file)) and also compared it agaist compiled naive search algorithm and default CLISP
(search) compiled into native machine code.
Knuth-Morris-Pratt vs. Boyer–Moore in LISP
Test generates 1000-chars string, and then randomly selects 1-to-50 characters substring and calls four different test functions to find out given substring in the generated string.
(get-internal-real-time) times for function calls are summed and next iteration starts. 1000 iterations were passed 10 times to get above graphs.
Boyer-Moore source code below. All sources can be found in the archive.
(defun suffix-match (needle nlen offset suffix-len) (if (> offset suffix-len) (and (char/= (elt needle (- offset suffix-len 1)) (elt needle (- nlen suffix-len 1))) (string= needle needle :start1 (- nlen suffix-len) :end1 nlen :start2 (- offset suffix-len) :end2 offset)) (string= needle needle :start1 (- nlen offset) :end1 nlen :start2 0 :end2 offset))) (defun bm-search (haystack hlen needle nlen) (let ((skip (make-array nlen :element-type 'integer)) (occ (make-array 257 :element-type 'integer :initial-element -1))) (if (or (> nlen hlen) (<= nlen 0) (string= haystack nil) (string= needle nil)) (return-from bm-search nil)) (loop for i from 0 to (1- nlen) do (setf (elt occ (char-code (elt needle i))) i)) (loop for i from 0 to (1- nlen) do (do* ((offs nlen) (smatch (suffix-match needle nlen offs i))) ((or (= offs 0) (not (null smatch))) (setf (elt skip (- nlen i 1)) (- nlen offs))) (decf offs) (setf smatch (suffix-match needle nlen offs i)))) (do ((hpos 0)) ((> hpos (- hlen nlen))) (let ((npos (1- nlen))) (do () ((char/= (elt needle npos) (elt haystack (+ npos hpos)))) (when (= npos 0) (return-from bm-search hpos)) (decf npos)) (setf hpos (+ hpos (max (elt skip npos) (- npos (elt occ (char-code (elt haystack (+ npos hpos))))))))))) nil)
Tested like this (load performed multiple times):
> (compile-file "./string.lisp") ;; Compiling file /root/lisp/string.lisp ... ;; Wrote file /root/lisp/string.fas 0 errors, 0 warnings #P"/root/lisp/string.fas" ; NIL ; NIL > (load "string.fas") ;; Loading file string.fas ... kmp: 279147, bm: 477522, dafault-search: 118057, naive: 1510182 ;; Loaded file string.fas T