ioremap.net

Storage and beyond

Knuth-Morris-Pratt vs. Boyer–Moore in LISP.

I’ve ‘implemented’ full (i.e. 3*N) Boyer–Moore substring search algorithm in LISP. But I was lazy to analyze the algorithm and essentially used what Wikipedia provides (with its unoptimized good-suffix-skip array calculation). Word ‘imlemented’ above is in quotes since I would not call this implementation, but just a transposing Wikipedia’s C code into LISP.

Both kmp-search and bm-search were compiled into lisp bytecode (using (compile-file)) and also compared it agaist compiled naive search algorithm and default CLISP (search) compiled into native machine code.

Knuth-Morris-Pratt vs. Boyer–Moore in LISP.
Knuth-Morris-Pratt vs. Boyer–Moore in LISP

Test generates 1000-chars string, and then randomly selects 1-to-50 characters substring and calls four different test functions to find out given substring in the generated string. (get-internal-real-time) times for function calls are summed and next iteration starts. 1000 iterations were passed 10 times to get above graphs.

Boyer-Moore source code below. All sources can be found in the archive.

(defun suffix-match (needle nlen offset suffix-len)
  (if (> offset suffix-len)
    (and (char/= (elt needle (- offset suffix-len 1))
	        (elt needle (- nlen suffix-len 1)))
	 (string= needle needle :start1 (- nlen suffix-len) :end1 nlen
		  :start2 (- offset suffix-len) :end2 offset))
    (string= needle needle :start1 (- nlen offset) :end1 nlen
	     :start2 0 :end2 offset)))

(defun bm-search (haystack hlen needle nlen)
  (let ((skip (make-array nlen :element-type 'integer))
	(occ (make-array 257 :element-type 'integer :initial-element -1)))
    (if (or (> nlen hlen) (<= nlen 0) (string= haystack nil) (string= needle nil))
      (return-from bm-search nil))
    (loop for i from 0 to (1- nlen) do
	    (setf (elt occ (char-code (elt needle i))) i))
    (loop for i from 0 to (1- nlen) do
	    (do* ((offs nlen)
		  (smatch (suffix-match needle nlen offs i)))
	      ((or (= offs 0) (not (null smatch)))
	       (setf (elt skip (- nlen i 1)) (- nlen offs)))
	      (decf offs)
	      (setf smatch (suffix-match needle nlen offs i))))
    (do ((hpos 0))
      ((> hpos (- hlen nlen)))
      (let ((npos (1- nlen)))
	(do ()
	  ((char/= (elt needle npos) (elt haystack (+ npos hpos))))
	  (when (= npos 0)
	    (return-from bm-search hpos))
	  (decf npos))
	(setf hpos (+ hpos (max (elt skip npos)
                         (- npos (elt occ (char-code (elt haystack (+ npos hpos)))))))))))
  nil)

Tested like this (load performed multiple times):

[1]> (compile-file "./string.lisp")
;; Compiling file /root/lisp/string.lisp ...
;; Wrote file /root/lisp/string.fas
0 errors, 0 warnings
#P"/root/lisp/string.fas" ;
NIL ;
NIL
[2]> (load "string.fas")
;; Loading file string.fas ...
kmp: 279147, bm: 477522, dafault-search: 118057, naive: 1510182
;; Loaded file string.fas
T

Comments are currently closed.

2 Responses to “Knuth-Morris-Pratt vs. Boyer–Moore in LISP.”

  • Anonymous says:

    А почему без (optimize (speed 3) (space 0) (safety 0) (debug 0))) и иже?

  • zbr says:

    А разницы никакой, погрешность от теста к тесту из-за меняющихся данных наверное на порядок больше чем разница в скорости между оптимизированной и неоптимизированной компиляцией.