(818) 986-1436 (818) 986-1360 (FAX)

Copyright (c) 1992 by Nimble Computer Corporation

We show how Baskett's "Puzzle" benchmark can be speeded up at least an order of magnitude by utilizing

It is worth studying packing problems because of their ubiquity in the real world. In addition to the obvious examples from business--e.g., freight loading--there are similarities with real problems in biochemical bonding.

The standard version of Puzzle found in the Gabriel benchmark suite is an embarrassment for the Lisp language because its implementation prefers hacking Fortran-like arrays instead of exploiting Common Lisp's rich set of datatypes and functions [Steele90] to solve the problem in a natural and efficient manner. In particular, the standard Gabriel Puzzle does not take advantage of Common Lisp's excellent bit-vector capabilities [Baker90].

We show how the use of bit-vectors in Common Lisp can speed up "Puzzle" by at least an order of magnitude, and these techniques allow us to achieve on a workstation (80860-based OKIstation(TM)) a speed of 8.3 times the Cray-1 on the old benchmark.

After initialization, the standard Puzzle code linearly searches the puzzle representation for the smallest-numbered empty location. It then tries all of the remaining pieces to see if they can be fit into the puzzle in such a way that this empty location will be filled. If a piece can be fitted, then the code performs a depth-first search for the next empty location and the next piece to be fitted. In many instances, the code will find that it has pieces which cannot be fitted, and initiates a backtrack to remove previously fitted pieces.

The standard code investigates 2,005 placements of the 18 pieces. The speed of the standard code is highly dependent upon the ordering of the pieces, which affects the ordering of the search; a different ordering investigated 10 times as many placements, for example. Interestingly enough, of the 2,005 partially-completed puzzles investigated, 1,565 of them are distinct, meaning that there is little hope of speedup from the "memoization" techniques which have been found effective for other puzzles and games [Bird80] [Baker92]. (The standard implementation of Puzzle investigates surprisingly few configurations, making the ordering of the puzzle pieces appear to have been tampered with to produce shorter searches. See [Beeler84] for deeper analysis of the puzzle itself.)

The standard Gabriel code for Puzzle does not have any errors, but it does show
evidence of a hasty conversion from a non-Lisp language. It cannot decide, for
example, whether to consistently use 0-origin or 1-origin indexing. The
standard code prefers to use the more complex `do` instead of the
simpler `dotimes`, and does not utilize macros like `incf` and
`decf`. None of these stylistic issues should affect performance,
however.

The one obvious stylistic change which could significantly improve performance
occurs at the end of the `place` routine where the puzzle is searched
for the smallest-numbered empty location. The Common Lisp `position`
"sequence" function could be used for this purpose, and it could conceivably
improve performance due to its presumably high level of optimization.

The decision to use bit-vectors in Common Lisp is complicated by the fact that
there are at least 3 different bit-vector models--bit-vectors represented by
bit-arrays, bit-vectors represented by bit "sequences", and bit-vectors
represented by large binary integers. Bit-vectors represented as large binary
integers are *functional*, in that such a bit-vector cannot have a single
bit changed, but the whole bit-vector must be copied. Bit-vectors represented
as bit-arrays can be manipulated in a destructive (imperative) manner, and
*may* therefore have an advantage in reducing garbage collector overhead,
but have some less obvious defects. Bit-vectors represented by binary integers
are only as large as they need to be to represent the highest-numbered bit,
while bit-vectors represented by bit-arrays always occupy their full allocated
length; this difference in sizes can result in higher performance for binary
integers if the integers are often much smaller than the maximum size.

We here discuss only a binary integer version of Puzzle. We suspect that a
bit-array version of Puzzle can be more efficient than one utilizing binary
integer bit-vectors, but the lack of a quick intersection test (e.g.,
`logtest` for binary integers) may scuttle this hope.

Our representation of the puzzle itself is a single binary integer, while the
puzzle piece types are also binary integers. A straight-forward translation
would convert the `puzzle` global vector into a global variable holding
a large binary integer, and convert the global 2-dimensional `p` array
into a global vector of large binary integers. Such a translation would also
place and remove pieces from the puzzle at a high rate; a better solution is to
remember the previous state of the puzzle, which is trivially done by making
`puzzle` into a parameter of the `trial` function. The function
`fit` disappears entirely, to be replaced by `logtest`, while
`place` is accomplished by `logxor`. If we move the empty
location search from the non-existent `place` function to the top of the
recursively called `trial` function, then we no longer have to pass the
address of this empty location as a parameter to `trial`. This search
can be accomplished by the following code, but we will rearrange things so that
a less complex solution can be obtained.

The problem with this solution is that we must construct 2 temporary binary integers before using(defun find-lowest-0-bit (bv) (declare (type (integer 0 *) bv)) (1- (integer-length (logandc2 (1+ bv) bv))))

(defun find-highest-0-bit (bv) (declare (type (integer * (0)) bv)) (1- (integer-length bv)))

With so many different piece "types" to consider, our algorithm should run far
more slowly than the standard code. (Indeed, a preliminary version of this
kind had to be cut off before it finished.) However, since we are trying to
fill the highest-numbered empty position, it is obvious that we should index
these different piece "types" by their largest bit-position, so that
`trial` will consider only the piece types that can actually fill the
empty position. Our index is thus a vector of lists of piece types, which
vector is indexed by the 125 positions in the puzzle; the elements in its last
list, for example, give the piece types which can be used to fill the last
puzzle position. It so happens that the maximum number of elements in any of
these lists is 13, so we could have arranged this information as a 125x13
array. But we are programming in Lisp! Hence, we will keep the
vector-of-lists representation for our index. At the cost of one additional
vector location, we can use a 1-origin for our index and thereby eliminate the
decrement which would otherwise surround every call to `integer-length`.

Given the large increase in types from 13 to 769, it is not obvious whether the 769 piece types and their index can be built fast enough. If we use code similar to that in the standard(defun trial (puzzle) (incf *kount*) (if (eql puzzle -1) t (let* ((j (integer-length puzzle))) (dolist (i (aref index j) nil) (let* ((classi (aref class i)) (ocnt (aref piececount classi))) (unless (zerop ocnt) (let* ((pi (aref p i))) (setf (aref piececount classi) (1- ocnt)) (unless (logtest puzzle pi) (when (trial (logxor puzzle pi)) (format t "Piece ~4D." i) (return-from trial t))) (setf (aref piececount classi) ocnt))))))))

The proper way to build the bit-vector patterns is to build them up recursively
on their 3 dimensions. Since Puzzle utilizes Fortran-like indexing
(fastest-varying-first), we first build the patterns along the `i`
dimension, then the `j` dimension, and finally the `k` dimension.
Once they are built, they can then be shifted into position *in toto*.

(defun definepiece (iclass ii jj kk) ;uses 0-origin indexing. (let* ((iimask (1- (ash 1 ii))) (jjmask 0) (kkmask 0)) (dotimes (j jj) (setf (ldb (byte 5 (* 5 j)) jjmask) iimask)) (dotimes (k kk) (setf (ldb (byte 25 (* 25 k)) kkmask) jjmask)) (dotimes (ioff (- 6 ii)) (dotimes (joff (- 6 jj)) (dotimes (koff (- 6 kk)) (let* ((mask (ash kkmask (+ (* (+ (* koff 5) joff) 5) ioff)))) (push *iii* (aref index (integer-length mask))) (setf (aref p *iii*) mask) (setf (aref class *iii*) iclass) (incf *iii*)))))))

The new Puzzle can almost certainly be speeded up with better implementations
of Common Lisp bit-vector operations on binary integers. In particular, mask
construction using `ldb` and `dpb` must be highly optimized, and
therefore our version should run well on machines like the Symbolics Ivory.

There are two types of potential parallelism in Puzzle--bit-parallelism during
`logtest` and `logxor`, and parallelism due to multiple searches
in parallel. We believe that all of the bit-parallelism has already been
tapped by using bit-vectors, and it is not clear how multiple searches can be
efficiently organized and still be faster than 0.12 second.

We have shown that efficiency and elegance in this algorithm are not unrelated.

Baase, Sara. *Computer Algorithms: Introduction to Design and Analysis*.
Addison-Wesley, 1978.

[Baker90]
Baker, H.G. "Efficient Implementation of Bit-vector Operations in Common
Lisp". ACM *Lisp Pointers 3*,2-3-4 (April-June 1990), 8-22.

[Baker92]
Baker, H.G. "The Gabriel 'Triangle' Benchmark at Warp Speed". ACM *Lisp
Pointers V*,3 (Jul-Sep 1992), 15-17.

Beeler, M. "Beyond the Baskett Benchmark". ACM *Comput. Arch. News 12*,1
(Mar. 1984), 20-31.

Bird, R.S. "Tabulation Techniques for Recursive Programs". ACM* Comp. Surv.
12*,4 (Dec. 1980),403-417.

Gabriel, R.P. *Performance and Evaluation of Lisp Systems*. MIT Press,
Camb., MA, 1985.

[Steele90]
Steele, Guy L. *Common Lisp, The Language; 2nd Ed*. Digital Press,
Bedford, MA, 1990,1029p.

[1] There are some additional minor optimizations, including additional declarations, in this version.