For exact string matching, Tarhio et al. [
11] presented a naive search algorithm which uses the SIMD instruction architecture. The algorithm compares
characters in parallel, where
is 16 or 32. The name N32 is used for the variation
32 that uses the AVX2 instruction set [
12]. We present N32I, a modification of N32, as Algorithm 5 to search IUPAC sequences. In N32I, only line 6 is different from the corresponding line of N32.
| Algorithm 5 N32I |
- 1:
construct vector for each - 2:
0; - 3:
while do - 4:
- 5:
for to do - 6:
and SIMDtest( vector) - 7:
if then goto out - 8:
+ popcount() - 9:
out:
|
The key idea of Algorithm 5 is to test
consecutive potential occurrences of the pattern in parallel. For that purpose, a comparison vector is constructed in line 1 for each character of the alphabet. The comparison vector contains
copies of the bit encoding of the character. The algorithm first compares the vector of
with
, then compares the vector of
with
and so on. The bitvector
of 32 bits keeps track of active match candidates. The intrinsic function
_mm_popcnt_u32 [
12] is used to count matches in line 8. The SIMDtest function uses six intrinsic functions [
12], and is shown as Algorithm 6. The purpose of lines 2–6 of SIMDtest is to replace each character in a chunk of
characters with its IUPAC bit representation.
| Algorithm 6 SIMDtest |
- 1:
yp = _mm256_loadu_si256(y) - 2:
ap = _mm256_blendv_epi8( - 3:
_mm256_shuffle_epi8(m1,yp), - 4:
_mm256_shuffle_epi8(m2,yp), - 5:
_mm256_cmpgt_epi8(ap,tp)) - 6:
xp = _mm256_loadu_si256(x) - 7:
return _mm256_movemask_epi8( - 8:
_mm256_cmpgt_epi8( - 9:
_mm256_and_si256(xp, ap),zp))
|
The variables
yp, ap, and xp, as well as the constants m1, m2, zp, and tp are of type
__mm256i. The constant
zp contains 32 null characters. The constant
tp contains 32 ‘O’ characters. The constants
m1 and
m2 for the shuffle are defined as follows:
Because shuffling occurs in units of 16 bytes, both
m1 and
m2 contain two identical chunks of 16 bytes.
SIMDtest (Algorithm 6) works as follows. In line 1, the next 32 bytes in the text are assigned to yp. The suffle instructions switch the characters to bit encoding. Because the suffle instructions operate on lower half-bytes and the character codes of some IUPAC characters share the same lower half-byte (e.g., D and T), we need two suffles. Line 5 forfms a mask for the occurrences of R, S, T, V, W, and Y having higher character codes. The blend instruction in lines 2–5 collects the results of the two suffles. Let us examine how D is transformed. The lower half-byte of D is 4, which corresponds to the fifth byte of m1 on the right. That byte is 13, which is 1101 in the IUPAC bit encoding of D. The result of and operation on the result of the blend operation and the pattern vector is tested against zero, and a resulting vector of 32 bits is formed with the movemask instruction.
In addition to SIMD instructions, loop peeling plays a key role in the efficiency of N32I. The term ‘peeling factor’ refers to the number of iterations in line 6 that moved outside the inner loop.