SUMMARY These results are on a NV78 system (ns-x4200-18) before and after BFU with the AMD64 assembly performance improvements for ARCFOUR and MD5. ARCFOUR openssl speed: speedup ranges from 1.2 (16 bytes) up to 3.4 (8KB) Krishna's src4_scale kernel module: speedup ranges from 3.1x for 1 thread down to 2.8x for 4 threads MD5 openssl speed: slows down by 8% for 16 bytes, down 4% for 64 bytes, up 1.06x for 256 bytes, up 1.3x for 1KB and speeds up 1.6x for 8KB Krishna's smd5_scale kernel module: speedup ranges from 1.7x for 1 thread down down to 1.4x for threads Krishna's pk11md5perf-64: speedup is 1.5x BEFORE - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - BEFORE # uname -a;head -1 /etc/release;mount |head -1;grep bfu /etc/motd BEFORE SunOS ns-x4200-18 5.11 snv_78 i86pc i386 i86pc BEFORE Solaris Express Community Edition snv_78 X86 BEFORE / on /dev/dsk/c0t0d0s0 on Fri Dec 7 15:34:23 2007 BEFORE BEFORE - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - BEFORE + /usr/sfw/bin/amd64/openssl speed -evp rc4 -elapsed -engine pkcs11 BEFORE engine "pkcs11" set. BEFORE You have chosen to measure elapsed time instead of user CPU time. BEFORE To get the most accurate results, try to run this BEFORE program when this computer is idle. BEFORE Doing rc4 for 3s on 16 size blocks: 8446974 rc4's in 2.99s BEFORE Doing rc4 for 3s on 64 size blocks: 3683741 rc4's in 2.99s BEFORE Doing rc4 for 3s on 256 size blocks: 1135423 rc4's in 3.00s BEFORE Doing rc4 for 3s on 1024 size blocks: 301083 rc4's in 3.00s BEFORE Doing rc4 for 3s on 8192 size blocks: 38309 rc4's in 3.00s BEFORE OpenSSL 0.9.8a 11 Oct 2005 (+ security patches to 2007-10-13) BEFORE built on: date not available BEFORE options:bn(64,64) md2(int) rc4(ptr,char) des(ptr,cisc,16,int) aes(partial) blowfish(ptr) BEFORE compiler: information not available BEFORE available timing options: TIMES TIMEB HZ=100 [sysconf value] BEFORE timing function used: ftime BEFORE The 'numbers' are in 1000s of bytes per second processed. BEFORE type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes BEFORE rc4 45155.89k 78822.94k 96889.43k 102769.66k 104609.11k BEFORE BEFORE + /usr/sfw/bin/amd64/openssl speed -evp md5 -elapsed -engine pkcs11 BEFORE engine "pkcs11" set. BEFORE You have chosen to measure elapsed time instead of user CPU time. BEFORE To get the most accurate results, try to run this BEFORE program when this computer is idle. BEFORE Doing md5 for 3s on 16 size blocks: 1161394 md5's in 3.00s BEFORE Doing md5 for 3s on 64 size blocks: 1052811 md5's in 3.00s BEFORE Doing md5 for 3s on 256 size blocks: 833327 md5's in 3.00s BEFORE Doing md5 for 3s on 1024 size blocks: 459168 md5's in 3.00s BEFORE Doing md5 for 3s on 8192 size blocks: 88427 md5's in 3.00s BEFORE OpenSSL 0.9.8a 11 Oct 2005 (+ security patches to 2007-10-13) BEFORE built on: date not available BEFORE options:bn(64,64) md2(int) rc4(ptr,char) des(ptr,cisc,16,int) aes(partial) blowfish(ptr) BEFORE compiler: information not available BEFORE available timing options: TIMES TIMEB HZ=100 [sysconf value] BEFORE timing function used: ftime BEFORE The 'numbers' are in 1000s of bytes per second processed. BEFORE type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes BEFORE md5 6198.23k 22459.97k 71110.57k 156729.34k 241464.66k BEFORE BEFORE BEFORE - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - BEFORE # modload src4_scale BEFORE # tail -10 /var/adm/messages BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 1 threads BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 107758000 bytes/sec BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 2 threads BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 207468000 bytes/sec BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 3 threads BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 304878000 bytes/sec BEFORE Dec 7 16:49:06 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 4 threads BEFORE Dec 7 16:49:07 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 409836000 bytes/sec BEFORE BEFORE # modload smd5_scale BEFORE # tail -10 /var/adm/messages BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 1 threads BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 220689000 bytes/sec BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 2 threads BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 365714000 bytes/sec BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 3 threads BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 548571000 bytes/sec BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 4 threads BEFORE Dec 7 16:49:16 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 711111000 bytes/sec BEFORE BEFORE - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - BEFORE # ./pk11md5perf-64 BEFORE 1 slots were detected BEFORE 0. Sun Metaslot [0] BEFORE BEFORE Overall Throughput: BEFORE =================== BEFORE Finished 10000 ops in 50791974 nanosecs (50.79 ms) BEFORE Data Rate: 196881.50 ops/sec 201606658.93 bytes/second BEFORE Using: 1 children (each 10000 operations) BEFORE BEFORE = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = AFTER = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = AFTER AFTER # uname -a;head -1 /etc/release;mount |head -1;grep bfu /etc/motd AFTER SunOS ns-x4200-18 5.11 nv78_ac4md5 i86pc i386 i86pc AFTER Solaris Express Community Edition snv_78 X86 AFTER / on /dev/dsk/c0t0d0s3 on Fri Dec 7 16:11:45 2007 AFTER bfu'ed from /export/home/tmp/archives.nv78.ac4md5 on 2007-12-07 AFTER - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AFTER + /usr/sfw/bin/amd64/openssl speed -evp rc4 -elapsed -engine pkcs11 AFTER engine "pkcs11" set. AFTER You have chosen to measure elapsed time instead of user CPU time. AFTER To get the most accurate results, try to run this AFTER program when this computer is idle. AFTER Doing rc4 for 3s on 16 size blocks: 10695637 rc4's in 2.99s AFTER Doing rc4 for 3s on 64 size blocks: 7267339 rc4's in 3.00s AFTER Doing rc4 for 3s on 256 size blocks: 3192053 rc4's in 3.00s AFTER Doing rc4 for 3s on 1024 size blocks: 984844 rc4's in 3.00s AFTER Doing rc4 for 3s on 8192 size blocks: 132081 rc4's in 3.00s AFTER OpenSSL 0.9.8a 11 Oct 2005 (+ security patches to 2007-10-13) AFTER built on: date not available AFTER options:bn(64,64) md2(int) rc4(ptr,char) des(ptr,cisc,16,int) aes(partial) blowfish(ptr) AFTER compiler: information not available AFTER available timing options: TIMES TIMEB HZ=100 [sysconf value] AFTER timing function used: ftime AFTER The 'numbers' are in 1000s of bytes per second processed. AFTER type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes AFTER rc4 57215.04k 155036.57k 272388.52k 336160.09k 360669.18k AFTER + /usr/sfw/bin/amd64/openssl speed -evp md5 -elapsed -engine pkcs11 AFTER engine "pkcs11" set. AFTER You have chosen to measure elapsed time instead of user CPU time. AFTER To get the most accurate results, try to run this AFTER program when this computer is idle. AFTER Doing md5 for 3s on 16 size blocks: 1067123 md5's in 3.00s AFTER Doing md5 for 3s on 64 size blocks: 1005807 md5's in 3.00s AFTER Doing md5 for 3s on 256 size blocks: 882334 md5's in 3.00s AFTER Doing md5 for 3s on 1024 size blocks: 590168 md5's in 3.00s AFTER Doing md5 for 3s on 8192 size blocks: 144608 md5's in 3.00s AFTER OpenSSL 0.9.8a 11 Oct 2005 (+ security patches to 2007-10-13) AFTER built on: date not available AFTER options:bn(64,64) md2(int) rc4(ptr,char) des(ptr,cisc,16,int) aes(partial) blowfish(ptr) AFTER compiler: information not available AFTER available timing options: TIMES TIMEB HZ=100 [sysconf value] AFTER timing function used: ftime AFTER The 'numbers' are in 1000s of bytes per second processed. AFTER type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes AFTER md5 5695.12k 21457.22k 75292.50k 201444.01k 394876.25k AFTER AFTER - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AFTER AFTER # modload src4_scale AFTER # tail -10 /var/adm/messages AFTER Dec 7 16:38:53 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 1 threads AFTER Dec 7 16:38:53 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 337837000 bytes/sec AFTER Dec 7 16:38:53 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 2 threads AFTER Dec 7 16:38:54 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 602409000 bytes/sec AFTER Dec 7 16:38:54 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 3 threads AFTER Dec 7 16:38:54 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 852272000 bytes/sec AFTER Dec 7 16:38:54 ns-x4200-18 src4_scale: [ID 578720 kern.info] NOTICE: Starting 4 threads AFTER Dec 7 16:38:54 ns-x4200-18 src4_scale: [ID 368245 kern.info] NOTICE: Encrypted 1162790000 bytes/sec AFTER AFTER AFTER # modload smd5_scale AFTER # tail -10 /var/adm/messages AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 1 threads AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 376470000 bytes/sec AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 2 threads AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 609523000 bytes/sec AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 3 threads AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 872727000 bytes/sec AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 578720 kern.info] NOTICE: Starting 4 threads AFTER Dec 7 16:39:28 ns-x4200-18 smd5_scale: [ID 993062 kern.info] NOTICE: Digested 984615000 bytes/sec AFTER - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - AFTER # ./pk11md5perf-64 AFTER 1 slots were detected AFTER 0. Sun Metaslot [0] AFTER AFTER Overall Throughput: AFTER =================== AFTER Finished 10000 ops in 32917783 nanosecs (32.92 ms) AFTER Data Rate: 303787.18 ops/sec 311078068.78 bytes/second AFTER Using: 1 children (each 10000 operations) AFTER AFTER - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -