Second post coming up! The discussion of this week will be about my work on the Math and OpenCL libraries. I will start with some math.
Math:
So far I've implemented the following:
- Vector, Normal, Point and Vector4 classes
- Matrix4x4 class with all necessary transformations including Inverse, Transpose, Rotate, Scale, etc.
- Matrix4x4 and Vector4 are fully SSE optimized and Vector/Normal/Point classes all come in SSE and non SSE versions (memory usage)
The first (and most important) decision I had to make was: "To SIMD or not to SIMD". I decided on the full SIMD approach. SIMD stands for Single Instruction Multiple Data. The main idea is: A single (cpu) instruction is executed and multiple data-elements are processed using this single instruction. Here's some pseudo code to explain:
1: Normal version
float x1 = 2.0, x2 = 4.0;
float y1 = 3.0, y2 = 6.0;
float z1 = 4.0, z2 = 8.0;
float w1 = 5.0, w2 = 10.0;
float add = 23.0;
// 8 instructions (excluding =)
float resultX = x1 * x2 + add;
float resultY = y1 * y2 + add;
float resultZ = z1 * z2 + add;
float resultW = w1 * w2 + add;
2: SIMD version
float4 xyzwOne(2.0, 3.0, 4.0, 5.0) ;
float4 xyzwTwo(4.0, 6.0, 8.0, 10.0) ;
float4 add(23.0);
// 2 instructions: multiply, add
float4 result = xyzwOne * xyzwTwo + add;
// Or just use 1 instruction: mul_add
float4 result = mul_add(xyzwOne, xyzwTwo, add);
For more in depth detail about this topic:
http://fastcpp.blogspot.de/
http://www.thomasdideriksen.dk/misc/SIMD/sse_in_real_applications.pdf
http://download.intel.com/design/PentiumIII/sml/24504301.pdf
etc...
For the fun of it I also implemented normal non SIMD versions of the most common data types for memory usage and (speed)test purposes.
The next step was to test for correctness and speed. The most important and fun part of using SIMD is the potential speed gains we can get! For this I wrote some unit tests to compare various commonly used operations:
- Vector Addition
- Vector Multiplication
- Vector Division
- Dot Product between two vectors
- Cross Product between two vectors
- Matrix4x4 / vector multiplication
Vector vec1(1, 2, 3), vec2(3, 4, 5);
for (i = 0; i < 2^power; ++i)
vec1 = vec1 + vec2;
In total for each operation 10 tests are done with increasing computational time i.e. we start the first time with ~1 million and end with ~1 billion iterations. We time our code and add up to a total time for each operation (SIMD and non SIMD). Finally, we do (Non-SIMD / SIMD) * 100 to get a percentage of how much faster SIMD version is. I've also made sure that the compiler did not optimize away the loops. Here are the results on my system (CPU is most important here!):
Windows 7, Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz, 8GB of ram.
- SIMD / Non-SIMD addition : 5630.32 ms / 6291.36 ms -> SIMD 1.11x faster
- SIMD / Non-SIMD multiplication: 6768.39 ms / 7177.41 ms -> SIMD 1.06x faster
- SIMD / Non-SIMD division : 6857.39 ms / 10171.6 ms -> SIMD 1.48x faster
- SIMD / Non-SIMD dot product : 3508.2 ms / 3502.2 ms -> SIMD 0.983x faster
- SIMD / Non-SIMD cross product : 3510.2 ms / 10938.6 ms -> SIMD 3.11x faster
- SIMD / Non-SIMD matrix / vector mul: 14908.9 ms / 18225 ms -> SIMD 1.22x faster
OpenCL:
I will not go into to much detail about OpenCL. The basic idea is that you can use platform independent code to use different kinds of "Compute" devices like cpu's, gpu's, etc. The devices can be used in parallel to perform tasks like physics simulations, weather prediction, A.I. and you name it! What I've done so far is:
- Detect OpenCL devices on the machine
- Allocate memory on these devices
- Compile program
- https://www.khronos.org/message_boards/forumdisplay.php/87-OpenCL
- https://www.evl.uic.edu/kreda/gpu/image-convolution/
- http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%200&referringTitle=OpenCL%20Tutorials
- http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/introductory-tutorial-to-opencl/
- http://enja.org/2010/07/13/adventures-in-opencl-part-1-getting-started/
- http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/06-intro_to_opencl.pdf
- etc..
Stay tuned for more next week (ish)! :)