Wednesday, June 2, 2010

I'm just having fun

It's been too long since I've posted on here. I could bore you with how busy I've been, but I think I will just tell you about some of the exciting things I've been learning. Also I am going to try to mix my blog posts with technical posts and posts aimed at the laymen. This post will be the latter. 

So due to the Memorial day holiday I was in the lab only one day this week. While there I split my time focusing on two topics. The main topic is CUDA, of course, and the other is the C/C++ programming languages. CUDA, as some of you may know, is written and developed using C/C++. This was the first language that I learned in high school but sadly I moved on to object oriented languages like Java and C#. So, I thought I should spend some time doing some review.

I found this great website that took me through two years of high school computer science in about two hours. I also installed Fedora 12 on my old laptop to practice developing on a similar platform at home.  It doesn't have a CUDA device installed but I can use it for C/C++ programming. 

Now what I'm really excited about. CUDA! I spent the rest of the day learning how to organize the multiple threads on the CUDA device. The more I read the more excited I became. This technology is the future, plain and simple. The power that CUDA gives a developer is really hard for me to imagine or visualize right now and I will tell you why.

It's all about threads! Think of a thread as stream of instructions that a compute device interprets. I use the word interpret very loosely here but that is beside the point. The point is that the old way of doing things is that a computer has one CPU that can handle one, two or maybe four threads. Thats it. A CUDA card can handle THOUSANDS, and it can handle all of those threads simultaneously. This is what I was talking about in my first post and is kind of old news. 

So what makes me so excited about it? Well, hold on and I will tell you. As you may imagine, dealing with thousands of anything can be pretty complicated and you probably shouldn't just "wing it". Thankfully the brilliant people over at Nvidia have came up with a very clever way of doing it. This is what excites me. The way that they do it. 

It goes something like this. When you are writing a program and you come to a point where lots of threads will be useful, you write a CUDA kernel. The kernel is what is executed by the threads. The kernel is just a set of instructions that is executed by a thread. Now we need to send this kernel off to a bunch of threads so it can be executed simultaneously. In-order to keep our data from getting lost in all of these threads, there must be a way for each thread to distinguish itself from the others. So when we invoke a kernel we send it off to what is called a grid. A grid is really just the grouping of threads that will execute this particular kernel. A grid has a structure which is 2 dimensional.  Within this 2D structure, or grid, there are 3D structures called "blocks". The blocks are where the threads live and you can use the location of the threads in the block and the blocks in the threads to give each thread a unique ID number. The ID is created using a little mathematical trickery which is pretty cool in its own right, but I won't go into that here. I will post a picture that I drew that lays it all out though. 

So in summary the threads are organized in a 3 dimensional structure, similar to a data structure called an array or matrix, and the locations within the structure are used to derive the unique identifier. I would like to call it a compute structure rather than a data structure but I'm probably just being silly. So thats what I've been up to!

2 comments:

  1. I will be Monday there and we will talk about starting a project.
    Look over this abstract:

    High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA

    Hardware accelerators (such as Nvidia’s CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this talk, I will focus on the use of Nvidia's Tesla GPU for high-precision numerical simulations in the area of gravitational waves. I will describe our approach and present the final performance results as compared with a single-core desktop processor and also the Cell BE.

    ReplyDelete