Archive for the ‘INFR 2350’ Category

If you haven’t heard yet, since it’s pretty much all over the internet with technology websites doing a variety of tests, Nvidia is releasing their next series of GPUs.

The series before the new ones coming out were Nvidia’s GTX 5 series, which were all based on a refreshed Fermi architecture. Nvidia has now moved on from the Fermi architecture and have moved onto a new one; Kepler.

What is so wonderful about the new series of GPUs from Nvidia? Well a lot of people are claiming the release of Nvidia’s GTX 680 card will make them the king of graphics. Is this assumption true? Based on a lot of benchmark results, yes, it is true.

ATI has a competitor for the 680, which is the XFX Radeon HD 7970. Here is a side by side comparison of the two:

Even though the ATI card has more cores and some specs better than Nvidia’s, the following benchmark comparison will show Nvidia still reigns supreme.

Compared to the last generations top tier graphics cards from Nvidia (GTX 580), The 680 completely destroys its old predecessors. Based on benchmarks, here is a comparison of the cards:

The comparison shows a clear dominance from the GTX 680’s performance. Though there is the occasional hiccup where another card trumps the 680, majority of the wins goes to Nvidia’s new card. In addition to that, it uses less energy too! (System Power at full throttle).

 Any problems?

Although there is a very promising future for the GTX 680 and how well it will crunch down on processing, there are still a few hiccups here and there.

A good example are that there have been claims the GTX 680 does not play nice when it comes to SLI, so it doesn’t scale as well compared to the ATI XFX Radeon HD 7970. Though I am sure Nvidia is developing new drivers to solve this issue. New drivers, better performance; or at least we hope.

This problem is occurring due to Nvidia’s complex Kepler architecture which involves their GPU Boost technology which is still fairly new on a new card.

So we will have to wait for Nvidia to improve the GTX 680’s SLI capabilities, even though it a very capable and superior performance card.


So much SLI goodness…

Games before PhysX:

Developers back in the day had to precompute how objects would behave in reactions during an event. This meant the same animation would be played during a certain event. PhysX on the other hand can calculate all required physics simulations in real time, giving unique animations and reactions during events in-game.

What is PhysX?

PhysX is a physics engine that is currently owned by Nvidia. Originally PhysX was a part of an expansion card on computer systems similar to a dedicated Graphics card (GPU), but instead this expansion card housed a Physics Processor Unit (PPU). PhysX was a part of a company called Ageia, which was then bought into Nvidia.

The PhysX engine is used widely throughout today’s games and consoles, from Microsoft Windows, Mac OS X, Linux, PS3, Xbox 360, and the Wii.

PhysX is responsible for calculating all physics simulations in games efficiently and quickly. This makes games more realistic in return, giving objects dynamics and realistic reactions to the environment (i.e Fluids).

How does PhysX work?

Before, an expansion card was needed, which was a dedicated PPU. Now with the evolving technology around us, a graphics card with CUDA enabled cores can partake in the physics calculations.

Here is a brief overview on what CUDA is from my previous post:

Because Nvidia acquired Aegia who developed and owned PhysX initially, Nvidia converted over to physics processing directly on the GPU, eliminating the PPU expansion card. GPUs today are very efficient at churning out calculations in a parallel architecture, therefore crunching down numbers quickly.

If you GPU has at least 32 CUDA cores on it (from Nvidia) and 256MB of VRAM on the GPU, then it will be able to handle the PhysX calculations without an extra card. The advantages of this are having 2 or more GPUs in a system, in SLI configuration, allowing an entire GPU dedicated to PhysX calculations, and the other responsible for rendering out the images. These combo is a very efficient and fast way to chew down processing times.

Here is a brief overview of what SLI is if you do not know from a previous post:

According to Nvidia, “PhysX is designed specifically for hardware acceleration by powerful processors with hundreds of processing cores.” Which means better looking graphics and simulations faster.

Some examples of what PhysX is responsible for in games are the following:

  • Explosions that create dust and collateral debris
  • Characters with complex, jointed geometries, for more life-like motion and interaction
  • Spectacular new weapons with incredible effects
  • Cloth that drapes and tears naturally
  • Dense smoke & fog that billow around objects in motion

Often games will have PhysX options to be enabled or disabled.

Well things don’t have to have PhysX right?

It is true games can run without PhysX, but without it, games can look very lifeless, still, and no very convincing at all. Here are a few comparison videos:

Mirror’s Edge:


Overview of several games: PhysX on and off:


Also here are some several videos of more comparisons on Nvidia’s website:

Overall, PhysX in games is a very important role in making games look realistic and giving computer graphics the extra boost of reality as well.

CUDA is a part of NVIDIA’s parallel computing architecture. Similar to how SLI is NVIDIA’s parallel processing using two GPUs, CUDA helps to dramatically increase computer performance by using the power of the GPU in a system. It stands for Compute Unified Device Architecture.

How do computers process information traditionally?

Traditionally, computers are used to centrally processing on the CPU and GPU. CUDA approaches this approach differently with a new paradigm.

What does CUDA do differently?

CUDA lets developers make their applications more efficient through the access of virtual instruction sets and memory of the parallel computational elements in CUDA GPUs (only on NVIDIA cards). GPUs are already setup in a parallel architecture which allows the execution of several threads at once, versus one single thread at a time.

Data is copied from the main memory to the GPU’s memory. (System RAM to the card’s VRAM). The CPU sends instructions on the processing on the GPU. The GPU can then execute several instructions at the same time (parallel processing) on each core of the GPU. If you think of the CPU cores, they typically have 2-4 cores, while a GPU has hundreds of CUDA enabled cores. So this helps processing speed up dramatically. The results are then copied from the GPU memory back to the main memory where it is accessed.

File:CUDA processing flow (En).PNG

Diagram on the processing flow of CUDA processing. 

I’m sure plenty of other approaches are available to this “parallel processing”

Well, that may be true, but NVIDIA has succeeded in making parallel programming in applications easier. Through their available SDK is an entire toolkit for developers to use, to help them make their applications run faster. CUDA can use high-level languages directly!

CUDA is a part of the evolving world in optimizing computer systems to run faster and more efficiently. Cutting down the required time for applications to process information is helping the world’s applications solve more problems faster in gaming and real world problems (health industries). Let’s keep on finding new innovations for fast and better application handling!

There are several different techniques for shadow mapping in applications. Each one has advantages and disadvantages unique to those techniques, therefore some techniques are more suitable to an application more so than others. Today we will be talking about Light Space Perspective Shadow Maps (LiSPSM) in a general respect. This technique is under the category of Warping shadow mapping techniques.


This technique is based upon the shadow mapping algorithm by Williams. A major problem with Standard Shadow Mapping (SSM) is the artifacts that are created by perspective aliasing. LiSPSM’s goal is to try to minimize and decrease the amount of artifacts by using perspective transformations suitable to certain situations, which creates a better looking shadow map.

What is Standard Shadow Mapping?

Since LiSPSM is based upon SSM, it is only natural to give a brief idea on what SSM is. This technique uses a two pass algorithm, where in the first pass the scene is viewed from the point of view of the light source and a two dimensional depth test image is saved from that view (basic shadow map).

Example of a depth image.

During the second pass of the algorithm, the scene is rendered from the point of view of the observer. As the render draws each pixel, it is transformed into the light source space and tested for visibility to the light source.

If that pixel is not visible by the light source, it is a shadow and the shading on that pixel will be changed accordingly. SSM is the basis for many shadow mapping techniques.

The yellow arrow represents a light source while the red sphere represents the observer and the point of view. 

SSM sounds pretty good and sufficient enough, why make it beter?

SSM creates aliasing problems when objects near the viewer get insufficient shadow map resolution. Far objects look good, while close objects get artifacts in their shadows.

Left diagram shows exaggerated effectiveness of SSM in relation to light source and observer. Right image shows artifacts created in a render.

Okay, so LiSPSM makes SSM better?

LisSPSM is not solely based upon SSM. Another technique that influences LiSPSM is Perspective Shadow Mapping (PSM). In PSM, you transform the post perspective space of the observer (perspective matrix * modelview matrix) and apply light source transformations on the matrix. Depending on where the observer is, the transformation will be different. This technique reduces perspective aliasing (far objects get smaller, closer objects get bigger).

The problems of PSM are:

  • Shadows from behind require a virtual move back of the camera because of the singularity of the perspective transformation
  • Lights can change the type of shadow (point/direction/inverted)
  • A lot of special cases
  • Non-intuitive post-perspective space
  • Shadow quality dependent upon view.
  • Uneven z-distribution

How does LiSPSM put SSM and PSM together and work?

Basically what LiSPSM does, based upon PSM, is use the perspective transformation based on the observers location. It specifies the perspective transformation in light space. This creates an additional frustum to the original view frustum.

The new frustum created, called P, has to enclose the original frustum, called B. Within the frustum B, it contains all interesting light rays, which is essentially all objects that can cast shadows. P is the perspective frustum where it is parallel to the shadow map.

In this diagram, P is the perspective frustum, B is the frustum containing interesting light, the orange yellow pointing downwards is the light source.

This technique will combine several matrices and essentially creates an equally spaced point cloud. What this is, is a cloud containing points which represent light rays available to hit an object. Through the LiSPSM technique, as the object gets closer, the cloud will become denser with the points coming closer to one another, and while it gets further, the points will disperse. This means more detail when closer and less detail when further.

The cloud on the left shows an equally spaced point cloud, consistent throughout, while the cloud on the right shows the transformed point set. 

These scenes show the effect of LiSPSM, where the top left shows the point of view of the light source. It gets closer/more direct to the fence moving to the right. 

This technique also contains directional lighting.

All the math goodness can be found on pages 3, 4 and 5 of

So with the approaching release date of the regular PS Vita on February 22nd 2012, I thought it would be appropriate to write a rough summary on the new handheld. I received my First Edition already and have been playing with it since February 15th last week and so far, it’s given a good impression on the graphics and what to expect on a handheld in the months to come.

Isn’t it just an updated PSP with an extra analog stick?

Nope. This little handheld is an entirely new beast. It features a 4c core ARM cortex-A9 for its CPU (Sony always likes shoving multiple cores in their platforms it seems), an SGX543MP4+ GPU, 512MB or RAM and 128MB of VRAM. There’s more memory overall in this little thing than the PS3! This should give you a brief idea on how much potential for graphics there is for this portable console. But let’s talk more about its graphics.

The GPU:

The GPU that is featured in the PSP Vita is a SGX543MP4+ GPU. Sure it sounds really technical from all the letters and numbers, but it is a GPU made by PowerVR (formerly VideoLogic). So no, Nvidia and ATI are not a part of this handheld. The GPU is a part of the company’s series 5XT and has 4 cores. This GPU can handle up to 7.2 GFLOPS @ 200MHz per core. This is a huge improvement from the last generation handheld, the PSP. The PSP featured a GPU with only 2MB or VRAM versus the 128MB on the Vita.

Though there isn’t really a lot of information listed about the PSP’s GPU to compare to the Vita’s, the PSP’s CPU processed up to 2.6 GFLOPS, way less than the Vita.

So is it like a PS3 like everyone’s saying?

The PS3 is still capable of creating larger and better games nowadays since developers have gotten used to the hardware, finding better and more efficient ways to make games look good. Though for a handheld, the Vita is doing very well graphics wise with some games comparable to the PS3’s. Though with the Vita’s hardware still new to developers, there hasn’t been a lot of time for new ways to create good looking games on this handheld to come up.

But even the launch titles on the Vita look fantastic.

Modnation Racers on the PS3

Modnation Racers on the Vita

Little Big Planet Demo on the Vita


The graphics on this handheld look great, even on the title launches. If you look at the PSPs launch titles, the graphics looked a bit crummy compared to the PSP games released today. When developers start finding new and better ways to make games look good on this little handheld, you can expect some pretty fantastic graphics.

With my anticipation of going out and purchasing the new release of Final Fantasy XII-2 when I have free time, it reminds me of a very important aspect of making a game look realistic. Hair. I’ve always associated with gorgeous pre-rendered cutscenes to the Final Fantasy series, even their feature length film Final Fantasy VII:  Advent Children. I’ve always wondered what it was that made the characters look so good, and of course, I realize now it is hair!

We have a lot of hair, how can we approach creating and rendering hair? Do we render every single strand of hair?

Yes and no, there are several different approaches to rendering hair. Theoretically you could render every single strand of hair, but that would be highly inefficient and would take a lot of processing power. But now with better technology, we are able to render better looking characters. I will be discussing briefly on one of many approaches to rendering hair, since I’m not an expert.

Pre-Rendered Scenes in Final Fantasy VII, notice the block hair.

Pre-rendered scene from Final Fantasy XIII-II, notice the actual hair that is rendered versus a block of hair.

So how do we get it to look all pretty and stuff?

The approach I will be discussing a slightly dated, but was found through an ATI presentation. This hair rendering technique uses a polygonal hair model. Then by applying shaders it creates a realistic looking hair model.

The shader used is a mix of Kajiya-Kay-hair shading model and Marschner’s model presented at SIGGRAPH 2003. The technique uses a simple approximate depth-sorting scheme as well, so this is a simple approach to rendering hair.

The reason for choosing a polygonal model over a line model for hair is because it has a lower geometric complexity which helps depth sorting faster.

What is the Kajiya-Kay Model?

It is a lighting model for anisotropic strands where the hair strands tangent is used instead of the normal in lighting equations. The hair normal is assumed to lay in a plane spanned by the tangent and view vector (V):

What is the Marschner Model?

It is the model for scattering properties of hair. It mainly discusses the primary specular highlight is towards hair tips, while secondary specular highlights are colored and shifted towards hair roots. “Sparkly” hair is the result of the secondary specular highlighting. The math for this model is very complex, so developers will often just try to match these properties visually.

Are there any textures for hair?

Yes, a common texture for hair is stretched out noise. Randomly generated noise in top to bottom line pattern.There is also alpha textures to show where the hair is opaque and translucent. Examples of the textures:

How do the shaders work for hair rendering?

There are both vertex and pixel shaders being used for hair rendering. The role of vertex shaders are to pass down tangent, normal, view vectors, light vectors, and ambient occlusion terms. While the pixel shader is responsible for diffuse lighting using the Kaijya-Kay model, two shifted specular highlights for the hair, and combining the terms.

What all this means is, an ambient occlusion, diffuse term, and specular term are all created by the shaders and combined into a final product:

Hair requires depth sorting:

Depth sorting is needed to draw the hair in back-to-front order to give the hair a look of depth/correct alpha blending. It uses a static index buffer that goes from the inside to outside order, maintaining the back-to-front order of drawing the hair. This lets the program load the patches of hair created and is drawn in order on screen. Not every hair is individually drawn.

What is the rendering pipeline/scheme?

There are 3 passes with this technique to create the final product.

Pass 1 deals with opaque parts of the hair.

Pass 2 deals with transparent back-facing parts

Pass 3 deals with transparent front-facing parts

This sounds pretty easy, why doesn’t everyone use this technique?

The reason is that this technique has cons some developers do not like. It also may not be complex enough to give a desired output. Pros of this technique are low geometric complexity so this allows the vertex engine to run faster (less load) and makes depth sorting faster when drawing hair back-to-front. This approach also allows applications to be run on lower-end hardware using simpler shaders or fixed-function pipelines.

Cons for this approach on the other hand are that this approach is assuming there is close to no animation for the hair model. Hair blowing in the wind or ponytails need to be dealt with separately. This approach is also not suitable for all hairstyles. It is more suited for simple, “flatter” hair styles. Not ones with hair sticking out in peculiar patterns like Cloud from Final Fantasy VII. Nothing too extreme now.

There are ways to optimize this technique for better performance, but I have no been able to grasp it all. But don’t think that this is the only approach to creating hair in 3D computer graphics. There are many new and improved techniques, as well as old and outdated ones. Whatever approach suites the needs for an application can be used.

Often when people are shopping for new computer hardware, such as graphics card to upgrade their current system, consumers will often stumble across the choice of having a “Professional” card or a “Consumer” card. A professional card is usually meant for workstations and for applications related to work and not leisure, while consumer cards are meant for gaming and an average person’s usage.

Well, then I can just get whatever I want right? A good graphics card is a good graphics card.

False. When finding a graphics card, you need to find one that suits your needs. You need the right tool for the job. What are you going to be using your computer for? Video games? Modelling/Maya? If it’s for work, you should invest into a workstation card, while if you’re using the computer for video games, you should just stick with consumer grade cards.

Don’t think one card can only do one task. Professional and consumer cards just do their specific jobs better than one another.

I found 2 GPUs that are pretty much the same, but why is there such a HUGE difference in prices?

To give you an idea of the price difference between consumer grade cards and professional cards here is an example.

A NVIDIA GTX 590 GPU with 3GB of VRAM is around $750-$800, which is pretty much the top of the line consumer GPU in the current series from NVIDIA. On the other side of the spectrum an NVIDIA Quadro 5000 with 2.5GB of VRAM (512MB less than the GTX590) goes for… $1750. Yepp. That’s a whole chunk of a lot more than a consumer grade card. Want to take another step forward? An NVIDIA Quadro 6000 with 3GB of VRAM goes for $4000!

Top of the line gaming card. $750. Still a bit pricey. 

Not even top of the line workstation card, $1750. Definitely pricey.

Well, I’m looking at their specs, it seems the consumer grade cards have more bang for their buck!

Though it may be true that a GTX590 has 1024 CUDA cores, while a Quadro 6000 has only 448 CUDA cores, in a workstation setting with Maya, the Quadro would outperform a GTX590. The GTX590 also has a 327.7 GB/sec memory bandwidth, while the Quadro 6000 only has a 144 GB/sec bandwidth for its memory. So looking at the hardware, it seems the GTX590 should be the more expensive one!

What it really comes down to in these GPUs.

Algorithms. It’s all about the algorithms. Professional cards contain specific drivers that are specially designed to work with work programs such as CAD, Maya, and other work applications. These drivers are reflect on the price of the card, as they drivers too money to research and develop! The investment in more efficient algorithms in a workstation GPU for work applications shows in its price. While on the other hand, consumer cards do not require this extensive research into work related drivers that cost a lot of money. This helps reduce the price of a consumer grade card in the end.

Right click and open image in new tab to see the results!

As you can see in the results, a $4000 workstation GPU loses significantly to a $475 GTX580 card in a 3D game benchmark setting.

So remember, if it’s for work, invest into your work. If it’s for fun, don’t take it too seriously. Time to save up those pennies!