GLESGAE:The Transform Stack

From Pandora Wiki
Jump to: navigation, search

Contents

GLESGAE - The Transform Stack

Introduction

I was going to just blindly march onto getting transform matrices running so you could position and move things about on screen.
That presents a few problems though, as it required a bit more knowledge of how OpenGL deals with things, and I'm trying to go about explaining stuff as plainly as possible. It also didn't really touch on the fact there are a few different Transform Stacks to deal with.
Well, I say stack, but in general only the Fixed Function pipeline retains the stack, the Shader Based pipeline can do whatever it likes... and generally to keep things simple, we only ever use one or two levels of the "stacks."

The added bonus of going through all this, is that we can also implement cameras at the same time that we get things to move about on screen, so it's a kind of win-win situation here!

Meet the Stacks

In a Fixed Function pipeline, you have four main Transform Stacks, each corresponding to View, Projection, Model and Texture manipulation.

These literally are stacks in that you can push additional matrices on top of them, and they will be concatenated in sequence for the final draw of whatever object you're currently rendering. This is particularly useful in the case of the Model Stack in that it makes sense in a hierarchical modelling view.
The stereotypical example being an arm... you'd draw the shoulder, then the upper arm, elbow, lower arm, wrist and hand. When you rotate the shoulder object, the rest of the objects would move and rotate correctly with it. Well, "correctly" is a bit of a misnomer as there'd be no limits on the joint, so you could have it spin round and round like crazy if you wanted at this point... but that's a matter of Physics which we'll get to at some point!

The Texture Stack is also particularly useful for doing quick and dirty effects, such as scrolling textures across models. A good example being indicator arrows following the mesh to direct the player where to go. Just whack a transform matrix that indicates this onto the stack when you want to draw the texture, and OpenGL will do the rest for you.

The View and Projection Stacks deal with the camera. Generally you have two camera types; 2D - or Orthographic - and 3D - generally termed Perspective. These are defined with specific View and Projection matrices, and we'll be implementing these two camera types for both Fixed Function and Shader Based systems very soon.

Gotcha: I've mentioned a separate View Transform Stack. This generally gets multiplied with the Model Stack to become the ModelView Stack, which it's more commonly referenced as. Confused? The View Transform deals with the clipping of the final projection to the screen. Think of it in terms of a Camera; the view finder cuts things out - this is the View transform; the lens can warp, zoom and otherwise manipulate how the scene looks - this is the Projection matrix; and the scene itself, being composed of objects, each have their own Model transform. I reference it as a stack because you could, for example, be taking a picture through a car or train window. That window is another View transform... so by dealing with Views in a stack as everything else, it just works a bit better. It is simpler, honest!

Model Transforms

We'll look at Model transforms first, as these are the easiest and make the most sense.

Every object in the world has a position and rotation. They also have a scale, but usually we leave this alone; the art assets should generally already be the correct size anyway.
These three things get concatenated into a Transform Matrix.. which is usually a 4x4 matrix.
We can actually split these up and deal with them seperately if we like - which is a process called decomposing - whereby we can split this 4x4 matrix into it's components of a 3x3 rotation matrix, a position vector, and a scale vector. In general, they sortof correspond to this thing:

Rotation00 * ScaleX,		Rotation01,			Rotation02,			PositionX
Rotation10,			Rotation11 * ScaleY,		Rotation12,			PositionY
Rotation20,			Rotation21,			Rotation22 * ScaleZ,		PositionZ
0				0				0				1

Gotcha: It's never easy, is it? There are two main conventions to matrices - Row-Major and Column-Major. The above matrix is Column-Major... IE: our Position Vector is in the last column. If you transpose this, you swap it around and it becomes Row-Major and our Position Vector therefore goes in the bottom row, where we've currently got [0, 0, 0, 1]. OpenGL tends to use Column-Major, whereas DirectX uses Row-Major for instance.

Gotcha: Picking Scale out of a Rotation Matrix can be very painful.. it's something I like to avoid as much as possible, and you should too! Ensure your art assets are either of the correct size, or additionally store a separate scale vector so you can demultiply it to get the original rotation matrix should you need it.

Anyway, every object in the world will generally have one of these. It's much more efficient to store a Transform matrix rather than separate Position and Rotation matrices as OpenGL and DirectX are going to be dealing with 4x4 matrices anyway..so you'd have to composea 4x4 every time you want to draw, which all mount up to become something that's not trivially light processing!

I'm not going to get into creating rotation matrices, and rotation objects in space just yet... that'll be later on.. we just need to know what a Transform matrix is, and what it's composed of just now.

View Matrices

View Matrices are a bit more fun.

Rather than just being independent things in space, a View Matrix requires an eye to see through. This does make sense, as you view through an eye - or a window, camera lens, etc.. - and what you can't see gets clipped out of view. They still exist, though. Like a certain cat, just because you can't see inside the box, doesn't mean it's not necessarily there, or when you're swinging a Wiimote about; just because you can't see that expensive lamp behind you, doesn't mean you're not about to smash it to pieces by swinging back.
We also need two other vectors - an Up and Centre vector. This gives us three vectors; an Eye Vector, a Centre Vector, and an Up Vector.

Computing a View Matrix isn't all that hard, it's just a bit lengthy, and again, we store this as a 4x4 Matrix.

Matrix4 createViewMatrix(const Vector3& eye, const Vector3& centre, const Vector3& up)
{
	Vector3 forwardVector(centre - eye);
	forwardVector.normalise();

	Vector3 rightVector(forwardVector.cross(up));
	rightVector.normalise();

	// We recompute the Up vector to make sure that everything is exactly perpendicular to one another.
 	// Otherwise, there can be rounding errors and things go a bit mad!
	// We can also do this to assert and check that our up vector actually matches up properly.
	Vector3 upVector(rightVector.cross(forwardVector));

	Matrx4 viewMatrix;
	viewMatrix(0, 0) = rightVector.x;
	viewMatrix(0, 1) = upVector.x;
	viewMatrix(0, 2) = -forwardVector.x;
	viewMatrix(0, 3) = 0.0F;
	
	viewMatrix(1, 0) = rightVector.y;
	viewMatrix(1, 1) = upVector.y;
	viewMatrix(1, 2) = -forwardVector.y;
	viewMatrix(1, 3) = 0.0F;
	
	viewMatrix(2, 0) = rightVector.z;
	viewMatrix(2, 1) = upVector.z;
	viewMatrix(2, 2) = -forwardVector.z;
	viewMatrix(2, 3) = 0.0F;
	
	viewMatrix(3, 0) = 0.0F;
	viewMatrix(3, 1) = 0.0F;
	viewMatrix(3, 2) = 0.0F;
	viewMatrix(3, 3) = 1.0F;

	// We then need to translate our eye back to the origin, and use this as effectively the View's position.
	// An eye does have a position in space, after all! As does a Window.
	viewMatrix(3, 0) = (viewMatrix(0, 0) * -eye.x + viewMatrix(1, 0) * -eye.y + viewMatrix(2, 0) * -eye.z);
	viewMatrix(3, 1) = (viewMatrix(0, 1) * -eye.x + viewMatrix(1, 1) * -eye.y + viewMatrix(2, 1) * -eye.z);
	viewMatrix(3, 2) = (viewMatrix(0, 2) * -eye.x + viewMatrix(1, 2) * -eye.y + viewMatrix(2, 2) * -eye.z);

	return viewMatrix;
}

So, that's a nice bit of code, but what does it actually *do*?
Well, our Eye vector is the position of the View... our Camera Lens, our Window, our Player's Eyes. As we know, we can decompose our Object's Model Transform and grab it's position, so this should be easy enough to understand what it is.

Our Up Vector is a unit vector ( meaning it's values are only between 0 and 1) which tell us which way is up. In general, Y is up. Sometimes, Z is up. It depends on which side of Mathematics you live on, really. As I tend to think in 2D with a bit of depth, Y is up for me, and is easier for me to think in, so my Up vector is [ 0, 1, 0 ]. Of course, this can change if you're floating about in space, or if you tilt your head back to look up - your eye's up vector has changed as it's effectively pointing straight out the top of your head, which you've now tilted, so is probably pointing towards the wall behind you now.

The Centre Vector is another odd one. It's calculated by taking the Eye - or Position - Vector and adding the Front Vector to it.
In other words; look forward. Your eye position is in your head ( I'd assume, anyway! ) and your Front Vector is dead ahead. Without turning your head, look to the right. Your eye position hasn't changed, but your forward vector has - it's now to the right. Now, hold up a piece of paper and imagine it's a window ( or hold up a piece of glass or whatever instead. ) Imagine this is your eyes and make it "look" forward. It should effectively just be flat to you. Now make it look right as you would do with your eyes by rotating it. This is exactly what's going on here!
This is also why we translate our eye back to the origin... as think about it, you don't know you're looking through a Window unless you step back and you can see the Window. We've stepped back to observe our Viewing area, and now we want to view through it, so we move it towards us. Our perfect example being a camera view finder; we look at it from the outside to see how much it's going to clip out, then we place our eye to it to view through it.

Simples!

Extra Fun

The centre vector can actually be used in another way.
Generally, you want a camera to face "forward".. however, you may also want a "tracking" camera.. which in this case, you replace the centre vector with the object you want to track.
We'll write some examples soon so you can mess about and see all this in action, don't worry!

Projection Fiddling

Projection Matrices are what give the world a semblance of depth, and can be used to distort the world. I'll show you the two classical projection matrices - one for 2d, and one for 3d - and you can have fun mucking about with them to create your own if you feel like it.

An amusing irony I've found is that the 3d projection matrix is often a bit easier to understand and deal with, so we'll go through that one first.

Matrix4 create3dProjectMatrix(const float nearClip, const float farClip, const float fov, const float aspectRatio)
{
 	const float radians = fov / 2.0F * PI / 180.0F;
	const float zDelta = farClip - nearClip;
	const float sine = sin(radians);
	const float cotangent = cos(radians) / sine;

	Matrix4 projectionMatrix;
	projectionMatrix(0, 0) = cotangent / aspectRatio;
	projectionMatrix(0, 1) = 0.0F;
	projectionMatrix(0, 2) = 0.0F;
	projectionMatrix(0, 3) = 0.0F;

	projectionMatrix(1, 0) = 0.0F;
	projectionMatrix(1, 1) = cotangent;
	projectionMatrix(1, 2) = 0.0F;
	projectionMatrix(1, 3) = 0.0F;
 
	projectionMatrix(2, 0) = 0.0F;
	projectionMatrix(2, 1) = 0.0F;
	projectionMatrix(2, 2) = -(farClip + nearClip) / zDelta;
	projectionMatrix(2, 3) = -1.0F;

	projectionMatrix(3, 0) = 0.0F;
	projectionMatrix(3, 1) = 0.0F;
	projectionMatrix(3, 2) = -2.0F * nearClip * farClip / zDelta;
	projectionMatrix(3, 3) = 0.0F;

	return projectionMatrix;
}

Simple, really.
We need a near clip, a far clip, a field of view, and an aspect ratio.

Our near clip is how far forward our view begins. Noticed how in some games if you get close to a wall, it disappears? This is usually caused by the near clip. Conversely, things that "pop" in to the screen as you view far ahead are caused by the far clip being too close. While it would be tempting to set the near clip as close as we can, and the far clip as large as possible, this causes a lot of issues depending on the resolution of our depth buffer.
If you look at two objects far away in the distance, it can be tricky to tell what's in front of the other. Your graphics card has the same issue. However, whereas our perception can just make things look a bit strange, the graphics card gets confused. It has a finite amount of precision to deal with this, and you can sometimes get "Z-Fighting" whereby two objects are effectively "fighting" to get in front of each other, and the screen flickers between them. It's very annoying. Therefore, give your far and near clip ranges sensible values!

The field of view controls how wide an angle ( in degrees ) you can see through your viewing matrix. For example, looking through a keyhole has quite a narrow field of view as it's more or less looking dead ahead. Your eyes, on the other hand, have quite a wide field of view as even when you look dead ahead, you can generally make out shapes to the left and right of you without having to look in that direction. Now, while you can have it anywhere between 0 and 360, you probably want it somewhere between 30 and 120.

Finally, our aspect ratio is the screen's aspect ratio. This can be either the actual physical device's aspect ratio, or perhaps some in-game monitor. Either way, it's just width / height. Easy.

The calculations themselves are fairly straight forward. If you don't quite understand them, then feel free to pick up a maths book. It's a bit beyond what I want to discuss here - I just want to tell you what's going on, not proof it! But with the above code, you'll get yourself a nice 3d projection matrix.

Now onto the 2d one...

Matrix4 create2dProjectionMatrix(const float left, const float bottom, const float right, const float top, const float nearClip, const float farClip)
{
	const float xScale = 2.0F / (right - left);
	const float yScale = 2.0F / (top - bottom);
	const float zScale = -2.0F / (farClip - nearClip);

	const float x = -(right + left) / (right - left);
	const float y = -(top + bottom) / (top - bottom);
	const float z = -(farClip + nearClip) / (farClip - nearClip);

	Matrix4 projectionMatrix;
	projectionMatrix(0, 0) = xScale;
	projectionMatrix(0, 1) = 0.0F;
	projectionMatrix(0, 2) = 0.0F;
	projectionMatrix(0, 3) = 0.0F;

	projectionMatrix(1, 0) = 0.0F;
	projectionMatrix(1, 1) = yScale;
	projectionMatrix(1, 2) = 0.0F;
	projectionMatrix(1, 3) = 0.0F;

	projectionMatrix(2, 0) = 0.0F;
	projectionMatrix(2, 1) = 0.0F;
	projectionMatrix(2, 2) = zScale;
	projectionMatrix(2, 3) = 0.0F;

	projectionMatrix(3, 0) = x;
	projectionMatrix(3, 1) = y;
	projectionMatrix(3, 2) = z;
	projectionMatrix(3, 3) = 1.0F;

	return projectionMatrix;
}

Looks easier doesn't it? Why I have difficulties getting my head around this sometimes, I'm not sure.. probably because it doesn't quite fit any camera analogy that works so well with the 3d view-projection stuff.

Anyway, we know about the near clip and far clip, and the same rules apply to the 2d projection matrix.
What we're interested in here are the left, top, bottom, and right values.
As you can see from the calculations, these control the scale, so generally these are unit values: Left and Bottom being 0, and Top and Right being 1.
This makes sense as these are normalised values... and OpenGL's 2d origin is on the bottom left corner of your screen. Therefore, Right is 1.0, as that's "full right".. you can't get further than that. Of course, you might want to muck about with the scale, and therefore doubling the projection - so you only see half of what's there - would mean you set Top and Right to 2.0 instead of 1.0. Or to shrink it in half - 0.5.
That's really all there is to it.

Putting It All Together

So, how does all this work? How do we get from drawing an object at x, y, z to it appearing on the screen somewhere?

Well, if we're just drawing one object - that has no parent or hierarchical transforms - directly to the screen, we need one Transform matrix, one View matrix and one Projection matrix. All this combined becomes our ModelViewProjection matrix, and it's literally all multiplied in that order - model * view * projection.

If our Object does have parent transformations, we concatenate them first, so it's finalTransform = parentTransform * objectTransform; and we effectively just go through the chain from bottom up ( so that parentTransform would've been it's finalTransform, and so on... ) till we get to our object. Then we multiply in the view and projection matrices.

Again, as I stated, the view and projection matrices can be stacks.. this works in exactly the same manner as the object transforms - we multiply bottom up to get the final ViewMatrix and final ProjectionMatrix to multiply with; finalTransform * finalView * finalProjection.

Seriously, that's it.

It isn't complicated, it's just a bit strange and you have to think a bit to fully understand what's going on. But once you sit down and break it apart, it's quite straight forward, I think.

Now to actually write some code into the Render Systems to draw things properly!

Checking out the SVN

There isn't actually any useful code for this Chapter that would make it standalone.

Building the Example

There is no example this time, as there'd be nothing to show!
Instead, jump to either GLESGAE:Fixed Function Transformations or GLESGAE:Shader Based Transformations for dealing with the Transform stacks in the correct manner.

Next Time

We're going to actually implement some Transform operations.

Personal tools
community