direct to video

April 19, 2010

agenda circling forth.

Filed under: demoscene, realtime rendering — directtovideo @ 11:35 am

agenda circling forth

youtube video download pouet

Anyone remember what happened this easter weekend? I’m a bit hazy about it myself – because I was in Germany living it up at the last ever Breakpoint. A party / festival that’s been running on the easter weekend for the past 8 years in Bingen am Rhein, it was a unique gathering of 1000+ creative and technical types who go there to show their work or see what the others have done, but stay for the massive, massive party. I can’t overstate how much I’ve enjoyed it over the past few years. The atmosphere is unique – it’s amazing to see the enthusiasm everyone there had just for being part of it.

So much has happened to me personally at that event on previous occasions: hotel food fights; TV appearances I can’t remotely remember; big-screen and stage appearances I’ll always remember; appalling hung over football performances; all-nighters working in the freezing cold, where you had to get a coffee just to hold it; close brushes with hospitalisation; and I never even got thrown out once! (I was “helped out” once though. Cheers Docd!) My record of actually getting something finished for it is patchy; we have won there before, but most years we either end up working all weekend in the hotel to just make the deadline, or giving up immediately and going on a massive bender instead – because the party was too much fun to miss. But seeing as it was the last one ever, we thought we owed it to get something done and support the event. And to actually get it done beforehand. And then go on that massive bender I was talking about.

A couple of months ago we started thinking about what we could do in the time available. Frameranger was our last big piece – a blockbuster, the kind of piece you go into a competition with being pretty confident you’re going to win – but that took far too much time, effort and pain to want to repeat in a hurry. Besides, one problem with Breakpoint is Farbrausch – they co-organise Breakpoint, and it was very likely they were going to show up with another massive production like Debris, which effectively defined the scene in 2007 and who’s influence still ripples across it. Without 6 moths or a year to work on something we probably wouldn’t be able to compete with them if they decided to push something big out. We also realised that we didn’t actually want to compete: it would be better to make something that we liked, that was enjoyable to produce, that had a bit more depth to it, showed more maturity, and that perhaps had some more relevance outside of the scene than the big multi-part spectacular we could make that might win. So, there and then we gave up on winning. That was actually quite liberating, and we got on with making it.

agenda circling forth

First we tried to source a soundtrack. Our first idea was to try and chance it, and contact a major label we admired and asked them to use a certain track we liked. Unfortunately they never even bothered to reply, so we started talking to some musician friends of ours to try and sort something out. They showed us a big load of material that they had been working on and we picked something out, and did a great job fixing it up. Result – we were hooked up with the ideal soundtrack from day one and could design the whole piece around it.
Note: since release, some copyright issues have come to light with the soundtrack due to the large volume of material sampled from one source: “Queen of the Universe” by Socrates. This didn’t get credited in the original release of the production because of a misunderstanding between the musicians and the artists; we’re working to resolve it ASAP.

We decided early on that the approach we should take with the visuals was to try and do everything with particles using a developed version of the particle system from Blunderbuss. It meant we had the ability to make something much more organic and flowing – every part of the screen could be moving all the time, and it would give it an abstract element that’s hard to achieve with polygons alone. However, this time we wanted to combine particles with actual graphics and make a full piece with multiple scenes from it.

agenda circling forth

Usually we develop the tech and the visuals in tandem – and sometimes the tools too – which often causes a lot of problems. The good thing for this project was that for once we actually had most of the core tech done before we started on the visuals. What we had from Blunderbuss gave us the basics – lots of particles, sorted and rendered with shadows, and a few effects like (fake) fluids on top. But it was essentially quite simple – one (point) emitter at a time, affectors affect all the particles, and the affectors and emitters were quite basic in themselves. It worked for that piece but it wasn’t enough for something larger and more complicated. What it lacked was control – we needed to combine multiple emitters, control particle counts per emitter, and add some more advanced features to cope with using meshes and scenes with animation, targeting into other meshes and scenes, more advanced affectors and much more control and quality in the rendering. And to not completely destroy the frame rate on the way.

Particles and meshes

Emitting from meshes was something I’ve already worked out – I have a routine that generates a big texture containing lots of positions spawned at random places on the polygonal surface of the model. This was simply done using random barycentric coordinates on each triangular face of the mesh. The number of random positions per triangle is weighted by the area of the triangle, so the points are evenly spread across the mesh and you get a pretty solid object. On top of this I added support for sampling the model’s material colours, vertex colours and textures and storing those resulting colour values in a texture too.

Adding support for skinned, animated meshes as emitters and targets was more difficult. The first task was to be able to apply skin/bone transforms in pixel shaders to calculate the animated position of the particle representation at the current point in time. That was quite straightforward: I added the skin weights and bone indices as additional textures, then the bone matrices themselves in a dynamic 1d texture which could be looked up by the bone indices. The skinning code was basically a copy-paste from the vertex shader version I already have, reading bone matrices from texture lookups instead of from vertex shader constants. Spawning a particle at this final resulting skinned position worked fine – the spawned particles appeared where the mesh was posed at the current frame. However, it gave an ugly motionblur-esque trail to the movement because the particles didn’t move with the animation – they spawned where the animation posed them and then the affectors (fluid, forces etc) took over.

What I needed was to combine the spawning with an affector step which also moved the particles with the animated pose, by calculating the current position and moving them to it. I had set it up so that each particle had a unique and corresponding entry in the mesh position texture so it was easy to follow which particle was tied to which point on the mesh, so it was also straightforward to add – but it didn’t do the job either, because I’d swapped following the affectors for following the animation. I needed a blend of those two functions which allowed a particle to follow the animation “a certain amount”, and follow the affectors too. That’s where it all starts to get fuzzy – there’s no “right result” for that, so I just had to work on what looked good.

I computed the current and previous frame’s skinned mesh position for the particle using the skinning routine. Then I had a weighting function based on the particle’s position compared to the mesh position, which also factored in a function based on the life of the particle. I computed that weight for the particle’s previous position and against the previous mesh position, and for the current positions, and picked the greater of the two weights; then I used that weight to blend between the particle’s current position and the current mesh position. The result was that the particle was able to become more affected by the mesh as it got closer to it, until it eventually “stuck” to the mesh and followed it through the animation – until it got towards the end of its life, when it stops being affected as much and gradually falls away from the anim (under control of other affectors).

agenda circling forth

Being able to emit from one mesh wasn’t good enough – we wanted to throw a whole Lightwave scene at it with multiple objects and animation and even modifiers like Fertilizer (making meshes appear to grow in over time) and animated visibility, and it would figure it out. This just meant that we had to combine all the meshes into one big soup when generating the particles – as long as we kept the object ID per particle in the texture, it was possible to match up the particle to information about the source mesh – transforms, visibility – stored in 1d texture lookup tables, and perform the necessary processing in the pixel shader.

Multiple emitters and materials

Part of the requirement for building a more complex scene was that we had more than one mesh / scene emitting particles at once. The naive solution was to just add more particle systems, but – apart from the performance implications – this had a fundamental flaw: the particles weren’t in the same render targets anymore so they didn’t sort against each other. In some places this was not a problem – it was an easy way to solve e.g. the background / sky particles – but it wasn’t sufficient for a complex scene. We needed to be able to emit from multiple emitters and share the same render targets. So I assigned each emitter a scissor region dynamically which controlled which part of the spawn information targets they could write to, and in turn which particles were spawned from each emitter. I also preserved the ID of the emitter which spawned a particle in the particle GBuffers.

Those particle GBuffers are starting to look more and more like a deferred renderer. The emitter index can be used to access all sorts of things that can now be controlled per emitter rather than globally – e.g. material colour, diffuse, ambient, particle size and so on – just like we had in the deferred renderer using a material index or object index. We can also use the emitter index to look up a table of transforms – so we can choose to move the particles with the emitter they came from.

Screen-space emitters

The particle system started to look increasingly like a deferred renderer, but what about that deferred renderer we had for rendering polygons? It’s not producing anything that makes it to the final render in the demo, but it still has a role. The GBuffers produced when rendering solid objects are now used to emit particles from. The depth buffer can be used to reconstruct the world position of a pixel on screen; the colour buffer provides the base material / texture colour; and we can even run the usual lighting passes and post-fx and obtain a buffer of lit, shaded pixels as would be rendered to the final screen. Most of the background scenes were rendered with lighting and SSAO before the particles took on the colour; they were then lit additionally by the particle lighting and shadowing.

This gives you something that’s 3d-in-2d – 2.5d? – so it isn’t as solid as emitting in 3d from a mesh, but it has that advantage of looking much more solid (from the initial perspective it was emitted from) with far fewer particles than when emitting from a mesh.

agenda circling forth

A side issue was how to make affectors override other affectors, given that they only produce a velocity buffer. That was quite simple – I sorted them using a controllable key and changed the blend mode. Whereas most affectors (velocity, fluid) blend additively, certain affectors (mesh / image attractors) render towards the end of the list and blend linearly – so they override the motions of the additive affectors but blend by their affecting weight. We also wanted to be able to tie emitters only to certain affectors, and this was handled again with a 1d lookup table on emitter index.

Another thing we wanted was to be able to keyframe the effect of an affector on a particle, and other properties of a particle, over the particle’s life time. Given everything was done on the GPU and needed to be efficient, arbitrary keyframe data was never going to be practical – so we used a simple approximation that still gave us control: we attached 1D bezier curves for many properties. They can be evaluated very efficiently in a shader and they still give a decent amount of control.

We added a few new affectors to the engine; not least was proper fluid dynamics support. I’ve had GPU versions of 2D and 3D Navier Stokes grid solvers for quite some time, and I tied in the 2D one to drive particles. 2D solvers are very effective in the right use-case, even in a 3D scene: we put them on a plane in 3D space, projected the particles onto that plane and sampled the velocity, and applied a falloff towards the edges of the plane and on the z distance from the plane. This did the job neatly for destroying the moon – the one place we needed “proper fluid dynamics” to make it look good.

Making the demo

As usual, we were running late. With 4 weeks to go we had a few sketches of ideas and the basis for some scenes, but nothing was too far along. The problem was that we had the music and the technical plan already sorted, but we didn’t have a solid visual concept and story locked down – just a few bits of graphics and some test scenes. The first scene that was laid down was the flowers, which was the first thing we really tried to do with it and existed in some form for several weeks. That helped us nail down our look and flow, and got the tech more or less finalised too. Much of the time was spent with Jani trying out ideas in the tool, and me fixing all the many things that didn’t work and responding to feature requests. With 2 weeks to go we hit the point where we couldn’t go any further without a fully fleshed out concept and we rapidly went through a few revisions – some were quite tight and story-driven and others much more vague. Finally we hit upon something that would flow and we got busy making the extra content.

In the last week things finally started to move. We went into full-on crunch, and worked late into the night every evening. My day started at 6am and ended around 1-2am; living on coffee, squeezing work on the piece into spare minutes at lunchtime or on the train, and then hammering on at it late into the night. It turns out you can adjust quite quickly to less than 5 hours sleep a night as anyone with kids would probably know, but still – apologies to any of my friends or colleages who thought I looked like a big stupid zombie that week. Of course I was totally 100% mentally switched on at all times. Honestly.

The final stages of production on any large project are always a bit painful. The start of a project is a slow, steady high – you have so many possibilities and the deadline where you actually have to deliver something seems so far away, and it’s all about ideas and the fun of trying to implement them. But then there’s a horrible point where you realise you actually have to get something made pretty soon, and you have nothing. From then on it’s a constant stream of ups and downs – something goes right or somebody does something great and you feel like it’s all going to work out, and you’re on a high; then something doesn’t go to plan and you feel like it’s just never going to happen and you’re right back down again. This gets more and more extreme until the end, where you feel this huge wave of relief / joy / anger / exhaustedness (depending on how it ended up). The whole process is a bit like romancing a really high maintainance nymphomaniac. Who’s on uppers. And never stops calling you up during the day. And keeps making you buy her shoes. Highly enjoyable in some ways but you sure suffer for it in others. And somehow you always forget enough of the downsides to want to repeat the experience a few months later.

As the week progressed the demo moved forward a lot. The scene with the running people, then the intro and the final part were built in quick succession. The part with the creature in the forest was done last – that was the one part I actually built myself, although Jani did a lot of work on it afterwards to make it into something decent. One advantage with building everything from particles is that it hides a multitude of sins – you don’t need the same level of polish and work on the models and textures as if you were showing them as plain 3D because it becomes so vague when it gets turned to particles anyway. By the end of the week we still seemed a long way from finishing, but somehow on thursday night – after a final almost-all-nighter – it all came together.

It was a very strange situation – we were done early. Let me illustrate the significance of that: that the last production I submitted to Breakpoint was entered during the competition while the 8th entry was currently playing on the big screen. So on past experience I had expected a certain amount of pressure this time. I think I’ve come to enjoy that a little bit over the years – even rely on it – so having almost nothing to do during the event was a little disconcerting. This was the first time I can remember us having time to sit and polish something in years. Naturally I spent most of the time living it up instead, but Jani used the extra time wisely and kept polishing it. New versions appeared over the weekend, each one getting better and better, until Sunday when we packed the final version. Naturally, in keeping with tradition, we did our best to ignore the deadline; shortly after it had passed, and after a kind announcement by KB over the loudspeaker to remind us to enter, I wandered up to the organisers area with the finished piece.

agenda circling forth

For the whole time we worked on the project, I didn’t expect to win. Although I was really happy how it turned out I thought it could go down like a lead balloon in the competition – it’s slow, abstract and it doesn’t have the crowd-pleasing bling you need to win big. It was a risk for us to do something like this, and I hadn’t thought of it as a competition piece at all. Yet somehow, in a strong field and up against our old german friends from Farbrausch, it won out. I still don’t really understand it, and I received the prize in a bit of a state of shock. I was half thinking “there’s been a mistake; run for it before they change their minds”.

In this scene of ours it’s easy to get obsessed with winning. But if there’s something I’ve learnt from this it’s that it’s so much better just to make something you want to make and are happy with; competition, winning, that’s something that happens sometimes and won’t happen other times, but either way it doesn’t really matter. It’s the icing on the cake if it happens, but the cake still tastes pretty good un-iced. I never liked marzipan anyway.

In the end the piece is a series of scenes that were connected by the common motifs that they followed through it, and a loose storyline driven in part by the music. Yep, you got it – I don’t want to explain the content of the piece too much. It’s much better if you draw your own conclusions. Some underlying themes are explored, and you’re welcome to look for them or just take it at face value.

People have said to me, “what’s next, aren’t you bored of particles?” – and I say “no”. Particles/points are a primitive, just like polygons. We haven’t got bored of polygons yet after what – 30 years+? There’s much more that can be done, and we’ve only scratched the surface of what’s possible. New ideas and new hardware make more things happen all the time. We’ll be back. I’m not sure in what form, but watch this space.

There’s been quite a lot of coverage of Breakpoint that we’ve benefited from – including a feature on the German TV channel 3Sat.

By the way, you really do need a good GPU to watch this in realtime. The “detail settings” we had for Blunderbuss to make it watchable on low-end hardware didn’t work here because the scenes were much more complex and we had to tune it for one detail setting – the highest. So you’ll need something top-of-the-line (think Geforce 280) to be able to enjoy it. Don’t worry about running it at less than highest resolution though – you won’t gain much from 1080p over 720p, for example. CPU and memory don’t make much difference, though.

March 5, 2010


Filed under: Uncategorized — directtovideo @ 5:41 pm

I feel slightly dirty about this, but I’ve joined vimeo. Did I suddenly get overcome by an urge to get a bit more web2.0, you ask? No, it’s because I had to in order to submit something to the Victoria&Albert Museum’s Decode:Recode gallery. The idea being: you take their ident piece as source code, mess with it and send it back to them.

We were looking at the other entries a couple of weeks ago, and several things struck us: 1. most of them didn’t change the code at all except for messing with the colours; 2. the colours were a bit.. dutch to start with; 3. the nicer ones were generally done with AfterEffects messing with the video, not running realtime. It felt like it was time the demoscene struck back and offered an education on the fine art of realtime graphics. So, that’s what we did. Paul offered up a 512 byte intro done on the spectrum, and I made something for high end PCs. Very, very high end PCs.


The piece uses a particle system generated off a voxelised version of their logo (the only thing I preserved from the original), rendered as 250,000 cubes and affected by various swarming modifiers, and then using a crafty ambient occlusion / radiosity tracer using raycasts through a voxel set version of the particle system. Insert the usual lighting + shadows, cameras, fine selected colours (not dutch!) and post processing effects here, and we’re done. A few hours work well spent.


Anyway. Check the video on Vimeo here, and check my Vimeo page here. Now I’ve got it, I might as well use it. I’m afraid the encoding on the vimeo video is terrible, but there’s a high quality version on download – I might upload a proper one at some point if anyone wants it.

Update: My Decode:Recode made it into the Metro newspaper in the UK! I’ve uploaded a scan here.

January 15, 2010

ambient occlusion in frameranger.

Filed under: Uncategorized — directtovideo @ 9:13 am

Following on from deferred rendering, here’s how we added ambient occlusion / indirect illumination to Frameranger.

Baking ambient occlusion was standard practice in our demos for some time now. You make the scene, go into Lightwave and hit the big red button called “bake”, and it comes back a bit later with something nice. Well it’s a bit more complex than that, but you get the point. However, our scenes in frameranger were a) numerous, b) on the large side and c) full of dynamic stuff. Which means that a) there’ll be a lot of lightmap textures, b) the light map textures will either be massive or look shit (or both), and c) the lightmaps won’t work anyway because most of the geometry doesn’t exist in Lightwave in the first place (or it’s moving). Oh, and there’s a 64mb file size limit in the demo competition at Assembly and most other demo competitions – ridiculous given the amount of memory and bandwidth we have nowadays, but a rule’s a rule. So it soon became apparent that baking ambient occlusion was not going to happen. It would have to either not be done, or be done in a different way. 3D doesn’t look too hot without some form of indirect lighting, so it looked like we’d need something clever to do it in the engine.

SSAO, then?

No. Allow me to now share my dislike of SSAO. I admit it – in the past I might have evangelised about it at some point in a conference talk or sample application and I’ve even used it in a couple of demos. Back in 2007 when I first started using it it seemed all cool and novel, like a magic silver bullet for a very difficult and long-standing problem. But it’s now 2009, for the last couple of years every other “my first game engine” image of the day on has managed an even worse bastardisation of the effect since the last one. The “SSAO look” has become something you can spot a mile off.

Let’s call SSAO what it really is: “crease darkening”. It picks out creases in the z buffer, and darkens them. The size of those creases, the amount of artefacts and the quality of approximation to real AO depends on the implementation, the amount of GPU time you’re willing to sacrifice and the scene itself – I bet we’ve all seen nice images of some object sitting on a plane and neatly getting shaded in a way that looks just like the cleverly constructed reference image from a Mental Ray render. But what they didn’t show was what happens when you put a massive wall by it some way off out of shot, and how the reference image and the SSAO version suddenly look completely different. And that’s the problem with SSAO – it’s great at capturing small crevices, and shit at capturing the large-scale global effects of ambient occlusion. The geometry we had in Frameranger wasn’t very crevicey, and we really needed those large-scale effects.

Having SSAO might be better than not having SSAO. But it is certainly not the final solution to the indirect lighting problem. And don’t even get me started on SSDO for faking radiosity. Sounds great in theory huh? SSAO kindof works, faked colour bleeds kindof works, so if we glue them together we get.. one reason why artists don’t like coders helping out on the visuals.

In short, SSAO went out the window quite early on. We tried it and it didn’t work for the scenes we had. So it was time to come up with something better. Unfortunately, there were no magic solutions – no one technique we could use to say “generate AO for everything”. There are a lot of techniques that can be used to generate ambient occlusion effects in realtime, or at least interactively, and each of them works for a certain case – some only work for small scenes, others for rigid or static objects or a certain rough shape. Until the hardware gets sufficiently powerful to make a silver bullet that kills the problem, we’re stuck with taking all of these methods and working out what we can do for each case – how best to solve the specific problem, the specific scene, at hand.

I tried out a lot of different techniques. Here’s a quick list:

  • Ambient occlusion fields
  • Analytical methods (the “ao for a sphere” calculation)
  • Shadow maps / “loads of lights”
  • Ray marching through voxel sets
  • Signed distance fields
  • Heightmap ray marching
  • Hand-made approximations

Now to go through those in a bit more detail. Skip along if you’ve heard it before.

Ambient occlusion fields (as described here).

A method for precomputed, static AO for an object stored in a volume texture, used for casting AO onto other objects. All you need to do per frame is read the volume texture in a deferred pass, and write the value out. It’s a bit like a modern, 3D, accurate version of the blob shadow. It looks nice, but it only works for rigid or static meshes – and those meshes had better be quite small, because the bigger it is the bigger a volume texture you need – for small objects a 32x32x32 or smaller can suffice, but you’d want larger for bigger objects. You also need a bigger map the wider area you want to spread the AO out over. Generating it needs a lengthy precalc or offline storage of the volume texture – you need to cast a bunch of rays at every volume cell to work out the AO, so it’s pretty dog-slow to update in realtime. I experimented with it a while back, and it gives nice soft results, although it doesnt work too well for self-occlusion.

Ambient occlusion fields - early test

I discarded this one – I was thinking about using it for the car and spider, but it was too slow to generate and not updatable in realtime. I did however use a similar idea for the arcs in the first scene, which I’ll get onto later.

A quick word on the analytical AO techniques – iq covered it here much better than I could, but in short: you can analytically calculate the occlusion effect for a given object, such as a sphere, pretty easily. If only we just had a bunch of spheres. I tried making sphere-tree versions of some of my objects but it looked rubbish, so I moved on.

Now a completely different approach: using shadow mapping for ambient occlusion. I’ve used this idea a few times, first in Fresh! – although as a precalc. It works something like this: ambient occlusion is meant to be “the amount of ambient / sky light which reaches the point” – so the easiest way to calculate that is to cast a load of rays out from the point and see how many don’t hit anything. In GPU land, where we presently reside, you render the scene from your point with a 180 degree FOV (or near to it) to a small buffer and see how many of the pixels don’t get filled. But that implies that to generate a 1024×1024 lightmap you’d need to render the scene about a million times, which is insane. So instead of that we can flip the problem on it’s head. Render the scene from the outside in, using fake lights and rendering to shadow maps, a few hundred times. Then apply the lights / shadow maps as usual to each pixel in the lightmap, and sum up how many of the lights each pixel was in shadow for. That means you only need to render the scene a few hundred times instead of a million – which is a great optimisation in my book. I wish I thought of this first, cos it’s clever, but I didn’t.

For realtime, rendering the scene a few hundred times still sounds pretty bad. If you really cut it down you could get away with around 50-100 passes, render depth only with low resolution geometry – and now we’re almost approaching something that could actually run. In fact I used this in realtime in a 4k called Glitterati back in 2006, and we’ve got a lot more GPU power nowadays.

For my new deferred implementation I wanted to up the speed and the quality. The main optimisation to be able to cache the results between frames – after all, for relatively static scenes the ambient occlusion doesn’t change very much from frame to frame – and to be able to use less shadow map passes per frame but giving the effect of more passes to increase the quality.

early version of shadow map ao with no interpolation or coloursadded colours

The way I did this was to limit the effect to a grid. This effect only works on a limited area anyway as it’s using shadow maps – which are limited in the size they can work on without the aliasing becoming too much of an issue – so gridding it didn’t hurt too much. I created a 3D grid of points which was something like 256x16x256 – much larger in X and Z than Y, which fit the scenes at hand quite well. This mapped to a number of slices of 2D 256×256 textures, and I used 4 channels to store 4 Y slices per 2D slice. This gave me a format which was quite easy to render to and sample with flitering. To render to it, I simply worked out the world position per pixel in the slice texture, sampled all the shadow maps for that position and averaged the results.

I sampled from it using my deferred normal and depth buffers, by working out which 2D position to sample from per pixel and processed several values in a kernel around it. I used the normal and distance from my deferred position to weight the samples, so they had a directional element to them.

I also wanted to avoid having to re-render all the shadow maps and sample from every frame. This turned out to be quite easy – I had a rolling buffer of shadow maps packed onto a 2D slice texture – fortunately they didn’t need to be high resolution. In total I had 64 shadow maps of 256×256 each packed onto a single 2048×2048 texture in an 8×8 arrangement. I re-rendered a number of shadow maps each frame. The grid was only updated using the re-rendered shadow maps like with a rolling sum – so first I subtracted the effect of the previous data I had for the shadow maps, and then I re-rendered them and added the new effects back in. Of course, the problem was that there was a lag when things moved, as the shadow maps updated over several frames to reflect the changes, but it meant that I could compromise speed for update rate. It proved very effective for background geometry as a lightmap replacement.

realtime ao in the frameranger city
realtime ao in the frameranger city - 2

I tried it out on the city in Frameranger. The main problem is the speed/memory vs quality trade-off – for the size of the scene, it needed bigger shadow maps and a bigger grid, and the quality just wasn’t good enough with acceptable performance. Works well with a smaller scene though.

Raytracing voxels

Going back to “ambient occlusion as a ray tracing problem” again – wouldn’t it be clever to make an easily raytracable version of my geometry, then cast a load of rays at it and see how many don’t hit anything? “Easily raytracable” on a GPU for arbitrary geometry probably means “voxels in a volume texture”. And, as we demo sceners have got to know well recently, an easy way to raytrace a voxel field is to convert it into a signed distance field and then ray march through it with sphere marching – it greatly reduces the number of samples on the ray march that you need to take, potentially skipping large (empty) parts of the field and getting to the answer quickly. Given that AO likes a bit of noise – ironically although it looks “worse” it makes it look more like a render, and therefore more believeable and “better” – you could spread the rays randomly and get away with quite few per pixel. Combined with distance field tracing it might come up with something sensible performance-wise.

I did try this approach out briefly, but it just wasn’t quite fast enough (or good looking enough at acceptable speed). But then I remembered an old talk from Alex Evans at Siggraph. It turns out that signed distance fields have a useful fringe benefit – if you take a few samples from them and perform some dirty function on the results, the result can look a lot like ambient occlusion. It’s quite efficient, and a lot of 4k intros have been using the approach recently to approximate ambient occlusion using signed distance fields. Iniqo Quilez talks about this in one of his presentations on distance field raytracing.

The main difficulty with this approach was that my scene did not currently exist as a distance field. Or as voxels in a volume texture. It was a bunch of polygons. So, how do you convert a mesh to a signed distance field? Option 1: compute the closest distance to triangle for all the triangles, and take the closest result. Option 2: voxelise the mesh and convert it to a signed distance field. I used both approaches. I wanted to do it on the GPU for speed, and so I rendered to a volume slice map – a 2D render target containing the slices of a 3D texture laid out in sequence.

The first method I used for offline computation, and only for static meshes – although it was still done on the GPU; the pixel shader for closest distance from point to triangle for a given triangle isn’t too difficult to do, and it can be optimised (I used an octree to reduce the areas where I computed the actual result) but the time to compute for a reasonably serious triangle mesh was too long for real time.

The second method I solved by rendering the object’s depth to the 6 views of a cube map around the object, from the outside pointing in. Then I performed a slightly dubious process whereby I try and work out the closest distance to the surface by something close to guesswork. It’s like this: given a 3D point inside the cube, sample the depths for that point from the 6 faces of the cubemap. This is enough to tell you if the point is inside the object, and get the closest of the 6 distances to the point – giving you a signed distance. Of course one sample isn’t enough to handle more than the simplest shapes, so you actually do a kernel around the point, check the different results and pick the closest from that. If this was a research paper I’d do a diagram. Nevermind. This solution gives good results for convex shapes, and pretty decent results for any shape where you don’t have completely inaccessible points from an exterior viewpoint – and it runs realtime, so I could use it to work with animated objects. Unfortunately the “signed distance” it gives isn’t exactly the closest possible result – more of a “reasonable guess”.

This problem was a pain to solve, but worth it. It opened a few doors for implementations of other effects like CSG, and for using meshes as inputs to fluid dynamics solvers working on level sets. It worked alright for small objects like the car, but had no chance on something like the city – it relied on the resolution of the volume texture, and would need a massive one to handle the city.

The ambient occlusion result it gave was alright for the polygonal objects I tried it on, but I found better and less memory/computation heavy solutions. The really good application of this technique came along later when I was working with the aforementioned liquid effects using level sets. See, in that case I already *had* a signed distance field that I was using to raytrace the effect in the first place – I didn’t have to mess around and generate one. So I simply took that as an input and used my already-existing ambient occlusion code to render AO for it. And hey, it worked!

Distance-field raytraced effect with AO

This was great for the fluid dynamics scenes, but it was overkill for handling things like the car. So I simplified it..

I’m not completely proud of this, but here goes: the AO solution for the car and the spider in the end was basically a glorified .. blob shadow. A fancy one updated in realtime and with some additional terms and trickery, but still essentially a blob shadow. And why not? Essentially what I had to produce, when you break it down, was what looked like the ambient occlusion effect of the car on the ground – which is usually some variation on a flat plane. Which is a pretty good fit for a blob shadow.

So I rendered the object from above and stored the depth value (so, the height value) in a 2D render target. Then I sampled that by using the GBuffer depth and normal values, projecting the derived world space position into the heightmap’s space, reading the height value from the map and doing some monkeying around with it to create a darkness value based on the distance between the heightmap’s position and the read position – so the value got darker as they got closer together. I took multiple samples with a randomly rotated poission disc, and calculated a blurred/softened result with that. In a way it was a bit like that ATI DOF technique, actually. I blurred the blob heightmap as well, because well – it made it look better. If you’re faking anyway, why stop? And the results were pretty good! It did the job it was required to do neatly and efficiently – and it was easy to combine multiple blob shadows because they were applied in the deferred render / lighting stage, so I could just render them in screen space and blend them together.

If there’s a lesson there, it’s that you shouldn’t discard simple, old-fashioned techniques just because they’re simple and old-fashioned. Sometimes you can get more joy and much better visual results from combining a bunch of simpler techniques in a clever way than you can from trying to find one magic super-technique which handles everything.

But I still needed a solution for the city / environment. There was an effect I came up with a couple of years ago but never used in a demo which calculated ambient occlusion in realtime for a very specific case: a heightfield (heightmap) rendered as boxes. This is getting back to the idea of raytracing ambient occlusion again – the trick is, as with the other cases, to simplify the representation of the stuff to render to get it into some form that you can raytrace through easily. In this case, the representation is a heightmap, which can be raytraced efficiently by raymarching through it in 2D. You project your world-space position into the heightmap’s space, then march through the heightmap in the ray’s direction, and at each point you compare your ray’s height against the heightmap’s height. The simple way is to just compare and see when the height of the ray is below the height of the heightmap, but you can do a bit of maths with the two values instead and get a better result for occlusion. Finally, return the total occlusion for all the rays you cast for the pixel.
Cutting out one axis means you don’t need as many rays per pixel, and I could get good results with casting only 5-10 rays (spread with noise) per pixel and with only around 8 march steps per ray. i.e. the kind of figures that can actually work in realtime.

block heightfield ambient occlusion

Naturally it works for boxes because you just render a box per pixel in the heightmap, and the heightmap is a perfect fit to the boxes: you’ve got a 2D and 3D representation of the same thing and you don’t lose any information between them. But can it be stretched further? Why stop at boxes? Why not render any scene you like to a heightmap, and use that heightmap to raytrace the ambient occlusion for that scene? Of course, the quality is going to depend on how well a heightmap can approximate the scene – if there are a lot of overlaps, holes and interior details it’ll start to break down. But how much does it matter?

cube wall - logo and object with heightfield ao

It turns out that for the city, it didnt matter that much at all. A city can map quite well to a heightmap superficially – buildings and roads and so on. The great thing about it is that it efficently captured the larger-scale occlusion – precisely the stuff that SSAO lost – without rendering a huge amount of stuff every frame. It just needed the heightmap and the rays, which were cast off the GBuffer. All the overlapping stuff, bridges etc, didn’t work perfectly of course, but hey – who gives a shit? It looked good! One way this could be used is to bake the local AO for the buildings themselves – i.e. the windows of a building on that building’s walls – and blend it with this technique used for the global AO.

heightfield ao used on the city scene

One final thing remained. The first scene with the wave of arcs needed a solution and none of the methods worked well. It was fully dynamic – the geometry was procedural and animated; there was loads of arcs; the scene covered a lot of ground; and the ambient occlusion was absolutely key to the look – it would look flat and lifeless without it. Quite a challenge.

After trying out numerous things, I came up with a solution that was completely specific (read: hardcoded) to the scene in question. I knew that I was dealing with an object – an arc – that could be defined easily and mathematically, and that I had to deal with a large number of them (hundreds) but that their movement pattern was procedural, they only moved in Z and rotated, they were rotating centering on Y=0, and each occupied it’s own unique space in the X axis. I needed to cast AO from each arc onto the other arcs, and also on the ground and any other objects around (e.g. the car) – but the range of the AO effect around an arc could be limited.

First, I came up with a solution to generate the occlusion effect for one arc on any given point. Initially I defined it mathematically but in the end I just made a 2D texture which was the result of projecting the arc side-on and blurring it – my 2d “arc AO map”. In the shader I projected the world space point into “arc space” and sampled the arc AO map, and combined that with a falloff term using the distance to the arc’s major plane (and some other dirty bits of maths to beautify it). That gave me a nice AO shape for one arc.

the ao map for an arc

Then, to solve it for ALL the arcs in the scene, I created a 1d lookup texture which gave me some information about each arc – it’s position, rotation, size and so on – packed into 4 channels. I defined the arcs and the lookup texture in such a way that they were sorted in X – so I could project my world space position into “arc lookup texture space”, and I’d instantly have the closest arc in X; I could step back and forward through the texture to find the neighbours. So my solution was simple: take N samples from the 1d lookup texture around my point; calculate the AO for each arc it defined; accumulate the AO using some dirty blend function; and output the value. By changing N I could trade off performance and quality. It worked great!

arcs with ao

I think that last technique sums up the point I’m trying to get across. Generating complex rendering effects like ambient occlusion in realtime for arbitrary scenes with loads of animation that are completely beyond your control is very difficult – a single technique that works great for anything just doesnt really exist. But how many scenes are you working with that are really like that? I had a scene which, at first look, seemed to be impossible to generate good quality ambient occlusion for in real time. By breaking it down, working out ways to approximate it and mixing and matching techniques, it’s often possible to find that solution. If you know you’re only dealing with cubes or you can define it mathematically, if you know your scene is basically all sitting in a single plane, or if there are small static pieces you can bake – e.g. the local AO for a car – then exploit those things. Use all the knowledge you can to break the scene down into pieces that you can handle.

Deferred rendering is a really great help for this because it makes applying all those separate pieces very easy – you can just make a screen-sized buffer, render the different AO elements seperately sampling from the GBuffers for position, normal and object ID data, and blend the different AO elements into that screen-sized AO buffer. Then use that AO buffer as part of your lighting equations, and you’re golden.

It can be done.

It’s just a bloody pain.

But it’s worth it.

November 13, 2009

deferred rendering in frameranger.

Filed under: demoscene, realtime rendering — Tags: , , , , , — directtovideo @ 12:43 pm

(This is going to get technical. Fast.)

I’m a big fan of deferred rendering as you might have gathered from my GDC 09 talk. So it made sense that for Frameranger, and subsequent projects, I moved my demo engine over from what was essentially a forward renderer to a complete deferred renderer. I wanted to share some of the experience here. There are some good introductions to deferred rendering out there, like this one, which cover the basics so I don’t have to.

Note, I’m working on DX9 – DX10 wasn’t an option at the time of development (and still isn’t, really, until the supporting OSes take the vast majority of the market) – so that adjusts my available feature set. No depth buffer reads, no hardware MSAA.

So, why go deferred?

  • Only need one geometry pass for the main render. Previously we usually needed a z prepass for performance, and sometimes even another separate pass for motion blur velocities and for depths for SSAO and other effects. For some of the stuff we had to render which was very high poly or had a lot of draw calls, or where it wasn’t polygons at all, only having to do them once is important. (Shadows still always cost extra, though.)
  • Separate rasterisation and shading. Not having to worry about lighting and so on at the geometry stage means that the geometry stage becomes simple, and most geometry shares one shader – we don’t need so many combinations of ubershader anymore. Combined with the reduced number of geometry passes it means we can reduce our batch count, amount of state changes and number of shaders we have to generate by a lot.
  • Lighting as a 2D post process. Apply as many lights as you want, efficiently, and without having to build many uber shader combinations. As many lights as we want, as many types of lights as we want, mixing shadowed and unshadowed, potentially handling 1000s of lights.
  • Spatially optimise lighting and complex shading. Only apply light to the pixels which are actually in the light’s area of effect.
  • Only shade pixels with complex lighting shaders once (per light) – less issues with overdraw from geometry rendering.
  • Use the additional GBuffer information to do some more interesting post fx and rendering.

So there were a lot of things we wanted a piece of. Unfortunately there are some major downsides too, which is why I hadn’t gone down this road before:

  • No hardware antialiasing (on DX9). For me this had been the killer up to now. I don’t like unantialised renders. This time around though the benefits were so big that I decided to just screw it and worry about antialiasing later.
  • Overhead of memory use – we need to store those fat-ass GBuffers somewhere – and rendering. It smooths out the render performance with all the pretty fixed overheads – which makes the complex cases faster (or work at all), but the simple cases are potentially a lot slower. I decided our general case is complex enough not to care about the simple cases.
  • Potentially reduced material flexibility. We can’t just hack a shader which computes some lighting and messes with the equations for that one material – we have to do everything en-masse in 2D processes.
  • Alpha stuff still has to be handled in a second, forward rendered pass. Which means we still need a working forward renderer.

So, I managed to sufficiently minimise in my head how much I cared about the downsides and just crack on and implement the deferred renderer to see what happened. It was actually very easy to do – a day or two’s work had the whole thing up and running with multiple types of light working. Then there were weeks or months of work in adding all the interesting stuff on top.

The first task was to get the geometry rendered to GBuffers. When rendering the GBuffers it’s important to minimise the number of channels and the bit depth required – the greater it is, the more memory and slower the render is. You also have to consider how you want to read the data – you probably don’t need the colour info until late in the day, whereas the normals are needed a lot – so don’t pack something you need a lot with the colours, because it’ll be a wasted additional GBuffer read.
I used 3 or 4 MRTs for my GBuffer rendering, depending on whether or not motion blur was enabled, where each MRT was 32 bits wide. The channels I rendered to GBuffers were:

Colour+ObjectIndex : RGBA8888;
Normal+MaterialIndex : RGBA8888;
Depth : Float32
VelocityBuffer (optional – only if motion blur enabled) : RG Float16
and of course a D24S8 depthstencil. Which I cant read from, because DX9 doesnt allow it. How annoying. If I could I could skip that float32 depth, like I would on PS3. But I can’t. Damnit. Actually it works on ATI, so it’s just NVIDIA that’s the problem. Dear NVIDIA: you guys managed to hack in almost every other feature under the sun using 4CCs and other tricks, so how about adding depth buffer value reads for DX9, and multisampled buffer reads while you’re there? Go on – I bet it’s there in the drivers already, and I just don’t know what the magic incantation is to access it.

One thing that comes up a lot when discussing deferred renderers is the storage of normals. Nearaz has a really good summary / investigation of the different methods. I’m lazy, so initially I just wrote out the normal as an XYZ, deciding that if I ever needed an extra GBuffer channel I’d fix it and use XY in a compacted form and recreate Z. So far I didn’t.

Next I added the basic light types. It’s pretty easy to move the lighting code over from a forward render – it just needed the extra code to pull the values out of the GBuffers and back project to recreate positions, rather than pull everything out of the vertex interpolators.
Spot lights were trivial, of course. A single projected shadow map did the job. Point lights proved annoying because of another D3D limitation – I couldn’t create a depth stencil cubemap, and I wanted to render to a depth stencil for efficiency on render and for free hardware PCF – so I ended up making a “virtual cube map” which spread the faces out on a 2D texture. Finally, for directional lights I implemented a varying number of cascaded shadow maps. Directionals proved to be by far the most used light type, and I put some work into making it calculate a good set of shadows for the splits.

The nice trick with a deferred renderer is that I can apply the splits separately to the screen in 2D, rather than sampling all of them per pixel. I first render a series of view-aligned planes at the depth of each split, front to back, into the depth+stencil, marking the stencil where they pass the depth test. This gives me a series of stencil masks that I can use to test against when rendering full screen quads, one per split, which sample just that one split’s shadow map each.

An early test with deferred lighting using two point lights.

The initial work of adding the lighting was quite easy, but it instantly proved the benefits of deferred shading. Previously, just adding a new type of light, changing lighting code or adding more influencing lights cost work – a new shader code path which had to be propogated through the ubershaders – increasing the compile time every time – and then into the code to select the ubershaders. There was a hard limit on the complexity of each light and the number of lights, because of the hard limit on the size of shaders and, more pressingly, the number of textures that could be used at once. But now it was simply a case of adding another 2D pass, and a single piece of shader code which could be edited and reloaded over and over again easily. I had never bothered to add all the different light types or cascading shadow map support to my forward renderer because it required too many permutations and too many simultaneous textures, but now I had it all working easily, and finally could handle shadows well from massive scenes with directional lights. Adding deferred rendering had already paid off.

Now, what about those classic problems with deferred rendering?


The next issue was that of flexibility – how to get materials that look different to each other. With forward renders it’s easy – you just make a different shader for the material, and change the behaviour of that material. But with deferred renderers it doesn’t work like that – everything is done in a series of 2D passes on the whole screen. So, you have to render extra information into the GBuffers which tell those passes how to produce the correct behaviour for each pixel. Unfortunately GBuffers get fat fast if you add a lot of parameters. Most of our parameters varied per material, so I simply created a material palette as I was rendering the objects of all the unique materials, and wrote two indices to the GBuffers – the object index, and the material index. Both limited to 256 indices (lists updated per frame). Bad news? Well, if we had that many draw calls we’d be screwed performance-wise anyway, so a 256 material+object limit didn’t matter.

As development continued, the data in our material palette grew and grew. By the end it was several textures worth – with data for fresnel coefficients, how to apply envmaps, various light equation constant modifiers, and so on. It enabled us to easily adjust the material parameters without adding a lot to our GBuffers. I’ve heard it said that this doesn’t work – because you want to vary a lot per pixel, like specular gloss, specular power, and so on, and this doesn’t allow it. I’ve also heard that you need lots of different shaders to get a good look. Well, it was never the case for us. Probably 90% of our geometry always went through the same default shader; we didn’t adjust that much per pixel in textures except where it was really needed – it added a lot more work to the art side as well as more space requirements.

Another useful thing about the material palette came to light later on when optimising the renderer to reduce draw call counts. The only remaining per-material parameters that were used when rendering the mesh were the textures and the material index – everything else was in the material palette. That meant that it was easy to merge meshes together and store a palette index in a vertex buffer channel I didn’t need (I used vertex colour alpha). This meant I could pack the meshes down completely except where the textures differed, but still allow on-the-fly editing of the separate material properties. In addition I could actually change the material index per pixel if I wanted – e.g. using a mask texture to select between two materials – with very little overhead. This exposed a whole new set of tricks and went some way to solving the problem of not being able to vary material properties using textures.

Besides that, where we did need something special there were a few tricks we could use when applying shading in the deferred passes. The material ID + object ID could be used to mask in whole special materials for certain objects (or parts of objects). For example, the car had a special paint shader that was masked in. Each material palette entry had a world-space bound box which was accumulated for all the objects which used it per frame; this was used to generate accurate-enough masks for 2D passes quickly and efficiently. And when it came to that extra bit of per-pixel data we needed but just didn’t have – we generated it. A simple function of the position and normal was plenty enough to sample a dirt texture or noise texture for fading reflections in and out or adjusting the bluriness. It’s a basic, hacked up form of deferred texturing, and it did the job nicely. Fortunately it’s really easy in my renderer to add extra passes and stages into the rendering pipeline, so this was something we could use a lot to customise things.

custom shader used for the car paint, applied in the deferred render

The deferred approach obviously worked great for lights. Number of lights is usually the big sell for deferred rendering. But it worked great for environment maps too. In Frameranger we had quite a lot of shiny stuff – e.g. a car and a robot – and we wanted to handle it by multiple dynamic environment maps. With the deferred render it was easy to apply. We attached dynamic envmap nodes to things so they moved around, and then attached different objects as inputs and outputs. The inputs get rendered to the envmap, and the outputs get the envmap applied to them. To apply, I generated a small dynamic 1d mask texture which mapped the object indices to white or black – so I could sample it per pixel using the object index and determine whether that object was affected by the envmap. I calculated world-space bounds for the objects which received the envmap and used that to roughly stencil in the shader, and applied the envmap additively, adjusting it using parameters from the material tables. To control fresnel reflection we wanted something better than the usual single “fresnel power” parameter. In Lightwave you can create an envelope for it and explicitly control the fresnel response over the range of the incident angle values, and I wanted something similar – so I exposed 4 control values and used a 1d bezier curve to interpolate them. Worked great – you could generate a very flexible response with it.

Non-polygonal elements

We hand to render more than just triangle-based meshes. Some of the effects – specifically the liquids / fluid dynamics – were raytraced using distance fields. Rather than write a special shading path to handle their lighting I decided to just add these into the deferred rendering pipeline and use the routines that were already in place. This had the benefit of making them interact properly with the other objects in the scene, casting shadows onto them, receiving shadows from them, working with dynamic environment maps and so on – it meant the effects looked properly part of the scene, not just floating in space separate from everything else.

raytraced fluid deferred rendered and mixed with poly elements

It was pretty easy to add. I raytraced the distance fields and output the results straight to the GBuffer – depth, normal, colour, etc. This also meant the ZBuffer was correct so the effects overlapped properly with the rest of the geometry. For the shadow map rendering passes I just raytraced them from the light’s point of view straight into the depth shadow map. It worked out nicely without too much effort.

raytraced fluid with deferred rendering, mixed with polygonal elements

Alpha stuff

Alpha stuff doesn’t like deferred rendering – sad but true. Actually I have found a way around it so you can perform deferred rendering on some alpha stuff too – I’ll get onto that in another post – but in terms of general alpha blended stuff, you’re limited to using a forward render which mimics the look of the deferred rendered geometry. Fortunately in Frameranger we didn’t have that much generic alpha mesh stuff to deal with – the particles, smoke, light beams and so on were already special cases or handled with effects in other ways – so it wasn’t a massive deal. We also avoided treating punch-through alphas or cutouts as “alpha” by using alpha testing dithered with a random noise map. As it turned out I just used my ubershadered forward rendering code for the alpha stuff like a “legacy” pass. One compromise I made was to skip shadow receiving for alpha stuff – although it would have been possible, it would have meant I’d have had to keep the shadow maps around longer than the deferred passes, whereas at present the same maps could be reused for all the lights. In reality, the only real alpha stuff we had to deal with in this way were a couple of transparent bits on the car.


One of the major downsides of deferred rendering is the inability to apply standard hardware MSAA to it. On some architectures it’s possible to use MSAA when rendering the Gbuffers, but you have to do the slow bit – using the GBuffers to perform deferred rendering passes – on a per-sample basis. i.e. for 4x MSAA, you have to light 4x as many pixels and then average the results at the end. Our aim is to achieve a comparable quality of antialiasing as MSAA provides forward renderers, but with the cost of deferred rendering with unantialiased rendertargets – i.e. only lighting the number of actual pixels on screen. With deferred rendering much of the cost of rendering is pushed to the deferred, 2D passes, so it’s important to avoid incurring a large performance cost there when adding antialiasing – scaling that cost by 4 to support 4x MSAA is not feasible.
On consoles or DX10 the natural starting point is to render the geometry to MSAA GBuffers and to try and optimise the lighting process so you don’t need to light every sample. Indeed, I outlined how to optimise the process on Playstation 3 in my GDC 09 presentation. That can reduce the number of additional samples you have to light to only around 20-30% more than the number of pixels in the unantialiased buffer, which is a great improvement but still costs.
On DX9 the problem is even worse because you can’t read the individual samples from an MSAA buffer, so MSAA is completely unusable for deferred rendering there.

So, plan B then.

There are several ways to tackle antialiasing of deferred renderers. First, you could just render everything 2x or 4x the size, light it as usual, and downsample it at the very end. It looks nice – much like MSAA on a forward render, really – and it’s easy to add. But it has exactly the effect on framerate you might imagine, so it’s not a practical solution for realtime. So the usual way people try and do it is to fake it – perform a 2D post process on the result of the deferred render which somehow works out where the edges are and fixes them in a way that looks like they were rendered with antialiasing. This approach is apparently rife on xbox360 titles where the hardware’s dubious memory arrangement makes using proper MSAA on HD framebuffers problematic.

So, how would that magic post process work exactly? Step 1 – detecting edges – is easy, particularly in a deferred renderer where we have a ton of information around to help. An edge detection kernel filter applied to the depth, normal and material index/object index usually gives great results, far superior to using a colour buffer for our purposes. Step 2 – antialiasing those edges – is a little bit more difficult. It’s important to remember that what antialiasing is doing is over-sampling: generating a load of possible values and averaging them. The usual approach with post-process AA is to blur the neighbouring pixels on the edges, which is really the opposite of what we wanted. So it doesn’t really work. For Frameranger I experimented a lot and managed a reasonable attempt at it which used noise, a poisson disc and some magic weighting of the samples – and it looked marginally better than a typical edge blur – more like an “edge dither”.

Here’s some screenshots of the edge noise technique. As you can see, it doesn’t entirely look like antialiasing. Actually as an effect, adding noise to the edges, it was alright – but as antialiasing it wasn’t a good substitute. It appeared that if I wanted to achieve the look of antialiasing I was going to have to move away from using only the 2D results as input and render the geometry differently in the first place – try and gain some more information that way. By the way, all the antialias comparsion screenshots should be matched so you can download the images and diff them or toggle back and forth between them if you want to see the differences.

edge noise off
edge noise on

The second approach I used borrowed from temporal reprojection techniques, and a very old way of antialiasing in an OpenGL example. In that example, antialiasing was done by rendering the scene over and over again to an accumulation buffer and jittering the projection matrix by a sub-pixel amount each time. It’s basically stochastic sampling of the render but flipped around so that you render the screen with one stochastic offset, then again with another offset etc. and average the results at the end. When you offset the projection slightly you cause the edges of the triangles to move slightly, and you get slightly different coverage and a different set of aliasing artefacts – when you average enough of them together you get an antialiased image.

Sadly, as you might expect, rendering the scene loads of times per frame isn’t too practical for realtime. But we can take the basic idea and split it over frames – so each frame we slightly jitter the projection matrix when we render, and we blend the current frame on top of the previous frames with a low alpha value. That works great and gives you a really nice antialiased image.. as long as nothing moves. As soon as you move you get big ugly motion-blur-like trails. So what we need to do is try and fix the case where it moves.

I split “movement” into two cases: object movement and camera movement. Camera movement is where temporal reprojection comes into play. How this works is, when we’re blending the current frame onto the previous frame we don’t just use the same pixel in the previous frame. Instead we project the current frame’s pixel back into the previous frame’s camera space by using a combination of the inverse view projection for the current frame and the view projection from the previous frame – which also requires the pixel depth, which is fortunately kicking around in a GBuffer – then sample the pixel there and blend it. This is basically trying to track positions in world space as they move in screen space. It does indeed fix most of the camera movement artefacts. Of course, problems do occur – at the edge of the screen, or where pixels that were occluded become unoccluded and vice versa. To cancel those, I weight the alpha of the blend by the world-space distance between the previous and last frame pixels. What it’s trying to work out is “is it really the same pixel”, and if it isn’t, don’t try and blend it – just overwrite it. Conveniently that can be used to fix object motions too. Finally use an edge detect mask on the whole thing so only the edges get blended.

Here’s some images to compare (off and on):

temporal AA offtemporal AA on

Well, it works! It actually works pretty nicely. The thing about temporal techniques is that they settle over time – so when you get a relatively static screen it quickly convolves to an antialiased image, becoming more aliased as movement is introduced. For a situation where you didn’t have too much movement on screen it’s a good technique. The problem is, for Frameranger we had some very fast camera movements – it just wasn’t working well enough.

Here’s some shots showing how it looks when the red cube is in motion (off and on again) – and with a small camera motion too. As you can see, some aliasing does creep back in.
temporal aa off in motiontemporal on in motion

Finally, over time, I came up with an actual real solution to deferred rendering with antialiasing. Unfortunately it requires an additional geometry pass, but it gives you the look of proper MSAA. The idea is to render the GBuffers, lighting passes and so on to a non-MSAA buffer, then re-render the geometry to an MSAA buffer, which a shader that just samples the buffer containing the lighting results using a bilateral filter based on depth. Then resolve that MSAA buffer to give you an antialiased result. This is in a way quite similar to a light prepass using inferred lighting, but the MSAA pass only samples the lighting results buffer – it doesn’t need to compute any shading of it’s own.

We know that the reason MSAA is efficient is it only runs the pixel shader once per pixel, but generates a depth/stencil value for each sample in the pixel – so depth test+write is performed multiple times. This means that for primitive interiors with no intersections, the value of the pixel will be the same as for the non-MSAA buffer – only the edges of primitives and the intersections between primitives will have different values. This technique exploits this. We generate the depth values for MSAA by re-rendering the primitives; we just need to work out what colour to write for each pixel on each primitive. On edge pixels for a non-MSAA buffer, the final pixel value will be that of the front-most primitive. On edge pixels for a resolved MSAA buffer, the pixel value will be an average of the MSAA sample values – the value of the front-most primitive per sample. This technique estimates the value per primitive to write by using the bilateral filter to look around a small area around the pixel in the same screen location as the one it’s currently shading on the non-MSAA buffer, and asking “which of these probably came from the primitive I’m rendering, and is therefore a good estimate”. Or in practice a weighted average of the pixels in the area, weighted by the difference between the non-MSAA depth and the primitive’s pixel depth.

Bilateral upsampling is an extremely useful technique for fixing up edges. It’s also a very good way to upsample lower resolution soft particle buffers, for example. I’m pretty happy with this method, and it’s what we’re now using for the much improved Frameranger final version (due out soon!). It’s made a lot of difference to the quality and cleanness of the look, with a pretty acceptable overhead. It scales to 4x, 8x or more MSAA samples nicely and only impacts the cost for that one pass, which has a reasonably simple shader and output bandwidth requirement (compared to the GBuffer stages or the deferred lighting passes). It actually has some similarities to Inferred Lighting, although my method is just for antialasing and not for shading.

bilateral aa off
bilateral aa on

Now, if you happen to be on a more flexible API or piece of hardware than me where you can read samples from MSAA buffers – like a PS3 – there’s an optimisation / extension to this – you can use the same technique but avoid the re-render. It goes like this: render the GBuffers to MSAA targets; resolve the buffers using a point sampling scheme – e.g. “pick top left sample”; run the lighting processes on these resolved buffers; now perform an additional fullscreen pass:, read in your original MSAA Gbuffers, then for each MSAA sample from that buffer – perform a bilateral filter w.r.t depth/normal, sampling from the resolved GBuffers and light buffer, to weight the resolved light buffer samples for that MSAA sample from the GBuffer. That will give you a bilateral upsampled light result per MSAA sample of the Gbuffer, and you can average them in the shader and write out one final antialiased light value. Clearly you only need to run these shaders on edges if you wish to optimise it further. So there you go – a practical solution to deferred rendered antialiasing which only needs one geometry rendering pass, and lets you perform lighting on a single-sample screen-sized buffer without worrying about MSAA at that stage. Shame it doesn’t work on DX9, because then for me it would be the ideal solution.

Coming up in part 2 : ambient occlusion.

October 6, 2009

a thoroughly modern particle system.

Filed under: demoscene, fluid dynamics, realtime rendering — directtovideo @ 3:30 pm

particles in blunderbuss

During the making of Frameranger, I spent some time looking into making a “modern particle system”. Particles have been around for ever and ever, and by and large they haven’t changed that much in demos over the last 5-10 years. You simulate the particles (around 1000 – 100,000 of them) on the CPU, animating them using a mix of simple physics, morphs and hardcoded magic; sort them back to front if necessary, and then upload the vertex buffers to the GPU where they get rendered as textured quads or point sprites. The CPU gets hammered by simulation and sorting, and the GPU has to cope with filling all of the alpha blended, textured pixels.

However, particles in the offline rendering / film world have changed a lot. Counts in the millions, amazing rendering, fluid dynamics controlling the motion. Renderers like Krakatoa have produced some amazing images and animations. I spent some time looking around on the internet at all sorts of references and tried to nail down what those renderers had but I didn’t – and therefore needed. This is something I do a lot when developing new effects or demos. Why bother looking at what’s currently done realtime? That’s already been done. 🙂

I decided on the following key things I needed:
1. Particle count. I want more. I want to be able to render sand or smoke or dust with particles. That means millions. 1 million would be a good start.
2. Spawning. Instead of just spawning from a simple emitter, I want to be able to spawn them using images or meshes.
3. Movement. I want to apply fluid dynamics to the particles to make them behave more like smoke or dust. And I want to morph them into things, like meshes or images – not just use the usual attractors and forces.
4. Shading. To look better the particles really need some form of lighting – to look like millions of little things forming a single solid-ish whole, not millions of little things moving randomly and independently.
5. Sorting. Good shading implies not additive blending, which implies sorting.

The problem with simulating particles on the CPU is that no matter how fast the simulation code on the CPU is, you’re going to hit two bottlenecks sooner or later: 1. you have to get that vertex data to the GPU – and that can make you bandwidth limited; and 2. you need to sort the particles back to front if you want to shade them nicely, which gets progressively slower the more you have. Fortunately given shader model 3 and up, it’s quite doable to make a particle system simulate on the GPU. You make big render targets for the particle positions, colours and so on; simulate in a pixel shader; and use vertex texture fetch to read from that texture in the vertex shader and give you an output position. Easy. Not quite – simulating on the GPU brings it’s own set of problems, but on to that later. Modern GPUs are sufficiently fast to easily be able to perform the operations to simulate millions of particles in the pixel shader, and outputting 1 million point sprites from the vertex shader is doable.

Shading is the biggest problem here, and the shading problem is mainly a lighting problem. Lighting for solid objects means a mix of – diffuse+specular reflection; shadows; and global illumination. Computing diffuse and specular reflection requires a normal, something which particles do not really have, unless we fake it. So that was my first line of attack – generate a normal for the particle. It would need to be consistent with the shape of the system, locally and globally, if it was going to give a good lighting approximation. I tried to use the position of the particle to generate a normal. It turns out that’s rather difficult if you’ve got something other than a load of static particles in the shape of a sphere or box. Then I tried to use a mesh as an emitter and use the underlying normal from the mesh for the particle. It did work, but of course once the particle moves away from it’s spawn point it becomes less and less accurate.

The image here shows particles generated from the car mesh in Frameranger, matching the shading and lighting.
particles mapped to the car from frameranger

I needed a better reference, so I looked away from solid objects and had a look at how you would light a volumetric object – e.g. a cloud. Which in real life is actually millions of millions of little particles, so maybe it makes quite a good match to lighting, well.. particles. It works out as a model of scattering and absorbtion. You cast light rays into the volume, and ray march through it. Whenever the ray hits a cell that isnt empty, a bit of the light gets absorbed by the cell and a bit of it scattered along secondary rays in different directions, and the rest passes on to the next cell. The cell’s brightness is the amount of light remaining on the ray when it gets to that cell. Scattering properly is hideously slow and expensive so we’ll just completely ignore it, and instead add a global constant to fake it (a good old “ambient” term). That just leaves us with marching rays through the volume and subtracting a small amount per cell, scaled by the amount of stuff in the cell. This actually works great, and I’ve used it for shading realtime smoke simulations – with a few additional constraints, like fixing to directional lights only and from a fixed direction, you can do it pretty efficiently. It looks superb too.

The problem is that the particles are not in a format that is appropriate for ray marching (like a volume texture). But the look is great – we just need a way of achieving it for particles. What we’re dealing with is semi-transparent things casting shadows, so it makes sense to research how to handle that. The efficient way of handling shadows for things nowadays is to use shadow maps. But shadow maps only work for opaque things – they give you the depth of the closest thing at each point in a 2D projection of light space. For alpha things you need more information than that, because otherwise the shadows will be solid.

Or do you? The first thing I tried was very simple – to use exponential shadow maps. Exponential shadowmaps have a great artefact / bug where the shadow seems to fade in close to the caster, and this is usually annoying – but for semi-transparent stuff we can use it to our advantage. Yep, plain old exponential shadowmaps actually work pretty well as shadowmaps for translucent objects – as long as those translucent objects aren’t all that translucent (e.g. smoke volumes). The blur step also makes small casters soften with those around them. It’s pretty fast too, and it almost drops into your regular lighting pipeline. But, for properly transparent (low alpha) stuff like particles, it’s not quite good enough.

The really nice high end offline way is to use deep shadow maps. That basically gives you a function or curve that gives you the shadow intensity at a given depth value. It’s usually generated by buffering up all the values written to each pixel in the map (depth and alpha), sorting them, and fitting a curve to them which is stored. Unfortunately it doesn’t map too well to pixel shader hardware. However there is a discrete version which is much simpler – opacity shadow maps. For this you divide depth into a series of layers and sum up the alpha value sums at each layer for the stuff written with a depth greater than that layer. On modern GPUs that’s actually pretty easy – you can fit 16 layers into 4 MRTs of 4 channels each, and render them in one pass! Unfortunately it’s not expandable beyond that without adding more passes, but it’s good enough to be getting on with – as long as you don’t need to cover a really large depth range and the layers are too spaced out. But this gives us nice shadows which work with semi-transparent stuff properly. You could even do coloured shadows if you didn’t mind less layers or multiple passes.

The next issue is how to apply that shadow information to the particles – it requires samping from 4 maps plus a bunch of maths, and isn’t all that quick. If we did it for every pixel rendered for the particles, it’d hammer the already stressed pixelshader. If the particles are small enough we could just sample it once per particle in the vertex shader – but it’s too many textures to sample. Fortunately the solution is easy – just calculate a colour buffer using the fragment shader, with all the lighting and shading information per particle in it, and sample that in the vertex shader. The great thing about that is it’s really similar in concept to the deferred renderer I’ve already got for solid geometry. You have a buffer containing positions and other information; you perform the lighting in multiple passes, one per light, blending into a composite buffer; then sample that composite buffer to get the particle colour when rendering to the screen. It’s so similar in fact to the deferred rendering pipeline that I can use the almost same lighting code, and even the same shadow maps from solid geometry to apply to the particles too – so particles can cast shadows on geometry, and geometry can cast shadows on particles.

particle lighting 01particle lighting 02

This shading pipeline – compositing first to a buffer, one pixel per particle – opens up new options. We can do all the same tricks we do in deferred rendering, like indexing a lookup table which contains material parameters for example. Or apply environment maps as well as lights. Or perform more complicated operations like using the particle’s life to index a colour lookup texture and change colour over the life of the particle – make it glow at first then fade down. It allows multiple operations to be glued together as separate passes rather than making many combinations of one shader pass.

So, we have a particle colour in a buffer. The next job is to render the particles to the screen. We’ve gone to all this effort to colour them well that we need to consider sorting – back to front – so it actually looks right. This could be problematic – we’ve got 1 million+ particles to sort, all moving independently and potentially quite quickly and randomly, and it has to be done on the GPU not the CPU – we can’t be pulling them back to CPU just to sort.

I had read some papers on sorting on the GPU but I decided it looked totally evil, so I ignored them. My first sorting approach was basically a bucket sort on GPU. I created a series of “buckets” – between 16 and 64 slices the size of the screen, laid out on a 2d texture (which was massive, by the way), with z values from the near to the far plane. Then I rendered the particles to that slice target, and in the vertex shader I worked out which slice fit the particle’s viewspace z value, and offset the output position to be in that slice. So, in one pass I had rendered all the particles to their correct “buckets” – all I had to do was to blend the buckets to the main screen from back to front, and I got a nicely sorted particle render which rendered efficiently – not much slower than not sorting at all. Unfortunately it had some problems – it used an awful lot of VRAM for the slice target, and the granularity of the slices was poor – they were too spread out, so sometimes all the particles would end up in one slice and not be sorted at all. I improved the Z ranges of the slices to fit the approximate (i.e. guessed) bound box of the particle system, but it still didn’t have great precision. In the end on Frameranger the VRAM requirements were simply too high, and I had to drop the effect. It turns out that the layers method is very useful for other things though, like rendering particles into volumes or arbitrary-layered opacity shadow maps.

When I revisted the particle effect, I knew the sorting had to be fixed. I looked back at the papers on GPU sorting, specifically the one in GPU Gems. They seemed very heavyweight – a sort of a 1024×1024 buffer (i.e. 1 million particles) would require 210 passes over that buffer per frame, which is completely unfeasible on a current high end GPU. But there was one line which caught my attention – “This will allow us to use intermediate results of the algorithm that converge to the correct sequence while we do more passes incrementally”. One of the sorting techniques would work over multiple frames – i.e. for each iteration of the algorithm, the results would be more sorted than the previous iteration – it would not give randomly changing orders, but converge on a sorted order. Perfect – we could split the sort over N frames, and it would get better and better each frame. That’s exactly what I did, and it actually worked great. It used much less memory than the bucket sort method and gave better accuracy too – and the performance requirements could be scaled as necessary in exchange for more frames needed to sort.

There are some irritations with simulating particles on GPU. Each particle must be treated independently and you have to perform a whole pass on all the particles simultaneously. It makes things which are trivial on CPU, like counting how many particles you emitted so far that frame, very difficult or not feasible at all on GPU. But it’s a rather important thing to solve – you often need to be able to emit particles slowly over time, rather than all at once. The first way I tried to solve that was to use the location of the particle in the position buffer. I would for example emit the particles in the y range 0 to 0.1 on the first frame, than 0.1 to 0.2 on the next, and so on. It worked to a point, but fell down when I started randomising the particle’s lifetime – I needed to emit different particles at different times. Then I realised something useful. If you’re dealing with loads and loads of something – like a near infinite amount – then doing things randomly is as good as doing things correctly. I.e. I dont need to correctly emit say 100 particles this frame – I just need to try to emit e.g. roughly 1% of particles this frame and if I’ve got enough particles in the first place, it’ll look alright. The trick is that those 1% is the right amount of random.

I’ll explain. The update goes like this: 1. generate a buffer of new potential spawn positions for particles. 2. Update the particle position buffer by reading the old positions, applying the particle velocities to them, and reducing the life; then if the life is less than 0, pick the corresponding value from the spawn position buffer and write that out instead. So, each frame I generate a whole set of spawn positions for the particles, but they only get used if the particle dies. But how to control the emission? Clearly if I put a value in the spawn buffer which has an initial life of less than 0 and it gets used, it’ll get killed by the renderer anyway and the next frame around it’ll respawn again – i.e. the particle never gets rendered and doesn’t really get spawned either. So if I want to control the number of particles emitted I just limit the number of values in the spawn buffer each frame that have an initial life greater than 0.

How do I choose which spawn values have valid lives? It needs to be a good spread, because the emission life is also randomised – some particles die earlier than others and need respawning. If I simply use a rolling window it’s not random enough and particles stop being spawned properly. If I actually randomly choose, it’s too random – it becomes dependent on framerate, and on a fast machine the particles just all get spawned – the randomness makes it run through the buffer too fast. So, what I did was a compromise between them – a random value that slowly changes in a time-dependent way.

The other nice thing about this spawn buffer was that it made it easy to combine multiple emitters. I could render some of the spawn buffer from one emitter, some from other, and it would “just work”. One of the first emitters I tried was a mesh emitter. The obvious way would be to emit particles from the vertices but this only worked well for some meshes – so instead I generated a texture of random positions on the mesh surface. I did this by firstly determining the total area of all the triangles in the mesh; then for each triangle spawning a number of particles, which was the total number of particles * (triangle area / total area). To spawn random positions I just used a random barycentric coordinate.

Here’s an early test case with particles generated for a logo mesh and being affected by fluid.
particle logo 01particle logo 02particle logo 03particle logo 04

Finally I needed some affectors. Of course I did the usual forces, but I wanted fluid dynamics. The obvious idea was to use a 3d grid solver and drive the particles by the velocities. Well, that wasn’t great. The main problem was that the grid was limited to a small area, and the particles could go anywhere. Besides, the fluid solver was quite slow to update for a decent resolution. So I used a much simpler method that generated much better results – procedural fluid flows (thanks Mr Bridson). Essentially this fakes up a velocity field by using differentials of a perlin-style noise field to generate fluid-like eddies – “curl noise”. By layering several of these on top, combined with some simple velocities, it looked very much like fluid.

The one remaining affector was something to attract particles to images. To do this I generated a texture from the image where each pixel contained the position of the closest filled pixel in the source image – a bit like a distance field but storing the closest position rather than the distance. Then, in the shader I projected the particle into image space, looked up the closest pixel and used that to calculate a velocity, weighted by the distance from the pixel. With a bit of randomness and adjustment to stop it affecting very new or very old particles, it worked a charm.

particles running under a fluid sim and attracted to an image

And there we have it – a “modern” particle system that works on DirectX9 – no CUDA required! I’m sure this will develop over time. With better GPUs the particle counts will go up fast – between 4 and 16 million is workable already on a top end Geforce, and it’ll go up and up with future hardware generations. In fact I have a host of other renderers for the particles besides this simple one – things to do metaballs, volume renders and clouds, for example – and a load of other improvements, but that can wait for another demo..

By the way, there’s a nice thing about GPU particles that maybe isn’t immediately obvious. You’re writing all the behaviour code (emission, affectors..) in shaders, right? And you can probably reload your shaders on the fly in your working environment. All of a sudden it makes development a lot easier. You don’t need to recompile and reload the executable every time you change the code, you can simply edit and reload the shader in the live environment. Great eh?


Filed under: demoscene — directtovideo @ 9:09 am

I made a new demo last week (specifically, the last two days of last week).
blunderbuss by fairlight

The great thing for me about making large demos which take ages, like Frameranger, is that they generate a lot of new material. So much stuff got tried out over the course of making it. Some of those things never made it in, some did but only as a small cameo performance. The bad thing about doing large demos which take ages is that they take ages, they’re a lot of hard work and stress, and ultimately don’t give a very good return on the time and effort invested – so much material doesn’t get used or barely gets noticed, and you’d make approximately one demo every year, which isn’t good for the release status of most demo parties. Also you tend to lose your way and change direction a few times over the course of development, which just leads to more wasted material.

That’s where doing small demos comes in. Doing something small and fun in a few days, with a single concept and small amount of material. You start it, make it, finish it and release it before you have a chance to change your mind or lose your way. Such was the development of this demo. It’s like comparing a holiday fling to a long term relationship – a nice mix of both gives you a well rounded life..

This demo started out as an effect in Frameranger that didn’t get used too well. I wanted to work on it some more and add to it – solve some problems that showed up during development. Then I started looking around the internet for ideas about how a modern particle effect should look. It’s a common process for me for realtime graphics development – first doing a search to see what the state of the art is in offline rendering, then trying to get as close to it as possible in the constraints of realtime. I found some really interesting sources related to the offline particle renderer for 3ds Max – Krakatoa. The main thing I picked out was the really nice looking shadows and lighting, and the fluid dynamics movement, so I tried to do something similar. I’ll follow this post up soon with the technical details.

My test started to develop into something that could actually be a demo in it’s own right. On the monday I looked around on my hard drive to see if there was a suitable piece of music I could use, and remembered a great tune by Bliss which I had my eye on using for a while. Bliss is an old friend – he actually did the soundtrack for my very first demo years and years ago. He doesn’t do much for the demoscene anymore but we still keep in touch now and again. I sent him a message to ask if I could use it and he said yes. The music is in my opinion totally beautiful – give that man a record deal!

The next couple of days were spent developing the effect code for the demo – adding text attractors and suitable emitters to the particle system – and then finally building the piece in the demo tool. The piece is much shorter than usual at only 2 minutes long, and it only features one effect, so it’s incredibly simple compared to most of the demos I’ve done before. That meant that it had to be more polished. Even though it’s only one effect for the duration I tried to give it a “story arc” – changing colours, emission rates, synced hits and using text attractors in the right place so it remained interesting over the timeline. It was done in good time to enter in the demo competition at Main, and placed 2nd. Not bad for a couple of days work.

I should point out in case it wasn’t already obvious – the concept isn’t exactly original. Here are some of the main sources of inspiration, although there are plenty out there:
youtube; youtube: these gave me the basic idea for the effect – the movement, rendering, and also the idea to decay the colours on an arc over time moving through several different colours over the life of the particle, which really helped shape the look. These, particularly the second one, really formed the effect – thanks to the author!
youtube: A beautiful piece. Some of the presentation ideas leaked into mine – e.g. the concept of a side scroll and the use of a spot light. It’s also a great example of how to turn one effect into a whole piece and give it a “story arc”. The particles and movement aren’t as nice as the Krakatoa ones though – I believe this used Trapcode Particular for aftereffects.
link: this helped form the smoke idea. Amazing piece – I aspired to get anywhere close to it.

September 30, 2009


Filed under: demoscene — Tags: , , , , , , , — directtovideo @ 1:45 pm

frameranger by fairlight,cncd,orange

2008 rolled around, and it was time for us to get over the disappointment of Media Error’s poor public reception and try and do something, again, to try and win the Assembly demo competition – the thing that still eluded us after all these years. The major criticism of Media Error had been its flow – some people thought it was a series of unconnected scenes. To me the theme was obvious (obvious!!) – a TV flicking channels, and a thing about war – but apparently it was too vague for the average viewer, who needs to have his hand held from a to b. Apparently a demo only has “flow” nowadays if it either a) consists of one single scene, or b) is so covered in noise that it’s hard to argue what the viewer is seeing and therefore say if there is any flow or not. Making a demo with actual discrete scenes and a variety of content is apparently a complete no-no nowadays. But that’s another post entirely.

Anyway, we decided we did want to do something with a more clear storyline. The rough concept we settled upon was a “story” about a robot (car) coming to earth in a space ship, landing in an explosion of debris, driving to a city, battling a large spider robot and then transforming into humanoid form and finishing the job. Yes, it’s an 11 year old comic book/sci-fi nerd’s wet dream. But seeing as we’re “cool”, the clever part is it was all to be done with style, clever art direction and graphic design. Think the art house version of Transformers.

The second thing we wanted was a “demo part” – something seemingly unconnected to the main story where we throw in a load of big heavy effects to say “hah, we’re better than you”, in classic demo tradition – a homage to the demos of the early 90s, where a “design part” / graphics-heavy story section would give way to a series of vector effects which showed off how massive the coder’s balls are. That’s another problem with the demo-watching audience these days – sadly the average technical knowledge has dropped off quite heavily, so often the audience doesn’t realise if there’s an “effect” or some clever code at work unless there’s a massive red arrow pointing to it saying “clever effect here”, or if the effects are presented in such a way that it’s bleeding obvious that there’s a big effect there. Well, we’re “cool” so we don’t do that usually – we like it when the effects are seamlessly mixed into graphics and design, so you can’t tell exactly what’s going on – but it does lead to people missing the clever stuff. We decided to rectify that with the “demo part”.

It became apparent this demo was a pretty tall order. The scenes were massive and numerous; the animation requirements far exceeded previous productions; and the story / direction meant that it was a lot harder to chop and change – in most demos if you lose a scene you can fill it with something else, but if there’s a story to follow and you lose a scene the story stops making sense.

Assembly 2008 came around. Unfortunately I was busy fixing two other entries – a 4k and a 64k (Panic Room, which won the 64k competition) – so I didn’t devote as much time to the demo as was needed. In the end we all ended up, as has become tradition, in Destop’s apartment on the thursday night of Assembly. One small issue was that I had previously visited the party place on the way to the airport to drop off some friends, and I happened to stop by the Assembly sub-event “Boozembly”. Assembly is an event attended by kids of all ages, unfortunately including those under 18 – and the insurance and licensing difficulties mean that they have to make it completely dry – no alcohol is sold, or even permitted in the hall, something strictly enforced by the numerous security guards. In fact they even sometimes check your car for alcohol if you park in their car park. Fun and games ensue every year as people try and find ways to get alcohol inside – something I’ve managed just once, but that’s another story.

To make up for the lack of booze inside there is an unofficial sub-event that occurs on some rocks a few hundred metres away from the hall called “Boozembly”, which is usually a complete mess of drunks wandering around in the darkness falling over (and off) sharp rocks. I remember one year – I think it was 2003 – two members of Noice had ambulances called within hours of each other for different reasons. Anyway, I had visited the rocks while going to the party hall and had a few beers before heading to Destop’s – so by the time I got there it was very late, and I wasn’t going to be too productive.. the next morning we decided to make the sensible decision and give up. For once we felt the content was too good to waste on a rushed demo.

We set numerous deadlines for release. The first one that went by was NVScene a few weeks after Assembly; later we tried for Breakpoint at easter 2009. All came and went, and Assembly came around again. This time there was a sense we really had to make it. We cleared our schedules and dedicated some proper time for it. For the first time in years I wasn’t able to attend Assembly, so for once that weekend of crunch in Helsinki wasn’t going to happen – it had to get made properly on time. For once we actually planned it out. We had time – a couple of months. We had a lot of content and assets already done, a stable toolset, and we knew exactly what we needed to do.

For this demo we tried a different approach to actually making it. For previous demos I’ve worked on, they usually followed a certain pattern. We made some graphics or effects; we set them up in the demo tool, and spent an age tweaking the lighting, shaders and layering on the post fx; and it remained like that until just before the deadline, when we quickly jammed in some cameras and animation in the scene and called it “done”. What that gave us was a load of pretty but boring scenes. Or worse, a lot of work is done on content that never gets seen by the demo camera. This time we wanted it to be different. We decided to focus on the weakest thing in our demos – the “action” – and get that done first. Post fx, lighting and shading are important, but ultimately a distraction – you can spend hours tweaking colours. It’s the easiest form of procrastination. We went for an approach that some gamedev teams use and tried to build the demo “in grey” – almost unshaded, no post fx, simple lighting and placeholders where necessary, to get the cameras and action right first. I actually believe this did work, and it’s a good way to go – it made us concentrate on getting some nice motions and direction down early, and we knew where the camera was going to be before we set up the lights, post fx and shading.

work in progress shot of the city.

The challenge was still immense. We hit a lot of problems – technical, artistic and logistic. The number and size of scenes meant that our previous approach of baking lightmaps for ambient occlusion / global illumination wasn’t feasible – it would send us way over the 64mb limit – so I came up with a new lighting engine that ran fully deferred, and allowed us to use a number of techniques to approximate ambient occlusion in realtime. The old engine wasn’t set up for such a large amount of animation content and had to go through numerous optimisation passes just to make the thing load in a sensible amount of time. Some of the effects and the rendering engine were very heavy on render targets, so we had serious problems with lack of VRAM. We had major main memory problems too – we had never made such a large demo, and we found that we were hitting the 2gb win32 app limit in the demo tool, causing it to crash. I had to spend yet more time optimising the memory usage. I later discovered the major culprit was TinyXML – it ate 600mb to load a 30mb XML file (our demo “script” – which is generated and superbloated), and completely fragmented the memory with small allocations. My emergency fix was to optimise the XML file itself – yes, cutting down by hand the names of nodes and attributes – and got it down by more than 50%, which got us out of trouble until I rewrote the XML loader after Assembly.

One of the biggest headaches was the music. Fairlight (/CNCD/Orange), unlike many other demogroups, does not have one active musician who does all our soundtracks. We have a few musicians we work with but most have moved away from the scene onto other things – some went pro, some just got out entirely. In some ways it’s good because we are able to look around and find the right sound for each project, not be tied to what one guy can do, and we’ve had some really great soundtracks over the years by the likes of Northbound Sound, Little Bitchard, Ercola, Sumo Lounge and others. The problem is we’ve got no one person in the group who takes responsibility for it. I don’t think it’s an understatement to say this demo has been through the most difficult musical journey of any demo I’ve worked on. Over the year – 18 months it’s been an active project, we’ve had at least five musicians involved, and many tracks. It seems that the more brilliant the musician the harder they are to lock down – they always have other projects on the go and don’t have the time to dedicate to this. With a few weeks to go until Assembly we finally got Ercola (responsible for the Media Error soundtrack and a great producer and artist) involved. He was a guy we knew could turn it around very quickly and do a good job, which is exactly what we needed. Even so it was seriously nerve-wracking up until the last week before Assembly when we finally got the track. By the way, if anyone out there is a great musician give us a call, we are always looking for good musical input. 🙂

Frameranger contains a lot of graphics and a lot of code. There is a whole collection of effects and rendering techniques, some of which will get a blog post on their own. I decided to go for completely deferred rendering and it worked out great. As well as being able to use as many lights as we wanted it greatly reduced the number of shaders being generated by the ubershader (a major issue on Panic Room). I added the ability to combine multiple (dynamic) environment maps like lights in the deferred render, and support for secondary rays cast off the deferred buffers for ambient occlusion. In fact almost everything in the demo has ambient occlusion, generated one way or another in realtime through various techniques. One of the best things was being able to combine traditional polygon geometry and raytraced effects seamlessly – e.g. we raytraced the liquid effects straight into the deferred buffers and sent them through the same lighting pipeline as everything else, casting and receiving shadows etc.

early work-in-progress shot of the lighting and ambient occlusion passes in the city.

Raytracing / ray marching popped up in numerous places. We used volumetric lights for the car headlights which were ray marched through the shadow map in screen space as a post effect. The surfaces for the fluids were raytraced on GPU too, using a technique I invented to handle almost unlimited numbers of metaballs (up to around 100,000). Of course there were other effects at work too – many post effects, particles, breaking floors and so on. However, most of them were mixed into the design and graphics so they were almost hidden away – it’s the kind of thing where you only notice them when they’re gone.

work in progress shot of the car headlights.

Fortunately we had the antidote to that – the “demo part”. My new fluid solver / renderer was ready at last for that. I had written a new 3D solver for fluids running on the GPU which used a new technique to get higher resolution and better performance: I evaluated the velocities at lower resolution – and even then the grids were still much larger than in Media Error thanks to modern GPU performance. Then I used the velocity grid to drive a procedural fluid flow solver to approximate the look of fluid eddies and mixed that with the velocity grid. Then I applied that to a high res density grid to push it around. The results were superb. The procedural flows weren’t tied to the grid resolution so they could produce really sharp results which didn’t lose detail. The velocity grid just had to handle the overall rough motion.

Then we had to do something interesting with it. In the end we used it for two effects – a liquid renderer driving particles and an isosurface which was raytraced, and a smoke renderer. Both had a full lighting and shadowing pipeline – giving us superb results. For both effects we were able to use voxelised meshes as the source input. We tried a few things for the smoke but in the end we used the effect to destroy some credits text. Unfortunately it was a prime example of artistic vs technical compromise, of which there is a lot in the demo. The scene didn’t show off the power of the effect to the fullest – it didn’t show all the clever features of the effect – but it looked really nice visually, with puffs of coloured smoke. Of course such things are completely lost on the audience. One genius commentor on pouet said about the scene, “nice plasma”. Nice plasma! It makes you glad you bothered with weeks or even months of work trying to innovate in the realm of realtime fluid dynamics, when your results are compared to an ancient demo effect.

One scene that worked out surprisingly well was the “pixel blocks” sequence. It was a simple effect – a grid of cubes, animated by rendering something to texture and using it as a heightfield – made “cool” by the use of raytraced heightfield-based realtime ambient occlusion which gave it the nice shading it had. Surprisingly it ended up as one of the most popular scenes in the demo, yet it was by far the easiest and took about an hour of work to put together on the saturday morning of the deadline.

the heightfield effect

A special word has to go to the work Destop did on the graphics and direction for the demo. He built most of the demo in our demo tool. The battle scene had around 40 cameras and a massive amount of carefully placed animations, and the whole scene contains 1000s of nodes where most scenes contain 10 to 100 – it’s by far the biggest thing I’ve ever seen made with the demo tool. It frightened me a bit actually. We also had Der Piipo doing a lot of modelling and animation work, and Mazor showed up with the 2D hud gfx at the end – just in time to fill some gaps.

Sadly, the demo still had problems. We knew the battle scene was the crux of the demo – make or break – and it was the biggest and hardest scene to do. A long action sequence – part built in Lightwave, part in the demo tool – with a lot of explosions and smoke. It was the smoke that caused me a huge headache. I went over and over this trying different solutions and none of them worked well. The requirements were: it had to fill a lot of space, with multiple explosions around the environment at once; it had to persist, so it didn’t fade out visibly; it had to fit with the meshes and lighting; it should look a bit stylised – not super realistic, but still cool and smoke-like; and the frame budget wasn’t massive for it as the 3D was already eating a lot of power. Those requirements meant I had to rule out something really cool like proper fluid dynamics – the scene was too big for grid-based effects. We could only handle a certain number of particles in the frame rate, and the lighting and shading would have to be faked. I tried various techniques and wrote the effect a few times, and it never quite worked out – so I kept putting it off. In the end I rushed something out in a couple of days and the solution wasn’t satisfactory – a hand-coded particle effect that could be spawned around the environment as needed. I didn’t like the end result much at all. That was one thing that went to the top of the list for the final.

We had other problems too. In the end, even the best laid plans break down as the deadline nears. I wasn’t travelling to Helsinki and I had to go to a wedding on the saturday morning, so that ruled out real last minute crunches – but somehow we ended up doing that anyway. For the last week before Assembly I got up at 6am and went to work every day, working on the demo on the train and at lunchtime whenever time allowed, and then came home and worked on it through the evening and half the night too. Then I got up the next morning and did it all again. The problem with demo crunches is that unlike work crunches, there’s much less external pressure to do it. For work you know you don’t really have a choice. For a demo you always have at the back of your mind, “I could ditch this right now and it wouldn’t matter”. When you’re exhausted and stressed out in the middle of the night you keep going because you don’t want to give up, you want to get it done and you don’t want to let your team mates (who are also up in the middle of the night working with you) down.

Come the thursday night we still had a lot to do. I took the day off work on friday and worked on it solidly, with 3-4 hours sleep on thursday night and less on friday. We missed the deadline on friday night but after a night and morning of work, come saturday lunch time it was done. All that was left was to hand it in, get refreshed, shower, and drive to the wedding – where I think I looked like a zombie. Then, come the evening, the competition started far away in a hockey stadium in Helsinki. The wedding was in the middle of nowhere so mobile phone reception was poor to non-existant but I managed to go outside into the car park and get a bar – when I finally got the news I had been waiting for by SMS, first from my groupmate and then from a load of other friends who had been watching the competition, either at the party or at home watching the live stream. “What happened?” I asked. The reply came back – “finally we’re going to get the trophy 🙂 “. “Is it close or did we destroy the competition?”, I asked. “Destroy :)” came the answer. I went back inside and enjoyed the rest of the wedding with a grin on my face. It seemed like we had finally done it.

September 29, 2009

the wonderful world of 64k intros.

Filed under: demoscene — Tags: , , , — directtovideo @ 1:58 pm

Optimising for size has been part of the demoscene for as long as anyone can remember. It probably started off driven by the desire to fit something interesting into a boot loader or something like that. 64k intros involve fitting a whole production – music and graphics and code included – in one self-contained executable of 65536 bytes. That’s not much bigger than an empty word document; it’s likely that one screenshot from the production in JPEG form would be larger than the 64k taken by the production. The nice thing about 64ks is that it’s enough space to do something worthwhile in terms of design, sound and graphics – but it’s small enough to still amaze people with the file size aspect. 64k is enough space to work in a “usual” fashion. To be able to work on graphics and music in a proper environment, and to be able to code without having to think about every byte – something that makes smaller productions (e.g. 4k or less) painful to produce.

The category evolved through the 90s, pushed forward by groups like TBL who managed to fit decent visuals and soundtracks that sounded like “proper music”. It reached a head with the release of fr-08: .the .product in 2000, which contained an amazing amount of graphics which actually looked something like you might see in a typical PC videogame of the time – but in a tiny fraction of the space. It set the world alight and immortalised the creators, and everyone wondered – how was it done?

the product by farbrausch, 64k in 2000

The answer was: they generated it. The textures, models, soundtrack – all generated, based on metadata about how to create the elements supplied by the artist using their custom-produced graphical tool set. In simple terms, they dont store the pixels for the texture in a compressed form; they store the steps to tell the engine how to regenerate the texture, from a combination of simple generators like noise, and blending operations. Of course, this kind of thing had been around for years, but this was the best implementation so far. The clever thing about these guys was that not only did they make such a great production, they also told the world how they did it. Numerous presentations and articles were followed up by releases of most, if not all, the tools they used in the making of the production for the world to see.

werkzeug by .theproduct - tool for making 64k intros

Something about it all caught the imagination of many of the young coders around in the demoscene at the time, myself included. Maybe it was the idea that the coder was back in charge and taking centre stage after years of being edged out by the artists and musicians; maybe it was the challenge of doing something bigger and better than these guys had done; or maybe it was about wanting a piece of the substantial fame they had achieved in the community and beyond. It’s worth noting that most good demos get a few thousand downloads; some of the stuff these guys made got hundreds of thousands. It went way beyond “the demoscene”.

So, like so many other coders, I started working on 64ks myself. The road was long. With 64ks nothing comes easy. You are expected to make content that would look decent in a 64mb demo, but there are no shortcuts. In a demo you can get graphics from anywhere – a 3d tool, your camera, the internet; and you can make the soundtrack however you want as long as it ends up as an MP3. In a 64k you wouldn’t get very far like that – you have to have a hand in the whole creation process yourself. If you want tools to make graphics and music, you have to go and make them. It massively increases the time taken to make anything.

My 64ks through the years

From 2001 to 2006 I made quite a lot of 64ks. They started out pretty awful – badly made hardcoded affairs – but grew into some quite hefty productions, winning Assembly and Breakpoint 64k competitions and gaining a award for best 64k intro and numerous nominations. I developed my own modelling, texturing, animation and scene editing tools, and my own soft synth (VST). It was an exciting time to be involved. There was an arms race between several groups – at first Farbrausch and Conspiracy, later Conspiracy and us (Fairlight). Every year we’d all be sat in the party hall at Breakpoint or Assembly trying to hammer out our production, each wondering what new tricks the other group had managed to pull off and whether it was better than the new tricks we had managed to pull off.

My final 64k was “dead ringer”, released at Assembly 2006, where it took first place. Dead ringer was a breakthrough in that it was the first 64k intro to move away from large and impressive but static scenes with simple animation – and featured a fully animated character performing a series of breakdancing moves that started life as several mb of motion capture data but managed to get squeezed into around 6k. Assembly 2006 had the most amazing 64k competition ever seen – the three big players in 64k (us, Conspiracy and Farbrausch) and Kewlers all competing, the bar getting pushed higher each time. It seemed that after that competition, the 64k scene was never the same again. Perhaps the bar had been pushed too high, and the amount of work needed to move it on was just too much for anyone to contemplate.

After Dead Ringer we were pretty much spent, so we quit. That was the plan, anyway. As it happens we came back in 2008 and made another 64k, “panic room”, which won the 64k intro competition at Assembly 2008, but that’s another story.

It took me a long time to realise the real secret of the farbrausch 64k intros all those years ago: all the talk is about the technology, but really it’s all about the artist. That’s the difference between a few numbers plugged into a generator routine, and those wonderful scenes that looked “almost as good as a video game but in 64k”. Sadly it’s the thing so many coders forget. It doesn’t matter what routines you code or what tools you build – in the end what matters is how they get used.

fluid dynamics #1: introduction, and the smokebox

Filed under: demoscene, fluid dynamics — Tags: , , , , , , , — directtovideo @ 10:57 am

One effect I was asked for a lot for demos by artists was “fluids”. Fluid dynamics make for great visuals – they look complicated, with minute details, yet they make sense to the human eye on a larger scale thanks to their physical basis. Unfortunately they are rather difficult to do. But if it were easy it wouldn’t be fun, right? Fluid dynamics and rendering has become an area of research I’ve kept coming back to over the years, and my journey started at the end of 2006. I immediately discounted simple fakes and decided to try and do something pretty “real” for our demos in 2007.

The first problem is that fluids take some serious computation power to calculate. To do it “properly”, what you have to do is simulate the flow of velocities and densities through space using a set of equations – “Navier Stokes”. I’m no mathematician, but fortunately these equations don’t look half as nasty in code as they do on paper. The rough point of them is that the velocites/densities at a certain point are interacted with other points nearby in space in the right way.

The basic approach follows two paths, depending on how you want to model space – you can use particles (SPH), or a grid. Grids are bounded in the area they cover, and their memory requirements depend on their size, but the equations are simpler and tend to be faster to solve – with grids, you automatically know what points are nearby. densities/velocities tend to dissipate and you lose details. Their rendering suits gas-like fluids well, as it’s typically visualised using a ray march through the volume – which suits transparent media. Particles can suit liquids better, and are unbounded – they can go where ever they want – and you don’t lose details in the same way as grids. You need a lot of particles, and the equations are a bit uglier and slower to solve because you have to work out which particles are near each other particle – it actually has a lot of similarities with flocking behaviours. Rendering particle fluids usually involves some sort of implicit surface formed from the particles and visualised using a raytracer or polygonised with marching cubes.

Modern offline fluid solvers tend to support one or both methods of fluid calculation. Fluids are big business – movies, adverts, all sorts of media use them. Offline renderers support hundreds of thousands of particles or huge grids, and can take hours to compute a few seconds of animation. So it’s going to be hard to compete with realtime.

I started off with grid solvers. They’re easier to understand and there are good examples and tutorials/papers out there that explain how to do it – and they aren’t packed full of magic numbers. The navier stokes equation breaks down into the following steps:
Update velocity grid:
– take the previous frame’s velocity grid
– perform a diffusion step – a bit like a blur
– perform a projection step – makes it mass-conserving, and its what gives the effect that swirly quality. In reality that means a linear solver with 10-20 steps, meaning looping over the whole grid 10-20 times. It’s the slow bit.
– perform an advect step, which is like doing “position = position + velocity” in reverse – to pull the velocities from the previous frame’s grid into the new grid.
Update density/colour grid:
– perform a diffusion step
– advect using the velocity grid.
You’ll soon realise you can cut out the diffusion steps – you can lose them without hurting the final result.

The first attempt I made was to do a 2D grid solver. That’s pretty easy – there’s a good example released by Jos Stam (probably the “father” of realtime fluid dynamics) some years ago which is easy to follow, although it needs optimising. The nice thing about 2D grid fluids is that they map very simply to the GPU – even back in the days of shaders 2.0. The grid goes into a 2D texture, and the algorithm becomes a multi-pass process on that texture. That worked out great – a few days of work and it was usable in a demo. It made it into halfsome in 2007. The resolution was good – we could have one fluid solver running at 512×512 well, or several at 256×256 – which proved to be ample. It was quite easy to control, too – we could simply initialise the density grid with an image, the velocity grid with random blobs, and the image was pushed around by the fluid.

2D fluids are nice, but 3D is better. But 3D brings a whole new set of problems. It was quite easy to extend a 2D solver to 3D on both CPU and GPU. Sadly at the time and on DirectX9, GPUs could not render to volume textures – so the problem could not be extended simply by changing the texture from 2D to 3D. Instead I had to lay out the “volume” as a series of slices on a 2D texture. That was hard to get right, but apart from that it extended easily. The problem was it was rather slow. The GPU at the time (GF6800 was the card du jour) just didn’t have the performance to handle it, when the extra overhead from fixing up and sampling from the 2D slice texture came into account. So the next option was to go CPU – I spent quite a long time hand optimising the code in SSE2 intrinsics, and then wrote out a volume texture in the end for rendering. Unfortunately the algorithm is in parts heavily memory/cache limited – the advect stage in the equation jumps around in memory almost randomly. In fact, in some parts of the solver the GPU was far faster thanks to more efficient memory access, and in other parts the CPU won out thanks to raw calculation performance and being able to re-use previous cell values. (Note, this is back in 2006-2007, Core Duo vs a GF6800.)

Finally I had a solver. Now, how to render the results? Raymarching the “volume texture” was quite slow – rendering the slices as a series of quads worked out better. Now the fun stuff started. I realised the key to a good looking was lighting – and for a semi-transparent thing like smoke/gas, that means shadows with absorbtion along the shadow ray. Ray marching in the shader or on CPU was out of the question, but fortunately there was an easy fake – assume a light that was only from the top down and do a “ray march” which added the value of the current grid cell to a rolling sum, and wrote the current sum back to the grid cell as a shadow value, then moved to the next cell immediately below it. That could be made even easier on my CPU solver by flipping the grid around so that the Y axis of the fluid was the “x axis” of the grid – and the sum could be rolled into the output stage of the copy to volume texture – so the whole shadow “ray march” was almost completely free.

After some time experimenting I discovered that the largest grid resolution I could get away with performance-wise was 64x32x32. Unfortunately it looks pretty rough at that. I tried a few things with octrees to avoid empty space but it just didn’t work – with my small grid, the whole space got filled quickly. In the end I simply doubled the res of the density grid and interpolated the velocities – so the slow part of the equation, updating the velocities, ran on a grid with 1/8 the number of cells as the density grid – which is the one where you really notice the blockiness.

It worked. It ran at framerates of at least 30fps realtime, with a grid res of 128x64x64 for the densities and 64x32x32 for the velocities. At the time of release it was probably the fastest and most powerful CPU grid solver ever made. It was even compared to the nvidia 8800 demo – which was running at a significantly higher resolution, but on vastly superior hardware and without lighting. We used it in media error, although we were so rushed with the demo that it only got used in a rather boring way – as “smoke in a box”. And it made me want to do something bigger and better.

smokebox in media error

old demos #2: media error (and track one).

Filed under: demoscene — Tags: , , — directtovideo @ 8:43 am

Our entry to the demo competition at Assembly 07, placed 3rd.

media error by fairlight cncd orange

In 2006 we made track one for Assembly 2006, and placed 2nd in the demo competition. We did quite well out of it, later picking up the award for the best demo of 2006. So we decided to do it again, and try and go one better and win Assembly.

Winning the Assembly demo competition is pretty much the demoscene equivalent of winning the world cup. The event has been around for over 15 years, and a lot of people interested in the demoscene today grew up marvelling at the demos released there back in the day. It’s a massive event today, with over 4000 visitors – most of whom are gamers, but it still has a strong scene pedigree. The combination of the size and exposure of the event and the historical significance makes it the one we always wanted to win.

Track one was probably one of the most intense creation processes I’ve ever been involved in. We only really decided to make something a few days before Assembly 06 started. This was partly my fault as I was off making glitterati and dead ringer, eating up a lot of my time before the event. We – a team of programmers and 2d+3d artists – spent three days of the four day event holed up in Destop’s apartment in Helsinki, only sleeping a maximum of 4-5 hours the whole time. It was insane and completely exhausting but still an amazing experience – everybody throwing out quality work, making new effects and graphics, running over to someone else’s pc to see the latest big addition. Thankfully the artists had quite a few pieces lying around on their hard disks that we could use.

Of course it was also utterly shambolic. Not only were we not sleeping, we were drinking more or less continuously. Of course we missed the deadline. Then we missed the “real” deadline – the usual cutoff point – and then we were rushing to hit the “absolutely final” deadline – when they record the entries to video. Usually in a demo competition one can push the deadlines right up until the minute they start the competition – and indeed, my current record is submitting a demo at breakpoint while the 8th entry in the competition was actually playing on the big screen. At Assembly they prerecord the entries to video so you don’t have that luxury. Well, we made it – but there were some compromises, like the last minute or two of the demo being just whatever we had to throw in. Someone (we never found out who) moved a load of the timeline bars during the last night and made it all out of sync, and we never quite fixed it all. By the end we had made it but we were totally exhausted and vowed not to do it like that again.

Track One by Fairlight
track one on pouet

The making of media error was supposed to be a carefully planned affair, starting early and finishing on time. Of course it wasn’t. Firstly we decided – at artist request – early in 2007 to completely remake the engine, adding a node graph to our current demo tool which largely resembled the AfterEffects timeline at that time. Then we fully integrated Lightwave support – the artist’s tool of choice – with a full loader for objects and scenes. This was all good, but the tools were quite fragile as a result. Then we foolishly got dragged into the Intel Demo Competition 2007, and by the time we were done we had less than four weeks to make a demo for Assembly 07.

Still, we had managed to assemble a superb team – and we felt we could pull it off. Sadly we ended up exactly where we were in 2006, with a bunch of us piled into Destop’s apartment in Helsinki working all hours on a demo. Again, we missed the deadline by an epic amount. Again it was a crazy but amazing process of not sleeping, drinking too much and spitting out graphics and effects left and right. We made it a bit earlier than track one, and felt pretty good about the demo.

Unfortunately in the competition we came up against one of the biggest demos ever made – Lifeforce by ASD – and fell to 3rd place. Winning Assembly would have to wait.

« Newer PostsOlder Posts »

Blog at