direct to video

March 15, 2012

get my slides from GDC2012.

As promised, they’re here! I’m afraid I had to delete all the videos, but apparently the recording of the full thing should be in the GDC Vault at some point.

[PDF here]


Yes I am aware that SlideShare managed to crop the bejesus out of my presentation

To everyone who showed up to my talk – thanks for coming! Here are the slides as a memento of the occasion!
To anyone who couldn’t make it and wants to read the slides, here they are! Good luck making sense of them!
To anyone who was at GDC but went to something else instead – here’s what you missed!

If you did see the presentation live, I was supposed to ask you to fill out the evaluation forms (only if you liked it, obviously – I don’t want to get 100 forms back saying “bat shit mental”). Oh, and I was also supposed to ask you to turn off your mobile phone, no flash photography, no video cameras, and that there are two exits at the back and to file out in row order in case of emergency, but I forgot. Apparently we all made it out alive.
Please do tell me what you thought on here too.

February 14, 2012

come see me talk about directX11 at gdc 2012.

Filed under: demoscene, directx 11, fluid dynamics, particles, realtime rendering — Tags: — directtovideo @ 2:59 pm

Quiet around here, isn’t it?
That’s because I’m going to be speaking at GDC 2012 about advanced procedural rendering in DirectX 11! (So I’m saving all the good material for that. Sorry.)

I’ll be talking about how we’ve used D3D11’s features to handle things like mesh generation and fluid dynamics for our upcoming demos, to give us huge advancements over our old DX9 engine – and in a way that you might consider practical enough to start thinking about for future game titles.

For those who are just starting on DX11 or are only thinking about it I’ll also try and give an overview of building blocks you really need to know about for tackling problems with compute efficiently, like stream compaction and prefix sums, and where they fit into actual real-world problems like implementing marching cubes, smoothed particle hydrodynamics and mesh smoothing.

Or you could just come and look at the pictures.

GDC2012: Advanced Procedural Rendering with DirectX 11

Thursday March 8th, 4:00- 5:00pm Room 2009, West Hall, 2nd Floor. Be there. We’re going to be doing shots off the front of the stage after every other slide, so bring some salt.

May 3, 2011

numb res.

Filed under: demoscene, fluid dynamics, particles, realtime rendering — directtovideo @ 4:39 pm

Numb Res by CNCD & Fairlight

numb res. get it?

pouet exe version video video (anaglyph 3d) youtube vimeo


It was easter. We made a new demo for The Gathering 2011.Yea, that’s right – in Norway, not in Germany. I really wanted to do a new demo because I’ve been collecting new routines all winter, and it was high time they got into the wild. So about 3 weeks before easter Jani and I started bouncing ideas around (“something with fluids” was the sumtotal of that I think). Then we went on the hunt for music. As some may know, we don’t have an active musician we work with regularly in Fairlight or CNCD anymore; we have to outsource. So I dropped a message on facebook half-jokingly asking if anyone had a spare soundtrack. I’m not sure whether that was a good idea or not but I spoke to Ruairi (RC55), who put me in touch with Tom Wright (aka Stereo Wildlife). He’s produced a beautiful new album and agreed to let us use one of the tracks – and even did a bit of remixing to make it fit the demo. So, music was ready from day 1. This is such a huge bonus when making a demo; it meant we could completely design around it, plan out what scenes we wanted straight away and know they’d fit.

The demo was envisaged as a “small project” – a relatively low budget production. Low budget meaning less development time, fewer resources. Weeks to make by a small team. Frameranger for example is a very “high budget” demo – lots of people, over a year in the making, tonnes of art assets and specifically made effects, and lots and lots of wasted work. This one is very different; there’s only one hand-modelled mesh in the whole thing that’s “rendered” properly (the head at the start and end), although there’s lots of meshes used for other things in the demo. We wanted an effect-led production. The first thing that happened was that Jani designed the numbers scene in Lightwave: creating meshes for each number, placing them in the scene, timing them and making a camera path for the whole lot. Meanwhile I was working on effect development. Then Jani developed the introduction part with the head more or less on his own, and modelled and tweaked the tracks for the fluid parts while I worked on fleshing out the numbers scene with elements and effects. Then we integrated and worked together to finish. With a week or so to go there was a touch of panic and it looked like we weren’t going to get there; but in the end we found ourselves more or less done 5 days before the competition. For once we had time to polish, tweak and optimise. Hope it shows..

As an aside: the Gathering was a great event for us not least because they also held the Awards, which recognises the best demoscene productions from last year. We got 11 nominations and after a very rock & roll ceremony full of glitz and fireworks came away with 4 awards: Ceasefire for best music, Agenda Circling Forth for best effects, technical achievement and the cherry on the cake: best demo of 2010. Ooooh. Apparently we just missed out on Public’s Choice by a few points – but hey, no accounting for taste.. ūüėČ

32. Particles. Again?

I’ve realised over time that I’m not really a traditional “democoder”. I’m a graphics researcher who happens to prefer to show his new work off in whatever demo we make next. That probably goes some way to explaining why I do things the way I do: researching and improving on certain areas (like particle systems or fluid dynamics. but not ribbons. bitches.). Some would say that fluids or particles are effects: you “do” fluids for a scene in a demo, then you go “do” something completely different. I don’t subscribe to that. For me the achievement in a demo like this is not to implement fluids: we first used fluid dynamics in a demo 5 years ago. The challenge is to move the field on – to do something new with it that nobody else has managed to do in realtime yet, or not on the same scale. Of course there’s a point where this gets lost on the viewer, and maybe it does just become “nice particles” to the uninitiated.

Although the natural reaction of some people will be “oh, particles again – nothing new!” – this is probably the biggest technical leap we’ve made for a demo since Blunderbuss. Instead of concentrating on the amount of particles and simply using them to render 3D scenes with a few modifiers on top, we concentrated on the cleverness of the particles: the simulation itself and the rendering/shading. In this demo the particles are smart. They’re going somewhere.

Particles are just a primitive like polygons or lines – not interesting in themselves. Creating and rendering a lot of them is easy. Making them do something interesting and look good is a completely different kettle of fish.

So lets talk about what we did this time to make particles do something interesting and look good..

93. Smoothed Particle Hydrodynamics (SPH)

SPH is a form of fluid dynamics which uses particles for storing the fluid and the transport of the forces/densities, rather than a grid. This allows you to represent more detail at higher resolution than a grid would allow given the same memory / performance limitations, it’s not limited to a certain area of space, and it makes collisions more practical and it’s a better fit for liquid effects. It’s the scheme used in professional offline packages like Realflow, used for all those nice liquid splashy effects you see in ads and movies – which take hours to simulate, let alone render. Good SPH is for me one of those holy grails of¬† effects development (like realtime radiosity). The thing is, the quality and scope of effects you can do with it is directly dependent on the number of particles – and so is the difficulty in pulling it off. If you have a few thousand you can make some droplet effects; with 10s of thousands you can make some nice splashes; and with 100s of thousands or millions, you can start to make really amazing running water simulations.

Early tests with SPH fluids

Early tests with SPH fluids

Early tests with SPH fluids - with environment

Early tests with SPH fluids - with environment

The problem with SPH in realtime is it’s really really hard. The simple explanation of the algorithm is: “take all the particles near my particle and perform some force exchange between them”. The force exchange is easy; the “all the particles near my particle” is a bitch. On GPU it’s even more of a bitch; and in 3D it becomes an order of magnitude more of a bitch.

Other demos have featured SPH before; FR-063 performed it on the CPU with (what looks like) between 1000-10000 particles. The current bleeding edge for 3D SPH in realtime is around 250,000 particles, working on a top end GPU using CUDA and with simple point rendering (and no effects or anything else on top). The current bleeding edge for 3D SPH on DX9 – i.e. with no compute shader / CUDA – is erm.. I dont actually think it’s been done.

The problem is simply the neighbourhood search. You end up with a variable amount of fast-moving particles affecting each particle, where it’s hard to pick an upper bound – so the spatial database is hard to construct. If you solve the neighbourhood search, you can solve SPH.

The demo features up to 500,000 particles running under 3D SPH in realtime on the GPU, with surface tension and viscosity terms; this is in combination with collisions, meshing, high end effects like MLAA and depth of field, and plenty of lighting effects. On DirectX9. It’s fast. Almost impossibly fast. How? We found a new approach to SPH where we can re-form the neighbourhood search term to something much easier to solve on a GPU. Meaning we can, honestly, get very close to what a program like Realflow can do over hours of simulation – but in realtime. And that, for me, is what demo coding (and realtime graphics) is all about.

There are 4 scenes which are directly showing “fluids” in the demo; a couple more using SPH in places for the great quality it has that it makes the particles spread out really nicely rather than bunch together randomly. In each of the fluid scenes it’s basically a load of particles dropped at the top of a very long track, and left to get on with it. The camera captures only a part of the action at any time – the great battle of “design vs showing off code” resulted in something that probably doesn’t completely sell the effect, but it does make something more enjoyable to watch. And that too is what democoding is about..

I thought it’d be nice to show it in isolation, so I put a couple of screenshots and a video above. Aside from that one embedded video – apparently wordpress is a little bitch and won’t let me embed more than one video link into a blog post – you can also check the reverse angles here and here. Those and the above screenshots show an initial test shot we did with 3D SPH – we drop 250,000 particles, and let them run with SPH and collisions against a mesh (handled as a signed distance field). Look, it splashes about and shit like that. All completely in realtime. Oooooooh. If nothing else, being able to run it in realtime makes it a lot easier to tweak. You get instant results – you don’t have to wait for any simulations to calculate. In these days of youtube and the prevalence of netbooks, perhaps high end realtime graphics doesn’t have the same relevance to the audience that it did 15 years ago – but it sure matters a huge amount when you’re actually making something. The benefit to the workflow is huge.

12. Signed Distance Fields

I touched on this for Ceasefire, but it was this production where we finally got them working and used them in anger: the use of signed distance fields for arbitrary collisions (and attraction) with particles. We take polygon meshes, convert them into signed distance fields using distance to triangle measurements and place the results in a volume texture, giving us the means for fast collision ray tests. This is absolutely invaluable when using fluid dynamics because otherwise the particles fly off merrily into space. So we have particles flowing around a head; particles flowing down a track carried by SPH; and particles being blown by a 3d fluid effect into the form of a word. All using signed distance fields.

We used them for a lot more besides particle effects, though. They’ve become an integral part of our rendering pipeline. That will become more apparent the next time we do something featuring a lot of solid 3D.. but they’ve opened up a lot of doors.

One clear example of SDF usage comes in the first “fluid” scene – falling drops collide with invisible words. This also neatly demonstrates the “art vs code” issue – we’re simulating 250,000 particles under SPH running down a long 3D track, and the camera shows a small subsection of those. The collision with the words actually uses two affectors: we used a collision node to make the particles bounce off the 3D words (using an SDF version of the mesh), which worked great – but it means you only see the top of the words. ūüôā So we added a second affector – a low weighted mesh attractor which pulls the particles towards points on the faces of the mesh. This helped the particles slowly run down and also pulls them in from 3d space towards the words. It also added to the surface tension effect by keeping them attracted to the words even after they fall off the end.

65. Particle Shading

In my original post on my particle system a year or more ago I talked about how we¬† had support for opacity shadow maps for self shadowing on particles. Since Blunderbuss we didn’t actually use that much – we’ve mainly got away with unlit particles, using the shading and lighting from the source meshes. But I’ve been working on some new techniques and had to make use of them..

The major problem with opacity shadow maps is depth aliasing – you only have a limited set of depth samples (16 in my case) for which to represent the scene, and it’s not enough. They tend not to be spread evenly across the particles either. So I tried a few new methods:

252. Volume Shading

This method borrows heavily from slice-wise volume rendering: the particles are sorted in light space by depth, nearest to furthest, and rendered in slices to composite the image. In this case though we only care about the shadow result: the values are written into the per-particle shading buffer used in the final particle render.

The sorted particles are rendered into the shadow map in batches – typically we used 64 batches per particle system. Per batch we additively render the batch particles into the shadow map, then project the shadow map onto the particles into the next batch: the value read from the shadow map is considered the amount of shadow on that particle from particles closer to the light.

opacity shadow map version

Rendering using an opacity shadow map

Rendering using volumetric shadowing

Rendering using volumetric shadowing

This clever bit is, this method doesn’t care about the actual depth of the particle : it only cares about the position of the particle in the sorted sequence. No depth writes are required and transparency is supported without any problems. One additional benefit of the technique is that we can blur the shadow map a bit after each batch, giving a scattering effect. If one had the power to do it and could render one particle per batch, it’d give a perfect shadowing result. As it is, the batch sizes give some slice aliasing.

Unfortunately the slice aliasing was too much of a problem with large sytems and the technique is also a bit too slow – and generates a lot of render target swaps. So I came up with something better..

15. “Stochastic” Shadow Mapping

This isn’t the same as the stochastic shadow mapping paper that was recently presented, but the name makes a certain amount of sense for the effect anyway. ūüôā The basic idea is something I’ve tried a few times on and off since 2009. The idea is that if your particles don’t overlap pixels in view space, you could render them as solid – using regular shadowmapping and lighting techniques. Of course this is rarely the case in a render – because particle systems rely on lots of small elements overlapping and blending¬† to look solid and nice. However, what if you do render them as single pixels and make them not overlap, and then perform a full screen 2D operation to upscale each point and make them overlap and blend?

We applied that approach to shadow maps generated from particles. The particles are rendered as single points to a very large shadow map; this gives us a reasonable chance that the particles won’t overlap. It’s just like a spatial hash – with a very simple hashing function and no collision handling.. Then, when sampling, we read from the map using a large kernel and sum up the amount of filled pixels which pass the shadow map test to give a shadowing result.

Stochastic shadowing in action, on something that is definately not a semen cell.

Stochastic shadowing in action, on something that is definately not an artistic interpretation of a sperm cell.

But there’s a twist: in order to improve the quality, cope with hash collisions and reduce aliasing, we perform a temporal reprojection step. When writing the shadow map each frame a random sub-pixel offset is applied to each particle which varies every frame; this means we get a different set of collisions, so different particles become visible each frame. Then when sampling the shadow map we blend the result with the previous frame, so the results adjust smoothly over time. By combining these two things we get a very nice, soft, reasonably alias-free shadow solution which is also efficient to render. No sorting required. The final shadow value per particle is written into a buffer and used at particle render time.

I also experimented with the technique for the actual rendering of the particles to the main frame – rendering single points with Z test and blurring the buffer out, with some per-pixel sorting during the composite, to create softened particles but without the need for a full particle sort. Unfortunately it didn’t give us the visual fidelity we needed; we relied on the blending of particles, the variable sizes and the sprites used. Could be more applicable in a future project though.

536. Meshing (Marching Cubes)

I suppose it’s the obvious step, isn’t it. Democoders love metaballs. Being able to render particles as meshes using metaballs is something we’ve wanted to do for ages because it moves us towards the “liquid” look – the Realflow-style look. We’ve been here before: in Frameranger we rendered around 50,000 metaballs in realtime by generating a potential field, converting it into a signed distance field and raymarching it. Results were promising but not perfect: being able to generate an actual triangle mesh has some side benefits, like being able to post process the mesh and adjust it with tension – something we really wanted to do to get closer to that Realflow look I keep going on about.

Marching cubes gives two issues to solve: generating the potentials, and then triangulating them. We already worked out how to generate the potentials some time ago for Frameranger, although a bit of work was required to scale it up to 250,000 particles. The second part is more difficult: you need to generate an arbitrary amount of geometry data from that potential field with triangle and vertex counts that change every frame. Naturally, we could quite easily make an implementation which just generates the worst case: treat every cell in the volume as if it was contributing triangles, then write degenerates for the invalid ones. That actually works – but it’s prohibitive for large volumes. One cell can contribute up to 5 triangles, and with a 128^3 volume we’d be looking at 10 million triangles – which isn’t great. 256^3 volumes would effectively be impossible. What we need is a way to only process and send triangles for the cells that are active.

This is problematic because we can’t generate index or vertex buffers on the GPU, we can’t generate drawcalls on the GPU (so we can’t vary how many primitives are rendered on the GPU) and we can’t use the CPU – because the potential field is on the GPU and it’d be far too slow to get it back to CPU. And even if we could, the CPU probably isn’t up to the task of generating the geometry fast enough anyway. And even if it was, we’d have to send all the triangle data back to the GPU again. So we’re stuck with the GPU – and yet we don’t have a way to vary the number of cells we render triangles for.

metaballs in numb res

It seems impossible. However, Gernot Ziegler came up with a nice solution a while ago: histopyramids. This is a way of performing stream compaction on the GPU: it takes a big sparse buffer, and moves all the filled elements to the start of the buffer. A bit like a sort, but much more efficient. This gives us exactly what we need: we generate the (sparse) potential grid and use histopyramid compaction to move all the filled elements to the start. Then we use an occlusion query to count the number of active cells and use the CPU to generate batches which give enough triangles for the count to generate. The actual vertices are generated using a pixel shader and vertex texture fetch is used to read them.


4. Bokeh Glows

I’ve had this effect on the back burner for a few years but finally got to actually finishing it up.. Bokeh is the term relating to the effect of circular or shaped highlights in a depth of field effect, caused by inaccuracies in the shape of the lens of a camera. Or something. They make DOF look really nice. I’ve tried before by using a really big circular kernel for a regular DOF effect with an HDR input and leaving it at that and it actually does work, but I wanted to see if I could get some shaped bokehs and really overblow it. So I tried something with point sprites.


bokeh, innit. turned up to max, of course.

The basic idea is to work out where on screen bokehs would happen, and render point sprites at those points. I did this using the following method:

– Bilinear downsample the screen (in several steps), storing the 2d position (UV) of the brightest point of the 4 values of the quad that were read to a render target.

– Use those 2d positions to read a blurred version of the original frame. Perform some thresholding to pick out the points which pass. Generate colour values for the points.

– Temporally smooth positions and colours using positions from last frame, apply some attack and decay.

– Render a load of point sprites using vertex texture fetch to read the positions and colours, rendering the sprites to the screen. (With some additional magic to make it look good.)

72. Post Process Antialiasing (MLAA)

This is the first demo since 2009 (Frameranger, in fact) that we’ve released which actually features polygons being rendered as polygons. Happily, time has moved on, and so has our renderer. One of the major bugbears I had with the deferred renderer is lack of antialiasing – but fortunately a whole bunch of post process antialiasing techniques got invented in the last couple of years. MLAA is the technique du jour, and we use an implementation in our renderer. It’s great.

We do two little twists in our version to make it cool: firstly we use a lot of stencil optimisation so only the active edges get the big-ass shader applied to do the actual MLAA (or in fact get any of the process after the edge detect applied). And secondly.. there’s an ugly problem with MLAA in that it actually cocks up quite badly in a certain case. The technique relies on checking for horizontal or vertical edges. But where you have a pixel which is both a horizontal and vertical edge, it messes it up. Which breaks about 1/4 of the diagonal edges you have to deal with, so its pretty noticable. Our oh so clever technique for fixing that is.. do the MLAA twice. ūüôā The second time we flip the whole image in x and y, then MLAA it and flip it back. Genius huh? .. no? Well, it makes the polygonal scenes look good, and fortunately the stenciled version is so fast the extra hit isnt really noticable.

42. Stereoscopic 3D

We really wanted to do something with 3D for a while, but sadly we dont have any true 3D hardware (*cough* donations please *cough*). We decided quite early on that we were going to go for a pretty much black & white look – so it would actually be feasible to use the good old red / cyan anaglyph method. 3D isn’t as easy as just turning it on, though. It takes some effort to make it work well, give a good effect and not strain your eyes. We tuned it quite carefully and the setup of the scenes really helps – the first scene is slow and quite static so it lets your eyes adjust, the camera movements are quite smooth and in a single direction so they’re easy to track, and so on and so on.

Do watch the demo in 3D, it’s really made for it. We’re going to make a proper HD 3D video with left & right splits soon for those with real 3d setups.


I guess what’s interesting for me about this demo is that it was so much easier to make than many we’ve done. It just kind of came together; we started early enough, we got the music at the start, we¬† didn’t have any major problems, nobody disappeared or dropped out, everything showed up on time, we didn’t completely overstretch ourselves and come up with some ideas that couldn’t be done, and we had time at the end to go over it and tweak and polish things, and we’re really happy with how it turned out. It’s like the way it’s supposed to go but never does. It doesn’t work for everyone (not very bombastic, you see) but it seems the people who got it really got it and like it, which is what matters. Maybe we’ve actually cracked it.. or maybe next time’ll be a royal screwup.¬† Have to wait and see..

An amusing realisation hit me the other day. We’ve unintentionally managed to make a demo which is entirely full of sexual references. There’s a load of massive sperm cells; there’ what looks like a female gender symbol, made up of little sperm cells; there’s a load of sperm falling down and colliding off things; and then there’s a big river of .. well, it’s not much of a stretch in context to call that fluid “spunk”, is it? It only dawned on me after Dixan commented that it was “finally a good demo about semen” on pouet, and I started thinking about it.


February 25, 2011

ceasefire (all falls down).

Filed under: demoscene, particles, realtime rendering — directtovideo @ 11:13 am

Ceasefire (All falls down..) by CNCD vs Fairlight – 2nd place, Assembly 2010 combined demo competition. Youtube Pouet Download executable binary

(This is late. Really really late. Sorry. I’ve been busy! Honest.)
It’s become traditional for us to do something for Assembly (in Helsinki, Finland, 5-8 aug 2010). This year we wanted to do a demo that continued from Agenda with the particle theme, but took things further – we felt like we barely scratched the surface of what was possible. And we actually started quite early, almost three months before. The core plan and direction was laid down and we organised the soundtrack. We wanted to try and really plan something out and make something big.

100% particles

Unfortunately when man makes plans.. well, it completely didn’t work out. The soundtrack didn’t come out as we hoped, the demo plot was far too bound to the soundtrack and the visuals were far too bound to the plot – we were at the mercy of it. Every scene was required, every part needed for it to make sense. We realised the whole thing wasn’t going to happen. So we started again. We hunted around for possible tracks and in the end Hunz came to the rescue – he let us use his beautiful track “All Falls Down” and also remixed it for us to fit the direction on a very short timescale.

So about that direction – well, the original plan for the demo was this sort of time-shifted end-of-the-world meet your maker theme where a city gets destroyed by some sort of holocaust, but then a phoenix rises from its ashes. It was going to be great! Trust me. Well, happily the new soundtrack – with strong vocals leading the way – actually did support this theme, but we were able to do something more loose than we had originally planned – disaster-related scenes, but less of a central plot to be reliant on.

Naturally the engine had matured a bit since Agenda, and we now benefitted from overall better performance, as well as a number of new features and effects; in particular lines / hair, displacement mapping of particles and collisions with distance fields. There were also a few effects I made specifically for the demo: fire using fluid solvers, raytraced spheres and a tidal wave thing. I’ll go through some of those in turn.


A natural step when you’ve got a particle system is to try linking the particles together with lines so you get something like hair, and that’s how this started out. Then you’ve got two immediate issues to overcome: how to get the right particles linked together so it isn’t a jumbled mess; and how to make them move in a way that appears connected. Fortunately if you solve the latter you’re a long way to fixing the former.

Firstly, we assume that particles next to each other in the texture are part of the same line, up until some line length is reached. For simplicity’s sake all the lines contain the same number of particles, and that number is a power of two so a number of lines fit neatly into the particle texture. Lines are arranged solely in the X direction of the particle texture and can’t spread onto multiple rows: i.e. the maximum line length allowed is the width of the particle texture. With this arrangement you’ve got a pretty easy way of finding the particles that make up one line, of finding the next and previous particles in the line and so on. For example, in a 1024×1024 particle texture and a line length of 256, I have 4 lines per row – 4096 lines in total.

Connected movement is achieved by using a spring solver. Particles attempt to maintain a certain distance from their connected neighbours in a line by pushing and pulling towards them; several iterations of that are performed per update. So it’s simply a case of looking at the next and previous particles in the line and moving the particle towards or away from its neighbours as appropriate. End points can be anchored if we want.

Ah, but why do the previous and next particles actually make sense as line neighbours in the first place? Can’t they be anywhere in space? No, because I have a special emitter that emits particles in a suitable way – i.e. as lines in the first place. This can be done using a random direction, or using normals from a mesh, or to fill a mesh, or along contours of a distance field. If they start off in a good shape, and there’s a spring solver on them to keep them in a good shape, they stay in a good shape. Easy.

For rendering we have a couple of options: line primitives or camera-facing quad strips. Quads have the advantage of having actual thickness, but they’re slower to render and have to be at least a minimum thickness or they get culled by the hardware. We tessellate at render time using catmull rom splines so lines can be smoother – that’s just done in the vertex shader. We use opacity shadow maps just like the particles use – so the lines are self shadowed nicely.

The shading had quite a lot of faking involved too, actually. I used a blend between a few colours; a dark tone which is used as an “occluded” colour near the root, and a lighter “unoccluded” tone; then a couple of tones to randomly pick between for each strand of hair.

*Unreleased material alert!* The hair effect when used on a horse, a while ago

Naturally as with all these particle things, the issue isn’t about numbers, it’s about control – and that was the trick: emitting to fill a mesh (a match stick in this case), getting all blown about and then reforming into that mesh again. It turned out the curl noise affector worked great on lines because it has spatial continuity – it made it look like hair underwater, which is exactly what we wanted.


I spent some time looking into how to do a good fire effect with the help of some Siggraph papers. Fire is quite hard to do properly – you have to capture the large-scale and small-scale movements. The really good way to do fire is to use a massive 3D fluid solver which is big enough to capture the small-scale details – but that’s completely prohibitive in terms of memory and performance. So there’s an approximation. The basic theory is, you use a small number of screen-aligned 2D slices each running their own separate 2D fluid solver; and you blend the input velocity and density across the slices so they all have pretty similar source data, which means they all move in a way that makes sense across the slices. Then you add some procedural fluid flow (read: curl noise) on top to add detail.

The way I started was to follow the paper and use particles for inputs. You render them as particles extruded into quads to capture the motion, rendering both density & temperature and velocity into the slices as MRTs; then you apply 2D fluid solvers to the slices, apply some procedural motions and render the slices view aligned with some shader to generate colour from temperature. Well, it turned out to be a total bitch. It appeared the paper left out a few critical details, and it didn’t work out quite the way¬†I hoped. The biggest problem was one of scale – getting a fire that would work for a big volume of it – like the heads of some tikis – was very different to one that worked for a small one like the burning head of a match. Also we couldnt get quite as many slices as we wanted because it was just too heavy with large, high resolution fluid simulations, even in 2D. The particles also didn’t give a clean and smooth enough result, even when extruded into quads.

In the end we ditched particles as inputs totally and used meshes instead. Well, GBuffers anyway. I rendered the meshes to GBuffers and blended those into the fire buffers, weighted by depth from slice and generating velocities using perlin noise and the screen space normals. This gave a much cleaner result which was more controllable and a massive amount faster. Still a total bitch to get the scales working well for different fires, though.

Evolution of the fire effect
evolution of fire
evolution of fire
evolution of fire
See, it got better

And then there was the rendering. You would think it’d be easy to map a floating point temperature value into good looking colours, but it wasn’t. I also had to blend them across the slices and with other scene elements, and there just didn’t seem to be a mode that made it look good. It took an age of tweaking and I never was satisfied with it.

In the end we got.. something. I wasn’t totally happy with the effect but it did¬†add something to the demo¬†that wasn’t particles.¬† It looked pretty good when applied to that fucking phoenix at the end though.

Raytraced spheres

Problem: render a reasonably large number (lets say 100s) of moving spheres that can overlap in screen space, and are all refractive, with a reasonable degree of accuracy. Solution? Lets see.. they need to refract the background which is easily achieved through render to texture; that alone could be achieved with a simple rasterisation-based approach. But they also need to refract each other given that they could overlap a lot – and that overlapping makes rasterisation inappropriate, and a raytracing solution would be better. Oh, and we also need it to not eat too much frame time given that it’s a small part of a much larger scene, so that – combined with the large number of spheres – prohibits a simple brute force approach of checking the ray against each sphere per pixel and then again for the refractions.

Spheres, during development
Raytraced spheres turned into particles, early in development. This effect was a right pain

What I needed was a way of reducing the problem down to a smaller set of spheres per pixel which are likely to affect the ray at that pixel. One way would be to build a 3D spatial database for the spheres and use that to trace more efficiently, but that isn’t all that pixel shader friendly – or easy to update per frame. So I cut a few corners and went for a 2D approach. The idea was, at a low resolution I worked out which spheres overlapped each pixel and stored those spheres in render targets; then at a high resolution I only consider the spheres in those render targets to trace through, rather than all of them. In order to cope with refractions I had to be a bit generous on the overlap test, but it worked well. The low resolution classification step was a long shader that looped through the large number of spheres – sorted front to back and roughly pre-classified on CPU to only check those vaguely near the pixel – and gathered the first 4 that overlapped, writing them to MRTs. The high resolution tracing shader loaded the 4 spheres from the render targets and checked them for ray intersections, then traced the ray through for refractions, finally getting an exit direction to look up the back buffer. 4 spheres was usually enough overlap to get believeable refractions – and hey, we were going to turn it all into particles anyway, so there was room for error.. wait, what was that about overkill?

I’ve used this approach before to render large numbers of metaballs (1000s) too; the problem is that with a lot of balls you start to need a lot of overlapped spheres per pixel, and you simply can’t cache enough, so it breaks down. To do 1000s of metaballs you need a different approach, but that’s something for another post..

Particle fun

One of the main scenes in the demo involves a street of buildings which gets blown up, building at a time, into particle explosions. That got.. pretty heavy. Each building was built of 1m particles, so we ended up pushing 10m particles per frame through the render. Ow. That was just not going to fly as regular particles where we maxed out any reasonable GPU at 2m Рand blew all kinds of memory limits with more than that Рso we had to do some things to cut it down.

blow that shit up
New PC shadebob record

The first idea was “static particles”. The idea was, don’t do all the simulation and sorting the particles go through; just use the position and colour textures that were pregenerated for emission from a mesh, and pass them straight to the¬†particle renderer. The particles could be pre-sorted in that texture for a rough camera direction so it looked alright. This obviously slices the¬†amount of work done per frame a lot. The particles¬†would be static though, but we could use displacement mapping effects (see later) to add some movement. We could also fake them fading in and out for lifetime cycles.

This trick bought us a lot of the time back; we could actually render the scene with this and get some sort of sensible framerate. But we¬†didn’t want a static scene, we wanted to explode the¬†buildings. So¬†I devised a scheme of smoke and mirrors, whereby¬†a building¬†is static particles until it explodes, and then switches seamlessly to a proper particle system.¬† Buut, you cant very well keep them all as particle systems after explode because it wastes loads of VRAM, which we’re already pushing too hard; so I wait until the explosion gets almost static and then switch them to an imposter by rendering them to a texture.

Displacement Mapping

Displacement mapping was used to add a per-frame offset to particle positions. This is done at render time only; well, not actually in the vertex shader, but as a pre-pass just before the render which processes the position buffer. It’s means it’s a temporary operation – it doesn’t have to persist to the next frame so it’s not part of the simulation, so the results don’t get stored and eat memory. So it works on static particles like on the street scene, which is ideal because we needed to add some movement there.

I added a bunch of operators Рaudio-based FFT modifiers, perlin noise movement modifiers, and things using images. We used it for some pulsing audio effects and a few other bits and pieces. Simple but oh, so effective.

Depth of field

Jani came up with this and it worked out a total treat. The idea is that we had so many particles that we could achieve a depth of field look just by randomising the positions a bit at render time (in vertex shader), where the randomness is controlled by the distance from focus. It took a fair few goes for him to explain it to me in a way that I understood, but once we got there¬†I added it and it totally worked – it looked great. We could use it for focus¬†pulls, “blurring out” shots¬†and so on.

Particle randomisation for depth of field

Distance fields

The subject of collisions with particles against meshes had come up before. Like that of real particle fluids – i.e. SPH – or rigid bodies or meshing, it usually get met with¬†“in realtime? fuck off” or¬†“yea.. I bet in 5 years we’ll be doing that” or “I’ll get to it when I’m done adding the radiosity solver” or¬†some other smartass coder vs artist remark. Like what we used to say about shadows in¬†the 90s. ¬†Of course, those arguments always end up evaporating because¬†it actually gets done in the end when someone comes up with a practical, simple, workable way of doing it. And so it is here.¬†All the hype about distance fields made me¬†get around to¬†writing a proper¬†mesh to¬† signed distance field conversion routine for some effect or other, and¬†I realised it would make perfect sense to use for particle collisions. With meshes.

It’s a pretty simple¬†routine; get the¬†particle position in the space of the distance field, see if it’s inside, work back to find the 0 contour and the¬†field¬†normal at the hit point and then do something. Like move the particle and set some bounce velocity. ¬†So I did it and we used it with the tidal wave scenes, and it was great! Particles colliding with logos, with 3d scenes, and so on.

Well, it would have been¬†great if the routine had worked. It didn’t; the mesh to distance field conversion was¬†broken, so parts of the field were all wrong and it produced all kind of funny results. We managed to fudge the effect enough to ¬†get through the demo but it wasn’t until months later that I realised the mistakes and made something that really worked properly. In the demo it works in a few places but it’s not quite what it should have been.. so you get a few splashes off the logo and some collisions with what basically ended up as boxes in the subway scene.

The good news is I fixed it since, and it’s brilliant. So many applications for it; although the real challenge is in getting an accurate signed distance field of an arbitrary complex mesh efficiently in the first place, and that was what took so long to solve. It probably deserves a whole article on it’s own so let’s leave it there.


I don’t know how this came about, but someone – might have been me actually – had the idea of using an ocean water effect and making the particles follow it. That water routine is so old. I’ve had it working since about 2003 and never actually used it in a demo, although it was planned for a couple and didn’t make it. It’s the implementation of Tessendorf’s FFT-based ocean water simulation, and it gives you a nice realistic ocean water heightfield which people usually use for meshes. I remember at the time I wrote it it worked fast on something like 32×32 or 64×64 grids on a PC CPU (due to the inverse 2D FFT you need to do), which wasn’t all that good looking. Since then Caspar did one on the PS3 running on SPU which ran at 256×256 if I remember right; fortunately PC CPUs caught up and now I can run it at a decent resolution pretty comfortably. If you want to know how the ocean routines work, google Tessendorf FFT ocean water and you’ll no doubt be presented with a load of material.

Original version of the water effect

The water in the subway scene, later

That was the first step; but then we started messing with it. We had a subway scene where we wanted to fill it with water and make it look like a wave was crashing through it. In an ideal (fantasy) world that would be done with proper fluid dynamics; I thought it’d be better (i.e. achieveable) if we faked it by taking the ocean effect and applying some magical space modifier to it to warp it into the shape of a wave. Simple.. a wave curl is a bit like some warped bell curve shifted and curled around by a twist / vortex equation. Right? Except somehow I was attempting to do this really late at night not all that long before the deadline, and I just couldn’t get it for ages and ages. GCSE maths is hard.

The subway scene

Post Processing

I have to quickly mention the post processing effects – well, effect – that we used to make the screen all break up and look like a broken video recording. A lot of people moaned about it, some liked it. Personally I love it. It’s a combination of a load of different small things which go together to make something cool. We mix between a load of distortions using sinewaves and noise – some on scanlines, some on blocks; stretching, offseting and flipping the screen; and then this frame-holding effect where we keep a history of a few frames and randomly hold them or jump between them for a little while. There’s something really satisfying about taking a scene you’ve spent ages lovingly crafting, and then messing it up on purpose.

So there it is – we tried to make plans, it didn’t work out, and we made something much quicker instead. I’m really glad I got to work with Hunz, and I’m happy with some of the routines that were put together pretty fast. Demo compos, like war, can be the source of great innovation and technical advancement – if things have to get done, they get done. Yep, demo compos are a lot like war actually. Except you can watch them with a few beers in the grandstand of a hockey arena, not on CNN.

I happened to do a seminar at Assembly which is here – if you want to watch 50 minutes of me discussing how we made our recent demos and at the same time being a cocky little shit. Go on, you know you want to.

Coming soon: all the new things we’ve been doing between when the content of this blog post was actually fresh and relevant, and now..

Blog at