Stupid memory problems!

  / by Ton

The original title for this post was actually “Stupid Macs!”, but luckily this story has a happy ending, also for OSX. :)

As you all might have noticed by the lack of blog postings, since early february we’re making very long days to get the final scenes rendered. This was the first real stress test for the Blender Render recode project, and needless to say that our poor fellows suffered quite some crashing… luckily most bugs could be squeezed quickly. After all, this project is also to get Blender better, eh!

With scenes growing in complexity and memory usage – huge textures, lots of render layers, motion blur speed vectors, compositing – it also became frustrating complex to track down the last bugs… our render/composit department (Andy & Matt, both using Macs) was suffering weird crashes about every hour. It either was in OpenGL, or it were ‘memory return null’ errors.

At first I blamed the OpenGL drivers… since the recode, all buffers in Blender are floating point, and OpenGL draws float buffer updates while rendering in the output window. They are using ATIs… which are known to be picky for drawing in frontbuffer too.

While running Blender in debug mode and finally getting a crash, I discovered that memory allocation addresses were suspiciously growing into the 0xFFFFFFFF range, or in other words; the entire memory space was in use! Our systems have 2.5 GB memory, and this project was only allocating like 1.5 of it.

To my big dismay it appeared that OSX only assigns processes a memory space of 2 GB! Macs only use the 2nd half of the 32 bits 4 GB range… this I just couldn’t believe… it wouldn’t be even possible for the OS to swap out unused data segments while rendering (like texture maps or shadowbuffers).
After a lot of searching on the web I could find some confirmation on this. Photoshop and Shake both mention this limit (up to 1.7 gig is safe, above that Macs might crash). However; the Apple Developer website was mysteriously vague about this… stupid marketing people! :)

Now I can already see the Linuxers smirk! Yes indeed, doing renders on our Linux stations just went smooth and without problems. Linux starts with memory allocations somewhere in the lower half, and will easily address up to 3 GB or more.

Since our renderfarm sponsor also uses OSX servers, we still really had to find a solution. Luckily I could find the main reason for memory fragmentation quickly, which was in code calculating the vertex speed-vectors for the previous/next frame, needed for Vector Motion Blur. We’re already using databases of over 8 M vertices per scene, and calculating three of them, and then doing differences just left memory too fragmented.
Restructuring this code solved most of the fragmentation, so we could go happily back to work… I thought.

Today the crashing happened again though.. and also the render farm didn’t survive on all scenes we rendered. It appeared that our artists just can’t efficiently squeeze render jobs to use less than 1.5 GB… image quality demands you know. (Stupid artists! :).

So, about time to look at a different approach. Our webserver sysadm (thanks Marco!) advised me to check on ‘mmap’, a Unix facility to be able to map files on disk to memory. And even better, mmap supports an option to also use ‘virtual’ files (actually “/dev/zero”) which can be used in a program just like memory allocations.
And yes! The first mmap trials reveiled that OSX allocates these in the *lower* half of memory! And even better… mmap allocations just allow to address almost the entire 4 GB.

I added mmap in the compositor for testing, and created this gigantic edit tree using images of 64 MB memory each:

memory usage

On my system, just 1 GB of memory, it uses almost 2.5 GB, and editing still is pretty responsive! Seems like this is an excellent way to allocate large data segments… and leave it to the OS to swap away the unused chunks.

Just a couple of hours ago I’ve committed this system integrated in our secure malloc system. While rendering now, all texture images, shadow buffers, render result buffers and compositing buffers are using mmap, with very nice stable and not-fragmented memory results. :)
Even better; with the current tile-based rendering pipeline, it’s an excellent way to balance memory to swap to disk when unused, to be able to render projects requiring quite some more memory than physically in your system.

For those who like code fragments;

Equivalent to a malloc:
memblock= mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);

And free:
munmap(memblock, len);


(Too bad I don’t have time to show these AWESOME images that are coming back from the renderfarm all the time… or to talk about the very cool Vector Blur we got now, or the redesigned rendering pipeline, our compositing system… etc etc. First get this darn movie ready!)

« | »

67 Responses to “Stupid memory problems!”

  1. Timothy said on 21 Feb, 2006:

    O.K does anyone know how to got this to work. I spent a long time trying to get the space ship to work. For some reason the animation I made of the space ship going in to space does not play when I play the game. Tell me if it works on yours computer.
    Here it is

  2. tbaldridge said on 21 Feb, 2006:

    I e-mailed Ton a while back about some design issues with the renderer, and he stated that they are doing a 1 frame per node system. He said that they were getting about 1 hour per frame. So it wouldn’t be that bad.

    Really, the only reason you would need more than one node per frame , is when tne number of frames you are rendering is less than the number of nodes in the farm

  3. Nik777 said on 21 Feb, 2006:

    Really, the only reason you would need more than one node per frame , is when tne number of frames you are rendering is less than the number of nodes in the farm

    That’s not *entirely* true.

    1. If a node goes down in the middle of processing, you may have to re-run that entire chunk of work (ie re-render the entire frame). Dividing the task into smaller chunks reduces the loss, (and hence potentially duplicated work).

    2. A parallel system tends to run as a single stream at the end of the run (some node ends up running after all the others have finished). This single-streaming could last for any amount of time up to the time of a full chunk of work (that last node may have started when all the others had just about finished). The greater the variation in the complexity of the individual chunks (frames), the more likely this is. So, the smaller the work chunks, the shorter the possible time that the render-farm can run on a single node – but the greater the per-chunk overheads.

    Just some observations.

    BTW: The new work that you are describing sounds awesome!


  4. Davide said on 21 Feb, 2006:

    I think that it could be great if blender could render not in a frame-per-node basis; I know it is possible to make a cluster for blender in linux using OpenMOSIX, but again it works in a frame-per-node basis, in this case if you have an anim with 250 frames and 5 nodes, then you should manually (or with an external script) split the file (or settings in the command line rendering) in five so the first render frames 1-50, the second render frames 51-100, ….

    then launching the five renderings at the same time, openmosix will distribute it automaticly (in a process basis). but once again if it take 1 hour per frame and the power goes does at minute 45 (for example) you lose all the 5 frames, and have to re-render all of them (another hour to get those same 5 frames), so if it could render as a all, each frame will (supposly) take only 12 min. (and if the power goes down at minute 45, you only lose 2 frames); this solution could also be great in a network that don’t have a fixed number of nodes (nodes entering and leaving at anytime). Saying all this, I must admit that I don’t know if that is possible at all.

    To the Orange team: more 3 weeks to the online release isn’t it?? can’t wait anymore :P, don’t be to nervous with the deadline, I’m sure you guys will handle it just fine :D

    I really hope that we will soon see in the web lot’s of tut about animation (since they don’t seem to be a lot for now); speaking of that, one of the best tut about the new armature functions -particulary the stride and how to setup a walkcycle-) can be found here:

  5. Jackson Guardini said on 22 Feb, 2006:

    Hi there! Congratulations for the excelent work, it looks Marvelous and I can’t wait to see the rest :D

    Here is my help for the team, I found a site that has some nice textures and they are for free, Take a look!


    have a nice day!

  6. punkfrog said on 23 Feb, 2006:

    Glad to see your working out the technical problems but come on please post more art! Even concept sketches would be cool, please????

  7. Renato Perini said on 23 Feb, 2006:

    It is a little bit off-topic… But I’m searching for those links mentioned by Ton (Photoshop and Shake about the 1.7GB limit…).
    Can someone give me that links? Thank you.

  8. Bmud said on 24 Feb, 2006:

    Yep, blender crashes the 22nd time I subdivide a segment. I’m suprised that I couldn’t get it to go further with 2gb of ram! But this problem is fixed now, so its all good :D

  9. Ton said on 25 Feb, 2006:

    tbaldridge: erm, I mailed that “no frame took longer than 1 hour”. The average is quite less. Don’t have the latest stats though, we’ll post that info too, later. :)

  10. antique said on 5 Mar, 2006:

    hey Linux-lovers and Windows-worshippers ! Blender started on Irix. Irix has been 64-bit for way over ten years. You guys are more than a little bit behind the times ….

  11. ROUBAL said on 24 Mar, 2006:

    Today is D Day for the Orange Team!

    Congratulations to Ton and the whole team for the awesome job done these last months.

    Thank you also for coding ever amazing new tools for the community!

    I would have liked to be with you at the première! I’m sure that it will be a great moment!


  12. caliper digital said on 20 Apr, 2006:

    caliper digital can solve yhis problems 8-)

  13. Baaman said on 21 Apr, 2006:

    2 gig of memory!?! i don’t do too much advanced stuff but that seems like alot even for a program like blender. did you all rember to decimate your meshes?

    blender way aherd of the compition in hardware eficency, blender will full texture in realtime(sort of) while Maya lags up a pentium four with 2 gig of ram just to display a shadeing preview, of a single mesh. score one for open sorce!

  14. hyzaar said on 21 Apr, 2006:

    Baaman, yes 2 gigs of memory, whats so strange?
    caliper digital is much more easy to use ;))

  15. spudd86 said on 26 May, 2006:

    bit late to be adding comments but oh well

    about mmap, on linux malloc will use mmap all on it’s own if apropriate… at least glibc malloc will. ( think it’s for large memory blocks)

  16. Anders said on 31 May, 2006:

    I don’t understand too much of all this. I’m just getting into compositing and somehow, through the days browsing, came here through open source..
    I just want to say I’m impressed by what you do, and all the testing work.