/ by Ton
The original title for this post was actually “Stupid Macs!”, but luckily this story has a happy ending, also for OSX. :)
As you all might have noticed by the lack of blog postings, since early february we’re making very long days to get the final scenes rendered. This was the first real stress test for the Blender Render recode project, and needless to say that our poor fellows suffered quite some crashing… luckily most bugs could be squeezed quickly. After all, this project is also to get Blender better, eh!
With scenes growing in complexity and memory usage - huge textures, lots of render layers, motion blur speed vectors, compositing - it also became frustrating complex to track down the last bugs… our render/composit department (Andy & Matt, both using Macs) was suffering weird crashes about every hour. It either was in OpenGL, or it were ‘memory return null’ errors.
At first I blamed the OpenGL drivers… since the recode, all buffers in Blender are floating point, and OpenGL draws float buffer updates while rendering in the output window. They are using ATIs… which are known to be picky for drawing in frontbuffer too.
While running Blender in debug mode and finally getting a crash, I discovered that memory allocation addresses were suspiciously growing into the 0xFFFFFFFF range, or in other words; the entire memory space was in use! Our systems have 2.5 GB memory, and this project was only allocating like 1.5 of it.
To my big dismay it appeared that OSX only assigns processes a memory space of 2 GB! Macs only use the 2nd half of the 32 bits 4 GB range… this I just couldn’t believe… it wouldn’t be even possible for the OS to swap out unused data segments while rendering (like texture maps or shadowbuffers).
After a lot of searching on the web I could find some confirmation on this. Photoshop and Shake both mention this limit (up to 1.7 gig is safe, above that Macs might crash). However; the Apple Developer website was mysteriously vague about this… stupid marketing people! :)
Now I can already see the Linuxers smirk! Yes indeed, doing renders on our Linux stations just went smooth and without problems. Linux starts with memory allocations somewhere in the lower half, and will easily address up to 3 GB or more.
Since our renderfarm sponsor also uses OSX servers, we still really had to find a solution. Luckily I could find the main reason for memory fragmentation quickly, which was in code calculating the vertex speed-vectors for the previous/next frame, needed for Vector Motion Blur. We’re already using databases of over 8 M vertices per scene, and calculating three of them, and then doing differences just left memory too fragmented.
Restructuring this code solved most of the fragmentation, so we could go happily back to work… I thought.
Today the crashing happened again though.. and also the render farm didn’t survive on all scenes we rendered. It appeared that our artists just can’t efficiently squeeze render jobs to use less than 1.5 GB… image quality demands you know. (Stupid artists! :).
So, about time to look at a different approach. Our webserver sysadm (thanks Marco!) advised me to check on ‘mmap’, a Unix facility to be able to map files on disk to memory. And even better, mmap supports an option to also use ‘virtual’ files (actually “/dev/zero”) which can be used in a program just like memory allocations.
And yes! The first mmap trials reveiled that OSX allocates these in the *lower* half of memory! And even better… mmap allocations just allow to address almost the entire 4 GB.
I added mmap in the compositor for testing, and created this gigantic edit tree using images of 64 MB memory each:
On my system, just 1 GB of memory, it uses almost 2.5 GB, and editing still is pretty responsive! Seems like this is an excellent way to allocate large data segments… and leave it to the OS to swap away the unused chunks.
Just a couple of hours ago I’ve committed this system integrated in our secure malloc system. While rendering now, all texture images, shadow buffers, render result buffers and compositing buffers are using mmap, with very nice stable and not-fragmented memory results. :)
Even better; with the current tile-based rendering pipeline, it’s an excellent way to balance memory to swap to disk when unused, to be able to render projects requiring quite some more memory than physically in your system.
For those who like code fragments;
Equivalent to a malloc:
memblock= mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
(Too bad I don’t have time to show these AWESOME images that are coming back from the renderfarm all the time… or to talk about the very cool Vector Blur we got now, or the redesigned rendering pipeline, our compositing system… etc etc. First get this darn movie ready!)