Facebook Reality Labs researchers used a modified convolutional neural network to create the impression of selective blur inside virtual reality scenes to make things a bit more real. They’re now releasing the code, DeepFocus, as open source.
Facebook, the company that brought you the Oculus Rift virtual reality headset, is pursuing machine learning to fix some of the shortcomings of the medium.
In particular, VR headsets suffer from a phenomenon called “vergence-accommodation conflict,” or VAC, where what you see through a display pressed up close to your eyes doesn’t match how your brain knows things should look in the far distance.
At the very least, that means a less-realistic experience in VR; more seriously, it can induce actual physical discomfort in the wearer of Rift or another such device.
In a blog post, engineers at Facebook’s Facebook Reality Lab describe how they’ve been able to train a single neural network to selectively blur the parts of a 3-D scene to make it overcome the VAC.
Their invention, called DeepFocus, was first shown at the “Oculus Connect” conference this year. Now, they’ve posted the code for DeepFocus online in with an open-source license.
DeepFocus works with a special prototype headset that the Reality Lab team have been developing over the past three years, called “Half Dome,” which is an example of a “varifocal” head-mounted display. Unlike the standard Rift and other VR kits, varifocals have eye-tracking camera systems, and lenses that are placed on actuators to move forward and backward. This lets the headset adjust the “depth of focus” by moving the image as the user’s gaze moves.
But hardware alone is not enough: the device still needs software in order to create the kind of background blur that the brain expects, rather like the way today’s iPhone X and Pixel phone adjust background “bokeh” in photos. The combination of hardware and software that recreates the feeling of depth in VR is part of an emerging field of study called “Computational Displays.”
As described in the formal paper, presented at the SIGGRAPH conference and posted online, prior approaches have combined multiple projectors in the physical headset with the well-established graphics technique of ray-tracing to generate images at many angles as the user moves. But that approach is computationally heavy, making it cumbersome for real-time effects that adjust as the user shifts their gaze.
Enter DeepFocus, which uses a convolutional neural network, or CNN, the workhorse of so many machine learning tasks. A CNN typically combines “pooling” layers that help to establish the higher-level discovery of features of an image. In this case, the researchers replaced those layers with what they call “interleaving” and “de-interleaving” layers that preserve some of the coarse, low-resolution information about the images.
As they write, “we introduce the interleaving layer to intentionally downscale the input high-resolution images before feeding them into convolutional layers.”
This new neural net is trained by exposing it to many images created by “scene generator software,” with arrangements of objects in complex layouts with occlusion that create layers of depth. Each of 4,900 “stacks” contains forty versions of an image with varying depths of field.
The network is fed the image stacks, a “depth map,” and something called a “circle of confusion” that gives the network a hint as to the extent of blur the image should have. The network learns to create a target blurred output using gradient descent.
The authors write that their blurred images are more “physically realistic” in their blurriness than are blurred images created by the widely used Unity video game software development tool.
Interestingly, the authors confess two instances where DeepFocus is flummoxed. In the case of semi-transparent glass surfaces, and mirrors, they report, the network is unable to figure out the correct depth of focus, instead making things such as an object that’s supposed to be behind glass, and therefore blurred, have no blur at all. As they point out, “there are no single correct depth values” in such scenes. “These failure cases require richer input information for physically accurate estimation and we leave this for future work,” they write
Date: December 26, 2018