I think it's because they already had proven capture hardware, harvest, and proc...

secretsatan · 2026-01-18T21:33:56 1768772036

Looks great by the way, i was wondering if there’s a file format for volumetric video captures

darhodester · 2026-01-19T00:22:28 1768782148

Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.

Check this project, for example: https://zju3dv.github.io/freetimegs/

Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.

Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).

itishappy · 2026-01-19T00:57:09 1768784229

https://developer.apple.com/av-foundation/

https://developer.apple.com/documentation/spatial/

Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.

numpad0 · 2026-01-19T02:12:07 1768788727

A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.

You just can't see the back of a thing by knowing the shape of the front side with current technologies.

itishappy · 2026-01-19T15:26:42 1768836402

Right! My terminology may be imprecise here, but I believe there is still an important distinction:

The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.

On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.

I also note that these distinctions start to vanish when dealing with video or using more than one capture device.

numpad0 · 2026-01-19T23:44:03 1768866243

No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.

So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...

and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.

secretsatan · 2026-01-18T23:26:57 1768778817

Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?