From the article: >Evercoast deployed a 56 camera RGB-D array Do you know which ...

bininunez · 2026-01-18T21:37:09 1768772229

We (Evercoast) used 56 RealSense D455s. Our software can run with any camera input, from depth cameras to machine vision to cinema REDs. But for this, RealSense did the job. The higher end the camera, the more expensive and time consuming everything is. We have a cloud platform to scale rendering, but it’s still overall more costly (time and money) to use high res. We’ve worked hard to make even low res data look awesome. And if you look at the aesthetic of the video (90s MTV), we didn’t need 4K/6K/8K renders.

bredren · 2026-01-19T00:39:37 1768783177

You may have explained this elsewhere, but if not—-what kind of post processing did you do to upscale or refine the realsense video?

Can you add any interesting details on the benchmarking done against the RED camera rig?

spookie · 2026-01-19T10:17:34 1768817854

This is a great question, would love some some feedback on this.

I assume they stuck with realsense for proper depth maps. However, those are both limited to a 6 meters range, and their depth imaging isn't able to resolve features smaller than their native resolution allows (gets worse after 3m too, as there is less and less parallax among other issues). I wonder how they approached that as well.

darhodester · 2026-01-18T19:25:29 1768764329

Aha: https://www.red.com/stories/evercoast-komodo-rig

So likely RealSense D455.

darhodester · 2026-01-18T19:23:41 1768764221

I was not involved in the capture process with Evercoast, but I may have heard somewhere they used RealSense cameras.

I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.

secretsatan · 2026-01-18T19:35:51 1768764951

Couldn’t you just use iphone pros for this? I developed an app specifically for photogrammetry capture using AR and the depth sensor as it seemed like a cheap alternative.

EDIT: I realize a phone is not on the same level as a red camera, but i just saw iphones as a massively cheaper option to alternatives in the field i worked in.

F7F7F7 · 2026-01-18T19:43:21 1768765401

ASAP Rocky has a fervent fanbase who's been anticipating this album. So I'm assuming that whatever record label he's signed to gave him the budget.

And when I think back to another iconic hip hop (iconic that genre) video where they used practical effects and military helicopters chasing speedboats in the waters off of Santa Monica...I bet they had change to spear.

cwillu · 2026-01-19T01:26:56 1768786016

Is there any reason to think https://thebaffler.com/salvos/the-problem-with-music doesn't apply here?

numpad0 · 2026-01-18T21:56:58 1768773418

A single camera only captures the side of the object facing the camera. Knowing how far away that camera facing side of a Rubik's Cube help if you were making educated guesses(novel view synthesis), but it won't solve the problem of actually photographing the backside.

There are usually six sides on a cube, which means you need minimum six iPhone around an object to capture all sides of it to be able to then freely move around it. You might as well seek open-source alternatives than relying on Apple surprise boxes for that.

In cases where your subject would be static, such as it being a building, then you can wave around a single iPhone for the same effect for a result comparable to more expensive rigs, of course.

antidamage · 2026-01-20T10:36:07 1768905367

The minimum is four RGB-only cameras (if you want RGB data) but adding lidar really helps.

The standard pipeline can infer a huge amount of data, and there are a few AI tools now for hallucinating missing geometry and backfaces based on context recognition, which can then be converted back into a splat for fast, smooth rendering.

darhodester · 2026-01-18T20:02:44 1768766564

I think it's because they already had proven capture hardware, harvest, and processing workflows.

But yes, you can easily use iPhones for this now.

secretsatan · 2026-01-18T21:33:56 1768772036

Looks great by the way, i was wondering if there’s a file format for volumetric video captures

darhodester · 2026-01-19T00:22:28 1768782148

Some companies have a proprietary file format for compressed 4D Gaussian splatting. For example: https://www.gracia.ai and https://www.4dv.ai.

Check this project, for example: https://zju3dv.github.io/freetimegs/

Unfortunately, these formats are currently closed behind cloud processing so adoption is a rather low.

Before Gaussian splatting, textured mesh caches would be used for volumetric video (e.g. Alembic geometry).

itishappy · 2026-01-19T00:57:09 1768784229

https://developer.apple.com/av-foundation/

https://developer.apple.com/documentation/spatial/

Edit: As I'm digging, this seems to be focused on stereoscopic video as opposed to actual point clouds. It appears applications like cinematic mode use a monocular depth map, and their lidar outputs raw point cloud data.

numpad0 · 2026-01-19T02:12:07 1768788727

A LIDAR point cloud from a single point of view is a mono-ocular depth map. Unless the LIDAR in question is like, using supernova level gamma rays or neutrino generators for the laser part to get density and albedo volumetric data for its whole distance range.

You just can't see the back of a thing by knowing the shape of the front side with current technologies.

itishappy · 2026-01-19T15:26:42 1768836402

Right! My terminology may be imprecise here, but I believe there is still an important distinction:

The depth map stored for image processing is image metadata, meaning it calculates one depth per pixel from a single position in space. Note that it doesn't have the ability to measure that many depth values, so it measures what it can using LIDAR and focus information and estimates the rest.

On the other hand, a point cloud is not image data. It isn't necessarily taken from a single position, in theory the device could be moved around to capture addition angles, and the result is a sparse point cloud of depth measurements. Also, raw point cloud data doesn't necessarily come tagged with point metadata such as color.

I also note that these distinctions start to vanish when dealing with video or using more than one capture device.

numpad0 · 2026-01-19T23:44:03 1768866243

No, LIDAR data are necessarily taken from a single position. They are 3D, but literally single eyed. You can't tell from LIDAR data if you're looking at a half-cut apple or an intact one. This becomes obvious the moment you tried to rotate a LIDAR capture - it's just the skin. You need depth maps from all angles to reconstruct the complete skin.

So you have to have minimum two for front and back of a dancer. Actually, the seams are kind of dubious so let's say three 120 degrees apart. Well we need ones looking down as well as up for baggy clothing, so more like nine, 30 degrees apart vertically and 120 degrees horizontally, ...

and ^ this will go far down enough that installing few dozens of identical non-Apple cameras in a monstrous sci-fi cage starts making a lot more sense than an iPhone, for a video.

secretsatan · 2026-01-18T23:26:57 1768778817

Recording pointclouds over time i guess i mean. I’m not going to pretend to understand video compression, but could it be possible to do the following movement aspect in 3d the same as 2d?

fastasucan · 2026-01-18T19:57:50 1768766270

Why would they go for the cheapest option?

secretsatan · 2026-01-18T20:22:29 1768767749

It was more the point that technology is much cheaper. The company i worked for had completely missed it while trying to develop in house solutions.

brcmthrowaway · 2026-01-18T19:23:01 1768764181

Kinect Azure