Ability to query world structure at x,y points in a reality


#1

(this is moved here from GitHub Issues for argon.js, https://github.com/argonjs/argon/issues/47)

Here’s the initial question I asked in that issue:

I’ve been thinking about what it’s going to mean to have something like argon.js running on more capable devices, like Google Tango or Microsoft Hololens, or using realities that have a lot more structural information about the “world”.

In all cases, it occurs to me that we might want to have a standard interface to ask “What is the 3D value in the world under this x,y value on the display?”. Akin to what Hololens programs use to figure out where to draw the cursor or pick points in the world.

Thoughts?

Aaron Mulder (https://github.com/ammulder) replied:

So I’ve used the three.js raycaster for this kind of thing – when the user waves the mouse over a three.js canvas, I identify whether any of “my” objects are under the cursor and highlight the selected one as appropriate.

I guess the question is, if you’re inspecting the “world”, what should the target be that the API returns? A point? A small disc? The entire “surface” under the cursor/point, however big that is? I’d be really interested in finding/identifying large flat horizontal and vertical surfaces in the world, but I’m not sure what degree of smoothing or whatever would be needed to say “this slightly irregular shape our sensors detected is really just a flat tabletop and here are the dimensions”. And is that the same as this or are there separate API calls for things like “find the Z height at point X,Y in the display” and maybe “find the object/surface at point X,Y in the display” and “give me a list of all flat surfaces in view”? (I guess I should look at the Hololens API and see what they offer.)

If thought I’d move the discussion here, since it’s part of big issue that isn’t specific to argon.js


#2

When looking at app content, things like the three.js raycaster are a good model (showing how to use that with argon.js was the reason I created the “Vuforia Decals” sample, actually – similarly for the “Draggable Cubes.”)

The real essence of my question relates to realities, and the separation they have from applications. I totally agree with you that the question of “what” is returned is the big one. Right now, devices like Tango and Hololens don’t return much at a low level; they require higher level libraries (in app space) to find things like surfaces (note: I think I might only be partially right here, re: Tango, and I’m sure this will change quickly). But, of course, we can integrate any code we want into Argon and the realities.

But, given the nature of the web, part of the question is “What should we return and how do we make sure the user is ok with that, and that they understand what information is being sensed in their world?”

I was thinking about the question because I was thinking about privacy. One potential advantage of the argon.js model is that by separating the reality from the application, we have the ability to create, distribute and consume AR content without giving all the information about the world and sensing (e.g., video streams, depth data or world meshes, etc) to the web app; the reality can consume and present them, and share a much more restricted set of data with the applications.

Of course, an application might want that data (it might want the video to do it’s own computer vision, to do do advanced rendering techniques; it might want the mesh or depth data for various reasons like surface finding or rendering). But, this separation allows the user to decide if they want an app to have full access, and allows developers to write web apps that still provide some AR capabilities even if the user doesn’t allow full access.

So, in light of that, my original question might be: what are the kinds of queries we might allow apps to make when they don’t have full access, or even if they do have full access, to the underly data? Even if the apps have full access to the data, putting the structural analysis and queries in the reality means that those features can improve over time, without modifying the apps (and if multiple apps are running at the same time, they see the same results). And if the apps don’t have access, we may not want to provide answers to questions like “What are the surfaces?” even if we know.

So, are there queries that might be considered “safe” even without “Access”? Is “what is the x/y/z of the world under the screen position x/y” ok? What if we restrict it somehow? Is there a middle ground between no access and full access that we can support?