Version one of Vision Pro has solved scrolling and pinching, but how will things evolve next? The maker of some of VR’s classic games (coming to Vision Pro soon) has some ideas.
One thing I love about using Apple’s Vision Pro is that it tracks my hands instead of using controllers. Its basic gestures like pinching and swiping are fantastic.
In more complex 3D immersive spaces, the hands-only gestural language seems to fall apart. Apple has worked its 2D navigation system across all of Vision OS, but the deeper 3D interactions aren’t fully there yet.
The Meta’s Quest headsets and Apple’s VR headsets are competitors. While the former uses physical controllers, it also offers hand-tracking without the need for a controller. The hand-tracking feature of Meta’s headset is considered superior for 3D interactions, especially when it comes to picking up objects in virtual space. The differences between headsets and how we use hand tracking may change in the future. As of now, it’s still early days for mixed reality-capable, hand-tracking VR headsets. I had a conversation with one of the biggest game developers in the VR industry, and they suggested that there’s still a lot that might change soon.
Games as a doorway to new ideas
Owlchemy Labs, which was acquired by Google in 2017, is responsible for developing the popular VR games Job Simulator and Vacation Simulator. This year, both of these games will be available on the Vision Pro and will be adapted to work without a controller, using only hand tracking technology.
Owlchemy has been experimenting with hand tracking technology for a while now. They have already introduced it in an experimental mode for the Quest in their game, Vacation Simulator. Last year, I had the opportunity to try a demo that experimented with more advanced hand-tracking interactions. This included using pinch-based gestures to move objects and squeezing letters in order to type on virtual keyboards. This was months before Apple gave its first Vision Pro demos.
In 2024, to this point, mixed reality headsets feel like they’re a bit split between hand-tracking-only and controller-optional designs. The Meta Quest headsets and Apple’s Vision Pro, for instance, have pretty different interface designs. Those differences could start to even out and evolve further than anything we’ve seen to date.
As Andrew Eiche, known as the “CEOwl” of Owlchemy, said to me in a recent conversation, these days are still like the early days of phones. Phones ended up changing their multitouch gestural language pretty extensively over time.
Are we entering the deeper phase of immersive hand tracking?
“The pinch interaction that Apple introduced was a game-changer,” according to Eiche. “It was so effective that an entire operating system could be built around it.” Owlchemy has been working on an advanced hand tracking-based game and has demonstrated some combinations of gaze-based and pinch-based interactions, as well as deeper controls that the Vision Pro currently lacks in more complex 3D-based experiences.
Eiche believes that adding 3D interactions to a software program is easier than adding 2D interactions. According to him, while the 3D interactions may be challenging, they are easier to understand from a brain-mapping perspective. For instance, if the user picks up a ball in the program, it is easier to comprehend than an abstract concept like a scrollbar, which is more complicated.
Eiche also thinks of this current phase of hand-based mixed reality as finding whatever works for right now, similar to the first steps of touch-based smartphones back when the iPhone first emerged. “Remember, smartphones when they first came around, browsing the web was just terrible. It was like pinch, zoom, pinch, zoom, links were tiny,” Eiche says.
Eiche finds the evolution of gestures to be the most fascinating aspect. “Pull to refresh: I think about it all the time. It’s such a brilliant interaction on the phone. But it wouldn’t have been possible on any other platform. VR hasn’t created its own pull to refresh yet – we still have a long way to go.”
Slam the snooze bar on a VR alarm clock
Currently, the utilization of 3D interactions on Vision Pro by individual games and apps is inconsistent in terms of interface styles. Some games use pinch and drag gestures similar to a mouse, while others employ full hand-tracked grabbing, and some use a combination of both. However, there lacks a common interface style among them.
Eiche sees hand tracking as inevitable across all mixed reality headsets and glasses but wants to see designers break out of the “2D pain” and embrace more natural interactions, like grabbing objects. His take on all the spatial clock apps on Vision Pro, for example, is that you should slam the snooze bar with your own hand, not gaze and pinch the screen.
Eiche sees as another consideration, namely how multiple mixed-reality apps live side by side with interactions and experiences that make sense. The Vision Pro enables many apps to live together, so how can developers make multitasking work more intuitively?
The Vision Pro already has a sort of continuum between fully immersed VR and more open AR using the digital crown to dial reality in or out, but more apps may need to evolve to explore different levels of engagement and immersion, similar to what Eiche compares to full-screen modes on laptops. Maybe hand-tracking interfaces change depending on levels of immersion, too. Owlchemy hasn’t made any mixed reality games yet, but Eiche thinks it will involve different design challenges: apps need to be ready to live with people who may be partially distracted doing or looking at something else.
What about haptics or controllers?
The Vision Pro, leaning completely on hand tracking, not only skips controllers but any sort of vibrating haptic feedback, something I’ve found really important for “feeling” things in virtual experiences. Eiche isn’t as immediately concerned about a lack of haptics for building really good VR and AR.
Owlchemy’s own hand tracking in some complicated 3D interfaces in games like Job Simulator, using buttons, levers and other tactile inputs, take advantage of some hand movements and clever audio cues. To Eiche, they function well enough as a type of virtual haptics.
“Our phones have lots of haptics in them now, but that’s because everybody is on silent mode,” Eiche comments on phones and watches. “I don’t think we’re going to be doing silent mode on a headset.” Eiche sees visual and audio cues working to be convincing enough to feel real, comparing it to imagining drinking a cold glass of water using method acting. “A sound and a sight does a lot of heavy lifting towards what your brain understands.”
Extra controllers or input devices such as Meta’s Touch controllers, a super-powered Apple Pencil, the Apple Watch, or even a ring (like Sony’s mixed reality headset uses) could provide more advanced feedback or input.
Eiche sees those kinds of specialized controllers coming next. “I don’t think that this is the death of peripherals. I think this is the rejuvenation. If you make a haptic glove, you should be so excited about this.”
At WWDC 24 in June, Apple may prioritize the development of advanced 3D interactions for VisionOS 2.0 over controllers, but this doesn’t necessarily mean that controllers won’t be developed in the future.
Eiche suggests that we should try to explore the mixed reality world on Vision Pro using only our hands, without relying on the Pencil device. By doing so, we can refine and perfect the use case of the Pencil for when it is needed. This approach will help us to make the best use of the Pencil’s capabilities.