We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!
The JPEG file format played a vital role in the web’s transition from a world of text to a visual experience through an open, efficient container for image sharing. Now the graphical language transmission format (glTF) promises to do the same for 3D objects in the metaverse and digital twins.
JPEG used several compression tricks to drastically shrink images compared to other formats like GIF. The latest version of glTF similarly uses techniques for compressing both the geometry of 3D objects and their textures. The glTF is already playing a critical role in e-commerce, as shown by Adobe’s push into the metaverse.
VentureBeat spoke with Neil Trevett, president of the Khronos Foundation that manages the glTF standard, to learn more about what glTF means for enterprises. He is also VP of Developer Ecosystems at Nvidia, where his job is to make it easier for developers to use GPUs. He explains how glTF complements other digital twin and metaverse formats like USD, how to use it and where it goes.
VentureBeat: what is glTF and how does it fit into the ecosystem of the metaverse and digital twins related file formats?
Neil Trevett: At Khronos, we have put a lot of effort into 3D APIs such as OpenGL, WebGL and Vulkan. We found that every application that uses 3D has to import assets at some point or another. The glTF file format is widely used and very complementary to USD, which is becoming the standard for creating and writing on platforms such as Omniverse. USD is the place to be if you want to merge multiple tools in advanced pipelines and create very high-quality content, including movies. That’s why Nvidia is investing heavily in USD for the Omniverse ecosystem.
On the other hand, glTF focuses on being efficient and user-friendly as a delivery format. It’s a lightweight, streamlined, and easy-to-handle format that can be used by any platform or device, up to and including web browsers on mobile phones. The tagline we use as an analogy is that “glTF is the JPEG of 3D.”
It also complements the file formats used in authoring tools. For example, Adobe Photoshop uses PSD files for image editing. No professional photographer would edit JPEGs because a lot of information has been lost. PSD files are more advanced than JPEGs and support multiple layers. You wouldn’t send a PSD file to my mom’s cell phone though. You need JPEG to send it to a billion devices as efficiently and quickly as possible. USD and glTF thus complement each other in the same way.
VentureBeat: How do you get from one to the next?
Trevet: It is essential to have a seamless distillation process from USD assets to glTF assets. Nvidia is investing in a glTF connector for Omniverse so that we can import and export glTF assets in and out of Omniverse seamlessly. At the glTF working group in Khronos, we are pleased that USD is meeting the industry’s needs for an authoring format, because that is a huge amount of work. The goal is for glTF to become the perfect USD distillation target to support ubiquitous implementation.
An authoring format and a delivery format have very different design requirements. USD design is all about flexibility. This helps compose things to make a movie or a VR environment. If you want to add another item and mix it with the existing scene, you have to keep all the design information. And you want everything at the ground truth level of resolution and quality.
The design of a transmission format is different. For example, with glTF, the vertex information is not very flexible for reauthoring. But it’s sent in exactly the form the GPU needs to run that geometry most efficiently through a 3D API like WebGL or Vulkan. So glTF puts a lot of design effort into compression to reduce download times. For example, Google has their Draco 3D Mesh compression technology and Binomial have their Base universal texture compression technology. We are also starting to put a lot of effort into level of detail (LOD) management so that you can download models very efficiently.
Distillation helps to move from one file format to another. A big part of it is removing the design and authoring information you no longer need. But you don’t want to diminish the visual quality unless you really have to. glTF allows you to maintain visual fidelity, but also gives you the choice of compressing things when aiming for low bandwidth implementation.
VentureBeat: How much smaller can you make it without losing too much fidelity?
Trevet: It’s like JPEG where you have a dial for increasing the compression with an acceptable loss of image quality, only glTF has the same for both geometry and texture compression. If it’s a geometry-intensive CAD model, the geometry will be the bulk of the data. But if it’s more of a consumer-oriented model, the texture data can be much larger than the geometry.
With Draco, shrinking data 5 to 10 times is reasonable without significant loss of quality. There is also something similar for texture.
Another factor is the amount of memory required, which is a precious resource in mobile phones. Before we implemented binomial compression in glTF, people sent JPEGs, which is great because they are relatively small. But the process of extracting this into a full-size structure can take hundreds of megabytes for even a simple model, which can hurt a cell phone’s power and performance. The glTF textures allow you to take a JPEG format super-compressed texture and immediately extract it to a GPU native texture, so it never gets to full size. As a result, you reduce both data transfer and memory requirements by 5-10 times. This can help if you download resources in a browser on a mobile phone.
VentureBeat: How do humans efficiently represent the textures of 3D objects?
Trevet: Well, there are two basic classes of texture. One of the most common are image-only textures, such as mapping a logo image onto a t-shirt. The other is procedural texture, where you generate a pattern, such as marble, wood, or stone, just by running an algorithm.
There are several algorithms you can use. For example, Allegorithmic, recently acquired by Adobe, pioneered an interesting technique for generating textures that are now used in Adobe Substance Designer. You often turn this texture into an image because it’s easier to process on client devices.
Once you have a texture, you can do more with it than just slap it on the model as a piece of wrapping paper. You can use those texture images to get a more refined material look. With physically-based rendered (PBR) materials, you try to get it to mimic the characteristics of real-world materials. Is it metallic, which makes it look shiny? Is it translucent? Does the light break? Some of the more advanced PBR algorithms can use up to 5 or 6 different texture maps that input parameters that characterize how glossy or translucent it is.
VentureBeat: How has glTF progressed on the scene graph side to show the relationships within objects, such as how car wheels can turn or connect multiple things?
Trevet: This is one area where the USD is well ahead of glTF. Most glTF use cases so far have been fulfilled by a single asset in a single asset file. 3D commerce is a leading use case where you want to bring out a chair and drop it in your living room like Ikea. That’s a single glTF asset and many of the use cases are happy with that. As we move into the metaverse and VR and AR, people want to create scenes with multiple means of implementation. An active area discussed in the working group is how best to implement and link multi glTF scenes and assets. It won’t be as sophisticated as USD as the focus is on shipping and delivery rather than authoring. But glTF will have something in the next 12 to 18 months to allow for the compounding and linking of multiple assets.
VentureBeat: How will glTF evolve to support more metaverse and digital twins use cases?
Trevet: We need to bring in things beyond just the physical appearance. We now have geometry, textures and animations in glTF 2.0. The current glTF says nothing about physical properties, sounds or interactions. I think many of the next generation extensions for glTF will add this kind of behavior and features.
The industry is currently deciding that it will be USD and glTF in the future. While there are older formats like OBJ, they are starting to show their age. There are popular formats like FBX that are proprietary. USD is an open source project and glTF is an open standard. Humans can participate in both ecosystems and help them evolve to meet their customer and market needs. I think both formats will evolve side by side. The goal now is to keep them aligned and keep this efficient distillation process between the two.