• Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety

    Editor’s note: This blog is a part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.
    Simulated driving environments enable engineers to safely and efficiently train, test and validate autonomous vehiclesacross countless real-world and edge-case scenarios without the risks and costs of physical testing.
    These simulated environments can be created through neural reconstruction of real-world data from AV fleets or generated with world foundation models— neural networks that understand physics and real-world properties. WFMs can be used to generate synthetic datasets for enhanced AV simulation.
    To help physical AI developers build such simulated environments, NVIDIA unveiled major advances in WFMs at the GTC Paris and CVPR conferences earlier this month. These new capabilities enhance NVIDIA Cosmos — a platform of generative WFMs, advanced tokenizers, guardrails and accelerated data processing tools.
    Key innovations like Cosmos Predict-2, the Cosmos Transfer-1 NVIDIA preview NIM microservice and Cosmos Reason are improving how AV developers generate synthetic data, build realistic simulated environments and validate safety systems at unprecedented scale.
    Universal Scene Description, a unified data framework and standard for physical AI applications, enables seamless integration and interoperability of simulation assets across the development pipeline. OpenUSD standardization plays a critical role in ensuring 3D pipelines are built to scale.
    NVIDIA Omniverse, a platform of application programming interfaces, software development kits and services for building OpenUSD-based physical AI applications, enables simulations from WFMs and neural reconstruction at world scale.
    Leading AV organizations — including Foretellix, Mcity, Oxa, Parallel Domain, Plus AI and Uber — are among the first to adopt Cosmos models.

    Foundations for Scalable, Realistic Simulation
    Cosmos Predict-2, NVIDIA’s latest WFM, generates high-quality synthetic data by predicting future world states from multimodal inputs like text, images and video. This capability is critical for creating temporally consistent, realistic scenarios that accelerate training and validation of AVs and robots.

    In addition, Cosmos Transfer, a control model that adds variations in weather, lighting and terrain to existing scenarios, will soon be available to 150,000 developers on CARLA, a leading open-source AV simulator. This greatly expands the broad AV developer community’s access to advanced AI-powered simulation tools.
    Developers can start integrating synthetic data into their own pipelines using the NVIDIA Physical AI Dataset. The latest release includes 40,000 clips generated using Cosmos.
    Building on these foundations, the Omniverse Blueprint for AV simulation provides a standardized, API-driven workflow for constructing rich digital twins, replaying real-world sensor data and generating new ground-truth data for closed-loop testing.
    The blueprint taps into OpenUSD’s layer-stacking and composition arcs, which enable developers to collaborate asynchronously and modify scenes nondestructively. This helps create modular, reusable scenario variants to efficiently generate different weather conditions, traffic patterns and edge cases.
    Driving the Future of AV Safety
    To bolster the operational safety of AV systems, NVIDIA earlier this year introduced NVIDIA Halos — a comprehensive safety platform that integrates the company’s full automotive hardware and software stack with AI research focused on AV safety.
    The new Cosmos models — Cosmos Predict- 2, Cosmos Transfer- 1 NIM and Cosmos Reason — deliver further safety enhancements to the Halos platform, enabling developers to create diverse, controllable and realistic scenarios for training and validating AV systems.
    These models, trained on massive multimodal datasets including driving data, amplify the breadth and depth of simulation, allowing for robust scenario coverage — including rare and safety-critical events — while supporting post-training customization for specialized AV tasks.

    At CVPR, NVIDIA was recognized as an Autonomous Grand Challenge winner, highlighting its leadership in advancing end-to-end AV workflows. The challenge used OpenUSD’s robust metadata and interoperability to simulate sensor inputs and vehicle trajectories in semi-reactive environments, achieving state-of-the-art results in safety and compliance.
    Learn more about how developers are leveraging tools like CARLA, Cosmos, and Omniverse to advance AV simulation in this livestream replay:

    Hear NVIDIA Director of Autonomous Vehicle Research Marco Pavone on the NVIDIA AI Podcast share how digital twins and high-fidelity simulation are improving vehicle testing, accelerating development and reducing real-world risks.
    Get Plugged Into the World of OpenUSD
    Learn more about what’s next for AV simulation with OpenUSD by watching the replay of NVIDIA founder and CEO Jensen Huang’s GTC Paris keynote.
    Looking for more live opportunities to learn more about OpenUSD? Don’t miss sessions and labs happening at SIGGRAPH 2025, August 10–14.
    Discover why developers and 3D practitioners are using OpenUSD and learn how to optimize 3D workflows with the self-paced “Learn OpenUSD” curriculum for 3D developers and practitioners, available for free through the NVIDIA Deep Learning Institute.
    Explore the Alliance for OpenUSD forum and the AOUSD website.
    Stay up to date by subscribing to NVIDIA Omniverse news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.
    #into #omniverse #world #foundation #models
    Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety
    Editor’s note: This blog is a part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. Simulated driving environments enable engineers to safely and efficiently train, test and validate autonomous vehiclesacross countless real-world and edge-case scenarios without the risks and costs of physical testing. These simulated environments can be created through neural reconstruction of real-world data from AV fleets or generated with world foundation models— neural networks that understand physics and real-world properties. WFMs can be used to generate synthetic datasets for enhanced AV simulation. To help physical AI developers build such simulated environments, NVIDIA unveiled major advances in WFMs at the GTC Paris and CVPR conferences earlier this month. These new capabilities enhance NVIDIA Cosmos — a platform of generative WFMs, advanced tokenizers, guardrails and accelerated data processing tools. Key innovations like Cosmos Predict-2, the Cosmos Transfer-1 NVIDIA preview NIM microservice and Cosmos Reason are improving how AV developers generate synthetic data, build realistic simulated environments and validate safety systems at unprecedented scale. Universal Scene Description, a unified data framework and standard for physical AI applications, enables seamless integration and interoperability of simulation assets across the development pipeline. OpenUSD standardization plays a critical role in ensuring 3D pipelines are built to scale. NVIDIA Omniverse, a platform of application programming interfaces, software development kits and services for building OpenUSD-based physical AI applications, enables simulations from WFMs and neural reconstruction at world scale. Leading AV organizations — including Foretellix, Mcity, Oxa, Parallel Domain, Plus AI and Uber — are among the first to adopt Cosmos models. Foundations for Scalable, Realistic Simulation Cosmos Predict-2, NVIDIA’s latest WFM, generates high-quality synthetic data by predicting future world states from multimodal inputs like text, images and video. This capability is critical for creating temporally consistent, realistic scenarios that accelerate training and validation of AVs and robots. In addition, Cosmos Transfer, a control model that adds variations in weather, lighting and terrain to existing scenarios, will soon be available to 150,000 developers on CARLA, a leading open-source AV simulator. This greatly expands the broad AV developer community’s access to advanced AI-powered simulation tools. Developers can start integrating synthetic data into their own pipelines using the NVIDIA Physical AI Dataset. The latest release includes 40,000 clips generated using Cosmos. Building on these foundations, the Omniverse Blueprint for AV simulation provides a standardized, API-driven workflow for constructing rich digital twins, replaying real-world sensor data and generating new ground-truth data for closed-loop testing. The blueprint taps into OpenUSD’s layer-stacking and composition arcs, which enable developers to collaborate asynchronously and modify scenes nondestructively. This helps create modular, reusable scenario variants to efficiently generate different weather conditions, traffic patterns and edge cases. Driving the Future of AV Safety To bolster the operational safety of AV systems, NVIDIA earlier this year introduced NVIDIA Halos — a comprehensive safety platform that integrates the company’s full automotive hardware and software stack with AI research focused on AV safety. The new Cosmos models — Cosmos Predict- 2, Cosmos Transfer- 1 NIM and Cosmos Reason — deliver further safety enhancements to the Halos platform, enabling developers to create diverse, controllable and realistic scenarios for training and validating AV systems. These models, trained on massive multimodal datasets including driving data, amplify the breadth and depth of simulation, allowing for robust scenario coverage — including rare and safety-critical events — while supporting post-training customization for specialized AV tasks. At CVPR, NVIDIA was recognized as an Autonomous Grand Challenge winner, highlighting its leadership in advancing end-to-end AV workflows. The challenge used OpenUSD’s robust metadata and interoperability to simulate sensor inputs and vehicle trajectories in semi-reactive environments, achieving state-of-the-art results in safety and compliance. Learn more about how developers are leveraging tools like CARLA, Cosmos, and Omniverse to advance AV simulation in this livestream replay: Hear NVIDIA Director of Autonomous Vehicle Research Marco Pavone on the NVIDIA AI Podcast share how digital twins and high-fidelity simulation are improving vehicle testing, accelerating development and reducing real-world risks. Get Plugged Into the World of OpenUSD Learn more about what’s next for AV simulation with OpenUSD by watching the replay of NVIDIA founder and CEO Jensen Huang’s GTC Paris keynote. Looking for more live opportunities to learn more about OpenUSD? Don’t miss sessions and labs happening at SIGGRAPH 2025, August 10–14. Discover why developers and 3D practitioners are using OpenUSD and learn how to optimize 3D workflows with the self-paced “Learn OpenUSD” curriculum for 3D developers and practitioners, available for free through the NVIDIA Deep Learning Institute. Explore the Alliance for OpenUSD forum and the AOUSD website. Stay up to date by subscribing to NVIDIA Omniverse news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X. #into #omniverse #world #foundation #models
    BLOGS.NVIDIA.COM
    Into the Omniverse: World Foundation Models Advance Autonomous Vehicle Simulation and Safety
    Editor’s note: This blog is a part of Into the Omniverse, a series focused on how developers, 3D practitioners and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. Simulated driving environments enable engineers to safely and efficiently train, test and validate autonomous vehicles (AVs) across countless real-world and edge-case scenarios without the risks and costs of physical testing. These simulated environments can be created through neural reconstruction of real-world data from AV fleets or generated with world foundation models (WFMs) — neural networks that understand physics and real-world properties. WFMs can be used to generate synthetic datasets for enhanced AV simulation. To help physical AI developers build such simulated environments, NVIDIA unveiled major advances in WFMs at the GTC Paris and CVPR conferences earlier this month. These new capabilities enhance NVIDIA Cosmos — a platform of generative WFMs, advanced tokenizers, guardrails and accelerated data processing tools. Key innovations like Cosmos Predict-2, the Cosmos Transfer-1 NVIDIA preview NIM microservice and Cosmos Reason are improving how AV developers generate synthetic data, build realistic simulated environments and validate safety systems at unprecedented scale. Universal Scene Description (OpenUSD), a unified data framework and standard for physical AI applications, enables seamless integration and interoperability of simulation assets across the development pipeline. OpenUSD standardization plays a critical role in ensuring 3D pipelines are built to scale. NVIDIA Omniverse, a platform of application programming interfaces, software development kits and services for building OpenUSD-based physical AI applications, enables simulations from WFMs and neural reconstruction at world scale. Leading AV organizations — including Foretellix, Mcity, Oxa, Parallel Domain, Plus AI and Uber — are among the first to adopt Cosmos models. Foundations for Scalable, Realistic Simulation Cosmos Predict-2, NVIDIA’s latest WFM, generates high-quality synthetic data by predicting future world states from multimodal inputs like text, images and video. This capability is critical for creating temporally consistent, realistic scenarios that accelerate training and validation of AVs and robots. In addition, Cosmos Transfer, a control model that adds variations in weather, lighting and terrain to existing scenarios, will soon be available to 150,000 developers on CARLA, a leading open-source AV simulator. This greatly expands the broad AV developer community’s access to advanced AI-powered simulation tools. Developers can start integrating synthetic data into their own pipelines using the NVIDIA Physical AI Dataset. The latest release includes 40,000 clips generated using Cosmos. Building on these foundations, the Omniverse Blueprint for AV simulation provides a standardized, API-driven workflow for constructing rich digital twins, replaying real-world sensor data and generating new ground-truth data for closed-loop testing. The blueprint taps into OpenUSD’s layer-stacking and composition arcs, which enable developers to collaborate asynchronously and modify scenes nondestructively. This helps create modular, reusable scenario variants to efficiently generate different weather conditions, traffic patterns and edge cases. Driving the Future of AV Safety To bolster the operational safety of AV systems, NVIDIA earlier this year introduced NVIDIA Halos — a comprehensive safety platform that integrates the company’s full automotive hardware and software stack with AI research focused on AV safety. The new Cosmos models — Cosmos Predict- 2, Cosmos Transfer- 1 NIM and Cosmos Reason — deliver further safety enhancements to the Halos platform, enabling developers to create diverse, controllable and realistic scenarios for training and validating AV systems. These models, trained on massive multimodal datasets including driving data, amplify the breadth and depth of simulation, allowing for robust scenario coverage — including rare and safety-critical events — while supporting post-training customization for specialized AV tasks. At CVPR, NVIDIA was recognized as an Autonomous Grand Challenge winner, highlighting its leadership in advancing end-to-end AV workflows. The challenge used OpenUSD’s robust metadata and interoperability to simulate sensor inputs and vehicle trajectories in semi-reactive environments, achieving state-of-the-art results in safety and compliance. Learn more about how developers are leveraging tools like CARLA, Cosmos, and Omniverse to advance AV simulation in this livestream replay: Hear NVIDIA Director of Autonomous Vehicle Research Marco Pavone on the NVIDIA AI Podcast share how digital twins and high-fidelity simulation are improving vehicle testing, accelerating development and reducing real-world risks. Get Plugged Into the World of OpenUSD Learn more about what’s next for AV simulation with OpenUSD by watching the replay of NVIDIA founder and CEO Jensen Huang’s GTC Paris keynote. Looking for more live opportunities to learn more about OpenUSD? Don’t miss sessions and labs happening at SIGGRAPH 2025, August 10–14. Discover why developers and 3D practitioners are using OpenUSD and learn how to optimize 3D workflows with the self-paced “Learn OpenUSD” curriculum for 3D developers and practitioners, available for free through the NVIDIA Deep Learning Institute. Explore the Alliance for OpenUSD forum and the AOUSD website. Stay up to date by subscribing to NVIDIA Omniverse news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.
    0 Comentários 0 Compartilhamentos
  • Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler

    Mark Theriault founded the startup FITY envisioning a line of clever cooling products: cold drink holders that come with freezable pucks to keep beverages cold for longer without the mess of ice. The entrepreneur started with 3D prints of products in his basement, building one unit at a time, before eventually scaling to mass production.
    Founding a consumer product company from scratch was a tall order for a single person. Going from preliminary sketches to production-ready designs was a major challenge. To bring his creative vision to life, Theriault relied on AI and his NVIDIA GeForce RTX-equipped system. For him, AI isn’t just a tool — it’s an entire pipeline to help him accomplish his goals. about his workflow below.
    Plus, GeForce RTX 5050 laptops start arriving today at retailers worldwide, from GeForce RTX 5050 Laptop GPUs feature 2,560 NVIDIA Blackwell CUDA cores, fifth-generation AI Tensor Cores, fourth-generation RT Cores, a ninth-generation NVENC encoder and a sixth-generation NVDEC decoder.
    In addition, NVIDIA’s Plug and Play: Project G-Assist Plug-In Hackathon — running virtually through Wednesday, July 16 — invites developers to explore AI and build custom G-Assist plug-ins for a chance to win prizes. the date for the G-Assist Plug-In webinar on Wednesday, July 9, from 10-11 a.m. PT, to learn more about Project G-Assist capabilities and fundamentals, and to participate in a live Q&A session.
    From Concept to Completion
    To create his standout products, Theriault tinkers with potential FITY Flex cooler designs with traditional methods, from sketch to computer-aided design to rapid prototyping, until he finds the right vision. A unique aspect of the FITY Flex design is that it can be customized with fun, popular shoe charms.
    For packaging design inspiration, Theriault uses his preferred text-to-image generative AI model for prototyping, Stable Diffusion XL — which runs 60% faster with the NVIDIA TensorRT software development kit — using the modular, node-based interface ComfyUI.
    ComfyUI gives users granular control over every step of the generation process — prompting, sampling, model loading, image conditioning and post-processing. It’s ideal for advanced users like Theriault who want to customize how images are generated.
    Theriault’s uses of AI result in a complete computer graphics-based ad campaign. Image courtesy of FITY.
    NVIDIA and GeForce RTX GPUs based on the NVIDIA Blackwell architecture include fifth-generation Tensor Cores designed to accelerate AI and deep learning workloads. These GPUs work with CUDA optimizations in PyTorch to seamlessly accelerate ComfyUI, reducing generation time on FLUX.1-dev, an image generation model from Black Forest Labs, from two minutes per image on the Mac M3 Ultra to about four seconds on the GeForce RTX 5090 desktop GPU.
    ComfyUI can also add ControlNets — AI models that help control image generation — that Theriault uses for tasks like guiding human poses, setting compositions via depth mapping and converting scribbles to images.
    Theriault even creates his own fine-tuned models to keep his style consistent. He used low-rank adaptationmodels — small, efficient adapters into specific layers of the network — enabling hyper-customized generation with minimal compute cost.
    LoRA models allow Theriault to ideate on visuals quickly. Image courtesy of FITY.
    “Over the last few months, I’ve been shifting from AI-assisted computer graphics renders to fully AI-generated product imagery using a custom Flux LoRA I trained in house. My RTX 4080 SUPER GPU has been essential for getting the performance I need to train and iterate quickly.” – Mark Theriault, founder of FITY 

    Theriault also taps into generative AI to create marketing assets like FITY Flex product packaging. He uses FLUX.1, which excels at generating legible text within images, addressing a common challenge in text-to-image models.
    Though FLUX.1 models can typically consume over 23GB of VRAM, NVIDIA has collaborated with Black Forest Labs to help reduce the size of these models using quantization — a technique that reduces model size while maintaining quality. The models were then accelerated with TensorRT, which provides an up to 2x speedup over PyTorch.
    To simplify using these models in ComfyUI, NVIDIA created the FLUX.1 NIM microservice, a containerized version of FLUX.1 that can be loaded in ComfyUI and enables FP4 quantization and TensorRT support. Combined, the models come down to just over 11GB of VRAM, and performance improves by 2.5x.
    Theriault uses the Blender Cycles app to render out final files. For 3D workflows, NVIDIA offers the AI Blueprint for 3D-guided generative AI to ease the positioning and composition of 3D images, so anyone interested in this method can quickly get started.
    Photorealistic renders. Image courtesy of FITY.
    Finally, Theriault uses large language models to generate marketing copy — tailored for search engine optimization, tone and storytelling — as well as to complete his patent and provisional applications, work that usually costs thousands of dollars in legal fees and considerable time.
    Generative AI helps Theriault create promotional materials like the above. Image courtesy of FITY.
    “As a one-man band with a ton of content to generate, having on-the-fly generation capabilities for my product designs really helps speed things up.” – Mark Theriault, founder of FITY

    Every texture, every word, every photo, every accessory was a micro-decision, Theriault said. AI helped him survive the “death by a thousand cuts” that can stall solo startup founders, he added.
    Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations. 
    Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.
    Follow NVIDIA Workstation on LinkedIn and X. 
    See notice regarding software product information.
    #startup #uses #nvidia #rtxpowered #generative
    Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler
    Mark Theriault founded the startup FITY envisioning a line of clever cooling products: cold drink holders that come with freezable pucks to keep beverages cold for longer without the mess of ice. The entrepreneur started with 3D prints of products in his basement, building one unit at a time, before eventually scaling to mass production. Founding a consumer product company from scratch was a tall order for a single person. Going from preliminary sketches to production-ready designs was a major challenge. To bring his creative vision to life, Theriault relied on AI and his NVIDIA GeForce RTX-equipped system. For him, AI isn’t just a tool — it’s an entire pipeline to help him accomplish his goals. about his workflow below. Plus, GeForce RTX 5050 laptops start arriving today at retailers worldwide, from GeForce RTX 5050 Laptop GPUs feature 2,560 NVIDIA Blackwell CUDA cores, fifth-generation AI Tensor Cores, fourth-generation RT Cores, a ninth-generation NVENC encoder and a sixth-generation NVDEC decoder. In addition, NVIDIA’s Plug and Play: Project G-Assist Plug-In Hackathon — running virtually through Wednesday, July 16 — invites developers to explore AI and build custom G-Assist plug-ins for a chance to win prizes. the date for the G-Assist Plug-In webinar on Wednesday, July 9, from 10-11 a.m. PT, to learn more about Project G-Assist capabilities and fundamentals, and to participate in a live Q&A session. From Concept to Completion To create his standout products, Theriault tinkers with potential FITY Flex cooler designs with traditional methods, from sketch to computer-aided design to rapid prototyping, until he finds the right vision. A unique aspect of the FITY Flex design is that it can be customized with fun, popular shoe charms. For packaging design inspiration, Theriault uses his preferred text-to-image generative AI model for prototyping, Stable Diffusion XL — which runs 60% faster with the NVIDIA TensorRT software development kit — using the modular, node-based interface ComfyUI. ComfyUI gives users granular control over every step of the generation process — prompting, sampling, model loading, image conditioning and post-processing. It’s ideal for advanced users like Theriault who want to customize how images are generated. Theriault’s uses of AI result in a complete computer graphics-based ad campaign. Image courtesy of FITY. NVIDIA and GeForce RTX GPUs based on the NVIDIA Blackwell architecture include fifth-generation Tensor Cores designed to accelerate AI and deep learning workloads. These GPUs work with CUDA optimizations in PyTorch to seamlessly accelerate ComfyUI, reducing generation time on FLUX.1-dev, an image generation model from Black Forest Labs, from two minutes per image on the Mac M3 Ultra to about four seconds on the GeForce RTX 5090 desktop GPU. ComfyUI can also add ControlNets — AI models that help control image generation — that Theriault uses for tasks like guiding human poses, setting compositions via depth mapping and converting scribbles to images. Theriault even creates his own fine-tuned models to keep his style consistent. He used low-rank adaptationmodels — small, efficient adapters into specific layers of the network — enabling hyper-customized generation with minimal compute cost. LoRA models allow Theriault to ideate on visuals quickly. Image courtesy of FITY. “Over the last few months, I’ve been shifting from AI-assisted computer graphics renders to fully AI-generated product imagery using a custom Flux LoRA I trained in house. My RTX 4080 SUPER GPU has been essential for getting the performance I need to train and iterate quickly.” – Mark Theriault, founder of FITY  Theriault also taps into generative AI to create marketing assets like FITY Flex product packaging. He uses FLUX.1, which excels at generating legible text within images, addressing a common challenge in text-to-image models. Though FLUX.1 models can typically consume over 23GB of VRAM, NVIDIA has collaborated with Black Forest Labs to help reduce the size of these models using quantization — a technique that reduces model size while maintaining quality. The models were then accelerated with TensorRT, which provides an up to 2x speedup over PyTorch. To simplify using these models in ComfyUI, NVIDIA created the FLUX.1 NIM microservice, a containerized version of FLUX.1 that can be loaded in ComfyUI and enables FP4 quantization and TensorRT support. Combined, the models come down to just over 11GB of VRAM, and performance improves by 2.5x. Theriault uses the Blender Cycles app to render out final files. For 3D workflows, NVIDIA offers the AI Blueprint for 3D-guided generative AI to ease the positioning and composition of 3D images, so anyone interested in this method can quickly get started. Photorealistic renders. Image courtesy of FITY. Finally, Theriault uses large language models to generate marketing copy — tailored for search engine optimization, tone and storytelling — as well as to complete his patent and provisional applications, work that usually costs thousands of dollars in legal fees and considerable time. Generative AI helps Theriault create promotional materials like the above. Image courtesy of FITY. “As a one-man band with a ton of content to generate, having on-the-fly generation capabilities for my product designs really helps speed things up.” – Mark Theriault, founder of FITY Every texture, every word, every photo, every accessory was a micro-decision, Theriault said. AI helped him survive the “death by a thousand cuts” that can stall solo startup founders, he added. Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.  Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Follow NVIDIA Workstation on LinkedIn and X.  See notice regarding software product information. #startup #uses #nvidia #rtxpowered #generative
    BLOGS.NVIDIA.COM
    Startup Uses NVIDIA RTX-Powered Generative AI to Make Coolers, Cooler
    Mark Theriault founded the startup FITY envisioning a line of clever cooling products: cold drink holders that come with freezable pucks to keep beverages cold for longer without the mess of ice. The entrepreneur started with 3D prints of products in his basement, building one unit at a time, before eventually scaling to mass production. Founding a consumer product company from scratch was a tall order for a single person. Going from preliminary sketches to production-ready designs was a major challenge. To bring his creative vision to life, Theriault relied on AI and his NVIDIA GeForce RTX-equipped system. For him, AI isn’t just a tool — it’s an entire pipeline to help him accomplish his goals. Read more about his workflow below. Plus, GeForce RTX 5050 laptops start arriving today at retailers worldwide, from $999. GeForce RTX 5050 Laptop GPUs feature 2,560 NVIDIA Blackwell CUDA cores, fifth-generation AI Tensor Cores, fourth-generation RT Cores, a ninth-generation NVENC encoder and a sixth-generation NVDEC decoder. In addition, NVIDIA’s Plug and Play: Project G-Assist Plug-In Hackathon — running virtually through Wednesday, July 16 — invites developers to explore AI and build custom G-Assist plug-ins for a chance to win prizes. Save the date for the G-Assist Plug-In webinar on Wednesday, July 9, from 10-11 a.m. PT, to learn more about Project G-Assist capabilities and fundamentals, and to participate in a live Q&A session. From Concept to Completion To create his standout products, Theriault tinkers with potential FITY Flex cooler designs with traditional methods, from sketch to computer-aided design to rapid prototyping, until he finds the right vision. A unique aspect of the FITY Flex design is that it can be customized with fun, popular shoe charms. For packaging design inspiration, Theriault uses his preferred text-to-image generative AI model for prototyping, Stable Diffusion XL — which runs 60% faster with the NVIDIA TensorRT software development kit — using the modular, node-based interface ComfyUI. ComfyUI gives users granular control over every step of the generation process — prompting, sampling, model loading, image conditioning and post-processing. It’s ideal for advanced users like Theriault who want to customize how images are generated. Theriault’s uses of AI result in a complete computer graphics-based ad campaign. Image courtesy of FITY. NVIDIA and GeForce RTX GPUs based on the NVIDIA Blackwell architecture include fifth-generation Tensor Cores designed to accelerate AI and deep learning workloads. These GPUs work with CUDA optimizations in PyTorch to seamlessly accelerate ComfyUI, reducing generation time on FLUX.1-dev, an image generation model from Black Forest Labs, from two minutes per image on the Mac M3 Ultra to about four seconds on the GeForce RTX 5090 desktop GPU. ComfyUI can also add ControlNets — AI models that help control image generation — that Theriault uses for tasks like guiding human poses, setting compositions via depth mapping and converting scribbles to images. Theriault even creates his own fine-tuned models to keep his style consistent. He used low-rank adaptation (LoRA) models — small, efficient adapters into specific layers of the network — enabling hyper-customized generation with minimal compute cost. LoRA models allow Theriault to ideate on visuals quickly. Image courtesy of FITY. “Over the last few months, I’ve been shifting from AI-assisted computer graphics renders to fully AI-generated product imagery using a custom Flux LoRA I trained in house. My RTX 4080 SUPER GPU has been essential for getting the performance I need to train and iterate quickly.” – Mark Theriault, founder of FITY  Theriault also taps into generative AI to create marketing assets like FITY Flex product packaging. He uses FLUX.1, which excels at generating legible text within images, addressing a common challenge in text-to-image models. Though FLUX.1 models can typically consume over 23GB of VRAM, NVIDIA has collaborated with Black Forest Labs to help reduce the size of these models using quantization — a technique that reduces model size while maintaining quality. The models were then accelerated with TensorRT, which provides an up to 2x speedup over PyTorch. To simplify using these models in ComfyUI, NVIDIA created the FLUX.1 NIM microservice, a containerized version of FLUX.1 that can be loaded in ComfyUI and enables FP4 quantization and TensorRT support. Combined, the models come down to just over 11GB of VRAM, and performance improves by 2.5x. Theriault uses the Blender Cycles app to render out final files. For 3D workflows, NVIDIA offers the AI Blueprint for 3D-guided generative AI to ease the positioning and composition of 3D images, so anyone interested in this method can quickly get started. Photorealistic renders. Image courtesy of FITY. Finally, Theriault uses large language models to generate marketing copy — tailored for search engine optimization, tone and storytelling — as well as to complete his patent and provisional applications, work that usually costs thousands of dollars in legal fees and considerable time. Generative AI helps Theriault create promotional materials like the above. Image courtesy of FITY. “As a one-man band with a ton of content to generate, having on-the-fly generation capabilities for my product designs really helps speed things up.” – Mark Theriault, founder of FITY Every texture, every word, every photo, every accessory was a micro-decision, Theriault said. AI helped him survive the “death by a thousand cuts” that can stall solo startup founders, he added. Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.  Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Follow NVIDIA Workstation on LinkedIn and X.  See notice regarding software product information.
    0 Comentários 0 Compartilhamentos
  • HOW DISGUISE BUILT OUT THE VIRTUAL ENVIRONMENTS FOR A MINECRAFT MOVIE

    By TREVOR HOGG

    Images courtesy of Warner Bros. Pictures.

    Rather than a world constructed around photorealistic pixels, a video game created by Markus Persson has taken the boxier 3D voxel route, which has become its signature aesthetic, and sparked an international phenomenon that finally gets adapted into a feature with the release of A Minecraft Movie. Brought onboard to help filmmaker Jared Hess in creating the environments that the cast of Jason Momoa, Jack Black, Sebastian Hansen, Emma Myers and Danielle Brooks find themselves inhabiting was Disguise under the direction of Production VFX Supervisor Dan Lemmon.

    “s the Senior Unreal Artist within the Virtual Art Departmenton Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.”
    —Talia Finlayson, Creative Technologist, Disguise

    Interior and exterior environments had to be created, such as the shop owned by Steve.

    “Prior to working on A Minecraft Movie, I held more technical roles, like serving as the Virtual Production LED Volume Operator on a project for Apple TV+ and Paramount Pictures,” notes Talia Finlayson, Creative Technologist for Disguise. “But as the Senior Unreal Artist within the Virtual Art Departmenton Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.” The project provided new opportunities. “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance,” notes Laura Bell, Creative Technologist for Disguise. “But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.”

    Set designs originally created by the art department in Rhinoceros 3D were transformed into fully navigable 3D environments within Unreal Engine. “These scenes were far more than visualizations,” Finlayson remarks. “They were interactive tools used throughout the production pipeline. We would ingest 3D models and concept art, clean and optimize geometry using tools like Blender, Cinema 4D or Maya, then build out the world in Unreal Engine. This included applying materials, lighting and extending environments. These Unreal scenes we created were vital tools across the production and were used for a variety of purposes such as enabling the director to explore shot compositions, block scenes and experiment with camera movement in a virtual space, as well as passing along Unreal Engine scenes to the visual effects vendors so they could align their digital environments and set extensions with the approved production layouts.”

    A virtual exploration of Steve’s shop in Midport Village.

    Certain elements have to be kept in mind when constructing virtual environments. “When building virtual environments, you need to consider what can actually be built, how actors and cameras will move through the space, and what’s safe and practical on set,” Bell observes. “Outside the areas where strict accuracy is required, you want the environments to blend naturally with the original designs from the art department and support the story, creating a space that feels right for the scene, guides the audience’s eye and sets the right tone. Things like composition, lighting and small environmental details can be really fun to work on, but also serve as beautiful additions to help enrich a story.”

    “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance. But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.”
    —Laura Bell, Creative Technologist, Disguise

    Among the buildings that had to be created for Midport Village was Steve’sLava Chicken Shack.

    Concept art was provided that served as visual touchstones. “We received concept art provided by the amazing team of concept artists,” Finlayson states. “Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging. Sometimes we would also help the storyboard artists by sending through images of the Unreal Engine worlds to help them geographically position themselves in the worlds and aid in their storyboarding.” At times, the video game assets came in handy. “Exteriors often involved large-scale landscapes and stylized architectural elements, which had to feel true to the Minecraft world,” Finlayson explains. “In some cases, we brought in geometry from the game itself to help quickly block out areas. For example, we did this for the Elytra Flight Chase sequence, which takes place through a large canyon.”

    Flexibility was critical. “A key technical challenge we faced was ensuring that the Unreal levels were built in a way that allowed for fast and flexible iteration,” Finlayson remarks. “Since our environments were constantly being reviewed by the director, production designer, DP and VFX supervisor, we needed to be able to respond quickly to feedback, sometimes live during a review session. To support this, we had to keep our scenes modular and well-organized; that meant breaking environments down into manageable components and maintaining clean naming conventions. By setting up the levels this way, we could make layout changes, swap assets or adjust lighting on the fly without breaking the scene or slowing down the process.” Production schedules influence the workflows, pipelines and techniques. “No two projects will ever feel exactly the same,” Bell notes. “For example, Pat Younisadapted his typical VR setup to allow scene reviews using a PS5 controller, which made it much more comfortable and accessible for the director. On a more technical side, because everything was cubes and voxels, my Blender workflow ended up being way heavier on the re-mesh modifier than usual, definitely not something I’ll run into again anytime soon!”

    A virtual study and final still of the cast members standing outside of the Lava Chicken Shack.

    “We received concept art provided by the amazing team of concept artists. Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging.”
    —Talia Finlayson, Creative Technologist, Disguise

    The design and composition of virtual environments tended to remain consistent throughout principal photography. “The only major design change I can recall was the removal of a second story from a building in Midport Village to allow the camera crane to get a clear shot of the chicken perched above Steve’s lava chicken shack,” Finlayson remarks. “I would agree that Midport Village likely went through the most iterations,” Bell responds. “The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled. I remember rebuilding the stairs leading up to the rampart five or six times, using different configurations based on the physically constructed stairs. This was because there were storyboarded sequences of the film’s characters, Henry, Steve and Garrett, being chased by piglins, and the action needed to match what could be achieved practically on set.”

    Virtually conceptualizing the layout of Midport Village.

    Complex virtual environments were constructed for the final battle and the various forest scenes throughout the movie. “What made these particularly challenging was the way physical set pieces were repurposed and repositioned to serve multiple scenes and locations within the story,” Finlayson reveals. “The same built elements had to appear in different parts of the world, so we had to carefully adjust the virtual environments to accommodate those different positions.” Bell is in agreement with her colleague. “The forest scenes were some of the more complex environments to manage. It could get tricky, particularly when the filming schedule shifted. There was one day on set where the order of shots changed unexpectedly, and because the physical sets looked so similar, I initially loaded a different perspective than planned. Fortunately, thanks to our workflow, Lindsay Georgeand I were able to quickly open the recorded sequence in Unreal Engine and swap out the correct virtual environment for the live composite without any disruption to the shoot.”

    An example of the virtual and final version of the Woodland Mansion.

    “Midport Village likely went through the most iterations. The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled.”
    —Laura Bell, Creative Technologist, Disguise

    Extensive detail was given to the center of the sets where the main action unfolds. “For these areas, we received prop layouts from the prop department to ensure accurate placement and alignment with the physical builds,” Finlayson explains. “These central environments were used heavily for storyboarding, blocking and department reviews, so precision was essential. As we moved further out from the practical set, the environments became more about blocking and spatial context rather than fine detail. We worked closely with Production Designer Grant Major to get approval on these extended environments, making sure they aligned with the overall visual direction. We also used creatures and crowd stand-ins provided by the visual effects team. These gave a great sense of scale and placement during early planning stages and allowed other departments to better understand how these elements would be integrated into the scenes.”

    Cast members Sebastian Hansen, Danielle Brooks and Emma Myers stand in front of the Earth Portal Plateau environment.

    Doing a virtual scale study of the Mountainside.

    Practical requirements like camera moves, stunt choreography and crane setups had an impact on the creation of virtual environments. “Sometimes we would adjust layouts slightly to open up areas for tracking shots or rework spaces to accommodate key action beats, all while keeping the environment feeling cohesive and true to the Minecraft world,” Bell states. “Simulcam bridged the physical and virtual worlds on set, overlaying Unreal Engine environments onto live-action scenes in real-time, giving the director, DP and other department heads a fully-realized preview of shots and enabling precise, informed decisions during production. It also recorded critical production data like camera movement paths, which was handed over to the post-production team to give them the exact tracks they needed, streamlining the visual effects pipeline.”

    Piglots cause mayhem during the Wingsuit Chase.

    Virtual versions of the exterior and interior of the Safe House located in the Enchanted Woods.

    “One of the biggest challenges for me was managing constant iteration while keeping our environments clean, organized and easy to update,” Finlayson notes. “Because the virtual sets were reviewed regularly by the director and other heads of departments, feedback was often implemented live in the room. This meant the environments had to be flexible. But overall, this was an amazing project to work on, and I am so grateful for the incredible VAD team I was a part of – Heide Nichols, Pat Younis, Jake Tuckand Laura. Everyone on this team worked so collaboratively, seamlessly and in such a supportive way that I never felt like I was out of my depth.” There was another challenge that is more to do with familiarity. “Having a VAD on a film is still a relatively new process in production,” Bell states. “There were moments where other departments were still learning what we did and how to best work with us. That said, the response was overwhelmingly positive. I remember being on set at the Simulcam station and seeing how excited people were to look at the virtual environments as they walked by, often stopping for a chat and a virtual tour. Instead of seeing just a huge blue curtain, they were stoked to see something Minecraft and could get a better sense of what they were actually shooting.”
    #how #disguise #built #out #virtual
    HOW DISGUISE BUILT OUT THE VIRTUAL ENVIRONMENTS FOR A MINECRAFT MOVIE
    By TREVOR HOGG Images courtesy of Warner Bros. Pictures. Rather than a world constructed around photorealistic pixels, a video game created by Markus Persson has taken the boxier 3D voxel route, which has become its signature aesthetic, and sparked an international phenomenon that finally gets adapted into a feature with the release of A Minecraft Movie. Brought onboard to help filmmaker Jared Hess in creating the environments that the cast of Jason Momoa, Jack Black, Sebastian Hansen, Emma Myers and Danielle Brooks find themselves inhabiting was Disguise under the direction of Production VFX Supervisor Dan Lemmon. “s the Senior Unreal Artist within the Virtual Art Departmenton Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.” —Talia Finlayson, Creative Technologist, Disguise Interior and exterior environments had to be created, such as the shop owned by Steve. “Prior to working on A Minecraft Movie, I held more technical roles, like serving as the Virtual Production LED Volume Operator on a project for Apple TV+ and Paramount Pictures,” notes Talia Finlayson, Creative Technologist for Disguise. “But as the Senior Unreal Artist within the Virtual Art Departmenton Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.” The project provided new opportunities. “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance,” notes Laura Bell, Creative Technologist for Disguise. “But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.” Set designs originally created by the art department in Rhinoceros 3D were transformed into fully navigable 3D environments within Unreal Engine. “These scenes were far more than visualizations,” Finlayson remarks. “They were interactive tools used throughout the production pipeline. We would ingest 3D models and concept art, clean and optimize geometry using tools like Blender, Cinema 4D or Maya, then build out the world in Unreal Engine. This included applying materials, lighting and extending environments. These Unreal scenes we created were vital tools across the production and were used for a variety of purposes such as enabling the director to explore shot compositions, block scenes and experiment with camera movement in a virtual space, as well as passing along Unreal Engine scenes to the visual effects vendors so they could align their digital environments and set extensions with the approved production layouts.” A virtual exploration of Steve’s shop in Midport Village. Certain elements have to be kept in mind when constructing virtual environments. “When building virtual environments, you need to consider what can actually be built, how actors and cameras will move through the space, and what’s safe and practical on set,” Bell observes. “Outside the areas where strict accuracy is required, you want the environments to blend naturally with the original designs from the art department and support the story, creating a space that feels right for the scene, guides the audience’s eye and sets the right tone. Things like composition, lighting and small environmental details can be really fun to work on, but also serve as beautiful additions to help enrich a story.” “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance. But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.” —Laura Bell, Creative Technologist, Disguise Among the buildings that had to be created for Midport Village was Steve’sLava Chicken Shack. Concept art was provided that served as visual touchstones. “We received concept art provided by the amazing team of concept artists,” Finlayson states. “Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging. Sometimes we would also help the storyboard artists by sending through images of the Unreal Engine worlds to help them geographically position themselves in the worlds and aid in their storyboarding.” At times, the video game assets came in handy. “Exteriors often involved large-scale landscapes and stylized architectural elements, which had to feel true to the Minecraft world,” Finlayson explains. “In some cases, we brought in geometry from the game itself to help quickly block out areas. For example, we did this for the Elytra Flight Chase sequence, which takes place through a large canyon.” Flexibility was critical. “A key technical challenge we faced was ensuring that the Unreal levels were built in a way that allowed for fast and flexible iteration,” Finlayson remarks. “Since our environments were constantly being reviewed by the director, production designer, DP and VFX supervisor, we needed to be able to respond quickly to feedback, sometimes live during a review session. To support this, we had to keep our scenes modular and well-organized; that meant breaking environments down into manageable components and maintaining clean naming conventions. By setting up the levels this way, we could make layout changes, swap assets or adjust lighting on the fly without breaking the scene or slowing down the process.” Production schedules influence the workflows, pipelines and techniques. “No two projects will ever feel exactly the same,” Bell notes. “For example, Pat Younisadapted his typical VR setup to allow scene reviews using a PS5 controller, which made it much more comfortable and accessible for the director. On a more technical side, because everything was cubes and voxels, my Blender workflow ended up being way heavier on the re-mesh modifier than usual, definitely not something I’ll run into again anytime soon!” A virtual study and final still of the cast members standing outside of the Lava Chicken Shack. “We received concept art provided by the amazing team of concept artists. Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging.” —Talia Finlayson, Creative Technologist, Disguise The design and composition of virtual environments tended to remain consistent throughout principal photography. “The only major design change I can recall was the removal of a second story from a building in Midport Village to allow the camera crane to get a clear shot of the chicken perched above Steve’s lava chicken shack,” Finlayson remarks. “I would agree that Midport Village likely went through the most iterations,” Bell responds. “The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled. I remember rebuilding the stairs leading up to the rampart five or six times, using different configurations based on the physically constructed stairs. This was because there were storyboarded sequences of the film’s characters, Henry, Steve and Garrett, being chased by piglins, and the action needed to match what could be achieved practically on set.” Virtually conceptualizing the layout of Midport Village. Complex virtual environments were constructed for the final battle and the various forest scenes throughout the movie. “What made these particularly challenging was the way physical set pieces were repurposed and repositioned to serve multiple scenes and locations within the story,” Finlayson reveals. “The same built elements had to appear in different parts of the world, so we had to carefully adjust the virtual environments to accommodate those different positions.” Bell is in agreement with her colleague. “The forest scenes were some of the more complex environments to manage. It could get tricky, particularly when the filming schedule shifted. There was one day on set where the order of shots changed unexpectedly, and because the physical sets looked so similar, I initially loaded a different perspective than planned. Fortunately, thanks to our workflow, Lindsay Georgeand I were able to quickly open the recorded sequence in Unreal Engine and swap out the correct virtual environment for the live composite without any disruption to the shoot.” An example of the virtual and final version of the Woodland Mansion. “Midport Village likely went through the most iterations. The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled.” —Laura Bell, Creative Technologist, Disguise Extensive detail was given to the center of the sets where the main action unfolds. “For these areas, we received prop layouts from the prop department to ensure accurate placement and alignment with the physical builds,” Finlayson explains. “These central environments were used heavily for storyboarding, blocking and department reviews, so precision was essential. As we moved further out from the practical set, the environments became more about blocking and spatial context rather than fine detail. We worked closely with Production Designer Grant Major to get approval on these extended environments, making sure they aligned with the overall visual direction. We also used creatures and crowd stand-ins provided by the visual effects team. These gave a great sense of scale and placement during early planning stages and allowed other departments to better understand how these elements would be integrated into the scenes.” Cast members Sebastian Hansen, Danielle Brooks and Emma Myers stand in front of the Earth Portal Plateau environment. Doing a virtual scale study of the Mountainside. Practical requirements like camera moves, stunt choreography and crane setups had an impact on the creation of virtual environments. “Sometimes we would adjust layouts slightly to open up areas for tracking shots or rework spaces to accommodate key action beats, all while keeping the environment feeling cohesive and true to the Minecraft world,” Bell states. “Simulcam bridged the physical and virtual worlds on set, overlaying Unreal Engine environments onto live-action scenes in real-time, giving the director, DP and other department heads a fully-realized preview of shots and enabling precise, informed decisions during production. It also recorded critical production data like camera movement paths, which was handed over to the post-production team to give them the exact tracks they needed, streamlining the visual effects pipeline.” Piglots cause mayhem during the Wingsuit Chase. Virtual versions of the exterior and interior of the Safe House located in the Enchanted Woods. “One of the biggest challenges for me was managing constant iteration while keeping our environments clean, organized and easy to update,” Finlayson notes. “Because the virtual sets were reviewed regularly by the director and other heads of departments, feedback was often implemented live in the room. This meant the environments had to be flexible. But overall, this was an amazing project to work on, and I am so grateful for the incredible VAD team I was a part of – Heide Nichols, Pat Younis, Jake Tuckand Laura. Everyone on this team worked so collaboratively, seamlessly and in such a supportive way that I never felt like I was out of my depth.” There was another challenge that is more to do with familiarity. “Having a VAD on a film is still a relatively new process in production,” Bell states. “There were moments where other departments were still learning what we did and how to best work with us. That said, the response was overwhelmingly positive. I remember being on set at the Simulcam station and seeing how excited people were to look at the virtual environments as they walked by, often stopping for a chat and a virtual tour. Instead of seeing just a huge blue curtain, they were stoked to see something Minecraft and could get a better sense of what they were actually shooting.” #how #disguise #built #out #virtual
    WWW.VFXVOICE.COM
    HOW DISGUISE BUILT OUT THE VIRTUAL ENVIRONMENTS FOR A MINECRAFT MOVIE
    By TREVOR HOGG Images courtesy of Warner Bros. Pictures. Rather than a world constructed around photorealistic pixels, a video game created by Markus Persson has taken the boxier 3D voxel route, which has become its signature aesthetic, and sparked an international phenomenon that finally gets adapted into a feature with the release of A Minecraft Movie. Brought onboard to help filmmaker Jared Hess in creating the environments that the cast of Jason Momoa, Jack Black, Sebastian Hansen, Emma Myers and Danielle Brooks find themselves inhabiting was Disguise under the direction of Production VFX Supervisor Dan Lemmon. “[A]s the Senior Unreal Artist within the Virtual Art Department (VAD) on Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.” —Talia Finlayson, Creative Technologist, Disguise Interior and exterior environments had to be created, such as the shop owned by Steve (Jack Black). “Prior to working on A Minecraft Movie, I held more technical roles, like serving as the Virtual Production LED Volume Operator on a project for Apple TV+ and Paramount Pictures,” notes Talia Finlayson, Creative Technologist for Disguise. “But as the Senior Unreal Artist within the Virtual Art Department (VAD) on Minecraft, I experienced the full creative workflow. What stood out most was how deeply the VAD was embedded across every stage of production. We weren’t working in isolation. From the production designer and director to the VFX supervisor and DP, the VAD became a hub for collaboration.” The project provided new opportunities. “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance,” notes Laura Bell, Creative Technologist for Disguise. “But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.” Set designs originally created by the art department in Rhinoceros 3D were transformed into fully navigable 3D environments within Unreal Engine. “These scenes were far more than visualizations,” Finlayson remarks. “They were interactive tools used throughout the production pipeline. We would ingest 3D models and concept art, clean and optimize geometry using tools like Blender, Cinema 4D or Maya, then build out the world in Unreal Engine. This included applying materials, lighting and extending environments. These Unreal scenes we created were vital tools across the production and were used for a variety of purposes such as enabling the director to explore shot compositions, block scenes and experiment with camera movement in a virtual space, as well as passing along Unreal Engine scenes to the visual effects vendors so they could align their digital environments and set extensions with the approved production layouts.” A virtual exploration of Steve’s shop in Midport Village. Certain elements have to be kept in mind when constructing virtual environments. “When building virtual environments, you need to consider what can actually be built, how actors and cameras will move through the space, and what’s safe and practical on set,” Bell observes. “Outside the areas where strict accuracy is required, you want the environments to blend naturally with the original designs from the art department and support the story, creating a space that feels right for the scene, guides the audience’s eye and sets the right tone. Things like composition, lighting and small environmental details can be really fun to work on, but also serve as beautiful additions to help enrich a story.” “I’ve always loved the physicality of working with an LED volume, both for the immersion it provides and the way that seeing the environment helps shape an actor’s performance. But for A Minecraft Movie, we used Simulcam instead, and it was an incredible experience to live-composite an entire Minecraft world in real-time, especially with nothing on set but blue curtains.” —Laura Bell, Creative Technologist, Disguise Among the buildings that had to be created for Midport Village was Steve’s (Jack Black) Lava Chicken Shack. Concept art was provided that served as visual touchstones. “We received concept art provided by the amazing team of concept artists,” Finlayson states. “Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging. Sometimes we would also help the storyboard artists by sending through images of the Unreal Engine worlds to help them geographically position themselves in the worlds and aid in their storyboarding.” At times, the video game assets came in handy. “Exteriors often involved large-scale landscapes and stylized architectural elements, which had to feel true to the Minecraft world,” Finlayson explains. “In some cases, we brought in geometry from the game itself to help quickly block out areas. For example, we did this for the Elytra Flight Chase sequence, which takes place through a large canyon.” Flexibility was critical. “A key technical challenge we faced was ensuring that the Unreal levels were built in a way that allowed for fast and flexible iteration,” Finlayson remarks. “Since our environments were constantly being reviewed by the director, production designer, DP and VFX supervisor, we needed to be able to respond quickly to feedback, sometimes live during a review session. To support this, we had to keep our scenes modular and well-organized; that meant breaking environments down into manageable components and maintaining clean naming conventions. By setting up the levels this way, we could make layout changes, swap assets or adjust lighting on the fly without breaking the scene or slowing down the process.” Production schedules influence the workflows, pipelines and techniques. “No two projects will ever feel exactly the same,” Bell notes. “For example, Pat Younis [VAD Art Director] adapted his typical VR setup to allow scene reviews using a PS5 controller, which made it much more comfortable and accessible for the director. On a more technical side, because everything was cubes and voxels, my Blender workflow ended up being way heavier on the re-mesh modifier than usual, definitely not something I’ll run into again anytime soon!” A virtual study and final still of the cast members standing outside of the Lava Chicken Shack. “We received concept art provided by the amazing team of concept artists. Not only did they send us 2D artwork, but they often shared the 3D models they used to create those visuals. These models were incredibly helpful as starting points when building out the virtual environments in Unreal Engine; they gave us a clear sense of composition and design intent. Storyboards were also a key part of the process and were constantly being updated as the project evolved. Having access to the latest versions allowed us to tailor the virtual environments to match camera angles, story beats and staging.” —Talia Finlayson, Creative Technologist, Disguise The design and composition of virtual environments tended to remain consistent throughout principal photography. “The only major design change I can recall was the removal of a second story from a building in Midport Village to allow the camera crane to get a clear shot of the chicken perched above Steve’s lava chicken shack,” Finlayson remarks. “I would agree that Midport Village likely went through the most iterations,” Bell responds. “The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled. I remember rebuilding the stairs leading up to the rampart five or six times, using different configurations based on the physically constructed stairs. This was because there were storyboarded sequences of the film’s characters, Henry, Steve and Garrett, being chased by piglins, and the action needed to match what could be achieved practically on set.” Virtually conceptualizing the layout of Midport Village. Complex virtual environments were constructed for the final battle and the various forest scenes throughout the movie. “What made these particularly challenging was the way physical set pieces were repurposed and repositioned to serve multiple scenes and locations within the story,” Finlayson reveals. “The same built elements had to appear in different parts of the world, so we had to carefully adjust the virtual environments to accommodate those different positions.” Bell is in agreement with her colleague. “The forest scenes were some of the more complex environments to manage. It could get tricky, particularly when the filming schedule shifted. There was one day on set where the order of shots changed unexpectedly, and because the physical sets looked so similar, I initially loaded a different perspective than planned. Fortunately, thanks to our workflow, Lindsay George [VP Tech] and I were able to quickly open the recorded sequence in Unreal Engine and swap out the correct virtual environment for the live composite without any disruption to the shoot.” An example of the virtual and final version of the Woodland Mansion. “Midport Village likely went through the most iterations. The archway, in particular, became a visual anchor across different levels. We often placed it off in the distance to help orient both ourselves and the audience and show how far the characters had traveled.” —Laura Bell, Creative Technologist, Disguise Extensive detail was given to the center of the sets where the main action unfolds. “For these areas, we received prop layouts from the prop department to ensure accurate placement and alignment with the physical builds,” Finlayson explains. “These central environments were used heavily for storyboarding, blocking and department reviews, so precision was essential. As we moved further out from the practical set, the environments became more about blocking and spatial context rather than fine detail. We worked closely with Production Designer Grant Major to get approval on these extended environments, making sure they aligned with the overall visual direction. We also used creatures and crowd stand-ins provided by the visual effects team. These gave a great sense of scale and placement during early planning stages and allowed other departments to better understand how these elements would be integrated into the scenes.” Cast members Sebastian Hansen, Danielle Brooks and Emma Myers stand in front of the Earth Portal Plateau environment. Doing a virtual scale study of the Mountainside. Practical requirements like camera moves, stunt choreography and crane setups had an impact on the creation of virtual environments. “Sometimes we would adjust layouts slightly to open up areas for tracking shots or rework spaces to accommodate key action beats, all while keeping the environment feeling cohesive and true to the Minecraft world,” Bell states. “Simulcam bridged the physical and virtual worlds on set, overlaying Unreal Engine environments onto live-action scenes in real-time, giving the director, DP and other department heads a fully-realized preview of shots and enabling precise, informed decisions during production. It also recorded critical production data like camera movement paths, which was handed over to the post-production team to give them the exact tracks they needed, streamlining the visual effects pipeline.” Piglots cause mayhem during the Wingsuit Chase. Virtual versions of the exterior and interior of the Safe House located in the Enchanted Woods. “One of the biggest challenges for me was managing constant iteration while keeping our environments clean, organized and easy to update,” Finlayson notes. “Because the virtual sets were reviewed regularly by the director and other heads of departments, feedback was often implemented live in the room. This meant the environments had to be flexible. But overall, this was an amazing project to work on, and I am so grateful for the incredible VAD team I was a part of – Heide Nichols [VAD Supervisor], Pat Younis, Jake Tuck [Unreal Artist] and Laura. Everyone on this team worked so collaboratively, seamlessly and in such a supportive way that I never felt like I was out of my depth.” There was another challenge that is more to do with familiarity. “Having a VAD on a film is still a relatively new process in production,” Bell states. “There were moments where other departments were still learning what we did and how to best work with us. That said, the response was overwhelmingly positive. I remember being on set at the Simulcam station and seeing how excited people were to look at the virtual environments as they walked by, often stopping for a chat and a virtual tour. Instead of seeing just a huge blue curtain, they were stoked to see something Minecraft and could get a better sense of what they were actually shooting.”
    0 Comentários 0 Compartilhamentos

  • ## Introduction

    Cet été, une opportunité se présente pour ceux qui souhaitent se spécialiser dans la **3D temps réel**. Le studio **MIAM! Animation** s'associe au département **Arts et Technologies de l’Image (ATI)** de l’**Université Paris 8** pour offrir une formation certifiante. Ce programme vise à initier les participants à la transition d’un pipeline précalculé vers un environnement de création en temps réel.

    ## Une formation professionnelle

    La formation proposée par MIAM! et ATI est ...
    ## Introduction Cet été, une opportunité se présente pour ceux qui souhaitent se spécialiser dans la **3D temps réel**. Le studio **MIAM! Animation** s'associe au département **Arts et Technologies de l’Image (ATI)** de l’**Université Paris 8** pour offrir une formation certifiante. Ce programme vise à initier les participants à la transition d’un pipeline précalculé vers un environnement de création en temps réel. ## Une formation professionnelle La formation proposée par MIAM! et ATI est ...
    TDs, formez-vous à la 3D temps réel avec Miam! et ATI
    ## Introduction Cet été, une opportunité se présente pour ceux qui souhaitent se spécialiser dans la **3D temps réel**. Le studio **MIAM! Animation** s'associe au département **Arts et Technologies de l’Image (ATI)** de l’**Université Paris 8** pour offrir une formation certifiante. Ce programme vise à initier les participants à la transition d’un pipeline précalculé vers un environnement de...
    Like
    Love
    Wow
    Sad
    Angry
    154
    1 Comentários 0 Compartilhamentos
  • Blender, developers meeting notes, 4.5 LTS Beta, 5.0 Alpha, project updates, user interface, core module, pipeline and I/O module, new features

    ---

    ## Introduction

    Ah, the Blender developers, those wizards of 3D magic. If you’re wondering what they’ve been up to recently, you’re in for a treat! The latest meeting notes from June 9, 2025, are here, showcasing everything from announcements to project updates, all dressed up with a sprinkle of sarcasm. So, buckle up as we dive into this whimsica...
    Blender, developers meeting notes, 4.5 LTS Beta, 5.0 Alpha, project updates, user interface, core module, pipeline and I/O module, new features --- ## Introduction Ah, the Blender developers, those wizards of 3D magic. If you’re wondering what they’ve been up to recently, you’re in for a treat! The latest meeting notes from June 9, 2025, are here, showcasing everything from announcements to project updates, all dressed up with a sprinkle of sarcasm. So, buckle up as we dive into this whimsica...
    Blender Developers Meeting Notes: What’s Cooking for 2025?
    Blender, developers meeting notes, 4.5 LTS Beta, 5.0 Alpha, project updates, user interface, core module, pipeline and I/O module, new features --- ## Introduction Ah, the Blender developers, those wizards of 3D magic. If you’re wondering what they’ve been up to recently, you’re in for a treat! The latest meeting notes from June 9, 2025, are here, showcasing everything from announcements to...
    Like
    Love
    Wow
    Sad
    Angry
    568
    1 Comentários 0 Compartilhamentos
  • déploiement continu, intégration continue, automatisation, pipelines, développement logiciel, méthodologies agiles, DevOps

    ## Introduction

    La mise en place d'un mécanisme de déploiement continu est devenue une nécessité pour de nombreuses équipes de développement logiciel. Dans un environnement technologique en constante évolution, les organisations cherchent à améliorer leur efficacité et leur rapidité de livraison. Cet article explore les concepts fondamentaux du déploiement continu, en mett...
    déploiement continu, intégration continue, automatisation, pipelines, développement logiciel, méthodologies agiles, DevOps ## Introduction La mise en place d'un mécanisme de déploiement continu est devenue une nécessité pour de nombreuses équipes de développement logiciel. Dans un environnement technologique en constante évolution, les organisations cherchent à améliorer leur efficacité et leur rapidité de livraison. Cet article explore les concepts fondamentaux du déploiement continu, en mett...
    Mise en place d'un mécanisme de déploiement continu
    déploiement continu, intégration continue, automatisation, pipelines, développement logiciel, méthodologies agiles, DevOps ## Introduction La mise en place d'un mécanisme de déploiement continu est devenue une nécessité pour de nombreuses équipes de développement logiciel. Dans un environnement technologique en constante évolution, les organisations cherchent à améliorer leur efficacité et...
    Like
    Love
    Wow
    Sad
    Angry
    657
    1 Comentários 0 Compartilhamentos
  • fxpodcast: the making of the immersive Apple Vision Pro film Bono: Stories of Surrender

    In this episode of the fxpodcast, we go behind the scenes with The-Artery, the New York-based creative studio that brought this ambitious vision to life. We speak with Founder and CCO Vico Sharabani, along with Elad Offer, the project’s Creative Director, about what it took to craft this unprecedented experience. From conceptual direction to VFX and design, The-Artery was responsible for the full production pipeline of the AVP edition.
    Bono’s memoir Surrender: 40 Songs, One Story has taken on new life—this time as a groundbreaking immersive cinematic experience tailored specifically for the Apple Vision Pro. Titled Bono: Stories of Surrender, the project transforms his personal journey of love, loss, and legacy into a first-of-its-kind Apple Immersive Video.

    The-Artery,Founder Vico Sharabani during post-production.
    This is far more than a stereo conversion of a traditional film. Designed natively for the Apple Vision Pro, Bono: Stories of Surrenderplaces viewers directly on stage with Bono, surrounding them in a deeply intimate audiovisual journey. Shot and mastered at a staggering 14K by 7K resolution, in 180-degree stereoscopic video at 90 frames per second, the format pushes the limits of current storytelling, running at data rates nearly 50 times higher than conventional content. The immersive trailer itself diverges significantly from its traditional counterpart, using novel cinematic language, spatial cues, and temporal transitions unique to Apple’s new medium.

    This marks the first feature-length film available in Apple Immersive Video, and a powerful statement on Bono’s and U2’s continued embrace of innovation. Watch the video or listen to the audio podcast as we unpack the creative and technical challenges of building a film for a platform that didn’t exist just a year ago, and what it means for the future of immersive storytelling.
    #fxpodcast #making #immersive #apple #vision
    fxpodcast: the making of the immersive Apple Vision Pro film Bono: Stories of Surrender
    In this episode of the fxpodcast, we go behind the scenes with The-Artery, the New York-based creative studio that brought this ambitious vision to life. We speak with Founder and CCO Vico Sharabani, along with Elad Offer, the project’s Creative Director, about what it took to craft this unprecedented experience. From conceptual direction to VFX and design, The-Artery was responsible for the full production pipeline of the AVP edition. Bono’s memoir Surrender: 40 Songs, One Story has taken on new life—this time as a groundbreaking immersive cinematic experience tailored specifically for the Apple Vision Pro. Titled Bono: Stories of Surrender, the project transforms his personal journey of love, loss, and legacy into a first-of-its-kind Apple Immersive Video. The-Artery,Founder Vico Sharabani during post-production. This is far more than a stereo conversion of a traditional film. Designed natively for the Apple Vision Pro, Bono: Stories of Surrenderplaces viewers directly on stage with Bono, surrounding them in a deeply intimate audiovisual journey. Shot and mastered at a staggering 14K by 7K resolution, in 180-degree stereoscopic video at 90 frames per second, the format pushes the limits of current storytelling, running at data rates nearly 50 times higher than conventional content. The immersive trailer itself diverges significantly from its traditional counterpart, using novel cinematic language, spatial cues, and temporal transitions unique to Apple’s new medium. This marks the first feature-length film available in Apple Immersive Video, and a powerful statement on Bono’s and U2’s continued embrace of innovation. Watch the video or listen to the audio podcast as we unpack the creative and technical challenges of building a film for a platform that didn’t exist just a year ago, and what it means for the future of immersive storytelling. #fxpodcast #making #immersive #apple #vision
    WWW.FXGUIDE.COM
    fxpodcast: the making of the immersive Apple Vision Pro film Bono: Stories of Surrender
    In this episode of the fxpodcast, we go behind the scenes with The-Artery, the New York-based creative studio that brought this ambitious vision to life. We speak with Founder and CCO Vico Sharabani, along with Elad Offer, the project’s Creative Director, about what it took to craft this unprecedented experience. From conceptual direction to VFX and design, The-Artery was responsible for the full production pipeline of the AVP edition. Bono’s memoir Surrender: 40 Songs, One Story has taken on new life—this time as a groundbreaking immersive cinematic experience tailored specifically for the Apple Vision Pro. Titled Bono: Stories of Surrender (Immersive), the project transforms his personal journey of love, loss, and legacy into a first-of-its-kind Apple Immersive Video. The-Artery, (R) Founder Vico Sharabani during post-production. This is far more than a stereo conversion of a traditional film. Designed natively for the Apple Vision Pro, Bono: Stories of Surrender (Immersive) places viewers directly on stage with Bono, surrounding them in a deeply intimate audiovisual journey. Shot and mastered at a staggering 14K by 7K resolution, in 180-degree stereoscopic video at 90 frames per second, the format pushes the limits of current storytelling, running at data rates nearly 50 times higher than conventional content. The immersive trailer itself diverges significantly from its traditional counterpart, using novel cinematic language, spatial cues, and temporal transitions unique to Apple’s new medium. This marks the first feature-length film available in Apple Immersive Video, and a powerful statement on Bono’s and U2’s continued embrace of innovation. Watch the video or listen to the audio podcast as we unpack the creative and technical challenges of building a film for a platform that didn’t exist just a year ago, and what it means for the future of immersive storytelling.
    Like
    Love
    Wow
    Angry
    Sad
    552
    2 Comentários 0 Compartilhamentos
  • EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments

    Navigating the dense urban canyons of cities like San Francisco or New York can be a nightmare for GPS systems. The towering skyscrapers block and reflect satellite signals, leading to location errors of tens of meters. For you and me, that might mean a missed turn. But for an autonomous vehicle or a delivery robot, that level of imprecision is the difference between a successful mission and a costly failure. These machines require pinpoint accuracy to operate safely and efficiently. Addressing this critical challenge, researchers from the École Polytechnique Fédérale de Lausannein Switzerland have introduced a groundbreaking new method for visual localization during CVPR 2025
    Their new paper, “FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching,” presents a novel AI model that significantly enhances the ability of a ground-level system, like an autonomous car, to determine its exact position and orientation using only a camera and a corresponding aerialimage. The new approach has demonstrated a remarkable 28% reduction in mean localization error compared to the previous state-of-the-art on a challenging public dataset.
    Key Takeaways:

    Superior Accuracy: The FG2 model reduces the average localization error by a significant 28% on the VIGOR cross-area test set, a challenging benchmark for this task.
    Human-like Intuition: Instead of relying on abstract descriptors, the model mimics human reasoning by matching fine-grained, semantically consistent features—like curbs, crosswalks, and buildings—between a ground-level photo and an aerial map.
    Enhanced Interpretability: The method allows researchers to “see” what the AI is “thinking” by visualizing exactly which features in the ground and aerial images are being matched, a major step forward from previous “black box” models.
    Weakly Supervised Learning: Remarkably, the model learns these complex and consistent feature matches without any direct labels for correspondences. It achieves this using only the final camera pose as a supervisory signal.

    Challenge: Seeing the World from Two Different Angles
    The core problem of cross-view localization is the dramatic difference in perspective between a street-level camera and an overhead satellite view. A building facade seen from the ground looks completely different from its rooftop signature in an aerial image. Existing methods have struggled with this. Some create a general “descriptor” for the entire scene, but this is an abstract approach that doesn’t mirror how humans naturally localize themselves by spotting specific landmarks. Other methods transform the ground image into a Bird’s-Eye-Viewbut are often limited to the ground plane, ignoring crucial vertical structures like buildings.

    FG2: Matching Fine-Grained Features
    The EPFL team’s FG2 method introduces a more intuitive and effective process. It aligns two sets of points: one generated from the ground-level image and another sampled from the aerial map.

    Here’s a breakdown of their innovative pipeline:

    Mapping to 3D: The process begins by taking the features from the ground-level image and lifting them into a 3D point cloud centered around the camera. This creates a 3D representation of the immediate environment.
    Smart Pooling to BEV: This is where the magic happens. Instead of simply flattening the 3D data, the model learns to intelligently select the most important features along the verticaldimension for each point. It essentially asks, “For this spot on the map, is the ground-level road marking more important, or is the edge of that building’s roof the better landmark?” This selection process is crucial, as it allows the model to correctly associate features like building facades with their corresponding rooftops in the aerial view.
    Feature Matching and Pose Estimation: Once both the ground and aerial views are represented as 2D point planes with rich feature descriptors, the model computes the similarity between them. It then samples a sparse set of the most confident matches and uses a classic geometric algorithm called Procrustes alignment to calculate the precise 3-DoFpose.

    Unprecedented Performance and Interpretability
    The results speak for themselves. On the challenging VIGOR dataset, which includes images from different cities in its cross-area test, FG2 reduced the mean localization error by 28% compared to the previous best method. It also demonstrated superior generalization capabilities on the KITTI dataset, a staple in autonomous driving research.

    Perhaps more importantly, the FG2 model offers a new level of transparency. By visualizing the matched points, the researchers showed that the model learns semantically consistent correspondences without being explicitly told to. For example, the system correctly matches zebra crossings, road markings, and even building facades in the ground view to their corresponding locations on the aerial map. This interpretability is extremenly valuable for building trust in safety-critical autonomous systems.
    “A Clearer Path” for Autonomous Navigation
    The FG2 method represents a significant leap forward in fine-grained visual localization. By developing a model that intelligently selects and matches features in a way that mirrors human intuition, the EPFL researchers have not only shattered previous accuracy records but also made the decision-making process of the AI more interpretable. This work paves the way for more robust and reliable navigation systems for autonomous vehicles, drones, and robots, bringing us one step closer to a future where machines can confidently navigate our world, even when GPS fails them.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
    Jean-marc MommessinJean-marc is a successful AI business executive .He leads and accelerates growth for AI powered solutions and started a computer vision company in 2006. He is a recognized speaker at AI conferences and has an MBA from Stanford.Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/AI-Generated Ad Created with Google’s Veo3 Airs During NBA Finals, Slashing Production Costs by 95%Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Highlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular Video ControlJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Snowflake Charts New AI Territory: Cortex AISQL & Snowflake Intelligence Poised to Reshape Data AnalyticsJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Exclusive Talk: Joey Conway of NVIDIA on Llama Nemotron Ultra and Open Source Models
    #epfl #researchers #unveil #fg2 #cvpr
    EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments
    Navigating the dense urban canyons of cities like San Francisco or New York can be a nightmare for GPS systems. The towering skyscrapers block and reflect satellite signals, leading to location errors of tens of meters. For you and me, that might mean a missed turn. But for an autonomous vehicle or a delivery robot, that level of imprecision is the difference between a successful mission and a costly failure. These machines require pinpoint accuracy to operate safely and efficiently. Addressing this critical challenge, researchers from the École Polytechnique Fédérale de Lausannein Switzerland have introduced a groundbreaking new method for visual localization during CVPR 2025 Their new paper, “FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching,” presents a novel AI model that significantly enhances the ability of a ground-level system, like an autonomous car, to determine its exact position and orientation using only a camera and a corresponding aerialimage. The new approach has demonstrated a remarkable 28% reduction in mean localization error compared to the previous state-of-the-art on a challenging public dataset. Key Takeaways: Superior Accuracy: The FG2 model reduces the average localization error by a significant 28% on the VIGOR cross-area test set, a challenging benchmark for this task. Human-like Intuition: Instead of relying on abstract descriptors, the model mimics human reasoning by matching fine-grained, semantically consistent features—like curbs, crosswalks, and buildings—between a ground-level photo and an aerial map. Enhanced Interpretability: The method allows researchers to “see” what the AI is “thinking” by visualizing exactly which features in the ground and aerial images are being matched, a major step forward from previous “black box” models. Weakly Supervised Learning: Remarkably, the model learns these complex and consistent feature matches without any direct labels for correspondences. It achieves this using only the final camera pose as a supervisory signal. Challenge: Seeing the World from Two Different Angles The core problem of cross-view localization is the dramatic difference in perspective between a street-level camera and an overhead satellite view. A building facade seen from the ground looks completely different from its rooftop signature in an aerial image. Existing methods have struggled with this. Some create a general “descriptor” for the entire scene, but this is an abstract approach that doesn’t mirror how humans naturally localize themselves by spotting specific landmarks. Other methods transform the ground image into a Bird’s-Eye-Viewbut are often limited to the ground plane, ignoring crucial vertical structures like buildings. FG2: Matching Fine-Grained Features The EPFL team’s FG2 method introduces a more intuitive and effective process. It aligns two sets of points: one generated from the ground-level image and another sampled from the aerial map. Here’s a breakdown of their innovative pipeline: Mapping to 3D: The process begins by taking the features from the ground-level image and lifting them into a 3D point cloud centered around the camera. This creates a 3D representation of the immediate environment. Smart Pooling to BEV: This is where the magic happens. Instead of simply flattening the 3D data, the model learns to intelligently select the most important features along the verticaldimension for each point. It essentially asks, “For this spot on the map, is the ground-level road marking more important, or is the edge of that building’s roof the better landmark?” This selection process is crucial, as it allows the model to correctly associate features like building facades with their corresponding rooftops in the aerial view. Feature Matching and Pose Estimation: Once both the ground and aerial views are represented as 2D point planes with rich feature descriptors, the model computes the similarity between them. It then samples a sparse set of the most confident matches and uses a classic geometric algorithm called Procrustes alignment to calculate the precise 3-DoFpose. Unprecedented Performance and Interpretability The results speak for themselves. On the challenging VIGOR dataset, which includes images from different cities in its cross-area test, FG2 reduced the mean localization error by 28% compared to the previous best method. It also demonstrated superior generalization capabilities on the KITTI dataset, a staple in autonomous driving research. Perhaps more importantly, the FG2 model offers a new level of transparency. By visualizing the matched points, the researchers showed that the model learns semantically consistent correspondences without being explicitly told to. For example, the system correctly matches zebra crossings, road markings, and even building facades in the ground view to their corresponding locations on the aerial map. This interpretability is extremenly valuable for building trust in safety-critical autonomous systems. “A Clearer Path” for Autonomous Navigation The FG2 method represents a significant leap forward in fine-grained visual localization. By developing a model that intelligently selects and matches features in a way that mirrors human intuition, the EPFL researchers have not only shattered previous accuracy records but also made the decision-making process of the AI more interpretable. This work paves the way for more robust and reliable navigation systems for autonomous vehicles, drones, and robots, bringing us one step closer to a future where machines can confidently navigate our world, even when GPS fails them. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Jean-marc MommessinJean-marc is a successful AI business executive .He leads and accelerates growth for AI powered solutions and started a computer vision company in 2006. He is a recognized speaker at AI conferences and has an MBA from Stanford.Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/AI-Generated Ad Created with Google’s Veo3 Airs During NBA Finals, Slashing Production Costs by 95%Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Highlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular Video ControlJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Snowflake Charts New AI Territory: Cortex AISQL & Snowflake Intelligence Poised to Reshape Data AnalyticsJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Exclusive Talk: Joey Conway of NVIDIA on Llama Nemotron Ultra and Open Source Models #epfl #researchers #unveil #fg2 #cvpr
    WWW.MARKTECHPOST.COM
    EPFL Researchers Unveil FG2 at CVPR: A New AI Model That Slashes Localization Errors by 28% for Autonomous Vehicles in GPS-Denied Environments
    Navigating the dense urban canyons of cities like San Francisco or New York can be a nightmare for GPS systems. The towering skyscrapers block and reflect satellite signals, leading to location errors of tens of meters. For you and me, that might mean a missed turn. But for an autonomous vehicle or a delivery robot, that level of imprecision is the difference between a successful mission and a costly failure. These machines require pinpoint accuracy to operate safely and efficiently. Addressing this critical challenge, researchers from the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland have introduced a groundbreaking new method for visual localization during CVPR 2025 Their new paper, “FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching,” presents a novel AI model that significantly enhances the ability of a ground-level system, like an autonomous car, to determine its exact position and orientation using only a camera and a corresponding aerial (or satellite) image. The new approach has demonstrated a remarkable 28% reduction in mean localization error compared to the previous state-of-the-art on a challenging public dataset. Key Takeaways: Superior Accuracy: The FG2 model reduces the average localization error by a significant 28% on the VIGOR cross-area test set, a challenging benchmark for this task. Human-like Intuition: Instead of relying on abstract descriptors, the model mimics human reasoning by matching fine-grained, semantically consistent features—like curbs, crosswalks, and buildings—between a ground-level photo and an aerial map. Enhanced Interpretability: The method allows researchers to “see” what the AI is “thinking” by visualizing exactly which features in the ground and aerial images are being matched, a major step forward from previous “black box” models. Weakly Supervised Learning: Remarkably, the model learns these complex and consistent feature matches without any direct labels for correspondences. It achieves this using only the final camera pose as a supervisory signal. Challenge: Seeing the World from Two Different Angles The core problem of cross-view localization is the dramatic difference in perspective between a street-level camera and an overhead satellite view. A building facade seen from the ground looks completely different from its rooftop signature in an aerial image. Existing methods have struggled with this. Some create a general “descriptor” for the entire scene, but this is an abstract approach that doesn’t mirror how humans naturally localize themselves by spotting specific landmarks. Other methods transform the ground image into a Bird’s-Eye-View (BEV) but are often limited to the ground plane, ignoring crucial vertical structures like buildings. FG2: Matching Fine-Grained Features The EPFL team’s FG2 method introduces a more intuitive and effective process. It aligns two sets of points: one generated from the ground-level image and another sampled from the aerial map. Here’s a breakdown of their innovative pipeline: Mapping to 3D: The process begins by taking the features from the ground-level image and lifting them into a 3D point cloud centered around the camera. This creates a 3D representation of the immediate environment. Smart Pooling to BEV: This is where the magic happens. Instead of simply flattening the 3D data, the model learns to intelligently select the most important features along the vertical (height) dimension for each point. It essentially asks, “For this spot on the map, is the ground-level road marking more important, or is the edge of that building’s roof the better landmark?” This selection process is crucial, as it allows the model to correctly associate features like building facades with their corresponding rooftops in the aerial view. Feature Matching and Pose Estimation: Once both the ground and aerial views are represented as 2D point planes with rich feature descriptors, the model computes the similarity between them. It then samples a sparse set of the most confident matches and uses a classic geometric algorithm called Procrustes alignment to calculate the precise 3-DoF (x, y, and yaw) pose. Unprecedented Performance and Interpretability The results speak for themselves. On the challenging VIGOR dataset, which includes images from different cities in its cross-area test, FG2 reduced the mean localization error by 28% compared to the previous best method. It also demonstrated superior generalization capabilities on the KITTI dataset, a staple in autonomous driving research. Perhaps more importantly, the FG2 model offers a new level of transparency. By visualizing the matched points, the researchers showed that the model learns semantically consistent correspondences without being explicitly told to. For example, the system correctly matches zebra crossings, road markings, and even building facades in the ground view to their corresponding locations on the aerial map. This interpretability is extremenly valuable for building trust in safety-critical autonomous systems. “A Clearer Path” for Autonomous Navigation The FG2 method represents a significant leap forward in fine-grained visual localization. By developing a model that intelligently selects and matches features in a way that mirrors human intuition, the EPFL researchers have not only shattered previous accuracy records but also made the decision-making process of the AI more interpretable. This work paves the way for more robust and reliable navigation systems for autonomous vehicles, drones, and robots, bringing us one step closer to a future where machines can confidently navigate our world, even when GPS fails them. Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Jean-marc MommessinJean-marc is a successful AI business executive .He leads and accelerates growth for AI powered solutions and started a computer vision company in 2006. He is a recognized speaker at AI conferences and has an MBA from Stanford.Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/AI-Generated Ad Created with Google’s Veo3 Airs During NBA Finals, Slashing Production Costs by 95%Jean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Highlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular Video ControlJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Snowflake Charts New AI Territory: Cortex AISQL & Snowflake Intelligence Poised to Reshape Data AnalyticsJean-marc Mommessinhttps://www.marktechpost.com/author/jean-marc0000677/Exclusive Talk: Joey Conway of NVIDIA on Llama Nemotron Ultra and Open Source Models
    Like
    Love
    Wow
    Angry
    Sad
    601
    0 Comentários 0 Compartilhamentos
  • Unity Technical VFX Artist at No Brakes Games

    Unity Technical VFX ArtistNo Brakes GamesVilnius, Lithuania or Remote3 hours agoApplyWe are No Brakes Games, the creators of Human Fall Flat.We are looking for a Unity Technical VFX Artist to join our team to work on Human Fall Flat 2.FULL-TIME POSITION - Vilnius, Lithuania or RemoteRole Overview:As a Unity Technical Artist specializing in VFX and rendering, you will develop and optimize real-time visual effects, create advanced shaders and materials, and ensure a balance between visual fidelity and performance. You will solve complex technical challenges, optimizing effects for real-time execution while collaborating with artists, designers, and engineers to push Unity’s rendering capabilities to the next level.Responsibilities:Develop and optimize real-time VFX solutions that are both visually striking and performant.Create scalable shaders and materials for water, fog, atmospheric effects, and dynamic lighting.Debug and resolve VFX performance bottlenecks using Unity Profiler, RenderDoc, and other tools.Optimize particle systems, volumetric effects, and GPU simulations for multi-platform performance.Document best practices and educate the team on efficient asset and shader workflows.Collaborate with engineers to develop and implement custom rendering solutions.Stay updated on Unity’s latest advancements in rendering, HDRP, and the Visual Effect Graph.Requirements:2+ years of experience as a Technical Artist in game development, with a focus on Unity.Strong understanding of Unity’s rendering pipelineand shader development.Experience developing performance-conscious visual effects, including particle systems, volumetric lighting, and dynamic environmental effects.Proficiency in GPU/CPU optimization techniques, LODs for VFX.Hands-on experience with real-time lighting and atmospheric effects.Ability to debug and profile complex rendering issues effectively.Excellent communication skills and ability to work collaboratively within a multi-disciplinary team.A flexible, R&D-driven mindset, able to iterate quickly in both prototyping and production environments.Nice-to-Have:Experience working on at least one released game project.Experience with Unity HDRP and SRP.Experience with multi-platform development.Knowledge of C#, Python, or C++ for extending Unity’s capabilities.Experience developing custom node-based tools or extending Unity’s Visual Effect Graph.Background in procedural animation, physics-based effects, or fluid simulations.Apply today by sending your Portfolio & CV to jobs@nobrakesgames.com
    Create Your Profile — Game companies can contact you with their relevant job openings.
    Apply
    #unity #technical #vfx #artist #brakes
    Unity Technical VFX Artist at No Brakes Games
    Unity Technical VFX ArtistNo Brakes GamesVilnius, Lithuania or Remote3 hours agoApplyWe are No Brakes Games, the creators of Human Fall Flat.We are looking for a Unity Technical VFX Artist to join our team to work on Human Fall Flat 2.FULL-TIME POSITION - Vilnius, Lithuania or RemoteRole Overview:As a Unity Technical Artist specializing in VFX and rendering, you will develop and optimize real-time visual effects, create advanced shaders and materials, and ensure a balance between visual fidelity and performance. You will solve complex technical challenges, optimizing effects for real-time execution while collaborating with artists, designers, and engineers to push Unity’s rendering capabilities to the next level.Responsibilities:Develop and optimize real-time VFX solutions that are both visually striking and performant.Create scalable shaders and materials for water, fog, atmospheric effects, and dynamic lighting.Debug and resolve VFX performance bottlenecks using Unity Profiler, RenderDoc, and other tools.Optimize particle systems, volumetric effects, and GPU simulations for multi-platform performance.Document best practices and educate the team on efficient asset and shader workflows.Collaborate with engineers to develop and implement custom rendering solutions.Stay updated on Unity’s latest advancements in rendering, HDRP, and the Visual Effect Graph.Requirements:2+ years of experience as a Technical Artist in game development, with a focus on Unity.Strong understanding of Unity’s rendering pipelineand shader development.Experience developing performance-conscious visual effects, including particle systems, volumetric lighting, and dynamic environmental effects.Proficiency in GPU/CPU optimization techniques, LODs for VFX.Hands-on experience with real-time lighting and atmospheric effects.Ability to debug and profile complex rendering issues effectively.Excellent communication skills and ability to work collaboratively within a multi-disciplinary team.A flexible, R&D-driven mindset, able to iterate quickly in both prototyping and production environments.Nice-to-Have:Experience working on at least one released game project.Experience with Unity HDRP and SRP.Experience with multi-platform development.Knowledge of C#, Python, or C++ for extending Unity’s capabilities.Experience developing custom node-based tools or extending Unity’s Visual Effect Graph.Background in procedural animation, physics-based effects, or fluid simulations.Apply today by sending your Portfolio & CV to jobs@nobrakesgames.com Create Your Profile — Game companies can contact you with their relevant job openings. Apply #unity #technical #vfx #artist #brakes
    Unity Technical VFX Artist at No Brakes Games
    Unity Technical VFX ArtistNo Brakes GamesVilnius, Lithuania or Remote3 hours agoApplyWe are No Brakes Games, the creators of Human Fall Flat.We are looking for a Unity Technical VFX Artist to join our team to work on Human Fall Flat 2.FULL-TIME POSITION - Vilnius, Lithuania or RemoteRole Overview:As a Unity Technical Artist specializing in VFX and rendering, you will develop and optimize real-time visual effects, create advanced shaders and materials, and ensure a balance between visual fidelity and performance. You will solve complex technical challenges, optimizing effects for real-time execution while collaborating with artists, designers, and engineers to push Unity’s rendering capabilities to the next level.Responsibilities:Develop and optimize real-time VFX solutions that are both visually striking and performant.Create scalable shaders and materials for water, fog, atmospheric effects, and dynamic lighting.Debug and resolve VFX performance bottlenecks using Unity Profiler, RenderDoc, and other tools.Optimize particle systems, volumetric effects, and GPU simulations for multi-platform performance.Document best practices and educate the team on efficient asset and shader workflows.Collaborate with engineers to develop and implement custom rendering solutions.Stay updated on Unity’s latest advancements in rendering, HDRP, and the Visual Effect Graph.Requirements:2+ years of experience as a Technical Artist in game development, with a focus on Unity.Strong understanding of Unity’s rendering pipeline (HDRP, URP, Built-in) and shader development (HLSL, Shader Graph).Experience developing performance-conscious visual effects, including particle systems, volumetric lighting, and dynamic environmental effects.Proficiency in GPU/CPU optimization techniques, LODs for VFX.Hands-on experience with real-time lighting and atmospheric effects.Ability to debug and profile complex rendering issues effectively.Excellent communication skills and ability to work collaboratively within a multi-disciplinary team.A flexible, R&D-driven mindset, able to iterate quickly in both prototyping and production environments.Nice-to-Have:Experience working on at least one released game project.Experience with Unity HDRP and SRP.Experience with multi-platform development (PC, console, mobile, VR/AR).Knowledge of C#, Python, or C++ for extending Unity’s capabilities.Experience developing custom node-based tools or extending Unity’s Visual Effect Graph.Background in procedural animation, physics-based effects, or fluid simulations.Apply today by sending your Portfolio & CV to jobs@nobrakesgames.com Create Your Profile — Game companies can contact you with their relevant job openings. Apply
    0 Comentários 0 Compartilhamentos