• In a world where reality feels distorted, I find myself lost in shadows, much like those images NVIDIA's DiffusionRenderer transforms. The brilliance of relighting and editing materials can't mask the emptiness I feel inside. Just as AI inserts objects into videos, I wish I could insert warmth into my heart. The painful irony is that while technology redefines our perceptions, it cannot mend the fractures of human connection. I stand alone, echoing in the silence, longing for a flicker of light in this overwhelming darkness.

    #loneliness #heartbreak #NVIDIA #DiffusionRenderer #AI
    In a world where reality feels distorted, I find myself lost in shadows, much like those images NVIDIA's DiffusionRenderer transforms. The brilliance of relighting and editing materials can't mask the emptiness I feel inside. Just as AI inserts objects into videos, I wish I could insert warmth into my heart. The painful irony is that while technology redefines our perceptions, it cannot mend the fractures of human connection. I stand alone, echoing in the silence, longing for a flicker of light in this overwhelming darkness. #loneliness #heartbreak #NVIDIA #DiffusionRenderer #AI
    Diffusing reality: how NVIDIA reimagined relighting
    NVIDIA’s DiffusionRenderer redefines neural rendering by using AI to relight, edit materials, and insert objects into real-world videos with no 3D geometry needed.
    Like
    Love
    Wow
    Sad
    Angry
    128
    1 Comentários 0 Compartilhamentos 0 Anterior
  • players Spinetix, HMP300, HMP350, affichage dynamique, résolution Full HD, connectique HDMI, interactivité, appareils autonomes

    ## Introduction

    Les lecteurs Spinetix HMP300 et HMP350 se présentent comme des appareils autonomes conçus pour la diffusion d'affichage dynamique. Avec une résolution Full HD de 1920 x 1080, ces lecteurs hypermédia s'adaptent à la plupart des écrans modernes. Cet article propose un aperçu de leurs caractéristiques, de leur connectivité, et de leurs avantages.

    ## Des ...
    players Spinetix, HMP300, HMP350, affichage dynamique, résolution Full HD, connectique HDMI, interactivité, appareils autonomes ## Introduction Les lecteurs Spinetix HMP300 et HMP350 se présentent comme des appareils autonomes conçus pour la diffusion d'affichage dynamique. Avec une résolution Full HD de 1920 x 1080, ces lecteurs hypermédia s'adaptent à la plupart des écrans modernes. Cet article propose un aperçu de leurs caractéristiques, de leur connectivité, et de leurs avantages. ## Des ...
    Players Spinetix : Une solution pour l'affichage dynamique
    players Spinetix, HMP300, HMP350, affichage dynamique, résolution Full HD, connectique HDMI, interactivité, appareils autonomes ## Introduction Les lecteurs Spinetix HMP300 et HMP350 se présentent comme des appareils autonomes conçus pour la diffusion d'affichage dynamique. Avec une résolution Full HD de 1920 x 1080, ces lecteurs hypermédia s'adaptent à la plupart des écrans modernes. Cet...
    Like
    Love
    Wow
    Sad
    Angry
    597
    1 Comentários 0 Compartilhamentos 0 Anterior
  • Is it just me, or does the phrase "Quick Tip: A Better Wet Shader" sound like the latest buzz from a trendy café, where the barista is more interested in his art than the coffee? I mean, who would have thought that the secret to stunning visuals in Blender would come down to a clearcoat? It’s almost as if John Mervin, that brave pioneer of pixelated perfection, stumbled upon the holy grail of rendering while driving—because, you know, multitasking is all the rage these days!

    Let's take a moment to appreciate the genius of recording tutorials while navigating rush hour traffic. Who needs a calm, focused environment when you could be dodging potholes and merging lanes? I can just picture it: "Okay, folks, today we're going to add a clearcoat to our wet shader... but first, let’s avoid this pedestrian!" Truly inspiring.

    But back to the world of wet shaders. Apparently, the key to mastering the art of sheen is just slapping on a clearcoat and calling it a day. Why bother with the complexities of light diffusion, texture mapping, or even the nuances of realism when you can just... coat it? It's like serving a gourmet meal and then drowning it in ketchup—truly a culinary masterpiece!

    And let’s not forget the vast potential here. If adding a clearcoat is revolutionary, imagine the untapped possibilities! Why not just throw in a sprinkle of fairy dust and call it a magical shader? Or better yet, how about a “drive-by” tutorial series that teaches us how to animate while on a rollercoaster? The future of Blender tutorials is bright—especially if you’re driving towards it at 80 mph!

    After all, who needs to focus on the intricacies of shader creation when we can all just slap on a clearcoat and hope for the best? The art of 3D rendering has clearly reached a new zenith. So, to all the aspiring Blender wizards out there, remember: clearcoat is your best friend, and traffic lights are merely suggestions.

    In conclusion, if you ever find yourself needing a quick fix in Blender, just remember—there’s nothing a good clearcoat can’t solve. Just don’t forget to keep your eyes on the road; after all, we wouldn’t want you to miss a tutorial while mastering the art of shaders on the go!

    #WetShader #BlenderTutorial #Clearcoat #3DRendering #DigitalArt
    Is it just me, or does the phrase "Quick Tip: A Better Wet Shader" sound like the latest buzz from a trendy café, where the barista is more interested in his art than the coffee? I mean, who would have thought that the secret to stunning visuals in Blender would come down to a clearcoat? It’s almost as if John Mervin, that brave pioneer of pixelated perfection, stumbled upon the holy grail of rendering while driving—because, you know, multitasking is all the rage these days! Let's take a moment to appreciate the genius of recording tutorials while navigating rush hour traffic. Who needs a calm, focused environment when you could be dodging potholes and merging lanes? I can just picture it: "Okay, folks, today we're going to add a clearcoat to our wet shader... but first, let’s avoid this pedestrian!" Truly inspiring. But back to the world of wet shaders. Apparently, the key to mastering the art of sheen is just slapping on a clearcoat and calling it a day. Why bother with the complexities of light diffusion, texture mapping, or even the nuances of realism when you can just... coat it? It's like serving a gourmet meal and then drowning it in ketchup—truly a culinary masterpiece! And let’s not forget the vast potential here. If adding a clearcoat is revolutionary, imagine the untapped possibilities! Why not just throw in a sprinkle of fairy dust and call it a magical shader? Or better yet, how about a “drive-by” tutorial series that teaches us how to animate while on a rollercoaster? The future of Blender tutorials is bright—especially if you’re driving towards it at 80 mph! After all, who needs to focus on the intricacies of shader creation when we can all just slap on a clearcoat and hope for the best? The art of 3D rendering has clearly reached a new zenith. So, to all the aspiring Blender wizards out there, remember: clearcoat is your best friend, and traffic lights are merely suggestions. In conclusion, if you ever find yourself needing a quick fix in Blender, just remember—there’s nothing a good clearcoat can’t solve. Just don’t forget to keep your eyes on the road; after all, we wouldn’t want you to miss a tutorial while mastering the art of shaders on the go! #WetShader #BlenderTutorial #Clearcoat #3DRendering #DigitalArt
    Quick Tip: A Better Wet Shader
    John Mervin probably made the shortest Blender tutorial ever ;-) You could just add a clearcoat...But why stop there? P.S. Please do not record tutorials while driving. Source
    Like
    Love
    Wow
    Sad
    Angry
    565
    1 Comentários 0 Compartilhamentos 0 Anterior
  • NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs

    Generative AI has reshaped how people create, imagine and interact with digital content.
    As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18GB of VRAM — limiting the number of systems that can run it well.
    By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4.
    NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion3.5 Large, to FP8 — reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kitdouble performance.
    In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers.
    RTX-Accelerated AI
    NVIDIA and Stability AI are boosting the performance and reducing the VRAM requirements of Stable Diffusion 3.5, one of the world’s most popular AI image models. With NVIDIA TensorRT acceleration and quantization, users can now generate and edit images faster and more efficiently on NVIDIA RTX GPUs.
    Stable Diffusion 3.5 quantized FP8generates images in half the time with similar quality as FP16. Prompt: A serene mountain lake at sunrise, crystal clear water reflecting snow-capped peaks, lush pine trees along the shore, soft morning mist, photorealistic, vibrant colors, high resolution.
    To address the VRAM limitations of SD3.5 Large, the model was quantized with TensorRT to FP8, reducing the VRAM requirement by 40% to 11GB. This means five GeForce RTX 50 Series GPUs can run the model from memory instead of just one.
    SD3.5 Large and Medium models were also optimized with TensorRT, an AI backend for taking full advantage of Tensor Cores. TensorRT optimizes a model’s weights and graph — the instructions on how to run a model — specifically for RTX GPUs.
    FP8 TensorRT boosts SD3.5 Large performance by 2.3x vs. BF16 PyTorch, with 40% less memory use. For SD3.5 Medium, BF16 TensorRT delivers a 1.7x speedup.
    Combined, FP8 TensorRT delivers a 2.3x performance boost on SD3.5 Large compared with running the original models in BF16 PyTorch, while using 40% less memory. And in SD3.5 Medium, BF16 TensorRT provides a 1.7x performance increase compared with BF16 PyTorch.
    The optimized models are now available on Stability AI’s Hugging Face page.
    NVIDIA and Stability AI are also collaborating to release SD3.5 as an NVIDIA NIM microservice, making it easier for creators and developers to access and deploy the model for a wide range of applications. The NIM microservice is expected to be released in July.
    TensorRT for RTX SDK Released
    Announced at Microsoft Build — and already available as part of the new Windows ML framework in preview — TensorRT for RTX is now available as a standalone SDK for developers.
    Previously, developers needed to pre-generate and package TensorRT engines for each class of GPU — a process that would yield GPU-specific optimizations but required significant time.
    With the new version of TensorRT, developers can create a generic TensorRT engine that’s optimized on device in seconds. This JIT compilation approach can be done in the background during installation or when they first use the feature.
    The easy-to-integrate SDK is now 8x smaller and can be invoked through Windows ML — Microsoft’s new AI inference backend in Windows. Developers can download the new standalone SDK from the NVIDIA Developer page or test it in the Windows ML preview.
    For more details, read this NVIDIA technical blog and this Microsoft Build recap.
    Join NVIDIA at GTC Paris
    At NVIDIA GTC Paris at VivaTech — Europe’s biggest startup and tech event — NVIDIA founder and CEO Jensen Huang yesterday delivered a keynote address on the latest breakthroughs in cloud AI infrastructure, agentic AI and physical AI. Watch a replay.
    GTC Paris runs through Thursday, June 12, with hands-on demos and sessions led by industry leaders. Whether attending in person or joining online, there’s still plenty to explore at the event.
    Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations. 
    Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.
    Follow NVIDIA Workstation on LinkedIn and X. 
    See notice regarding software product information.
    #nvidia #tensorrt #boosts #stable #diffusion
    NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs
    Generative AI has reshaped how people create, imagine and interact with digital content. As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18GB of VRAM — limiting the number of systems that can run it well. By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4. NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion3.5 Large, to FP8 — reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kitdouble performance. In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers. RTX-Accelerated AI NVIDIA and Stability AI are boosting the performance and reducing the VRAM requirements of Stable Diffusion 3.5, one of the world’s most popular AI image models. With NVIDIA TensorRT acceleration and quantization, users can now generate and edit images faster and more efficiently on NVIDIA RTX GPUs. Stable Diffusion 3.5 quantized FP8generates images in half the time with similar quality as FP16. Prompt: A serene mountain lake at sunrise, crystal clear water reflecting snow-capped peaks, lush pine trees along the shore, soft morning mist, photorealistic, vibrant colors, high resolution. To address the VRAM limitations of SD3.5 Large, the model was quantized with TensorRT to FP8, reducing the VRAM requirement by 40% to 11GB. This means five GeForce RTX 50 Series GPUs can run the model from memory instead of just one. SD3.5 Large and Medium models were also optimized with TensorRT, an AI backend for taking full advantage of Tensor Cores. TensorRT optimizes a model’s weights and graph — the instructions on how to run a model — specifically for RTX GPUs. FP8 TensorRT boosts SD3.5 Large performance by 2.3x vs. BF16 PyTorch, with 40% less memory use. For SD3.5 Medium, BF16 TensorRT delivers a 1.7x speedup. Combined, FP8 TensorRT delivers a 2.3x performance boost on SD3.5 Large compared with running the original models in BF16 PyTorch, while using 40% less memory. And in SD3.5 Medium, BF16 TensorRT provides a 1.7x performance increase compared with BF16 PyTorch. The optimized models are now available on Stability AI’s Hugging Face page. NVIDIA and Stability AI are also collaborating to release SD3.5 as an NVIDIA NIM microservice, making it easier for creators and developers to access and deploy the model for a wide range of applications. The NIM microservice is expected to be released in July. TensorRT for RTX SDK Released Announced at Microsoft Build — and already available as part of the new Windows ML framework in preview — TensorRT for RTX is now available as a standalone SDK for developers. Previously, developers needed to pre-generate and package TensorRT engines for each class of GPU — a process that would yield GPU-specific optimizations but required significant time. With the new version of TensorRT, developers can create a generic TensorRT engine that’s optimized on device in seconds. This JIT compilation approach can be done in the background during installation or when they first use the feature. The easy-to-integrate SDK is now 8x smaller and can be invoked through Windows ML — Microsoft’s new AI inference backend in Windows. Developers can download the new standalone SDK from the NVIDIA Developer page or test it in the Windows ML preview. For more details, read this NVIDIA technical blog and this Microsoft Build recap. Join NVIDIA at GTC Paris At NVIDIA GTC Paris at VivaTech — Europe’s biggest startup and tech event — NVIDIA founder and CEO Jensen Huang yesterday delivered a keynote address on the latest breakthroughs in cloud AI infrastructure, agentic AI and physical AI. Watch a replay. GTC Paris runs through Thursday, June 12, with hands-on demos and sessions led by industry leaders. Whether attending in person or joining online, there’s still plenty to explore at the event. Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.  Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Follow NVIDIA Workstation on LinkedIn and X.  See notice regarding software product information. #nvidia #tensorrt #boosts #stable #diffusion
    BLOGS.NVIDIA.COM
    NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs
    Generative AI has reshaped how people create, imagine and interact with digital content. As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18GB of VRAM — limiting the number of systems that can run it well. By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4. NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion (SD) 3.5 Large, to FP8 — reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kit (SDK) double performance. In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time (JIT), on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers. RTX-Accelerated AI NVIDIA and Stability AI are boosting the performance and reducing the VRAM requirements of Stable Diffusion 3.5, one of the world’s most popular AI image models. With NVIDIA TensorRT acceleration and quantization, users can now generate and edit images faster and more efficiently on NVIDIA RTX GPUs. Stable Diffusion 3.5 quantized FP8 (right) generates images in half the time with similar quality as FP16 (left). Prompt: A serene mountain lake at sunrise, crystal clear water reflecting snow-capped peaks, lush pine trees along the shore, soft morning mist, photorealistic, vibrant colors, high resolution. To address the VRAM limitations of SD3.5 Large, the model was quantized with TensorRT to FP8, reducing the VRAM requirement by 40% to 11GB. This means five GeForce RTX 50 Series GPUs can run the model from memory instead of just one. SD3.5 Large and Medium models were also optimized with TensorRT, an AI backend for taking full advantage of Tensor Cores. TensorRT optimizes a model’s weights and graph — the instructions on how to run a model — specifically for RTX GPUs. FP8 TensorRT boosts SD3.5 Large performance by 2.3x vs. BF16 PyTorch, with 40% less memory use. For SD3.5 Medium, BF16 TensorRT delivers a 1.7x speedup. Combined, FP8 TensorRT delivers a 2.3x performance boost on SD3.5 Large compared with running the original models in BF16 PyTorch, while using 40% less memory. And in SD3.5 Medium, BF16 TensorRT provides a 1.7x performance increase compared with BF16 PyTorch. The optimized models are now available on Stability AI’s Hugging Face page. NVIDIA and Stability AI are also collaborating to release SD3.5 as an NVIDIA NIM microservice, making it easier for creators and developers to access and deploy the model for a wide range of applications. The NIM microservice is expected to be released in July. TensorRT for RTX SDK Released Announced at Microsoft Build — and already available as part of the new Windows ML framework in preview — TensorRT for RTX is now available as a standalone SDK for developers. Previously, developers needed to pre-generate and package TensorRT engines for each class of GPU — a process that would yield GPU-specific optimizations but required significant time. With the new version of TensorRT, developers can create a generic TensorRT engine that’s optimized on device in seconds. This JIT compilation approach can be done in the background during installation or when they first use the feature. The easy-to-integrate SDK is now 8x smaller and can be invoked through Windows ML — Microsoft’s new AI inference backend in Windows. Developers can download the new standalone SDK from the NVIDIA Developer page or test it in the Windows ML preview. For more details, read this NVIDIA technical blog and this Microsoft Build recap. Join NVIDIA at GTC Paris At NVIDIA GTC Paris at VivaTech — Europe’s biggest startup and tech event — NVIDIA founder and CEO Jensen Huang yesterday delivered a keynote address on the latest breakthroughs in cloud AI infrastructure, agentic AI and physical AI. Watch a replay. GTC Paris runs through Thursday, June 12, with hands-on demos and sessions led by industry leaders. Whether attending in person or joining online, there’s still plenty to explore at the event. Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.  Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Follow NVIDIA Workstation on LinkedIn and X.  See notice regarding software product information.
    Like
    Love
    Wow
    Sad
    Angry
    482
    0 Comentários 0 Compartilhamentos 0 Anterior
  • Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm

    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

    When DeepSeek released its R1 model this January, it wasn’t just another AI announcement. It was a watershed moment that sent shockwaves through the tech industry, forcing industry leaders to reconsider their fundamental approaches to AI development.
    What makes DeepSeek’s accomplishment remarkable isn’t that the company developed novel capabilities; rather, it was how it achieved comparable results to those delivered by tech heavyweights at a fraction of the cost. In reality, DeepSeek didn’t do anything that hadn’t been done before; its innovation stemmed from pursuing different priorities. As a result, we are now experiencing rapid-fire development along two parallel tracks: efficiency and compute. 
    As DeepSeek prepares to release its R2 model, and as it concurrently faces the potential of even greater chip restrictions from the U.S., it’s important to look at how it captured so much attention.
    Engineering around constraints
    DeepSeek’s arrival, as sudden and dramatic as it was, captivated us all because it showcased the capacity for innovation to thrive even under significant constraints. Faced with U.S. export controls limiting access to cutting-edge AI chips, DeepSeek was forced to find alternative pathways to AI advancement.
    While U.S. companies pursued performance gains through more powerful hardware, bigger models and better data, DeepSeek focused on optimizing what was available. It implemented known ideas with remarkable execution — and there is novelty in executing what’s known and doing it well.
    This efficiency-first mindset yielded incredibly impressive results. DeepSeek’s R1 model reportedly matches OpenAI’s capabilities at just 5 to 10% of the operating cost. According to reports, the final training run for DeepSeek’s V3 predecessor cost a mere million — which was described by former Tesla AI scientist Andrej Karpathy as “a joke of a budget” compared to the tens or hundreds of millions spent by U.S. competitors. More strikingly, while OpenAI reportedly spent million training its recent “Orion” model, DeepSeek achieved superior benchmark results for just million — less than 1.2% of OpenAI’s investment.
    If you get starry eyed believing these incredible results were achieved even as DeepSeek was at a severe disadvantage based on its inability to access advanced AI chips, I hate to tell you, but that narrative isn’t entirely accurate. Initial U.S. export controls focused primarily on compute capabilities, not on memory and networking — two crucial components for AI development.
    That means that the chips DeepSeek had access to were not poor quality chips; their networking and memory capabilities allowed DeepSeek to parallelize operations across many units, a key strategy for running their large model efficiently.
    This, combined with China’s national push toward controlling the entire vertical stack of AI infrastructure, resulted in accelerated innovation that many Western observers didn’t anticipate. DeepSeek’s advancements were an inevitable part of AI development, but they brought known advancements forward a few years earlier than would have been possible otherwise, and that’s pretty amazing.
    Pragmatism over process
    Beyond hardware optimization, DeepSeek’s approach to training data represents another departure from conventional Western practices. Rather than relying solely on web-scraped content, DeepSeek reportedly leveraged significant amounts of synthetic data and outputs from other proprietary models. This is a classic example of model distillation, or the ability to learn from really powerful models. Such an approach, however, raises questions about data privacy and governance that might concern Western enterprise customers. Still, it underscores DeepSeek’s overall pragmatic focus on results over process.
    The effective use of synthetic data is a key differentiator. Synthetic data can be very effective when it comes to training large models, but you have to be careful; some model architectures handle synthetic data better than others. For instance, transformer-based models with mixture of expertsarchitectures like DeepSeek’s tend to be more robust when incorporating synthetic data, while more traditional dense architectures like those used in early Llama models can experience performance degradation or even “model collapse” when trained on too much synthetic content.
    This architectural sensitivity matters because synthetic data introduces different patterns and distributions compared to real-world data. When a model architecture doesn’t handle synthetic data well, it may learn shortcuts or biases present in the synthetic data generation process rather than generalizable knowledge. This can lead to reduced performance on real-world tasks, increased hallucinations or brittleness when facing novel situations. 
    Still, DeepSeek’s engineering teams reportedly designed their model architecture specifically with synthetic data integration in mind from the earliest planning stages. This allowed the company to leverage the cost benefits of synthetic data without sacrificing performance.
    Market reverberations
    Why does all of this matter? Stock market aside, DeepSeek’s emergence has triggered substantive strategic shifts among industry leaders.
    Case in point: OpenAI. Sam Altman recently announced plans to release the company’s first “open-weight” language model since 2019. This is a pretty notable pivot for a company that built its business on proprietary systems. It seems DeepSeek’s rise, on top of Llama’s success, has hit OpenAI’s leader hard. Just a month after DeepSeek arrived on the scene, Altman admitted that OpenAI had been “on the wrong side of history” regarding open-source AI. 
    With OpenAI reportedly spending to 8 billion annually on operations, the economic pressure from efficient alternatives like DeepSeek has become impossible to ignore. As AI scholar Kai-Fu Lee bluntly put it: “You’re spending billion or billion a year, making a massive loss, and here you have a competitor coming in with an open-source model that’s for free.” This necessitates change.
    This economic reality prompted OpenAI to pursue a massive billion funding round that valued the company at an unprecedented billion. But even with a war chest of funds at its disposal, the fundamental challenge remains: OpenAI’s approach is dramatically more resource-intensive than DeepSeek’s.
    Beyond model training
    Another significant trend accelerated by DeepSeek is the shift toward “test-time compute”. As major AI labs have now trained their models on much of the available public data on the internet, data scarcity is slowing further improvements in pre-training.
    To get around this, DeepSeek announced a collaboration with Tsinghua University to enable “self-principled critique tuning”. This approach trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The system includes a built-in “judge” that evaluates the AI’s answers in real-time, comparing responses against core rules and quality standards.
    The development is part of a movement towards autonomous self-evaluation and improvement in AI systems in which models use inference time to improve results, rather than simply making models larger during training. DeepSeek calls its system “DeepSeek-GRM”. But, as with its model distillation approach, this could be considered a mix of promise and risk.
    For example, if the AI develops its own judging criteria, there’s a risk those principles diverge from human values, ethics or context. The rules could end up being overly rigid or biased, optimizing for style over substance, and/or reinforce incorrect assumptions or hallucinations. Additionally, without a human in the loop, issues could arise if the “judge” is flawed or misaligned. It’s a kind of AI talking to itself, without robust external grounding. On top of this, users and developers may not understand why the AI reached a certain conclusion — which feeds into a bigger concern: Should an AI be allowed to decide what is “good” or “correct” based solely on its own logic? These risks shouldn’t be discounted.
    At the same time, this approach is gaining traction, as again DeepSeek builds on the body of work of othersto create what is likely the first full-stack application of SPCT in a commercial effort.
    This could mark a powerful shift in AI autonomy, but there still is a need for rigorous auditing, transparency and safeguards. It’s not just about models getting smarter, but that they remain aligned, interpretable, and trustworthy as they begin critiquing themselves without human guardrails.
    Moving into the future
    So, taking all of this into account, the rise of DeepSeek signals a broader shift in the AI industry toward parallel innovation tracks. While companies continue building more powerful compute clusters for next-generation capabilities, there will also be intense focus on finding efficiency gains through software engineering and model architecture improvements to offset the challenges of AI energy consumption, which far outpaces power generation capacity. 
    Companies are taking note. Microsoft, for example, has halted data center development in multiple regions globally, recalibrating toward a more distributed, efficient infrastructure approach. While still planning to invest approximately billion in AI infrastructure this fiscal year, the company is reallocating resources in response to the efficiency gains DeepSeek introduced to the market.
    Meta has also responded,
    With so much movement in such a short time, it becomes somewhat ironic that the U.S. sanctions designed to maintain American AI dominance may have instead accelerated the very innovation they sought to contain. By constraining access to materials, DeepSeek was forced to blaze a new trail.
    Moving forward, as the industry continues to evolve globally, adaptability for all players will be key. Policies, people and market reactions will continue to shift the ground rules — whether it’s eliminating the AI diffusion rule, a new ban on technology purchases or something else entirely. It’s what we learn from one another and how we respond that will be worth watching.
    Jae Lee is CEO and co-founder of TwelveLabs.

    Daily insights on business use cases with VB Daily
    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.
    #rethinking #deepseeks #playbook #shakes #highspend
    Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm
    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more When DeepSeek released its R1 model this January, it wasn’t just another AI announcement. It was a watershed moment that sent shockwaves through the tech industry, forcing industry leaders to reconsider their fundamental approaches to AI development. What makes DeepSeek’s accomplishment remarkable isn’t that the company developed novel capabilities; rather, it was how it achieved comparable results to those delivered by tech heavyweights at a fraction of the cost. In reality, DeepSeek didn’t do anything that hadn’t been done before; its innovation stemmed from pursuing different priorities. As a result, we are now experiencing rapid-fire development along two parallel tracks: efficiency and compute.  As DeepSeek prepares to release its R2 model, and as it concurrently faces the potential of even greater chip restrictions from the U.S., it’s important to look at how it captured so much attention. Engineering around constraints DeepSeek’s arrival, as sudden and dramatic as it was, captivated us all because it showcased the capacity for innovation to thrive even under significant constraints. Faced with U.S. export controls limiting access to cutting-edge AI chips, DeepSeek was forced to find alternative pathways to AI advancement. While U.S. companies pursued performance gains through more powerful hardware, bigger models and better data, DeepSeek focused on optimizing what was available. It implemented known ideas with remarkable execution — and there is novelty in executing what’s known and doing it well. This efficiency-first mindset yielded incredibly impressive results. DeepSeek’s R1 model reportedly matches OpenAI’s capabilities at just 5 to 10% of the operating cost. According to reports, the final training run for DeepSeek’s V3 predecessor cost a mere million — which was described by former Tesla AI scientist Andrej Karpathy as “a joke of a budget” compared to the tens or hundreds of millions spent by U.S. competitors. More strikingly, while OpenAI reportedly spent million training its recent “Orion” model, DeepSeek achieved superior benchmark results for just million — less than 1.2% of OpenAI’s investment. If you get starry eyed believing these incredible results were achieved even as DeepSeek was at a severe disadvantage based on its inability to access advanced AI chips, I hate to tell you, but that narrative isn’t entirely accurate. Initial U.S. export controls focused primarily on compute capabilities, not on memory and networking — two crucial components for AI development. That means that the chips DeepSeek had access to were not poor quality chips; their networking and memory capabilities allowed DeepSeek to parallelize operations across many units, a key strategy for running their large model efficiently. This, combined with China’s national push toward controlling the entire vertical stack of AI infrastructure, resulted in accelerated innovation that many Western observers didn’t anticipate. DeepSeek’s advancements were an inevitable part of AI development, but they brought known advancements forward a few years earlier than would have been possible otherwise, and that’s pretty amazing. Pragmatism over process Beyond hardware optimization, DeepSeek’s approach to training data represents another departure from conventional Western practices. Rather than relying solely on web-scraped content, DeepSeek reportedly leveraged significant amounts of synthetic data and outputs from other proprietary models. This is a classic example of model distillation, or the ability to learn from really powerful models. Such an approach, however, raises questions about data privacy and governance that might concern Western enterprise customers. Still, it underscores DeepSeek’s overall pragmatic focus on results over process. The effective use of synthetic data is a key differentiator. Synthetic data can be very effective when it comes to training large models, but you have to be careful; some model architectures handle synthetic data better than others. For instance, transformer-based models with mixture of expertsarchitectures like DeepSeek’s tend to be more robust when incorporating synthetic data, while more traditional dense architectures like those used in early Llama models can experience performance degradation or even “model collapse” when trained on too much synthetic content. This architectural sensitivity matters because synthetic data introduces different patterns and distributions compared to real-world data. When a model architecture doesn’t handle synthetic data well, it may learn shortcuts or biases present in the synthetic data generation process rather than generalizable knowledge. This can lead to reduced performance on real-world tasks, increased hallucinations or brittleness when facing novel situations.  Still, DeepSeek’s engineering teams reportedly designed their model architecture specifically with synthetic data integration in mind from the earliest planning stages. This allowed the company to leverage the cost benefits of synthetic data without sacrificing performance. Market reverberations Why does all of this matter? Stock market aside, DeepSeek’s emergence has triggered substantive strategic shifts among industry leaders. Case in point: OpenAI. Sam Altman recently announced plans to release the company’s first “open-weight” language model since 2019. This is a pretty notable pivot for a company that built its business on proprietary systems. It seems DeepSeek’s rise, on top of Llama’s success, has hit OpenAI’s leader hard. Just a month after DeepSeek arrived on the scene, Altman admitted that OpenAI had been “on the wrong side of history” regarding open-source AI.  With OpenAI reportedly spending to 8 billion annually on operations, the economic pressure from efficient alternatives like DeepSeek has become impossible to ignore. As AI scholar Kai-Fu Lee bluntly put it: “You’re spending billion or billion a year, making a massive loss, and here you have a competitor coming in with an open-source model that’s for free.” This necessitates change. This economic reality prompted OpenAI to pursue a massive billion funding round that valued the company at an unprecedented billion. But even with a war chest of funds at its disposal, the fundamental challenge remains: OpenAI’s approach is dramatically more resource-intensive than DeepSeek’s. Beyond model training Another significant trend accelerated by DeepSeek is the shift toward “test-time compute”. As major AI labs have now trained their models on much of the available public data on the internet, data scarcity is slowing further improvements in pre-training. To get around this, DeepSeek announced a collaboration with Tsinghua University to enable “self-principled critique tuning”. This approach trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The system includes a built-in “judge” that evaluates the AI’s answers in real-time, comparing responses against core rules and quality standards. The development is part of a movement towards autonomous self-evaluation and improvement in AI systems in which models use inference time to improve results, rather than simply making models larger during training. DeepSeek calls its system “DeepSeek-GRM”. But, as with its model distillation approach, this could be considered a mix of promise and risk. For example, if the AI develops its own judging criteria, there’s a risk those principles diverge from human values, ethics or context. The rules could end up being overly rigid or biased, optimizing for style over substance, and/or reinforce incorrect assumptions or hallucinations. Additionally, without a human in the loop, issues could arise if the “judge” is flawed or misaligned. It’s a kind of AI talking to itself, without robust external grounding. On top of this, users and developers may not understand why the AI reached a certain conclusion — which feeds into a bigger concern: Should an AI be allowed to decide what is “good” or “correct” based solely on its own logic? These risks shouldn’t be discounted. At the same time, this approach is gaining traction, as again DeepSeek builds on the body of work of othersto create what is likely the first full-stack application of SPCT in a commercial effort. This could mark a powerful shift in AI autonomy, but there still is a need for rigorous auditing, transparency and safeguards. It’s not just about models getting smarter, but that they remain aligned, interpretable, and trustworthy as they begin critiquing themselves without human guardrails. Moving into the future So, taking all of this into account, the rise of DeepSeek signals a broader shift in the AI industry toward parallel innovation tracks. While companies continue building more powerful compute clusters for next-generation capabilities, there will also be intense focus on finding efficiency gains through software engineering and model architecture improvements to offset the challenges of AI energy consumption, which far outpaces power generation capacity.  Companies are taking note. Microsoft, for example, has halted data center development in multiple regions globally, recalibrating toward a more distributed, efficient infrastructure approach. While still planning to invest approximately billion in AI infrastructure this fiscal year, the company is reallocating resources in response to the efficiency gains DeepSeek introduced to the market. Meta has also responded, With so much movement in such a short time, it becomes somewhat ironic that the U.S. sanctions designed to maintain American AI dominance may have instead accelerated the very innovation they sought to contain. By constraining access to materials, DeepSeek was forced to blaze a new trail. Moving forward, as the industry continues to evolve globally, adaptability for all players will be key. Policies, people and market reactions will continue to shift the ground rules — whether it’s eliminating the AI diffusion rule, a new ban on technology purchases or something else entirely. It’s what we learn from one another and how we respond that will be worth watching. Jae Lee is CEO and co-founder of TwelveLabs. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured. #rethinking #deepseeks #playbook #shakes #highspend
    VENTUREBEAT.COM
    Rethinking AI: DeepSeek’s playbook shakes up the high-spend, high-compute paradigm
    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more When DeepSeek released its R1 model this January, it wasn’t just another AI announcement. It was a watershed moment that sent shockwaves through the tech industry, forcing industry leaders to reconsider their fundamental approaches to AI development. What makes DeepSeek’s accomplishment remarkable isn’t that the company developed novel capabilities; rather, it was how it achieved comparable results to those delivered by tech heavyweights at a fraction of the cost. In reality, DeepSeek didn’t do anything that hadn’t been done before; its innovation stemmed from pursuing different priorities. As a result, we are now experiencing rapid-fire development along two parallel tracks: efficiency and compute.  As DeepSeek prepares to release its R2 model, and as it concurrently faces the potential of even greater chip restrictions from the U.S., it’s important to look at how it captured so much attention. Engineering around constraints DeepSeek’s arrival, as sudden and dramatic as it was, captivated us all because it showcased the capacity for innovation to thrive even under significant constraints. Faced with U.S. export controls limiting access to cutting-edge AI chips, DeepSeek was forced to find alternative pathways to AI advancement. While U.S. companies pursued performance gains through more powerful hardware, bigger models and better data, DeepSeek focused on optimizing what was available. It implemented known ideas with remarkable execution — and there is novelty in executing what’s known and doing it well. This efficiency-first mindset yielded incredibly impressive results. DeepSeek’s R1 model reportedly matches OpenAI’s capabilities at just 5 to 10% of the operating cost. According to reports, the final training run for DeepSeek’s V3 predecessor cost a mere $6 million — which was described by former Tesla AI scientist Andrej Karpathy as “a joke of a budget” compared to the tens or hundreds of millions spent by U.S. competitors. More strikingly, while OpenAI reportedly spent $500 million training its recent “Orion” model, DeepSeek achieved superior benchmark results for just $5.6 million — less than 1.2% of OpenAI’s investment. If you get starry eyed believing these incredible results were achieved even as DeepSeek was at a severe disadvantage based on its inability to access advanced AI chips, I hate to tell you, but that narrative isn’t entirely accurate (even though it makes a good story). Initial U.S. export controls focused primarily on compute capabilities, not on memory and networking — two crucial components for AI development. That means that the chips DeepSeek had access to were not poor quality chips; their networking and memory capabilities allowed DeepSeek to parallelize operations across many units, a key strategy for running their large model efficiently. This, combined with China’s national push toward controlling the entire vertical stack of AI infrastructure, resulted in accelerated innovation that many Western observers didn’t anticipate. DeepSeek’s advancements were an inevitable part of AI development, but they brought known advancements forward a few years earlier than would have been possible otherwise, and that’s pretty amazing. Pragmatism over process Beyond hardware optimization, DeepSeek’s approach to training data represents another departure from conventional Western practices. Rather than relying solely on web-scraped content, DeepSeek reportedly leveraged significant amounts of synthetic data and outputs from other proprietary models. This is a classic example of model distillation, or the ability to learn from really powerful models. Such an approach, however, raises questions about data privacy and governance that might concern Western enterprise customers. Still, it underscores DeepSeek’s overall pragmatic focus on results over process. The effective use of synthetic data is a key differentiator. Synthetic data can be very effective when it comes to training large models, but you have to be careful; some model architectures handle synthetic data better than others. For instance, transformer-based models with mixture of experts (MoE) architectures like DeepSeek’s tend to be more robust when incorporating synthetic data, while more traditional dense architectures like those used in early Llama models can experience performance degradation or even “model collapse” when trained on too much synthetic content. This architectural sensitivity matters because synthetic data introduces different patterns and distributions compared to real-world data. When a model architecture doesn’t handle synthetic data well, it may learn shortcuts or biases present in the synthetic data generation process rather than generalizable knowledge. This can lead to reduced performance on real-world tasks, increased hallucinations or brittleness when facing novel situations.  Still, DeepSeek’s engineering teams reportedly designed their model architecture specifically with synthetic data integration in mind from the earliest planning stages. This allowed the company to leverage the cost benefits of synthetic data without sacrificing performance. Market reverberations Why does all of this matter? Stock market aside, DeepSeek’s emergence has triggered substantive strategic shifts among industry leaders. Case in point: OpenAI. Sam Altman recently announced plans to release the company’s first “open-weight” language model since 2019. This is a pretty notable pivot for a company that built its business on proprietary systems. It seems DeepSeek’s rise, on top of Llama’s success, has hit OpenAI’s leader hard. Just a month after DeepSeek arrived on the scene, Altman admitted that OpenAI had been “on the wrong side of history” regarding open-source AI.  With OpenAI reportedly spending $7 to 8 billion annually on operations, the economic pressure from efficient alternatives like DeepSeek has become impossible to ignore. As AI scholar Kai-Fu Lee bluntly put it: “You’re spending $7 billion or $8 billion a year, making a massive loss, and here you have a competitor coming in with an open-source model that’s for free.” This necessitates change. This economic reality prompted OpenAI to pursue a massive $40 billion funding round that valued the company at an unprecedented $300 billion. But even with a war chest of funds at its disposal, the fundamental challenge remains: OpenAI’s approach is dramatically more resource-intensive than DeepSeek’s. Beyond model training Another significant trend accelerated by DeepSeek is the shift toward “test-time compute” (TTC). As major AI labs have now trained their models on much of the available public data on the internet, data scarcity is slowing further improvements in pre-training. To get around this, DeepSeek announced a collaboration with Tsinghua University to enable “self-principled critique tuning” (SPCT). This approach trains AI to develop its own rules for judging content and then uses those rules to provide detailed critiques. The system includes a built-in “judge” that evaluates the AI’s answers in real-time, comparing responses against core rules and quality standards. The development is part of a movement towards autonomous self-evaluation and improvement in AI systems in which models use inference time to improve results, rather than simply making models larger during training. DeepSeek calls its system “DeepSeek-GRM” (generalist reward modeling). But, as with its model distillation approach, this could be considered a mix of promise and risk. For example, if the AI develops its own judging criteria, there’s a risk those principles diverge from human values, ethics or context. The rules could end up being overly rigid or biased, optimizing for style over substance, and/or reinforce incorrect assumptions or hallucinations. Additionally, without a human in the loop, issues could arise if the “judge” is flawed or misaligned. It’s a kind of AI talking to itself, without robust external grounding. On top of this, users and developers may not understand why the AI reached a certain conclusion — which feeds into a bigger concern: Should an AI be allowed to decide what is “good” or “correct” based solely on its own logic? These risks shouldn’t be discounted. At the same time, this approach is gaining traction, as again DeepSeek builds on the body of work of others (think OpenAI’s “critique and revise” methods, Anthropic’s constitutional AI or research on self-rewarding agents) to create what is likely the first full-stack application of SPCT in a commercial effort. This could mark a powerful shift in AI autonomy, but there still is a need for rigorous auditing, transparency and safeguards. It’s not just about models getting smarter, but that they remain aligned, interpretable, and trustworthy as they begin critiquing themselves without human guardrails. Moving into the future So, taking all of this into account, the rise of DeepSeek signals a broader shift in the AI industry toward parallel innovation tracks. While companies continue building more powerful compute clusters for next-generation capabilities, there will also be intense focus on finding efficiency gains through software engineering and model architecture improvements to offset the challenges of AI energy consumption, which far outpaces power generation capacity.  Companies are taking note. Microsoft, for example, has halted data center development in multiple regions globally, recalibrating toward a more distributed, efficient infrastructure approach. While still planning to invest approximately $80 billion in AI infrastructure this fiscal year, the company is reallocating resources in response to the efficiency gains DeepSeek introduced to the market. Meta has also responded, With so much movement in such a short time, it becomes somewhat ironic that the U.S. sanctions designed to maintain American AI dominance may have instead accelerated the very innovation they sought to contain. By constraining access to materials, DeepSeek was forced to blaze a new trail. Moving forward, as the industry continues to evolve globally, adaptability for all players will be key. Policies, people and market reactions will continue to shift the ground rules — whether it’s eliminating the AI diffusion rule, a new ban on technology purchases or something else entirely. It’s what we learn from one another and how we respond that will be worth watching. Jae Lee is CEO and co-founder of TwelveLabs. Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.
    0 Comentários 0 Compartilhamentos 0 Anterior
  • ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation

    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively.
    Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge.

    Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows.
    Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner.

    The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity.
    The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models.

    By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research.

    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter.
    NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
    #bytedance #researchers #introduce #detailflow #coarsetofine
    ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation #bytedance #researchers #introduce #detailflow #coarsetofine
    WWW.MARKTECHPOST.COM
    ByteDance Researchers Introduce DetailFlow: A 1D Coarse-to-Fine Autoregressive Framework for Faster, Token-Efficient Image Generation
    Autoregressive image generation has been shaped by advances in sequential modeling, originally seen in natural language processing. This field focuses on generating images one token at a time, similar to how sentences are constructed in language models. The appeal of this approach lies in its ability to maintain structural coherence across the image while allowing for high levels of control during the generation process. As researchers began to apply these techniques to visual data, they found that structured prediction not only preserved spatial integrity but also supported tasks like image manipulation and multimodal translation effectively. Despite these benefits, generating high-resolution images remains computationally expensive and slow. A primary issue is the number of tokens needed to represent complex visuals. Raster-scan methods that flatten 2D images into linear sequences require thousands of tokens for detailed images, resulting in long inference times and high memory consumption. Models like Infinity need over 10,000 tokens for a 1024×1024 image. This becomes unsustainable for real-time applications or when scaling to more extensive datasets. Reducing the token burden while preserving or improving output quality has become a pressing challenge. Efforts to mitigate token inflation have led to innovations like next-scale prediction seen in VAR and FlexVAR. These models create images by predicting progressively finer scales, which imitates the human tendency to sketch rough outlines before adding detail. However, they still rely on hundreds of tokens—680 in the case of VAR and FlexVAR for 256×256 images. Moreover, approaches like TiTok and FlexTok use 1D tokenization to compress spatial redundancy, but they often fail to scale efficiently. For example, FlexTok’s gFID increases from 1.9 at 32 tokens to 2.5 at 256 tokens, highlighting a degradation in output quality as the token count grows. Researchers from ByteDance introduced DetailFlow, a 1D autoregressive image generation framework. This method arranges token sequences from global to fine detail using a process called next-detail prediction. Unlike traditional 2D raster-scan or scale-based techniques, DetailFlow employs a 1D tokenizer trained on progressively degraded images. This design allows the model to prioritize foundational image structures before refining visual details. By mapping tokens directly to resolution levels, DetailFlow significantly reduces token requirements, enabling images to be generated in a semantically ordered, coarse-to-fine manner. The mechanism in DetailFlow centers on a 1D latent space where each token contributes incrementally more detail. Earlier tokens encode global features, while later tokens refine specific visual aspects. To train this, the researchers created a resolution mapping function that links token count to target resolution. During training, the model is exposed to images of varying quality levels and learns to predict progressively higher-resolution outputs as more tokens are introduced. It also implements parallel token prediction by grouping sequences and predicting entire sets at once. Since parallel prediction can introduce sampling errors, a self-correction mechanism was integrated. This system perturbs certain tokens during training and teaches subsequent tokens to compensate, ensuring that final images maintain structural and visual integrity. The results from the experiments on the ImageNet 256×256 benchmark were noteworthy. DetailFlow achieved a gFID score of 2.96 using only 128 tokens, outperforming VAR at 3.3 and FlexVAR at 3.05, both of which used 680 tokens. Even more impressive, DetailFlow-64 reached a gFID of 2.62 using 512 tokens. In terms of speed, it delivered nearly double the inference rate of VAR and FlexVAR. A further ablation study confirmed that the self-correction training and semantic ordering of tokens substantially improved output quality. For example, enabling self-correction dropped the gFID from 4.11 to 3.68 in one setting. These metrics demonstrate both higher quality and faster generation compared to established models. By focusing on semantic structure and reducing redundancy, DetailFlow presents a viable solution to long-standing issues in autoregressive image generation. The method’s coarse-to-fine approach, efficient parallel decoding, and ability to self-correct highlight how architectural innovations can address performance and scalability limitations. Through their structured use of 1D tokens, the researchers from ByteDance have demonstrated a model that maintains high image fidelity while significantly reducing computational load, making it a valuable addition to image synthesis research. Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 95k+ ML SubReddit and Subscribe to our Newsletter. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement FinetuningNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces LLaDA-V: A Purely Diffusion-Based Multimodal Large Language Model for Visual Instruction Tuning and Multimodal ReasoningNikhilhttps://www.marktechpost.com/author/nikhil0980/NVIDIA AI Introduces Fast-dLLM: A Training-Free Framework That Brings KV Caching and Parallel Decoding to Diffusion LLMsNikhilhttps://www.marktechpost.com/author/nikhil0980/Meet NovelSeek: A Unified Multi-Agent Framework for Autonomous Scientific Research from Hypothesis Generation to Experimental Validation
    Like
    Love
    Wow
    Sad
    Angry
    821
    0 Comentários 0 Compartilhamentos 0 Anterior
  • Crafting Atmospheric Images with Fog and Mist

    Fog and mist offer photographers an exceptional opportunity to create deeply atmospheric, moody, and mysterious imagery. By embracing these unique weather conditions, you can transform ordinary scenes into captivating visual stories filled with depth, emotion, and intrigue. Here’s how to effectively harness fog and mist to elevate your photography.

    Understanding the Appeal of Fog and Mist
    Fog and mist naturally diffuse light, softening contrasts and textures within a scene. This soft diffusion creates a dreamlike ambiance and adds emotional depth to photographs, often evoking feelings of tranquility, solitude, or mystery. By obscuring and revealing elements selectively, fog and mist invite viewers to engage deeply with the visual narrative, filling in unseen details with imagination.
    Ideal Conditions and Timing
    The most atmospheric fog and mist usually occur during early mornings or evenings, especially near bodies of water or in valleys and lowlands. Paying close attention to weather forecasts can help you predict ideal conditions. Early preparation and scouting locations ahead of time ensure you’re ready when the perfect atmospheric conditions arise.

    Composition Techniques for Foggy Scenes
    Creating impactful foggy compositions involves thoughtful techniques:

    Layers and Depth: Use the fog’s varying densities to emphasize depth. Layering foreground, midground, and background elements adds visual complexity and interest.
    Silhouettes and Shapes: Fog reduces detail, emphasizing strong shapes and silhouettes. Compose your images around distinctive shapes, trees, or structures to anchor your photograph.
    Simplify the Frame: Minimalism is especially effective in foggy conditions. Embrace simplicity by isolating single elements or subjects against misty backdrops for dramatic effect.

    Lighting and Exposure Considerations
    Fog significantly impacts exposure and lighting conditions:

    Soft Light: The diffused, gentle lighting conditions in fog and mist reduce harsh shadows, creating flattering, ethereal images.
    Exposure Compensation: Fog often tricks camera meters into underexposing scenes. Consider slightly increasing your exposure compensation to accurately capture the brightness and subtle details of foggy conditions.

    Creative Opportunities with Fog and Mist
    Fog and mist open diverse creative possibilities:

    Black-and-White Photography: Foggy conditions lend themselves exceptionally well to monochrome photography, highlighting contrasts, textures, and shapes dramatically.
    Color Tones and Mood: Color images in foggy conditions can carry gentle pastel tones or cool hues, enhancing the atmospheric and emotional impact of your imagery.

    Enhancing Foggy Images Through Post-Processing
    Post-processing can refine foggy scenes:

    Contrast and Clarity Adjustments: Fine-tune contrast and clarity subtly to maintain the softness and mood without losing important detail.
    Selective Sharpening: Apply selective sharpening to key elements or subjects, ensuring they stand out within the foggy environment without diminishing the atmospheric quality.

    Fog and mist provide photographers with unique conditions to craft images rich in mood, narrative, and visual intrigue. By thoughtfully considering composition, timing, lighting, and post-processing, you can harness the power of fog and mist to produce atmospheric photography that deeply resonates with viewers. Embrace these ethereal elements, and transform everyday scenes into extraordinary visual stories.
    Extended reading: Alpine views: 12 breathtaking mountainscapes to celebrate the chilly season
    The post Crafting Atmospheric Images with Fog and Mist appeared first on 500px.
    #crafting #atmospheric #images #with #fog
    Crafting Atmospheric Images with Fog and Mist
    Fog and mist offer photographers an exceptional opportunity to create deeply atmospheric, moody, and mysterious imagery. By embracing these unique weather conditions, you can transform ordinary scenes into captivating visual stories filled with depth, emotion, and intrigue. Here’s how to effectively harness fog and mist to elevate your photography. Understanding the Appeal of Fog and Mist Fog and mist naturally diffuse light, softening contrasts and textures within a scene. This soft diffusion creates a dreamlike ambiance and adds emotional depth to photographs, often evoking feelings of tranquility, solitude, or mystery. By obscuring and revealing elements selectively, fog and mist invite viewers to engage deeply with the visual narrative, filling in unseen details with imagination. Ideal Conditions and Timing The most atmospheric fog and mist usually occur during early mornings or evenings, especially near bodies of water or in valleys and lowlands. Paying close attention to weather forecasts can help you predict ideal conditions. Early preparation and scouting locations ahead of time ensure you’re ready when the perfect atmospheric conditions arise. Composition Techniques for Foggy Scenes Creating impactful foggy compositions involves thoughtful techniques: Layers and Depth: Use the fog’s varying densities to emphasize depth. Layering foreground, midground, and background elements adds visual complexity and interest. Silhouettes and Shapes: Fog reduces detail, emphasizing strong shapes and silhouettes. Compose your images around distinctive shapes, trees, or structures to anchor your photograph. Simplify the Frame: Minimalism is especially effective in foggy conditions. Embrace simplicity by isolating single elements or subjects against misty backdrops for dramatic effect. Lighting and Exposure Considerations Fog significantly impacts exposure and lighting conditions: Soft Light: The diffused, gentle lighting conditions in fog and mist reduce harsh shadows, creating flattering, ethereal images. Exposure Compensation: Fog often tricks camera meters into underexposing scenes. Consider slightly increasing your exposure compensation to accurately capture the brightness and subtle details of foggy conditions. Creative Opportunities with Fog and Mist Fog and mist open diverse creative possibilities: Black-and-White Photography: Foggy conditions lend themselves exceptionally well to monochrome photography, highlighting contrasts, textures, and shapes dramatically. Color Tones and Mood: Color images in foggy conditions can carry gentle pastel tones or cool hues, enhancing the atmospheric and emotional impact of your imagery. Enhancing Foggy Images Through Post-Processing Post-processing can refine foggy scenes: Contrast and Clarity Adjustments: Fine-tune contrast and clarity subtly to maintain the softness and mood without losing important detail. Selective Sharpening: Apply selective sharpening to key elements or subjects, ensuring they stand out within the foggy environment without diminishing the atmospheric quality. Fog and mist provide photographers with unique conditions to craft images rich in mood, narrative, and visual intrigue. By thoughtfully considering composition, timing, lighting, and post-processing, you can harness the power of fog and mist to produce atmospheric photography that deeply resonates with viewers. Embrace these ethereal elements, and transform everyday scenes into extraordinary visual stories. Extended reading: Alpine views: 12 breathtaking mountainscapes to celebrate the chilly season The post Crafting Atmospheric Images with Fog and Mist appeared first on 500px. #crafting #atmospheric #images #with #fog
    ISO.500PX.COM
    Crafting Atmospheric Images with Fog and Mist
    Fog and mist offer photographers an exceptional opportunity to create deeply atmospheric, moody, and mysterious imagery. By embracing these unique weather conditions, you can transform ordinary scenes into captivating visual stories filled with depth, emotion, and intrigue. Here’s how to effectively harness fog and mist to elevate your photography. Understanding the Appeal of Fog and Mist Fog and mist naturally diffuse light, softening contrasts and textures within a scene. This soft diffusion creates a dreamlike ambiance and adds emotional depth to photographs, often evoking feelings of tranquility, solitude, or mystery. By obscuring and revealing elements selectively, fog and mist invite viewers to engage deeply with the visual narrative, filling in unseen details with imagination. Ideal Conditions and Timing The most atmospheric fog and mist usually occur during early mornings or evenings, especially near bodies of water or in valleys and lowlands. Paying close attention to weather forecasts can help you predict ideal conditions. Early preparation and scouting locations ahead of time ensure you’re ready when the perfect atmospheric conditions arise. Composition Techniques for Foggy Scenes Creating impactful foggy compositions involves thoughtful techniques: Layers and Depth: Use the fog’s varying densities to emphasize depth. Layering foreground, midground, and background elements adds visual complexity and interest. Silhouettes and Shapes: Fog reduces detail, emphasizing strong shapes and silhouettes. Compose your images around distinctive shapes, trees, or structures to anchor your photograph. Simplify the Frame: Minimalism is especially effective in foggy conditions. Embrace simplicity by isolating single elements or subjects against misty backdrops for dramatic effect. Lighting and Exposure Considerations Fog significantly impacts exposure and lighting conditions: Soft Light: The diffused, gentle lighting conditions in fog and mist reduce harsh shadows, creating flattering, ethereal images. Exposure Compensation: Fog often tricks camera meters into underexposing scenes. Consider slightly increasing your exposure compensation to accurately capture the brightness and subtle details of foggy conditions. Creative Opportunities with Fog and Mist Fog and mist open diverse creative possibilities: Black-and-White Photography: Foggy conditions lend themselves exceptionally well to monochrome photography, highlighting contrasts, textures, and shapes dramatically. Color Tones and Mood: Color images in foggy conditions can carry gentle pastel tones or cool hues, enhancing the atmospheric and emotional impact of your imagery. Enhancing Foggy Images Through Post-Processing Post-processing can refine foggy scenes: Contrast and Clarity Adjustments: Fine-tune contrast and clarity subtly to maintain the softness and mood without losing important detail. Selective Sharpening: Apply selective sharpening to key elements or subjects, ensuring they stand out within the foggy environment without diminishing the atmospheric quality. Fog and mist provide photographers with unique conditions to craft images rich in mood, narrative, and visual intrigue. By thoughtfully considering composition, timing, lighting, and post-processing, you can harness the power of fog and mist to produce atmospheric photography that deeply resonates with viewers. Embrace these ethereal elements, and transform everyday scenes into extraordinary visual stories. Extended reading: Alpine views: 12 breathtaking mountainscapes to celebrate the chilly season The post Crafting Atmospheric Images with Fog and Mist appeared first on 500px.
    Like
    Love
    Wow
    Sad
    Angry
    273
    0 Comentários 0 Compartilhamentos 0 Anterior
CGShares https://cgshares.com