• Ah, the wonders of modern gaming! Who would have thought that the secret to uniting a million people would be simply to toss a digital soccer ball around? Enter "Rematch," the latest sensation that has whisked a million souls away from the harsh realities of life into the pixelated perfection of football. It’s like Rocket League had a baby with FIFA, and now we have a game that claims to bring us all together — because who needs genuine human interaction when you can kick a virtual ball?

    Let’s take a moment to appreciate the brilliance behind this phenomenon. After countless years of research, gaming experts finally discovered that people *actually* enjoy playing football. Shocking, right? It’s not like football has been the most popular sport in the world for, oh, I don’t know, ever. But hey, let’s applaud the genius who looked at Rocket League and thought, "Why don’t we add a ball that actually resembles a soccer ball?"

    With Rematch, we’ve moved past the days of traditional socializing. Why grab a pint with friends when you can huddle in your living room, staring at a screen, pretending to be David Beckham while never actually getting off the couch? The thrill of the game has never been so… sedentary. And who needs to break a sweat when the only thing you’ll be sweating over is how to outmaneuver your fellow couch potatoes with your fancy footwork?

    Now, let’s talk about the social implications. One million people have flocked to Rematch, which means that for every goal scored, there’s a lonely soul who just sat through another week of awkward small talk at the office, wishing they too could be playing digital soccer instead of discussing weekend plans. Talk about a win-win! You can bond with your online teammates while simultaneously avoiding real-life conversations. It’s like the ultimate social life hack!

    But wait, there’s more! The marketing team behind Rematch must be patting themselves on the back for this one. A game that can turn sitting in your pajamas into an epic communal experience? Bravo! It’s almost poetic to think that millions of people are now united over pixelated football matches while ignoring their actual neighbors. Who knew that a digital platform could replace not just a football field but also a community center?

    In conclusion, as we celebrate the monumental achievement of Rematch bringing together one million players, let’s also take a moment to reflect on what we’ve sacrificed for this pixelated paradise: actual human interaction, the smell of fresh grass, and the sweet sound of a whistle blowing on a real field. But hey, at least we’re saving the planet one digital kick at a time, right?

    #Rematch #DigitalSoccer #GamingCommunity #PixelatedFootball #SoccerRevolution
    Ah, the wonders of modern gaming! Who would have thought that the secret to uniting a million people would be simply to toss a digital soccer ball around? Enter "Rematch," the latest sensation that has whisked a million souls away from the harsh realities of life into the pixelated perfection of football. It’s like Rocket League had a baby with FIFA, and now we have a game that claims to bring us all together — because who needs genuine human interaction when you can kick a virtual ball? Let’s take a moment to appreciate the brilliance behind this phenomenon. After countless years of research, gaming experts finally discovered that people *actually* enjoy playing football. Shocking, right? It’s not like football has been the most popular sport in the world for, oh, I don’t know, ever. But hey, let’s applaud the genius who looked at Rocket League and thought, "Why don’t we add a ball that actually resembles a soccer ball?" With Rematch, we’ve moved past the days of traditional socializing. Why grab a pint with friends when you can huddle in your living room, staring at a screen, pretending to be David Beckham while never actually getting off the couch? The thrill of the game has never been so… sedentary. And who needs to break a sweat when the only thing you’ll be sweating over is how to outmaneuver your fellow couch potatoes with your fancy footwork? Now, let’s talk about the social implications. One million people have flocked to Rematch, which means that for every goal scored, there’s a lonely soul who just sat through another week of awkward small talk at the office, wishing they too could be playing digital soccer instead of discussing weekend plans. Talk about a win-win! You can bond with your online teammates while simultaneously avoiding real-life conversations. It’s like the ultimate social life hack! But wait, there’s more! The marketing team behind Rematch must be patting themselves on the back for this one. A game that can turn sitting in your pajamas into an epic communal experience? Bravo! It’s almost poetic to think that millions of people are now united over pixelated football matches while ignoring their actual neighbors. Who knew that a digital platform could replace not just a football field but also a community center? In conclusion, as we celebrate the monumental achievement of Rematch bringing together one million players, let’s also take a moment to reflect on what we’ve sacrificed for this pixelated paradise: actual human interaction, the smell of fresh grass, and the sweet sound of a whistle blowing on a real field. But hey, at least we’re saving the planet one digital kick at a time, right? #Rematch #DigitalSoccer #GamingCommunity #PixelatedFootball #SoccerRevolution
    Déjà 1 million de personnes sur Rematch, le jeu de foot rassemble beaucoup de monde
    ActuGaming.net Déjà 1 million de personnes sur Rematch, le jeu de foot rassemble beaucoup de monde Rematch part d’une idée si bonne et pourtant si évidente après le succès de Rocket […] L'article Déjà 1 million de personnes sur Rematch,
    Like
    Love
    Wow
    Sad
    Angry
    160
    1 Commentarios 0 Acciones
  • Review: The Legend Of Zelda: Breath Of The Wild - Nintendo Switch 2 Edition - The Best Way To Play, But 'Zelda Notes' Sucks

    Tri-forced new content.Editor's Note: As this is our first 'Nintendo Switch 2 Edition' review, we want to outline their focus from the off.Our existing reviews — in this case for Breath of the Wild, both the Wii U and Switch versions, and the DLC — are still live and relevant. If you're coming to this game fresh, we recommend you look back at those first. Our NS2 Edition reviews will focus on new features and upgrades, evaluating what they add to the original experience, and whether they're worth your time - and they will be scored accordingly.Read the full article on nintendolife.com
    #review #legend #zelda #breath #wild
    Review: The Legend Of Zelda: Breath Of The Wild - Nintendo Switch 2 Edition - The Best Way To Play, But 'Zelda Notes' Sucks
    Tri-forced new content.Editor's Note: As this is our first 'Nintendo Switch 2 Edition' review, we want to outline their focus from the off.Our existing reviews — in this case for Breath of the Wild, both the Wii U and Switch versions, and the DLC — are still live and relevant. If you're coming to this game fresh, we recommend you look back at those first. Our NS2 Edition reviews will focus on new features and upgrades, evaluating what they add to the original experience, and whether they're worth your time - and they will be scored accordingly.Read the full article on nintendolife.com #review #legend #zelda #breath #wild
    WWW.NINTENDOLIFE.COM
    Review: The Legend Of Zelda: Breath Of The Wild - Nintendo Switch 2 Edition - The Best Way To Play, But 'Zelda Notes' Sucks
    Tri-forced new content.Editor's Note: As this is our first 'Nintendo Switch 2 Edition' review (and with many more to come, potentially), we want to outline their focus from the off.Our existing reviews — in this case for Breath of the Wild, both the Wii U and Switch versions, and the DLC — are still live and relevant. If you're coming to this game fresh, we recommend you look back at those first. Our NS2 Edition reviews will focus on new features and upgrades, evaluating what they add to the original experience, and whether they're worth your time - and they will be scored accordingly.Read the full article on nintendolife.com
    Like
    Love
    Wow
    Sad
    Angry
    648
    0 Commentarios 0 Acciones
  • BenchmarkQED: Automated benchmarking of RAG systems

    One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics. 
    To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub. It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.  
    BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks. 
    In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes.
    In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset. 
    Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.  
    AutoQ: Automated query synthesis
    This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset.
    AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the queryforming a logical progression along the spectrum.
    Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum. 
    AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset.
    Figure 2. Synthesis process and example query for each of the four AutoQ query classes. 

    About Microsoft Research
    Advancing science and technology to benefit humanity

    View our story

    Opens in a new tab
    AutoE: Automated evaluation framework 
    Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation:

    Comprehensiveness: Does the answer address all relevant aspects of the question? 
    Diversity: Does it present varied perspectives or insights? 
    Empowerment: Does it help the reader understand and make informed judgments? 
    Relevance: Does it address what the question is specifically asking?  

    The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics.
    An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class . AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge.
    These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method.
    Figure 3. Win rates of four LazyGraphRAG configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%.
    The four LazyGraphRAG conditionsdiffer by query budgetand chunk size. All used GPT-4o mini for relevance tests and GPT-4o for query expansionand answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout.
    Comparison systems were GraphRAG , Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG, RAPTOR, and TREX. All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy.
    LazyGraphRAG outperformed every comparison condition using the same generative model, winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration. For DataLocal queries, the smaller budgetperformed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk sizehad a slight edge, likely because longer chunks provide a more coherent context.
    Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall.
    Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset.
    Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall.
    Figure 4. Win rates of LazyGraphRAG  over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition. 
    AutoD: Automated data sampling and summarization
    Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities.
    The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clustersand the number of samples per cluster. This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations.
    AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited.
    Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository, alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.  
    We hope these datasets, together with the BenchmarkQED tools, help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub. 
    Opens in a new tab
    #benchmarkqedautomatedbenchmarking #ofrag #systems
    BenchmarkQED: Automated benchmarking of RAG systems
    One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics.  To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub. It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.   BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks.  In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes. In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset.  Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.   AutoQ: Automated query synthesis This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset. AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the queryforming a logical progression along the spectrum. Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum.  AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset. Figure 2. Synthesis process and example query for each of the four AutoQ query classes.  About Microsoft Research Advancing science and technology to benefit humanity View our story Opens in a new tab AutoE: Automated evaluation framework  Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation: Comprehensiveness: Does the answer address all relevant aspects of the question?  Diversity: Does it present varied perspectives or insights?  Empowerment: Does it help the reader understand and make informed judgments?  Relevance: Does it address what the question is specifically asking?   The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics. An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class . AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge. These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method. Figure 3. Win rates of four LazyGraphRAG configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%. The four LazyGraphRAG conditionsdiffer by query budgetand chunk size. All used GPT-4o mini for relevance tests and GPT-4o for query expansionand answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout. Comparison systems were GraphRAG , Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG, RAPTOR, and TREX. All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy. LazyGraphRAG outperformed every comparison condition using the same generative model, winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration. For DataLocal queries, the smaller budgetperformed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk sizehad a slight edge, likely because longer chunks provide a more coherent context. Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall. Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset. Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall. Figure 4. Win rates of LazyGraphRAG  over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition.  AutoD: Automated data sampling and summarization Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities. The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clustersand the number of samples per cluster. This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations. AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited. Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository, alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.   We hope these datasets, together with the BenchmarkQED tools, help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub.  Opens in a new tab #benchmarkqedautomatedbenchmarking #ofrag #systems
    WWW.MICROSOFT.COM
    BenchmarkQED: Automated benchmarking of RAG systems
    One of the key use cases for generative AI involves answering questions over private datasets, with retrieval-augmented generation (RAG) as the go-to framework. As new RAG techniques emerge, there’s a growing need to benchmark their performance across diverse datasets and metrics.  To meet this need, we’re introducing BenchmarkQED, a new suite of tools that automates RAG benchmarking at scale, available on GitHub (opens in new tab). It includes components for query generation, evaluation, and dataset preparation, each designed to support rigorous, reproducible testing.   BenchmarkQED complements the RAG methods in our open-source GraphRAG library, enabling users to run a GraphRAG-style evaluation across models, metrics, and datasets. GraphRAG uses a large language model (LLM) to generate and summarize entity-based knowledge graphs, producing more comprehensive and diverse answers than standard RAG for large-scale tasks.  In this post, we walk through the core components of BenchmarkQED that contribute to the overall benchmarking process. We also share some of the latest benchmark results comparing our LazyGraphRAG system to competing methods, including a vector-based RAG with a 1M-token context window, where the leading LazyGraphRAG configuration showed significant win rates across all combinations of quality metrics and query classes. In the paper, we distinguish between local queries, where answers are found in a small number of text regions, and sometimes even a single region, and global queries, which require reasoning over large portions of or even the entire dataset.  Conventional vector-based RAG excels at local queries because the regions containing the answer to the query resemble the query itself and can be retrieved as the nearest neighbor in the vector space of text embeddings. However, it struggles with global questions, such as, “What are the main themes of the dataset?” which require understanding dataset qualities not explicitly stated in the text.   AutoQ: Automated query synthesis This limitation motivated the development of GraphRAG a system designed to answer global queries. GraphRAG’s evaluation requirements subsequently led to the creation of AutoQ, a method for synthesizing these global queries for any dataset. AutoQ extends this approach by generating synthetic queries across the spectrum of queries, from local to global. It defines four distinct classes based on the source and scope of the query (Figure 1, top) forming a logical progression along the spectrum (Figure 1, bottom). Figure 1. Construction of a 2×2 design space for synthetic query generation with AutoQ, showing how the four resulting query classes map onto the local-global query spectrum.  AutoQ can be configured to generate any number and distribution of synthetic queries along these classes, enabling consistent benchmarking across datasets without requiring user customization. Figure 2 shows the synthesis process and sample queries from each class, using an AP News dataset. Figure 2. Synthesis process and example query for each of the four AutoQ query classes.  About Microsoft Research Advancing science and technology to benefit humanity View our story Opens in a new tab AutoE: Automated evaluation framework  Our evaluation of GraphRAG focused on analyzing key qualities of answers to global questions. The following qualities were used for the current evaluation: Comprehensiveness: Does the answer address all relevant aspects of the question?  Diversity: Does it present varied perspectives or insights?  Empowerment: Does it help the reader understand and make informed judgments?  Relevance: Does it address what the question is specifically asking?   The AutoE component scales evaluation of these qualities using the LLM-as-a-Judge method. It presents pairs of answers to an LLM, along with the query and target metric, in counterbalanced order. The model determines whether the first answer wins, loses, or ties with the second. Over a set of queries, whether from AutoQ or elsewhere, this produces win rates between competing methods. When ground truth is available, AutoE can also score answers on correctness, completeness, and related metrics. An illustrative evaluation is shown in Figure 3. Using a dataset of 1,397 AP News articles on health and healthcare, AutoQ generated 50 queries per class (200 total). AutoE then compared LazyGraphRAG to a competing RAG method, running six trials per query across four metrics, using GPT-4.1 as a judge. These trial-level results were aggregated using metric-based win rates, where each trial is scored 1 for a win, 0.5 for a tie, and 0 for a loss, and then averaged to calculate the overall win rate for each RAG method. Figure 3. Win rates of four LazyGraphRAG (LGR) configurations across methods, broken down by the AutoQ query class and averaged across AutoE’s four metrics: comprehensiveness, diversity, empowerment, and relevance. LazyGraphRAG outperforms comparison conditions where the bar is above 50%. The four LazyGraphRAG conditions (LGR_b200_c200, LGR_b50_c200, LGR_b50_c600, LGR_b200_c200_mini) differ by query budget (b50, b200) and chunk size (c200, c600). All used GPT-4o mini for relevance tests and GPT-4o for query expansion (to five subqueries) and answer generation, except for LGR_b200_c200_mini, which used GPT-4o mini throughout. Comparison systems were GraphRAG (Local, Global, and Drift Search), Vector RAG with 8k- and 120k-token windows, and three published methods: LightRAG (opens in new tab), RAPTOR (opens in new tab), and TREX (opens in new tab). All methods were limited to the same 8k tokens for answer generation. GraphRAG Global Search used level 2 of the community hierarchy. LazyGraphRAG outperformed every comparison condition using the same generative model (GPT-4o), winning all 96 comparisons, with all but one reaching statistical significance. The best overall performance came from the larger budget, smaller chunk size configuration (LGR_b200_c200). For DataLocal queries, the smaller budget (LGR_b50_c200) performed slightly better, likely because fewer chunks were relevant. For ActivityLocal queries, the larger chunk size (LGR_b50_c600) had a slight edge, likely because longer chunks provide a more coherent context. Competing methods performed relatively better on the query classes for which they were designed: GraphRAG Global for global queries, Vector RAG for local queries, and GraphRAG Drift Search, which combines both strategies, posed the strongest challenge overall. Increasing Vector RAG’s context window from 8k to 120k tokens did not improve its performance compared to LazyGraphRAG. This raised the question of how LazyGraphRAG would perform against Vector RAG with 1-million token context window containing most of the dataset. Figure 4 shows the follow-up experiment comparing LazyGraphRAG to Vector RAG using GPT-4.1 that enabled this comparison. Even against the 1M-token window, LazyGraphRAG achieved higher win rates across all comparisons, failing to reach significance only for the relevance of answers to DataLocal queries. These queries tend to benefit most from Vector RAG’s ranking of directly relevant chunks, making it hard for LazyGraphRAG to generate answers that have greater relevance to the query, even though these answers may be dramatically more comprehensive, diverse, and empowering overall. Figure 4. Win rates of LazyGraphRAG (LGR) over Vector RAG across different context window sizes, broken down by the four AutoQ query classes and four AutoE metrics: comprehensiveness, diversity, empowerment, and relevance. Bars above 50% indicate that LazyGraphRAG outperformed the comparison condition.  AutoD: Automated data sampling and summarization Text datasets have an underlying topical structure, but the depth, breadth, and connectivity of that structure can vary widely. This variability makes it difficult to evaluate RAG systems consistently, as results may reflect the idiosyncrasies of the dataset rather than the system’s general capabilities. The AutoD component addresses this by sampling datasets to meet a target specification, defined by the number of topic clusters (breadth) and the number of samples per cluster (depth). This creates consistency across datasets, enabling more meaningful comparisons, as structurally aligned datasets lead to comparable AutoQ queries, which in turn support consistent AutoE evaluations. AutoD also includes tools for summarizing input or output datasets in a way that reflects their topical coverage. These summaries play an important role in the AutoQ query synthesis process, but they can also be used more broadly, such as in prompts where context space is limited. Since the release of the GraphRAG paper, we’ve received many requests to share the dataset of the Behind the Tech (opens in new tab) podcast transcripts we used in our evaluation. An updated version of this dataset is now available in the BenchmarkQED repository (opens in new tab), alongside the AP News dataset containing 1,397 health-related articles, licensed for open release.   We hope these datasets, together with the BenchmarkQED tools (opens in new tab), help accelerate benchmark-driven development of RAG systems and AI question-answering. We invite the community to try them on GitHub (opens in new tab).  Opens in a new tab
    Like
    Love
    Wow
    Sad
    Angry
    487
    0 Commentarios 0 Acciones
  • HMRC phishing breach wholly avoidable, but hard to stop

    A significant cyber breach at His Majesty’s Revenue and Customsthat saw scammers cheat the public purse out of approximately £47m has been met with dismay from security experts thanks to the sheer simplicity of the attack, which originated via account takeover attempts on legitimate taxpayers.
    HMRC disclosed the breach to a Treasury Select Committee this week, revealing that hackers accessed the online accounts of about 100,000 people via phishing attacks and managed to claim a significant amount of money in tax rebates before being stopped.
    It is understood that those individuals affected have been contacted by HMRC – they have not personally lost any money and are not themselves in any trouble. Arrests in the case have already been made.
    During proceedings, HMRC also came in for criticism by the committee’s chair Meg Hillier, who had learned about the via an earlier news report on the matter, over the length of time taken to come clean over the incident.

    With phishing emails sent to unwitting taxpayers identified as the initial attack vector for the scammers, HMRC might feel relieved that it has dodged full blame for the incident.
    But according to Will Richmond-Coggan, a partner specialising in data and cyber disputes at law firm Freeths, even though the tax office had gone to pains to stress its own systems were never actually compromised, the incident underscored just how widespread the consequences of cyber attacks can be – snowballing from simple origins into a multimillion pound loss.
    “It is clear from HMRC's explanation that the crime against HMRC was only possible because of earlier data breaches and cyber attacks,” said Richmond-Coggan.
    “Those earlier attacks put personal data in the hands of the criminals which enabled them to impersonate tax payers and apply successfully to claim back tax.”

    Meanwhile, Gerasim Hovhannisyan, CEO of EasyDMARC, an email security provider, pointed out that phishing against both private individuals and businesses and other organisations had long ago moved beyond the domain of scammers chancing their luck.
    While this type of scattergun fraud remains a potent threat, particularly to consumers who may not be informed about cyber security matters – the scale of the HMRC phish surely suggests a targeted operation, likely using carefully crafted email purporting to represent HMRC itself, designed to lure self-assessment taxpayers into handing over their accounts.
    Not only that, but generative artificial intelligencemeans targeted phishing operations have become exponentially more dangerous in a very short space of time, added Hovhannisyan.
    “has madescalable, polished, and dangerously convincing, often indistinguishable from legitimate communication. And while many organisations have strengthened their security perimeters, email remains the most consistently exploited and underestimated attack vector,” he said.
    “These scams exploit human trust, using urgency, authority, and increasingly realistic impersonation tactics. If HMRC can be phished, anyone can.”
    Added Hovhannisyan: “What’s more alarming is that the Treasury Select Committee only learned of the breach through the news. When £47m is stolen through impersonation, institutions can’t afford to stay quiet. Delayed disclosure erodes trust, stalls response, and gives attackers room to manoeuvre.”

    Once again a service’s end-users have turned out to be the source of a cyber attack and as such, whether they are internal or – as in this case – external, are often considered an organisation’s first line of defence.
    However, it is not always wise to take this approach, and for an organisation like HMRC daily engaging with members of the public, it is also not really possible. Security education is a difficult proposition at the best of times and although the UK’s National Cyber Security Centreprovides extensive advice and guidance on spotting and dealing with phishing emails for consumers – it also operates a phishing reporting service that as of April 2025 has received over 41 million scam reports – bodies like HMRC cannot rely on everybody having visited the NCSC’s website.
    As such, Mike Britton, chief information officerat Abnormal AI, a specialist in phishing, social engineering and account takeover prevention, argued that HMRC could and should have done more from a technical perspective.
    “Governments will always be a high tier target for cyber criminals due to the valuable information they hold. In fact, attacks against this sector are rising,” he said.
    “In this case, it looks like criminals utilised account take over to conduct fraud. To combat this, multifactor authenticationis key, but as attacks grow more sophisticated, further steps must be taken.”
    Britton said organisations like HMRC really needed to consider adopting more layered security strategies, not only including MFA but also incorporating wider visibility and unified controls across its IT systems.
    Account takeover attacks such as the ones seen in this incident can unfold quickly, he added, so its cyber function should also be equipped with the tools to identify and remediate compromised accounts on the fly.

    about trends in phishing

    Quishing, meaning QR code phishing, is an offputting term for an on-the-rise attack method. Learn how to defend against it.
    A healthy dose of judicious skepticism is crucial to preventing phishing attacks, said David Fine, supervisory special agent at the FBI, during a presentation at a HIMSS event.
    Exchange admins got a boost from Microsoft when it improved how it handles DMARC authentication failures to help organisations fight back from email-based attacks on their users.
    #hmrc #phishing #breach #wholly #avoidable
    HMRC phishing breach wholly avoidable, but hard to stop
    A significant cyber breach at His Majesty’s Revenue and Customsthat saw scammers cheat the public purse out of approximately £47m has been met with dismay from security experts thanks to the sheer simplicity of the attack, which originated via account takeover attempts on legitimate taxpayers. HMRC disclosed the breach to a Treasury Select Committee this week, revealing that hackers accessed the online accounts of about 100,000 people via phishing attacks and managed to claim a significant amount of money in tax rebates before being stopped. It is understood that those individuals affected have been contacted by HMRC – they have not personally lost any money and are not themselves in any trouble. Arrests in the case have already been made. During proceedings, HMRC also came in for criticism by the committee’s chair Meg Hillier, who had learned about the via an earlier news report on the matter, over the length of time taken to come clean over the incident. With phishing emails sent to unwitting taxpayers identified as the initial attack vector for the scammers, HMRC might feel relieved that it has dodged full blame for the incident. But according to Will Richmond-Coggan, a partner specialising in data and cyber disputes at law firm Freeths, even though the tax office had gone to pains to stress its own systems were never actually compromised, the incident underscored just how widespread the consequences of cyber attacks can be – snowballing from simple origins into a multimillion pound loss. “It is clear from HMRC's explanation that the crime against HMRC was only possible because of earlier data breaches and cyber attacks,” said Richmond-Coggan. “Those earlier attacks put personal data in the hands of the criminals which enabled them to impersonate tax payers and apply successfully to claim back tax.” Meanwhile, Gerasim Hovhannisyan, CEO of EasyDMARC, an email security provider, pointed out that phishing against both private individuals and businesses and other organisations had long ago moved beyond the domain of scammers chancing their luck. While this type of scattergun fraud remains a potent threat, particularly to consumers who may not be informed about cyber security matters – the scale of the HMRC phish surely suggests a targeted operation, likely using carefully crafted email purporting to represent HMRC itself, designed to lure self-assessment taxpayers into handing over their accounts. Not only that, but generative artificial intelligencemeans targeted phishing operations have become exponentially more dangerous in a very short space of time, added Hovhannisyan. “has madescalable, polished, and dangerously convincing, often indistinguishable from legitimate communication. And while many organisations have strengthened their security perimeters, email remains the most consistently exploited and underestimated attack vector,” he said. “These scams exploit human trust, using urgency, authority, and increasingly realistic impersonation tactics. If HMRC can be phished, anyone can.” Added Hovhannisyan: “What’s more alarming is that the Treasury Select Committee only learned of the breach through the news. When £47m is stolen through impersonation, institutions can’t afford to stay quiet. Delayed disclosure erodes trust, stalls response, and gives attackers room to manoeuvre.” Once again a service’s end-users have turned out to be the source of a cyber attack and as such, whether they are internal or – as in this case – external, are often considered an organisation’s first line of defence. However, it is not always wise to take this approach, and for an organisation like HMRC daily engaging with members of the public, it is also not really possible. Security education is a difficult proposition at the best of times and although the UK’s National Cyber Security Centreprovides extensive advice and guidance on spotting and dealing with phishing emails for consumers – it also operates a phishing reporting service that as of April 2025 has received over 41 million scam reports – bodies like HMRC cannot rely on everybody having visited the NCSC’s website. As such, Mike Britton, chief information officerat Abnormal AI, a specialist in phishing, social engineering and account takeover prevention, argued that HMRC could and should have done more from a technical perspective. “Governments will always be a high tier target for cyber criminals due to the valuable information they hold. In fact, attacks against this sector are rising,” he said. “In this case, it looks like criminals utilised account take over to conduct fraud. To combat this, multifactor authenticationis key, but as attacks grow more sophisticated, further steps must be taken.” Britton said organisations like HMRC really needed to consider adopting more layered security strategies, not only including MFA but also incorporating wider visibility and unified controls across its IT systems. Account takeover attacks such as the ones seen in this incident can unfold quickly, he added, so its cyber function should also be equipped with the tools to identify and remediate compromised accounts on the fly. about trends in phishing Quishing, meaning QR code phishing, is an offputting term for an on-the-rise attack method. Learn how to defend against it. A healthy dose of judicious skepticism is crucial to preventing phishing attacks, said David Fine, supervisory special agent at the FBI, during a presentation at a HIMSS event. Exchange admins got a boost from Microsoft when it improved how it handles DMARC authentication failures to help organisations fight back from email-based attacks on their users. #hmrc #phishing #breach #wholly #avoidable
    WWW.COMPUTERWEEKLY.COM
    HMRC phishing breach wholly avoidable, but hard to stop
    A significant cyber breach at His Majesty’s Revenue and Customs (HMRC) that saw scammers cheat the public purse out of approximately £47m has been met with dismay from security experts thanks to the sheer simplicity of the attack, which originated via account takeover attempts on legitimate taxpayers. HMRC disclosed the breach to a Treasury Select Committee this week, revealing that hackers accessed the online accounts of about 100,000 people via phishing attacks and managed to claim a significant amount of money in tax rebates before being stopped. It is understood that those individuals affected have been contacted by HMRC – they have not personally lost any money and are not themselves in any trouble. Arrests in the case have already been made. During proceedings, HMRC also came in for criticism by the committee’s chair Meg Hillier, who had learned about the via an earlier news report on the matter, over the length of time taken to come clean over the incident. With phishing emails sent to unwitting taxpayers identified as the initial attack vector for the scammers, HMRC might feel relieved that it has dodged full blame for the incident. But according to Will Richmond-Coggan, a partner specialising in data and cyber disputes at law firm Freeths, even though the tax office had gone to pains to stress its own systems were never actually compromised, the incident underscored just how widespread the consequences of cyber attacks can be – snowballing from simple origins into a multimillion pound loss. “It is clear from HMRC's explanation that the crime against HMRC was only possible because of earlier data breaches and cyber attacks,” said Richmond-Coggan. “Those earlier attacks put personal data in the hands of the criminals which enabled them to impersonate tax payers and apply successfully to claim back tax.” Meanwhile, Gerasim Hovhannisyan, CEO of EasyDMARC, an email security provider, pointed out that phishing against both private individuals and businesses and other organisations had long ago moved beyond the domain of scammers chancing their luck. While this type of scattergun fraud remains a potent threat, particularly to consumers who may not be informed about cyber security matters – the scale of the HMRC phish surely suggests a targeted operation, likely using carefully crafted email purporting to represent HMRC itself, designed to lure self-assessment taxpayers into handing over their accounts. Not only that, but generative artificial intelligence (GenAI) means targeted phishing operations have become exponentially more dangerous in a very short space of time, added Hovhannisyan. “[It] has made [phishing] scalable, polished, and dangerously convincing, often indistinguishable from legitimate communication. And while many organisations have strengthened their security perimeters, email remains the most consistently exploited and underestimated attack vector,” he said. “These scams exploit human trust, using urgency, authority, and increasingly realistic impersonation tactics. If HMRC can be phished, anyone can.” Added Hovhannisyan: “What’s more alarming is that the Treasury Select Committee only learned of the breach through the news. When £47m is stolen through impersonation, institutions can’t afford to stay quiet. Delayed disclosure erodes trust, stalls response, and gives attackers room to manoeuvre.” Once again a service’s end-users have turned out to be the source of a cyber attack and as such, whether they are internal or – as in this case – external, are often considered an organisation’s first line of defence. However, it is not always wise to take this approach, and for an organisation like HMRC daily engaging with members of the public, it is also not really possible. Security education is a difficult proposition at the best of times and although the UK’s National Cyber Security Centre (NCSC) provides extensive advice and guidance on spotting and dealing with phishing emails for consumers – it also operates a phishing reporting service that as of April 2025 has received over 41 million scam reports – bodies like HMRC cannot rely on everybody having visited the NCSC’s website. As such, Mike Britton, chief information officer (CIO) at Abnormal AI, a specialist in phishing, social engineering and account takeover prevention, argued that HMRC could and should have done more from a technical perspective. “Governments will always be a high tier target for cyber criminals due to the valuable information they hold. In fact, attacks against this sector are rising,” he said. “In this case, it looks like criminals utilised account take over to conduct fraud. To combat this, multifactor authentication (MFA) is key, but as attacks grow more sophisticated, further steps must be taken.” Britton said organisations like HMRC really needed to consider adopting more layered security strategies, not only including MFA but also incorporating wider visibility and unified controls across its IT systems. Account takeover attacks such as the ones seen in this incident can unfold quickly, he added, so its cyber function should also be equipped with the tools to identify and remediate compromised accounts on the fly. Read more about trends in phishing Quishing, meaning QR code phishing, is an offputting term for an on-the-rise attack method. Learn how to defend against it. A healthy dose of judicious skepticism is crucial to preventing phishing attacks, said David Fine, supervisory special agent at the FBI, during a presentation at a HIMSS event. Exchange admins got a boost from Microsoft when it improved how it handles DMARC authentication failures to help organisations fight back from email-based attacks on their users.
    Like
    Love
    Wow
    Sad
    Angry
    279
    0 Commentarios 0 Acciones
  • Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance

    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more

    Google has released an updated preview of​​ Gemini 2.5 Pro, its “most intelligent” model, first announced in March and upgraded in May, as a preview, intending to release the same model to general availability in a couple of weeks. 
    Enterprises can test building new applications or replace earlier versions with an updated version of the “I/O edition” of Gemini 2.5 Pro that, according to a blog post by Google, is more creative in its responses and outperforms other models in coding and reasoning. 
    During its annual I/O developer conference in May, Google announced that it updated Gemini 2.5 Pro to be better than its earlier iteration, which it quietly released. Google DeepMind CEO Demis Hassabis said the I/O edition is the company’s best coding model yet. 
    But this new preview, called Gemini 2.5 Pro Preview 06-05 Thinking, is even better than the I/O edition. The stable version Google plans to release publicly is “ready for enterprise-scale capabilities.”
    The I/O edition, or gemini-2.5-pro-preview-05-06, was first made available to developers and enterprises in May through Google AI Studio and Vertex AI. Gemini 2.5 Pro Preview 06-05 Thinking can be accessed via the same platforms. 
    Performance metrics
    This new version of Gemini 2.5 Pro performs even better than the first release. 
    Google said the new version of Gemini 2.5 Pro improved by 24 points in LMArena and by 35 points in WebDevArena, where it currently tops the leaderboard. The company’s benchmark tests showed that the model outscored competitors like OpenAI’s o3, o3-mini, and o4-mini, Anthropic’s Claude 4 Opus, Grok 3 Beta from xAI and DeepSeek R1. 
    “We’ve also addressed feedback from our previous 2.5 Pro releases, improving its style and structure — it can be more creative with better-formatted responses,” Google said in the blog post. 

    What enterprises can expect
    Google’s continuous improvement of Gemini 2.5 Pro might be confusing for many, but Google previously framed these as a response to community feedback. Pricing for the new version is per million tokens without caching for inputs and for the output price. 
    When the very first version of Gemini 2.5 Pro launched in March, VentureBeat’s Matt Marshall called it “the smartest model you’re not using.” Since then, Google has integrated the model into many of its new applications and services, including “Deep Think,” where Gemini considers multiple hypotheses before responding. 
    The release of Gemini 2.5 Pro, and its two upgraded versions, revived Google’s place in the large language model space after competitors like DeepSeek and OpenAI diverted the industry’s attention to their reasoning models. 
    In just a few hours of announcing the updated Gemini 2.5 Pro, developers have already begun playing around with it. While many found the update to live up to Google’s promise of being faster, the jury is still out if this latest Gemini 2.5 Pro does actually perform better. 

    Daily insights on business use cases with VB Daily
    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.
    #google #claims #gemini #pro #preview
    Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance
    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google has released an updated preview of​​ Gemini 2.5 Pro, its “most intelligent” model, first announced in March and upgraded in May, as a preview, intending to release the same model to general availability in a couple of weeks.  Enterprises can test building new applications or replace earlier versions with an updated version of the “I/O edition” of Gemini 2.5 Pro that, according to a blog post by Google, is more creative in its responses and outperforms other models in coding and reasoning.  During its annual I/O developer conference in May, Google announced that it updated Gemini 2.5 Pro to be better than its earlier iteration, which it quietly released. Google DeepMind CEO Demis Hassabis said the I/O edition is the company’s best coding model yet.  But this new preview, called Gemini 2.5 Pro Preview 06-05 Thinking, is even better than the I/O edition. The stable version Google plans to release publicly is “ready for enterprise-scale capabilities.” The I/O edition, or gemini-2.5-pro-preview-05-06, was first made available to developers and enterprises in May through Google AI Studio and Vertex AI. Gemini 2.5 Pro Preview 06-05 Thinking can be accessed via the same platforms.  Performance metrics This new version of Gemini 2.5 Pro performs even better than the first release.  Google said the new version of Gemini 2.5 Pro improved by 24 points in LMArena and by 35 points in WebDevArena, where it currently tops the leaderboard. The company’s benchmark tests showed that the model outscored competitors like OpenAI’s o3, o3-mini, and o4-mini, Anthropic’s Claude 4 Opus, Grok 3 Beta from xAI and DeepSeek R1.  “We’ve also addressed feedback from our previous 2.5 Pro releases, improving its style and structure — it can be more creative with better-formatted responses,” Google said in the blog post.  What enterprises can expect Google’s continuous improvement of Gemini 2.5 Pro might be confusing for many, but Google previously framed these as a response to community feedback. Pricing for the new version is per million tokens without caching for inputs and for the output price.  When the very first version of Gemini 2.5 Pro launched in March, VentureBeat’s Matt Marshall called it “the smartest model you’re not using.” Since then, Google has integrated the model into many of its new applications and services, including “Deep Think,” where Gemini considers multiple hypotheses before responding.  The release of Gemini 2.5 Pro, and its two upgraded versions, revived Google’s place in the large language model space after competitors like DeepSeek and OpenAI diverted the industry’s attention to their reasoning models.  In just a few hours of announcing the updated Gemini 2.5 Pro, developers have already begun playing around with it. While many found the update to live up to Google’s promise of being faster, the jury is still out if this latest Gemini 2.5 Pro does actually perform better.  Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured. #google #claims #gemini #pro #preview
    VENTUREBEAT.COM
    Google claims Gemini 2.5 Pro preview beats DeepSeek R1 and Grok 3 Beta in coding performance
    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Google has released an updated preview of​​ Gemini 2.5 Pro, its “most intelligent” model, first announced in March and upgraded in May, as a preview, intending to release the same model to general availability in a couple of weeks.  Enterprises can test building new applications or replace earlier versions with an updated version of the “I/O edition” of Gemini 2.5 Pro that, according to a blog post by Google, is more creative in its responses and outperforms other models in coding and reasoning.  During its annual I/O developer conference in May, Google announced that it updated Gemini 2.5 Pro to be better than its earlier iteration, which it quietly released. Google DeepMind CEO Demis Hassabis said the I/O edition is the company’s best coding model yet.  But this new preview, called Gemini 2.5 Pro Preview 06-05 Thinking, is even better than the I/O edition. The stable version Google plans to release publicly is “ready for enterprise-scale capabilities.” The I/O edition, or gemini-2.5-pro-preview-05-06, was first made available to developers and enterprises in May through Google AI Studio and Vertex AI. Gemini 2.5 Pro Preview 06-05 Thinking can be accessed via the same platforms.  Performance metrics This new version of Gemini 2.5 Pro performs even better than the first release.  Google said the new version of Gemini 2.5 Pro improved by 24 points in LMArena and by 35 points in WebDevArena, where it currently tops the leaderboard. The company’s benchmark tests showed that the model outscored competitors like OpenAI’s o3, o3-mini, and o4-mini, Anthropic’s Claude 4 Opus, Grok 3 Beta from xAI and DeepSeek R1.  “We’ve also addressed feedback from our previous 2.5 Pro releases, improving its style and structure — it can be more creative with better-formatted responses,” Google said in the blog post.  What enterprises can expect Google’s continuous improvement of Gemini 2.5 Pro might be confusing for many, but Google previously framed these as a response to community feedback. Pricing for the new version is $1.25 per million tokens without caching for inputs and $10 for the output price.  When the very first version of Gemini 2.5 Pro launched in March, VentureBeat’s Matt Marshall called it “the smartest model you’re not using.” Since then, Google has integrated the model into many of its new applications and services, including “Deep Think,” where Gemini considers multiple hypotheses before responding.  The release of Gemini 2.5 Pro, and its two upgraded versions, revived Google’s place in the large language model space after competitors like DeepSeek and OpenAI diverted the industry’s attention to their reasoning models.  In just a few hours of announcing the updated Gemini 2.5 Pro, developers have already begun playing around with it. While many found the update to live up to Google’s promise of being faster, the jury is still out if this latest Gemini 2.5 Pro does actually perform better.  Daily insights on business use cases with VB Daily If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI. Read our Privacy Policy Thanks for subscribing. Check out more VB newsletters here. An error occured.
    Like
    Love
    Wow
    Angry
    Sad
    303
    0 Commentarios 0 Acciones
  • 'No Work Today': Diehard Nintendo Fans Line Up Early For Switch 2

    Lisa Jones has been a Nintendo fan since the company’s first major console, the NES, launched in the 1980s. “I’ve actually had every system, including the Virtual Boy,” she says. So, with Nintendo about to release its newest console, the Switch 2, Jones knew she had to own it on day one. “I took the day off just to make sure I’d get one,” she told PCMag as she waited outside a Best Buy store, sitting on the concrete while occasionally stretching. Jones was among the diehard Nintendo fans who began lining up outside the store in San Francisco, hoping to snag the console on launch day. The Switch 2 becomes available to consumers at 12 a.m. EST / 9 p.m. PST. But not everyone managed to snag a preorder, prompting some to fall back on the tried-and-true method of lining up in person.“Yeah, I’m cold,” said Doonie Love, an actor and model who was first in line at the store. He spoke to us with his black hoodie pulled over his head as the San Francisco wind blew by. Love began waiting at about 9 a.m. after failing to secure a preorder, which sold out quickly across retailers weeks ago. Although he’s a Nintendo and Pokémon fan, he actually showed up to the Best Buy on a “whim,” curious to see if people were lining up.“There’s no work today, I just needed something to do,” he said on deciding to wait in line. “I just called someone to bring a jacket, chair, and burrito," he later added. Others like Brad Reinke were ready to line up. “I took the day off. Yeah, I was totally prepared to play video games all day,” he told us while sitting in his foldable chair and eating a pasta takeout order from DoorDash. “We’re here all night so I've got to get lunch and dinner in me.” He too is a major Nintendo fan, and also bought the Switch 1 on launch day back in 2017. “I’m a big collector and I’m probably going to buy everything they have on sale.” he said. While Reinke wasn’t able to secure a preorder, he said he enjoys the experience of the “midnight releases," which attracts other devoted fans. “There’s good company, everyone’s here for the same reason, so we all have stuff to talk about,” he said.Meanwhile, another consumer named James Gualtieri was prepared to work remotely while waiting outside the Best Buy, carrying his laptop and a Wi-Fi hotspot. “I was in ameeting for half an hour, chatting with folks,” he said. Recommended by Our EditorsWe visited the Best Buy at around 2 p.m. on Wednesday, where the line for customers without preorders was relatively small, at about 10 people. As a result, it looked like all the consumers had a strong chance of scoring the console on launch day. But Gualtieri told us Best Buy staff wouldn’t commit to confirming if everyone in line would come away with the Switch 2 since the retailer also has to prioritize preorders.  “At the end of the day, it’s not the end of the world if I don’t get one,” he said after already waiting for two hours. Fortunately, Gualtieri’s workplace is located next to the Best Buy store. “If I can’t get one, I’ll try to get in line tomorrow morning. I would really love to get one before the weekend,” he said. Meanwhile, others like Jones said it was important to snag a Switch 2 soon, rather than wait, citing the risk of Trump’s tariffs raising the price. “Get it while you can,” she said, noting Microsoft recently increased the price for its Xbox consoles.  Best Buy isn’t the only location in San Francisco to offer the Switch 2 for tonight’s release. Nintendo’s official store in the city opened last month and is slated to sell the console as well. But the product will only be available to lucky consumers who were able to snag a preorder, or “warp pass.” Hours before the sales were set to begin, the store held a prelaunch “celebration” event, giving fans a chance to demo the Switch 2. The event attracted a long line of over 80 people when it began at 1 p.m. Several Nintendo fans also dressed up for the event, including a consumer named Annie, who cosplayed as the Zelda character, and said “I came here from Mexico.”"When I was a child I play the Nintendo so much with my friends," Annie added, while also showing off a Zelda tattoo. Another consumer named Greg H. also looked forward to tonight’s launch, having scored a warp pass to buy the Switch 2 from the official Nintendo store in San Francisco. “There is this nostalgic factor of waiting up until midnight to pick up the console,” he said while standing at the prelaunch event with a Nintendo N64 bag. “There’s also a communal aspect, where you meet a lot of people with the same interest.”
    #039no #work #today039 #diehard #nintendo
    'No Work Today': Diehard Nintendo Fans Line Up Early For Switch 2
    Lisa Jones has been a Nintendo fan since the company’s first major console, the NES, launched in the 1980s. “I’ve actually had every system, including the Virtual Boy,” she says. So, with Nintendo about to release its newest console, the Switch 2, Jones knew she had to own it on day one. “I took the day off just to make sure I’d get one,” she told PCMag as she waited outside a Best Buy store, sitting on the concrete while occasionally stretching. Jones was among the diehard Nintendo fans who began lining up outside the store in San Francisco, hoping to snag the console on launch day. The Switch 2 becomes available to consumers at 12 a.m. EST / 9 p.m. PST. But not everyone managed to snag a preorder, prompting some to fall back on the tried-and-true method of lining up in person.“Yeah, I’m cold,” said Doonie Love, an actor and model who was first in line at the store. He spoke to us with his black hoodie pulled over his head as the San Francisco wind blew by. Love began waiting at about 9 a.m. after failing to secure a preorder, which sold out quickly across retailers weeks ago. Although he’s a Nintendo and Pokémon fan, he actually showed up to the Best Buy on a “whim,” curious to see if people were lining up.“There’s no work today, I just needed something to do,” he said on deciding to wait in line. “I just called someone to bring a jacket, chair, and burrito," he later added. Others like Brad Reinke were ready to line up. “I took the day off. Yeah, I was totally prepared to play video games all day,” he told us while sitting in his foldable chair and eating a pasta takeout order from DoorDash. “We’re here all night so I've got to get lunch and dinner in me.” He too is a major Nintendo fan, and also bought the Switch 1 on launch day back in 2017. “I’m a big collector and I’m probably going to buy everything they have on sale.” he said. While Reinke wasn’t able to secure a preorder, he said he enjoys the experience of the “midnight releases," which attracts other devoted fans. “There’s good company, everyone’s here for the same reason, so we all have stuff to talk about,” he said.Meanwhile, another consumer named James Gualtieri was prepared to work remotely while waiting outside the Best Buy, carrying his laptop and a Wi-Fi hotspot. “I was in ameeting for half an hour, chatting with folks,” he said. Recommended by Our EditorsWe visited the Best Buy at around 2 p.m. on Wednesday, where the line for customers without preorders was relatively small, at about 10 people. As a result, it looked like all the consumers had a strong chance of scoring the console on launch day. But Gualtieri told us Best Buy staff wouldn’t commit to confirming if everyone in line would come away with the Switch 2 since the retailer also has to prioritize preorders.  “At the end of the day, it’s not the end of the world if I don’t get one,” he said after already waiting for two hours. Fortunately, Gualtieri’s workplace is located next to the Best Buy store. “If I can’t get one, I’ll try to get in line tomorrow morning. I would really love to get one before the weekend,” he said. Meanwhile, others like Jones said it was important to snag a Switch 2 soon, rather than wait, citing the risk of Trump’s tariffs raising the price. “Get it while you can,” she said, noting Microsoft recently increased the price for its Xbox consoles.  Best Buy isn’t the only location in San Francisco to offer the Switch 2 for tonight’s release. Nintendo’s official store in the city opened last month and is slated to sell the console as well. But the product will only be available to lucky consumers who were able to snag a preorder, or “warp pass.” Hours before the sales were set to begin, the store held a prelaunch “celebration” event, giving fans a chance to demo the Switch 2. The event attracted a long line of over 80 people when it began at 1 p.m. Several Nintendo fans also dressed up for the event, including a consumer named Annie, who cosplayed as the Zelda character, and said “I came here from Mexico.”"When I was a child I play the Nintendo so much with my friends," Annie added, while also showing off a Zelda tattoo. Another consumer named Greg H. also looked forward to tonight’s launch, having scored a warp pass to buy the Switch 2 from the official Nintendo store in San Francisco. “There is this nostalgic factor of waiting up until midnight to pick up the console,” he said while standing at the prelaunch event with a Nintendo N64 bag. “There’s also a communal aspect, where you meet a lot of people with the same interest.” #039no #work #today039 #diehard #nintendo
    ME.PCMAG.COM
    'No Work Today': Diehard Nintendo Fans Line Up Early For Switch 2
    Lisa Jones has been a Nintendo fan since the company’s first major console, the NES, launched in the 1980s. “I’ve actually had every system, including the Virtual Boy,” she says. So, with Nintendo about to release its newest console, the Switch 2, Jones knew she had to own it on day one. “I took the day off just to make sure I’d get one,” she told PCMag as she waited outside a Best Buy store, sitting on the concrete while occasionally stretching. Jones was among the diehard Nintendo fans who began lining up outside the store in San Francisco, hoping to snag the console on launch day. The Switch 2 becomes available to consumers at 12 a.m. EST / 9 p.m. PST. But not everyone managed to snag a preorder, prompting some to fall back on the tried-and-true method of lining up in person.“Yeah, I’m cold,” said Doonie Love, an actor and model who was first in line at the store. He spoke to us with his black hoodie pulled over his head as the San Francisco wind blew by. (Credit: PCMag/Michael Kan)Love began waiting at about 9 a.m. after failing to secure a preorder, which sold out quickly across retailers weeks ago. Although he’s a Nintendo and Pokémon fan, he actually showed up to the Best Buy on a “whim,” curious to see if people were lining up.“There’s no work today, I just needed something to do,” he said on deciding to wait in line. “I just called someone to bring a jacket, chair, and burrito," he later added. Others like Brad Reinke were ready to line up. “I took the day off. Yeah, I was totally prepared to play video games all day,” he told us while sitting in his foldable chair and eating a pasta takeout order from DoorDash. “We’re here all night so I've got to get lunch and dinner in me.” He too is a major Nintendo fan, and also bought the Switch 1 on launch day back in 2017. “I’m a big collector and I’m probably going to buy everything they have on sale.” he said. While Reinke wasn’t able to secure a preorder, he said he enjoys the experience of the “midnight releases," which attracts other devoted fans. “There’s good company, everyone’s here for the same reason, so we all have stuff to talk about,” he said.Meanwhile, another consumer named James Gualtieri was prepared to work remotely while waiting outside the Best Buy, carrying his laptop and a Wi-Fi hotspot. “I was in a (remote) meeting for half an hour, chatting with folks,” he said. Recommended by Our Editors(Credit: PCMag/Michael Kan)We visited the Best Buy at around 2 p.m. on Wednesday, where the line for customers without preorders was relatively small, at about 10 people. As a result, it looked like all the consumers had a strong chance of scoring the console on launch day. But Gualtieri told us Best Buy staff wouldn’t commit to confirming if everyone in line would come away with the Switch 2 since the retailer also has to prioritize preorders.  “At the end of the day, it’s not the end of the world if I don’t get one,” he said after already waiting for two hours. Fortunately, Gualtieri’s workplace is located next to the Best Buy store. “If I can’t get one, I’ll try to get in line tomorrow morning. I would really love to get one before the weekend,” he said. Meanwhile, others like Jones said it was important to snag a Switch 2 soon, rather than wait, citing the risk of Trump’s tariffs raising the price. “Get it while you can,” she said, noting Microsoft recently increased the price for its Xbox consoles.  Best Buy isn’t the only location in San Francisco to offer the Switch 2 for tonight’s release. Nintendo’s official store in the city opened last month and is slated to sell the console as well. But the product will only be available to lucky consumers who were able to snag a preorder, or “warp pass.” (Credit: PCMag/Michael Kan)Hours before the sales were set to begin, the store held a prelaunch “celebration” event, giving fans a chance to demo the Switch 2. The event attracted a long line of over 80 people when it began at 1 p.m. Several Nintendo fans also dressed up for the event, including a consumer named Annie, who cosplayed as the Zelda character, and said “I came here from Mexico.”"When I was a child I play the Nintendo so much with my friends," Annie added, while also showing off a Zelda tattoo. Another consumer named Greg H. also looked forward to tonight’s launch, having scored a warp pass to buy the Switch 2 from the official Nintendo store in San Francisco. “There is this nostalgic factor of waiting up until midnight to pick up the console,” he said while standing at the prelaunch event with a Nintendo N64 bag. “There’s also a communal aspect, where you meet a lot of people with the same interest.”(Credit: PCMag/Michael Kan)
    Like
    Wow
    Love
    Sad
    Angry
    213
    0 Commentarios 0 Acciones