Zoeken

Marktechpost AI een koppeling hebt gedeeld

2025-05-15 07:35:19 ·

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents

Machine learning engineeringinvolves developing, tuning, and deploying machine learning systems that require iterative experimentation, model optimization, and robust handling of data pipelines. As model complexity increases, so do the challenges associated with orchestrating end-to-end workflows efficiently. Researchers have explored the automation of MLE tasks using AI agents to handle these demands. Large Language Models, particularly those with strong coding and problem-solving abilities, have shown potential to enhance this process significantly. Their role in automating structured workflows is now being tested through rigorous benchmarks and environments tailored to emulate real-world MLE scenarios.
A primary hurdle in automating machine learning engineering lies in the work’s inherently iterative and feedback-driven nature. Tasks such as hyperparameter tuning, model debugging, and data preprocessing cannot be resolved in one step; they require repeated modifications and evaluations. Traditional evaluation tools for AI models often rely on static datasets and do not allow for real-time error feedback or interactive problem-solving. This limitation prevents LLM agents from learning through trial and error, an essential component for mastering engineering tasks that evolve or require multiple attempts for success.

Earlier tools to evaluate LLMs in engineering or coding tasks have mostly focused on individual subtasks or isolated challenges. These include tools like MLAgentBench and DSBench, which rely on narrow test cases sourced from Kaggle competitions or synthetic datasets. While they cover more than basic tasks, they do not enable agents to perform code execution, debugging, or results interpretation in a live setting. Other environments, like SWE-Gym, focus exclusively on software engineering and lack support for machine learning-specific workflows. These limitations have slowed the creation of versatile, high-performing MLE agents that can handle real-time project complexities.
Researchers from Georgia Institute of Technology and Stanford University have introduced MLE-Dojo, a framework with an interactive environment that connects LLM agents with real-world machine learning tasks derived from over 200 Kaggle competitions. This framework supports tabular data analysis, computer vision, natural language processing, and time-series forecasting challenges. Research introduced MLE-Dojo to allow agents to write, execute, and revise code in a sandboxed, feedback-rich setting. The goal was to replicate the interactive cycles that human engineers follow, enabling structured learning for agents. The environment includes pre-installed dependencies, evaluation metrics, and supports supervised fine-tuning and reinforcement learning strategies.

MLE-Dojo’s structure consists of modular components that support a wide range of MLE challenges. Each task runs within its own Docker container, isolating it for safety and reproducibility. Agents interact with the environment through a Partially Observable Markov Decision Process, receiving observations, performing actions, and gaining rewards based on performance. The environment supports five primary action types: requesting task information, validating code, executing code, retrieving interaction history, and resetting the environment. It also provides a detailed observation space that includes datasets, execution results, and error messages. The agent receives structured feedback after every interaction, allowing for step-wise improvement. This modular setup helps maintain interoperability and simplifies adding new tasks to the system.
The evaluation included eight frontier LLMs—Gemini-2.5-Pro, DeepSeek-r1, o3-mini, GPT-4o, GPT-4o-mini, Gemini-2.0-Pro, Gemini-2.0-Flash, and DeepSeek-v3—across four core machine learning domains. Gemini-2.5-Pro achieved the highest Elo rating of 1257, followed by DeepSeek-r1 at 1137 and o3-mini at 1108. Regarding HumanRank, Gemini-2.5-Pro led with 61.95%, indicating its superior performance over human benchmarks. Models like GPT-4o-mini executed code only 20% of the time, adopting conservative strategies, while o3-mini performed executions in over 90% of the cases. The average failure rate for Gemini-2.5-Pro remained the lowest across validation and execution phases, reinforcing its robustness. Among domains, computer vision posed the greatest challenge, with most models scoring under 60 in HumanRank. Reasoning models generally produced longer outputs and maintained stronger performance consistency across iterations.

The research highlights the difficulty of applying LLMs to full machine learning workflows. It outlines a comprehensive solution in MLE-Dojo that enables learning through interaction, not just completion. MLE-Dojo sets a new standard for training and evaluating autonomous MLE agents by simulating engineering environments more accurately.

Check out theTwitter and don’t forget to join our 90k+ ML SubReddit.
NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain GeneralizationNikhilhttps://www.marktechpost.com/author/nikhil0980/PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the EnterpriseNikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size: A Metric to Quantify Memory Utilization in Sequence Models for Performance Optimization
#georgia #tech #stanford #researchers #introduce

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents
Machine learning engineeringinvolves developing, tuning, and deploying machine learning systems that require iterative experimentation, model optimization, and robust handling of data pipelines. As model complexity increases, so do the challenges associated with orchestrating end-to-end workflows efficiently. Researchers have explored the automation of MLE tasks using AI agents to handle these demands. Large Language Models, particularly those with strong coding and problem-solving abilities, have shown potential to enhance this process significantly. Their role in automating structured workflows is now being tested through rigorous benchmarks and environments tailored to emulate real-world MLE scenarios. A primary hurdle in automating machine learning engineering lies in the work’s inherently iterative and feedback-driven nature. Tasks such as hyperparameter tuning, model debugging, and data preprocessing cannot be resolved in one step; they require repeated modifications and evaluations. Traditional evaluation tools for AI models often rely on static datasets and do not allow for real-time error feedback or interactive problem-solving. This limitation prevents LLM agents from learning through trial and error, an essential component for mastering engineering tasks that evolve or require multiple attempts for success. Earlier tools to evaluate LLMs in engineering or coding tasks have mostly focused on individual subtasks or isolated challenges. These include tools like MLAgentBench and DSBench, which rely on narrow test cases sourced from Kaggle competitions or synthetic datasets. While they cover more than basic tasks, they do not enable agents to perform code execution, debugging, or results interpretation in a live setting. Other environments, like SWE-Gym, focus exclusively on software engineering and lack support for machine learning-specific workflows. These limitations have slowed the creation of versatile, high-performing MLE agents that can handle real-time project complexities. Researchers from Georgia Institute of Technology and Stanford University have introduced MLE-Dojo, a framework with an interactive environment that connects LLM agents with real-world machine learning tasks derived from over 200 Kaggle competitions. This framework supports tabular data analysis, computer vision, natural language processing, and time-series forecasting challenges. Research introduced MLE-Dojo to allow agents to write, execute, and revise code in a sandboxed, feedback-rich setting. The goal was to replicate the interactive cycles that human engineers follow, enabling structured learning for agents. The environment includes pre-installed dependencies, evaluation metrics, and supports supervised fine-tuning and reinforcement learning strategies. MLE-Dojo’s structure consists of modular components that support a wide range of MLE challenges. Each task runs within its own Docker container, isolating it for safety and reproducibility. Agents interact with the environment through a Partially Observable Markov Decision Process, receiving observations, performing actions, and gaining rewards based on performance. The environment supports five primary action types: requesting task information, validating code, executing code, retrieving interaction history, and resetting the environment. It also provides a detailed observation space that includes datasets, execution results, and error messages. The agent receives structured feedback after every interaction, allowing for step-wise improvement. This modular setup helps maintain interoperability and simplifies adding new tasks to the system. The evaluation included eight frontier LLMs—Gemini-2.5-Pro, DeepSeek-r1, o3-mini, GPT-4o, GPT-4o-mini, Gemini-2.0-Pro, Gemini-2.0-Flash, and DeepSeek-v3—across four core machine learning domains. Gemini-2.5-Pro achieved the highest Elo rating of 1257, followed by DeepSeek-r1 at 1137 and o3-mini at 1108. Regarding HumanRank, Gemini-2.5-Pro led with 61.95%, indicating its superior performance over human benchmarks. Models like GPT-4o-mini executed code only 20% of the time, adopting conservative strategies, while o3-mini performed executions in over 90% of the cases. The average failure rate for Gemini-2.5-Pro remained the lowest across validation and execution phases, reinforcing its robustness. Among domains, computer vision posed the greatest challenge, with most models scoring under 60 in HumanRank. Reasoning models generally produced longer outputs and maintained stronger performance consistency across iterations. The research highlights the difficulty of applying LLMs to full machine learning workflows. It outlines a comprehensive solution in MLE-Dojo that enables learning through interaction, not just completion. MLE-Dojo sets a new standard for training and evaluating autonomous MLE agents by simulating engineering environments more accurately. Check out theTwitter and don’t forget to join our 90k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain GeneralizationNikhilhttps://www.marktechpost.com/author/nikhil0980/PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the EnterpriseNikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size: A Metric to Quantify Memory Utilization in Sequence Models for Performance Optimization #georgia #tech #stanford #researchers #introduce

WWW.MARKTECHPOST.COM

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents

Machine learning engineering (MLE) involves developing, tuning, and deploying machine learning systems that require iterative experimentation, model optimization, and robust handling of data pipelines. As model complexity increases, so do the challenges associated with orchestrating end-to-end workflows efficiently. Researchers have explored the automation of MLE tasks using AI agents to handle these demands. Large Language Models (LLMs), particularly those with strong coding and problem-solving abilities, have shown potential to enhance this process significantly. Their role in automating structured workflows is now being tested through rigorous benchmarks and environments tailored to emulate real-world MLE scenarios. A primary hurdle in automating machine learning engineering lies in the work’s inherently iterative and feedback-driven nature. Tasks such as hyperparameter tuning, model debugging, and data preprocessing cannot be resolved in one step; they require repeated modifications and evaluations. Traditional evaluation tools for AI models often rely on static datasets and do not allow for real-time error feedback or interactive problem-solving. This limitation prevents LLM agents from learning through trial and error, an essential component for mastering engineering tasks that evolve or require multiple attempts for success. Earlier tools to evaluate LLMs in engineering or coding tasks have mostly focused on individual subtasks or isolated challenges. These include tools like MLAgentBench and DSBench, which rely on narrow test cases sourced from Kaggle competitions or synthetic datasets. While they cover more than basic tasks, they do not enable agents to perform code execution, debugging, or results interpretation in a live setting. Other environments, like SWE-Gym, focus exclusively on software engineering and lack support for machine learning-specific workflows. These limitations have slowed the creation of versatile, high-performing MLE agents that can handle real-time project complexities. Researchers from Georgia Institute of Technology and Stanford University have introduced MLE-Dojo, a framework with an interactive environment that connects LLM agents with real-world machine learning tasks derived from over 200 Kaggle competitions. This framework supports tabular data analysis, computer vision, natural language processing, and time-series forecasting challenges. Research introduced MLE-Dojo to allow agents to write, execute, and revise code in a sandboxed, feedback-rich setting. The goal was to replicate the interactive cycles that human engineers follow, enabling structured learning for agents. The environment includes pre-installed dependencies, evaluation metrics, and supports supervised fine-tuning and reinforcement learning strategies. MLE-Dojo’s structure consists of modular components that support a wide range of MLE challenges. Each task runs within its own Docker container, isolating it for safety and reproducibility. Agents interact with the environment through a Partially Observable Markov Decision Process, receiving observations, performing actions, and gaining rewards based on performance. The environment supports five primary action types: requesting task information, validating code, executing code, retrieving interaction history, and resetting the environment. It also provides a detailed observation space that includes datasets, execution results, and error messages. The agent receives structured feedback after every interaction, allowing for step-wise improvement. This modular setup helps maintain interoperability and simplifies adding new tasks to the system. The evaluation included eight frontier LLMs—Gemini-2.5-Pro, DeepSeek-r1, o3-mini, GPT-4o, GPT-4o-mini, Gemini-2.0-Pro, Gemini-2.0-Flash, and DeepSeek-v3—across four core machine learning domains. Gemini-2.5-Pro achieved the highest Elo rating of 1257, followed by DeepSeek-r1 at 1137 and o3-mini at 1108. Regarding HumanRank, Gemini-2.5-Pro led with 61.95%, indicating its superior performance over human benchmarks. Models like GPT-4o-mini executed code only 20% of the time, adopting conservative strategies, while o3-mini performed executions in over 90% of the cases. The average failure rate for Gemini-2.5-Pro remained the lowest across validation and execution phases, reinforcing its robustness. Among domains, computer vision posed the greatest challenge, with most models scoring under 60 in HumanRank. Reasoning models generally produced longer outputs and maintained stronger performance consistency across iterations. The research highlights the difficulty of applying LLMs to full machine learning workflows. It outlines a comprehensive solution in MLE-Dojo that enables learning through interaction, not just completion. MLE-Dojo sets a new standard for training and evaluating autonomous MLE agents by simulating engineering environments more accurately. Check out theTwitter and don’t forget to join our 90k+ ML SubReddit. NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain GeneralizationNikhilhttps://www.marktechpost.com/author/nikhil0980/PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the EnterpriseNikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size (ESS): A Metric to Quantify Memory Utilization in Sequence Models for Performance Optimization

·82 Views

Please log in to like, share and comment!
Marktechpost AI een koppeling hebt gedeeld

2025-05-14 01:19:32 ·

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what it defines as the next pivotal evolution in enterprise automation: Agentic Artificial Intelligence.
These systems, capable of autonomous decision-making and context-aware interactions, are poised to reconfigure how organizations operate—shifting from traditional software models to orchestrated AI-driven services.
From Automation to Autonomous Intelligence
Agentic AI is not just another AI trend—it marks a foundational shift.
Unlike conventional systems that require human input for each decision point, agentic AI systems operate independently to achieve predefined goals.
Drawing on multimodal data (text, audio, images), they reason, plan, adapt, and learn continuously in dynamic environments.
PwC identifies six defining capabilities of agentic AI:
Autonomy in decision-making
Goal-driven behavior aligned with organizational outcomes
Environmental interaction to adapt in real time
Learning capabilities through reinforcement and historical data
Workflow orchestration across complex business functions
Multi-agent communication to coordinate actions within distributed systems
This architecture enables enterprise-grade systems that go beyond single-task automation to orchestrate entire processes with human-like intelligence and accountability.
Closing the Gaps of Traditional AI Approaches
The report contrasts agentic AI with earlier generations of chatbots and RAG-based systems.
Traditional rule-based bots suffer from rigidity, while retrieval-augmented systems often lack contextual understanding across long interactions.
Agentic AI surpasses both by maintaining dialogue memory, reasoning across systems (e.g., CRM, ERP, IVR), and dynamically solving customer issues.
PwC envisions micro-agents—each optimized for tasks like inquiry resolution, sentiment analysis, or escalation—coordinated by a central orchestrator to deliver coherent, responsive service experiences.
Demonstrated Impact Across Sectors
PwC’s guide is grounded in practical use cases spanning industries:
JPMorgan Chase has automated legal document analysis via its COiN platform, saving over 360,000 manual review hours annually.
Siemens leverages agentic AI for predictive maintenance, improving uptime and cutting maintenance costs by 20%.
Amazon uses multimodal agentic models to deliver personalized recommendations, contributing to a 35% increase in sales and improved retention.
These examples demonstrate how agentic systems can optimize decision-making, streamline operations, and enhance customer engagement across functions—from finance and healthcare to logistics and retail.
A Paradigm Shift: Service-as-a-Software
One of the report’s most thought-provoking insights is the rise of service-as-a-software—a departure from traditional licensing models.
In this paradigm, organizations pay not for access to software but for task-specific outcomes delivered by AI agents.
For instance, instead of maintaining a support center, a business might deploy autonomous agents like Sierra and only pay per successful customer resolution.
This model reduces operational costs, expands scalability, and allows organizations to move incrementally from “copilot” to fully autonomous “autopilot” systems.
To implement these systems, enterprises can choose from both commercial and open-source frameworks:
LangGraph and CrewAI offer enterprise-grade orchestration with integration support.
AutoGen and AutoGPT, on the open-source side, support rapid experimentation with multi-agent architectures.
The optimal choice depends on integration needs, IT maturity, and long-term scalability goals.
Crafting a Strategic Adoption Roadmap
PwC emphasizes that success in deploying agentic AI hinges on aligning AI initiatives with business objectives, securing executive sponsorship, and starting with high-impact pilot programs.
Equally crucial is preparing the organization with ethical safeguards, data infrastructure, and cross-functional talent.
Agentic AI offers more than automation—it promises intelligent, adaptable systems that learn and optimize autonomously.
As enterprises recalibrate their AI strategies, those that move early will not only unlock new efficiencies but also shape the next chapter of digital transformation.
Download the Guide here. All credit for this research goes to the researchers of this project.
Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit.
Here’s a brief overview of what we’re building at Marktechpost:
ML News Community – r/machinelearningnews (92k+ members)
Newsletter– airesearchinsights.com/(30k+ subscribers)
miniCON AI Events – minicon.marktechpost.com
AI Reports & Magazines – magazine.marktechpost.com
AI Dev & Research News – marktechpost.com (1M+ monthly readers)
Partner with us
NikhilNikhil is an intern consultant at Marktechpost.
He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur.
Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science.
With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal" style="color: #0066cc;">https://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This" style="color: #0066cc;">https://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size (ESS): A Metric to Quantify Memory Utilization in Sequence Models for Performance OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Huawei" style="color: #0066cc;">https://www.marktechpost.com/author/nikhil0980/Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Google" style="color: #0066cc;">https://www.marktechpost.com/author/nikhil0980/Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering

Source: https://www.marktechpost.com/2025/05/13/pwc-releases-executive-guide-on-agentic-ai-a-strategic-blueprint-for-deploying-autonomous-multi-agent-systems-in-the-enterprise/" style="color: #0066cc;">https://www.marktechpost.com/2025/05/13/pwc-releases-executive-guide-on-agentic-ai-a-strategic-blueprint-for-deploying-autonomous-multi-agent-systems-in-the-enterprise/
#pwc #releases #executive #guide #agentic #strategic #blueprint #for #deploying #autonomous #multiagent #systems #the #enterprise

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise
In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what it defines as the next pivotal evolution in enterprise automation: Agentic Artificial Intelligence. These systems, capable of autonomous decision-making and context-aware interactions, are poised to reconfigure how organizations operate—shifting from traditional software models to orchestrated AI-driven services. From Automation to Autonomous Intelligence Agentic AI is not just another AI trend—it marks a foundational shift. Unlike conventional systems that require human input for each decision point, agentic AI systems operate independently to achieve predefined goals. Drawing on multimodal data (text, audio, images), they reason, plan, adapt, and learn continuously in dynamic environments. PwC identifies six defining capabilities of agentic AI: Autonomy in decision-making Goal-driven behavior aligned with organizational outcomes Environmental interaction to adapt in real time Learning capabilities through reinforcement and historical data Workflow orchestration across complex business functions Multi-agent communication to coordinate actions within distributed systems This architecture enables enterprise-grade systems that go beyond single-task automation to orchestrate entire processes with human-like intelligence and accountability. Closing the Gaps of Traditional AI Approaches The report contrasts agentic AI with earlier generations of chatbots and RAG-based systems. Traditional rule-based bots suffer from rigidity, while retrieval-augmented systems often lack contextual understanding across long interactions. Agentic AI surpasses both by maintaining dialogue memory, reasoning across systems (e.g., CRM, ERP, IVR), and dynamically solving customer issues. PwC envisions micro-agents—each optimized for tasks like inquiry resolution, sentiment analysis, or escalation—coordinated by a central orchestrator to deliver coherent, responsive service experiences. Demonstrated Impact Across Sectors PwC’s guide is grounded in practical use cases spanning industries: JPMorgan Chase has automated legal document analysis via its COiN platform, saving over 360,000 manual review hours annually. Siemens leverages agentic AI for predictive maintenance, improving uptime and cutting maintenance costs by 20%. Amazon uses multimodal agentic models to deliver personalized recommendations, contributing to a 35% increase in sales and improved retention. These examples demonstrate how agentic systems can optimize decision-making, streamline operations, and enhance customer engagement across functions—from finance and healthcare to logistics and retail. A Paradigm Shift: Service-as-a-Software One of the report’s most thought-provoking insights is the rise of service-as-a-software—a departure from traditional licensing models. In this paradigm, organizations pay not for access to software but for task-specific outcomes delivered by AI agents. For instance, instead of maintaining a support center, a business might deploy autonomous agents like Sierra and only pay per successful customer resolution. This model reduces operational costs, expands scalability, and allows organizations to move incrementally from “copilot” to fully autonomous “autopilot” systems. To implement these systems, enterprises can choose from both commercial and open-source frameworks: LangGraph and CrewAI offer enterprise-grade orchestration with integration support. AutoGen and AutoGPT, on the open-source side, support rapid experimentation with multi-agent architectures. The optimal choice depends on integration needs, IT maturity, and long-term scalability goals. Crafting a Strategic Adoption Roadmap PwC emphasizes that success in deploying agentic AI hinges on aligning AI initiatives with business objectives, securing executive sponsorship, and starting with high-impact pilot programs. Equally crucial is preparing the organization with ethical safeguards, data infrastructure, and cross-functional talent. Agentic AI offers more than automation—it promises intelligent, adaptable systems that learn and optimize autonomously. As enterprises recalibrate their AI strategies, those that move early will not only unlock new efficiencies but also shape the next chapter of digital transformation. Download the Guide here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size (ESS): A Metric to Quantify Memory Utilization in Sequence Models for Performance OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering Source: https://www.marktechpost.com/2025/05/13/pwc-releases-executive-guide-on-agentic-ai-a-strategic-blueprint-for-deploying-autonomous-multi-agent-systems-in-the-enterprise/ #pwc #releases #executive #guide #agentic #strategic #blueprint #for #deploying #autonomous #multiagent #systems #the #enterprise

WWW.MARKTECHPOST.COM

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

In its latest executive guide, “Agentic AI – The New Frontier in GenAI,” PwC presents a strategic approach for what it defines as the next pivotal evolution in enterprise automation: Agentic Artificial Intelligence. These systems, capable of autonomous decision-making and context-aware interactions, are poised to reconfigure how organizations operate—shifting from traditional software models to orchestrated AI-driven services. From Automation to Autonomous Intelligence Agentic AI is not just another AI trend—it marks a foundational shift. Unlike conventional systems that require human input for each decision point, agentic AI systems operate independently to achieve predefined goals. Drawing on multimodal data (text, audio, images), they reason, plan, adapt, and learn continuously in dynamic environments. PwC identifies six defining capabilities of agentic AI: Autonomy in decision-making Goal-driven behavior aligned with organizational outcomes Environmental interaction to adapt in real time Learning capabilities through reinforcement and historical data Workflow orchestration across complex business functions Multi-agent communication to coordinate actions within distributed systems This architecture enables enterprise-grade systems that go beyond single-task automation to orchestrate entire processes with human-like intelligence and accountability. Closing the Gaps of Traditional AI Approaches The report contrasts agentic AI with earlier generations of chatbots and RAG-based systems. Traditional rule-based bots suffer from rigidity, while retrieval-augmented systems often lack contextual understanding across long interactions. Agentic AI surpasses both by maintaining dialogue memory, reasoning across systems (e.g., CRM, ERP, IVR), and dynamically solving customer issues. PwC envisions micro-agents—each optimized for tasks like inquiry resolution, sentiment analysis, or escalation—coordinated by a central orchestrator to deliver coherent, responsive service experiences. Demonstrated Impact Across Sectors PwC’s guide is grounded in practical use cases spanning industries: JPMorgan Chase has automated legal document analysis via its COiN platform, saving over 360,000 manual review hours annually. Siemens leverages agentic AI for predictive maintenance, improving uptime and cutting maintenance costs by 20%. Amazon uses multimodal agentic models to deliver personalized recommendations, contributing to a 35% increase in sales and improved retention. These examples demonstrate how agentic systems can optimize decision-making, streamline operations, and enhance customer engagement across functions—from finance and healthcare to logistics and retail. A Paradigm Shift: Service-as-a-Software One of the report’s most thought-provoking insights is the rise of service-as-a-software—a departure from traditional licensing models. In this paradigm, organizations pay not for access to software but for task-specific outcomes delivered by AI agents. For instance, instead of maintaining a support center, a business might deploy autonomous agents like Sierra and only pay per successful customer resolution. This model reduces operational costs, expands scalability, and allows organizations to move incrementally from “copilot” to fully autonomous “autopilot” systems. To implement these systems, enterprises can choose from both commercial and open-source frameworks: LangGraph and CrewAI offer enterprise-grade orchestration with integration support. AutoGen and AutoGPT, on the open-source side, support rapid experimentation with multi-agent architectures. The optimal choice depends on integration needs, IT maturity, and long-term scalability goals. Crafting a Strategic Adoption Roadmap PwC emphasizes that success in deploying agentic AI hinges on aligning AI initiatives with business objectives, securing executive sponsorship, and starting with high-impact pilot programs. Equally crucial is preparing the organization with ethical safeguards, data infrastructure, and cross-functional talent. Agentic AI offers more than automation—it promises intelligent, adaptable systems that learn and optimize autonomously. As enterprises recalibrate their AI strategies, those that move early will not only unlock new efficiencies but also shape the next chapter of digital transformation. Download the Guide here. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 90k+ ML SubReddit. Here’s a brief overview of what we’re building at Marktechpost: ML News Community – r/machinelearningnews (92k+ members) Newsletter– airesearchinsights.com/(30k+ subscribers) miniCON AI Events – minicon.marktechpost.com AI Reports & Magazines – magazine.marktechpost.com AI Dev & Research News – marktechpost.com (1M+ monthly readers) Partner with us NikhilNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.Nikhilhttps://www.marktechpost.com/author/nikhil0980/Multimodal AI Needs More Than Modality Support: Researchers Propose General-Level and General-Bench to Evaluate True Synergy in Generalist ModelsNikhilhttps://www.marktechpost.com/author/nikhil0980/This AI Paper Introduces Effective State-Size (ESS): A Metric to Quantify Memory Utilization in Sequence Models for Performance OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Huawei Introduces Pangu Ultra MoE: A 718B-Parameter Sparse Language Model Trained Efficiently on Ascend NPUs Using Simulation-Driven Architecture and System-Level OptimizationNikhilhttps://www.marktechpost.com/author/nikhil0980/Google Redefines Computer Science R&D: A Hybrid Research Model that Merges Innovation with Scalable Engineering

·384 Views

Please log in to like, share and comment!
Building Design een koppeling hebt gedeeld

2025-05-13 13:56:04 ·

#333;">Lessons must be learned from past PFI failures, government infrastructure advisor warns

Comments from NISTA’s Matthew Vickerstaff come as ministers weigh up benefits of relaunching initiative next monthThe government’s new infrastructure advisory body has said ministers would need to “learn from the mistakes” of the past if a new generation of PFI contracts are launched as part of the upcoming infrastructure strategy.
Matthew Vickerstaff, deputy chief executive of the The National Infrastructure and Service Transformation Authority (NISTA), said there was still a “constant drumbeat” of construction issues on schools built through private finance initiatives (PFI).
Matthew Vickerstaff speaking at the Public Accounts Committee yesterday afternoon
Chancellor Rachel Reeves is understood to be considering reinstating a form of private financing to pay for public projects, including social infrastructure schemes such as schools, ahead of the launch of its 10-Year Infrastructure Strategy next month.
It would be the first major rollout of PFI in England since 2018, when then chancellor Philip Hammond declared the successor scheme to the original PFI programme as “inflexible and overly complex”.
>> See also: PFI: Do the numbers add up?
Speaking at a meeting of the Public Accounts Committee in Parliament yesterday, Vickerstaff highlighted issues that had blighted historic PFI schemes where construction risk had been transferred to the private sector.
“Just what we’re seeing on school projects, leaking roofs is a consistent, constant drum beat, fire door stopping, acoustics, lighting levels, the ability of classrooms to be operable in a white board environment, problems around leisure centres or sports facilities, contamination of land, latent defects of refurbishments on old buildings creating real problems,” he said.
“The dash to get the schools ready for September, I cannot tell you how many PFI schools have that problem, and we need to get the private sector to fix it.”
But while Vickerstaff said he was “ambivalent” about a new generation of PFI contracts, he argued contractual arrangements on new schemes could contain less risk for the public purse if the government did decide to opt for this route in its infrastructure strategy.
“I would say that compared with 25 years ago, the asset management, the building information systems and computer aided facilities management has vastly improved so we’re dealing with a generation of contracts that would certainly by improved whether it’s public sector or private sector,” he said.
“I’m ambivalent but what we need to make sure is that we learn from the mistakes and definitely get them to fix what we’re experiencing in some situations.”
Vickerstaff added: “In terms of lessons learned, making sure construction is monitored by a clerk of works and independently certified would be a really important factor moving forward, because construction defects have been a problem because the construction contracts whether it be public sector or private sector have not been well monitored or controlled.”
Meanwhile, a new report by PwC has called on the government to explore a new generation of public-private finance in order to address the deficit in infrastructure including schools and healthcare.
The research, published today, found “strong market appetite” for a new model of public-private partnerships which could be based on the Mutual Investment Model developed in Wales.
PwC corporate finance associate director Dan Whittle said: “There is a strong view that public-private finance has a valuable role to play as a strategic tool to close the UK’s infrastructure gap, particularly at a time when we are constrained by fiscal rules.
“There is no need to reinvent the fundamentals of the PPP model.
What must continue to evolve is how we implement this model with refined risk allocation to reflect the current appetite of the market, smarter contract management, and a genuine partnership approach.”
The government is expected to unveil its infrastructure strategy alongside its spending review in June.

#666;">المصدر: https://www.bdonline.co.uk/news/lessons-must-be-learned-from-past-pfi-failures-government-infrastructure-advisor-warns/5135938.article" style="color: #0066cc; text-decoration: none;">www.bdonline.co.uk
#0066cc;">#lessons #must #learned #from #past #pfi #failures #government #infrastructure #advisor #warns #comments #nistas #matthew #vickerstaff #come #ministers #weigh #benefits #relaunching #initiative #next #monththe #governments #new #advisory #body #has #said #would #need #learn #the #mistakes #generation #contracts #are #launched #part #upcoming #strategymatthew #deputy #chief #executive #national #and #service #transformation #authority #nista #there #was #still #constant #drumbeat #construction #issues #schools #built #through #private #finance #initiatives #pfimatthew #speaking #public #accounts #committee #yesterday #afternoonchancellor #rachel #reeves #understood #considering #reinstating #form #financing #pay #for #projects #including #social #schemes #such #ahead #launch #its #10year #strategy #monthit #first #major #rollout #england #since #when #then #chancellor #philip #hammond #declared #successor #scheme #original #programme #inflexible #overly #complexampgtampgt #see #alsopfi #numbers #add #upspeaking #meeting #parliament #highlighted #that #had #blighted #historic #where #risk #been #transferred #sectorjust #what #were #seeing #school #leaking #roofs #consistent #drum #beat #fire #door #stopping #acoustics #lighting #levels #ability #classrooms #operable #white #board #environment #problems #around #leisure #centres #sports #facilities #contamination #land #latent #defects #refurbishments #old #buildings #creating #real #saidthe #dash #get #ready #september #cannot #tell #you #how #many #have #problem #sector #fix #itbut #while #ambivalent #about #argued #contractual #arrangements #could #contain #less #purse #did #decide #opt #this #route #strategyi #say #compared #with #years #ago #asset #management #building #information #systems #computer #aided #vastly #improved #dealing #certainly #whether #saidim #but #make #sure #definitely #them #experiencing #some #situationsvickerstaff #added #terms #making #monitored #clerk #works #independently #certified #really #important #factor #moving #forward #because #not #well #controlledmeanwhile #report #pwc #called #explore #publicprivate #order #address #deficit #healthcarethe #research #published #today #found #strong #market #appetite #model #partnerships #which #based #mutual #investment #developed #walespwc #corporate #associate #director #dan #whittle #view #valuable #role #play #strategic #tool #close #uks #gap #particularly #time #constrained #fiscal #rulesthere #reinvent #fundamentals #ppp #modelwhat #continue #evolve #implement #refined #allocation #reflect #current #smarter #contract #genuine #partnership #approachthe #expected #unveil #alongside #spending #review #june

Lessons must be learned from past PFI failures, government infrastructure advisor warns
Comments from NISTA’s Matthew Vickerstaff come as ministers weigh up benefits of relaunching initiative next monthThe government’s new infrastructure advisory body has said ministers would need to “learn from the mistakes” of the past if a new generation of PFI contracts are launched as part of the upcoming infrastructure strategy. Matthew Vickerstaff, deputy chief executive of the The National Infrastructure and Service Transformation Authority (NISTA), said there was still a “constant drumbeat” of construction issues on schools built through private finance initiatives (PFI). Matthew Vickerstaff speaking at the Public Accounts Committee yesterday afternoon Chancellor Rachel Reeves is understood to be considering reinstating a form of private financing to pay for public projects, including social infrastructure schemes such as schools, ahead of the launch of its 10-Year Infrastructure Strategy next month. It would be the first major rollout of PFI in England since 2018, when then chancellor Philip Hammond declared the successor scheme to the original PFI programme as “inflexible and overly complex”. >> See also: PFI: Do the numbers add up? Speaking at a meeting of the Public Accounts Committee in Parliament yesterday, Vickerstaff highlighted issues that had blighted historic PFI schemes where construction risk had been transferred to the private sector. “Just what we’re seeing on school projects, leaking roofs is a consistent, constant drum beat, fire door stopping, acoustics, lighting levels, the ability of classrooms to be operable in a white board environment, problems around leisure centres or sports facilities, contamination of land, latent defects of refurbishments on old buildings creating real problems,” he said. “The dash to get the schools ready for September, I cannot tell you how many PFI schools have that problem, and we need to get the private sector to fix it.” But while Vickerstaff said he was “ambivalent” about a new generation of PFI contracts, he argued contractual arrangements on new schemes could contain less risk for the public purse if the government did decide to opt for this route in its infrastructure strategy. “I would say that compared with 25 years ago, the asset management, the building information systems and computer aided facilities management has vastly improved so we’re dealing with a generation of contracts that would certainly by improved whether it’s public sector or private sector,” he said. “I’m ambivalent but what we need to make sure is that we learn from the mistakes and definitely get them to fix what we’re experiencing in some situations.” Vickerstaff added: “In terms of lessons learned, making sure construction is monitored by a clerk of works and independently certified would be a really important factor moving forward, because construction defects have been a problem because the construction contracts whether it be public sector or private sector have not been well monitored or controlled.” Meanwhile, a new report by PwC has called on the government to explore a new generation of public-private finance in order to address the deficit in infrastructure including schools and healthcare. The research, published today, found “strong market appetite” for a new model of public-private partnerships which could be based on the Mutual Investment Model developed in Wales. PwC corporate finance associate director Dan Whittle said: “There is a strong view that public-private finance has a valuable role to play as a strategic tool to close the UK’s infrastructure gap, particularly at a time when we are constrained by fiscal rules. “There is no need to reinvent the fundamentals of the PPP model. What must continue to evolve is how we implement this model with refined risk allocation to reflect the current appetite of the market, smarter contract management, and a genuine partnership approach.” The government is expected to unveil its infrastructure strategy alongside its spending review in June.
المصدر: www.bdonline.co.uk
#lessons #must #learned #from #past #pfi #failures #government #infrastructure #advisor #warns #comments #nistas #matthew #vickerstaff #come #ministers #weigh #benefits #relaunching #initiative #next #monththe #governments #new #advisory #body #has #said #would #need #learn #the #mistakes #generation #contracts #are #launched #part #upcoming #strategymatthew #deputy #chief #executive #national #and #service #transformation #authority #nista #there #was #still #constant #drumbeat #construction #issues #schools #built #through #private #finance #initiatives #pfimatthew #speaking #public #accounts #committee #yesterday #afternoonchancellor #rachel #reeves #understood #considering #reinstating #form #financing #pay #for #projects #including #social #schemes #such #ahead #launch #its #10year #strategy #monthit #first #major #rollout #england #since #when #then #chancellor #philip #hammond #declared #successor #scheme #original #programme #inflexible #overly #complexampgtampgt #see #alsopfi #numbers #add #upspeaking #meeting #parliament #highlighted #that #had #blighted #historic #where #risk #been #transferred #sectorjust #what #were #seeing #school #leaking #roofs #consistent #drum #beat #fire #door #stopping #acoustics #lighting #levels #ability #classrooms #operable #white #board #environment #problems #around #leisure #centres #sports #facilities #contamination #land #latent #defects #refurbishments #old #buildings #creating #real #saidthe #dash #get #ready #september #cannot #tell #you #how #many #have #problem #sector #fix #itbut #while #ambivalent #about #argued #contractual #arrangements #could #contain #less #purse #did #decide #opt #this #route #strategyi #say #compared #with #years #ago #asset #management #building #information #systems #computer #aided #vastly #improved #dealing #certainly #whether #saidim #but #make #sure #definitely #them #experiencing #some #situationsvickerstaff #added #terms #making #monitored #clerk #works #independently #certified #really #important #factor #moving #forward #because #not #well #controlledmeanwhile #report #pwc #called #explore #publicprivate #order #address #deficit #healthcarethe #research #published #today #found #strong #market #appetite #model #partnerships #which #based #mutual #investment #developed #walespwc #corporate #associate #director #dan #whittle #view #valuable #role #play #strategic #tool #close #uks #gap #particularly #time #constrained #fiscal #rulesthere #reinvent #fundamentals #ppp #modelwhat #continue #evolve #implement #refined #allocation #reflect #current #smarter #contract #genuine #partnership #approachthe #expected #unveil #alongside #spending #review #june

WWW.BDONLINE.CO.UK

Lessons must be learned from past PFI failures, government infrastructure advisor warns

Comments from NISTA’s Matthew Vickerstaff come as ministers weigh up benefits of relaunching initiative next monthThe government’s new infrastructure advisory body has said ministers would need to “learn from the mistakes” of the past if a new generation of PFI contracts are launched as part of the upcoming infrastructure strategy. Matthew Vickerstaff, deputy chief executive of the The National Infrastructure and Service Transformation Authority (NISTA), said there was still a “constant drumbeat” of construction issues on schools built through private finance initiatives (PFI). Matthew Vickerstaff speaking at the Public Accounts Committee yesterday afternoon Chancellor Rachel Reeves is understood to be considering reinstating a form of private financing to pay for public projects, including social infrastructure schemes such as schools, ahead of the launch of its 10-Year Infrastructure Strategy next month. It would be the first major rollout of PFI in England since 2018, when then chancellor Philip Hammond declared the successor scheme to the original PFI programme as “inflexible and overly complex”. >> See also: PFI: Do the numbers add up? Speaking at a meeting of the Public Accounts Committee in Parliament yesterday, Vickerstaff highlighted issues that had blighted historic PFI schemes where construction risk had been transferred to the private sector. “Just what we’re seeing on school projects, leaking roofs is a consistent, constant drum beat, fire door stopping, acoustics, lighting levels, the ability of classrooms to be operable in a white board environment, problems around leisure centres or sports facilities, contamination of land, latent defects of refurbishments on old buildings creating real problems,” he said. “The dash to get the schools ready for September, I cannot tell you how many PFI schools have that problem, and we need to get the private sector to fix it.” But while Vickerstaff said he was “ambivalent” about a new generation of PFI contracts, he argued contractual arrangements on new schemes could contain less risk for the public purse if the government did decide to opt for this route in its infrastructure strategy. “I would say that compared with 25 years ago, the asset management, the building information systems and computer aided facilities management has vastly improved so we’re dealing with a generation of contracts that would certainly by improved whether it’s public sector or private sector,” he said. “I’m ambivalent but what we need to make sure is that we learn from the mistakes and definitely get them to fix what we’re experiencing in some situations.” Vickerstaff added: “In terms of lessons learned, making sure construction is monitored by a clerk of works and independently certified would be a really important factor moving forward, because construction defects have been a problem because the construction contracts whether it be public sector or private sector have not been well monitored or controlled.” Meanwhile, a new report by PwC has called on the government to explore a new generation of public-private finance in order to address the deficit in infrastructure including schools and healthcare. The research, published today, found “strong market appetite” for a new model of public-private partnerships which could be based on the Mutual Investment Model developed in Wales. PwC corporate finance associate director Dan Whittle said: “There is a strong view that public-private finance has a valuable role to play as a strategic tool to close the UK’s infrastructure gap, particularly at a time when we are constrained by fiscal rules. “There is no need to reinvent the fundamentals of the PPP model. What must continue to evolve is how we implement this model with refined risk allocation to reflect the current appetite of the market, smarter contract management, and a genuine partnership approach.” The government is expected to unveil its infrastructure strategy alongside its spending review in June.

·474 Views

Please log in to like, share and comment!

Registreer

Talen

Georgia Tech and Stanford Researchers Introduce MLE-Dojo: A Gym-Style Framework Designed for Training, Evaluating, and Benchmarking Autonomous Machine Learning Engineering (MLE) Agents

PwC Releases Executive Guide on Agentic AI: A Strategic Blueprint for Deploying Autonomous Multi-Agent Systems in the Enterprise

Lessons must be learned from past PFI failures, government infrastructure advisor warns