• This Windows 11 update makes Start Menu much more desirable and usable again
    www.digitaltrends.com
    The Start Menu has been the central element in Microsoft Windows for nearly three decades. Though loved initially for its resourcefulness, the Menu went through some debatable I call them abhorrent changes with Windows 8, but eventually returned to occupying less space in the interface with Windows 8.1, and then Windows 10 and 11. Despite the rescuing, it is still reeling under the damaging changes in the form of recommendations and random automatically populating lists that reduce it to a mere glorified search interface. However, Microsoft may now be looking to resolve these issues and bringing back a more simplified interface with an upcoming update. Microsoft is testing a new interface for Start Menu on Windows 11, reducing the existing clutter of randomly interspersed apps and files. X user @phantomofearth, renowned for testing new features in Windows Insider builds, gave us a good look at the new interface in a detailed video walkthrough. First off, the video shows that the updated interface rids away of the current split view that comprises Pinned and Recommended sections and merges them into a single section. It adds a third section labelled All, which lists every installed app, which was previously accessible by clicking the All button next on top of the Pinned section. These can be arranged into an alphabetical list or a grid with folders compiling apps based on their category. Recommended VideosThe overhauled Start Menu also gets a vertically scrolling layout, so all your apps can be accessed through a single page with fewer button clicks or taps. In addition, the Start Menu also lets you hide recommendations entirely if youd like it that way. The idea is to seemingly give Windows 11 users more control over the menus usability. Windows XPs Start Menu Digital TrendsWhile its difficult for the Start Menu to return to its glorious Windows XP days, I expect some respite from the barrage of unwanted ads with these improvements. These changes are coming to the latest Windows 11 Insider preview builds in the Dev and Beta channels, with build numbers 26200.5518 and 26120.3671, respectively. That means you cant immediately access it unless you are part of those Insider channels. Microsoft hasnt officially made any announcements on when the new Start Menu interface will be available for stable channels and whether it will be or not. However, with Microsofts Copilot event, along with the companys 50th Anniversary celebration, lined up for early morning tomorrow, we can expect it to share some news. Editors Recommendations
    0 Commentarii ·0 Distribuiri ·38 Views
  • Deloitte is planning layoffs after a federal crackdown on consulting contracts
    www.businessinsider.com
    Deloitte announced layoffs at an all-hands call on Thursday, employees told BI. J. David Ake/Getty Images 2025-04-04T06:32:53Z SaveSaved Read in app This story is available exclusively to Business Insider subscribers. Become an Insider and start reading now.Have an account? Deloitte is planning layoffs in its government and public services practice.Deloitte has seen 127 federal contracts cut or modified since January as DOGE slashes government costs.Deloitte leaders said at a meeting Thursday that cuts may cause losses next year, an employee told BI.Deloitte is preparing for layoffs.Three current Deloitte employees told Business Insider they heard about the company's plans on a call for the firm's consulting and advisory practices on Thursday.On the call, known internally as "A+C On Air,"The employee added that Salzetti said cuts in the division would conclude by the end of April.In a statement to Business Insider, Jonathan Gandal, a managing director in Deloitte's reputation division, confirmed the layoffs, writing, "We are taking modest personnel actions based on moderating growth in certain areas, our government clients' evolving needs, and low levels of voluntary attrition."It was not immediately clear how many people would be affected by the layoffs.The government and public services practice has over 15,000 employees in the US and is worth $5.5 billion, according to Deloitte's website.DOGE comes for consultingThe firm is bearing the brunt of DOGE's scrutiny of the federal government's contracts with the consulting industry.The General Services Administration, which is leading the consulting cost-cutting push, asked 10 firms, including Deloitte, to submit a scorecard detailing their pricing and suggestions for where they could cut costs this past Monday. The results of those submissions have not been published yet, but the GSA is pushing for deeper cuts, The Wall Street Journal reported.Since January, at least 127 of Deloitte's government contracts have been cut or modified more than double the number for Booz Allen Hamilton, the second firm most affected by federal cuts according to data from the White House's DOGE office analyzed by Business Insider earlier this week. That amounts to about $371.8 million in cuts, or over 11% of the $3.3 billion in contracts Deloitte strikes with US federal agencies a year.At Thursday's meeting, executives acknowledged the recent contract cuts.Two employees who were on the call told BI that leadership said that Deloitte's fiscal year, which typically runs June 1 through May 31, will end with higher revenue projections than planned.All three employees"I'm expecting a healthy, but not jaw-dropping bonus in May, and then not really expecting much of any bonus next year," one of the employees said.Employees told BI that DOGE's actions have shifted the climate at Deloitte, especially for those who work in its public sector practices.One employee added that "the tariffs and chaos are beginning to cause alarm bells in commercial as well," referring to the slew of tariffs Trump has proposed since taking office.Deloitte did not respond to BI's request for comment for more details on bonuses and on DOGE's effect on company culture.Are you a consultant who has been impacted by DOGE? Reach out to Lakshmi Varanasi at lvaranasi@businessinsider.com or lvaranasi.70 on encrypted messaging app Signal.Recommended video
    0 Commentarii ·0 Distribuiri ·55 Views
  • Here's how the 10 richest people in the world fared after Trump's tariffs
    www.businessinsider.com
    Amazon founder Jeff Bezos, Google CEO Sundar Pichai, and Tesla and Tesla CEO Elon Musk lost billions this week in the market selloff. SAUL LOEB / POOL / AFP 2025-04-04T06:26:45Z SaveSaved Read in app This story is available exclusively to Business Insider subscribers. Become an Insider and start reading now.Have an account? The world's top 10 richest people saw $74 billion vanish on paper after Trump's tariffs.Trump's tariffs triggered a huge market sell-off.Musk lost $11 billion and Bezos nearly $16 billion, per the Bloomberg Billionaire Index.The world's top 10 richest people saw $74 billion vanish on paper after this week's market rout, according to the Bloomberg Billionaires Index.President Donald Trump's sweeping tariffs on Wednesday afternoon triggered market chaos. Stocks suffered their worst single-day loss in five years on Thursday. The S&P 500 dropped nearly 5%, the Dow lost 1,679 points, and the Nasdaq composite plunged 6%.Here's how much the wealthiest have lost since the tariff announcement and how it compares to their net worth, per Bloomberg:Elon Musk-$11.0 billion ( -2.5%) Elon Musk has remained a fixture in Washington since the start of President Donald Trump's second term. Graeme Sloan for The Washington Post via Getty Images Elon Musk has seen his net worth fluctuate wildly over the past several weeks, as his involvement with the White House DOGE office has drawn public ire and boycotts against Tesla, sending Tesla's stock down.Musk's wealth largely comes from his stake in Tesla, but he is also the CEO of X/Twitter, Neuralink, the Boring Company, and SpaceX. He's worth $322 billion, per the Bloomberg Billionaires Index, making him the world's richest person.Jeff Bezos-$15.9 billion ( -6.7%) Jeff Bezos stepped down as Amazon's CEO in 2021 but remains closely tied to the company's strategic operation. AP Photo/John Loche Bezos is the founder and executive chairman of Amazon, and he is worth $201 billion. He also owns The Washington Post, which he purchased in 2013. Bezos stepped down as Amazon's CEO in 2021.Mark Zuckerberg-$17.9 billion ( -8.6%) Mark Zuckerberg, the cofounder, chairman, and CEO of Meta, has a net worth of $189 billion. Manuel Orbegozo/REUTERS Mark Zuckerberg has been facing criticism over rolling back fact-checking on Meta platforms, including Facebook, Threads, and Instagram, and replacing that with "community notes."Zuckerberg is the cofounder, chairman, and CEO of Meta, putting him at a net worth of $189 billion.Warren Buffet-$2.57 billion ( -1.8%) Warren Buffett is the chairman and CEO of Berkshire Hathaway, a multinational conglomerate holding company. Reuters/Mario Anzuoni Warren Buffett, with a net worth of $165 billion, is the chairman and CEO of Berkshire Hathaway, a multinational conglomerate holding company. Through Berkshire, Buffett owns a wide range of businesses, including GEICO, BNSF Railway, and Dairy Queen.Berkshire Hathaway's largest holding is Apple, which makes up around 20% of its portfolio.Bernard Arnault-$6.22 billion ( -3.5%) Bernard Arnault, the chairman and CEO of LVMH, has a net worth of $163 billion. Chesnot/Getty Images Bernard Arnault is the chairman and CEO of LVMH, the world's largest luxury goods conglomerate. The majority of his $163 billion comes from his stake in LVMH, which owns over 75 brands across fashion, cosmetics, jewelry, and spirits, including Louis Vuitton, Dior, and Mot & Chandon.LVMH has been reporting declining sales under dampened consumer sentiments in multiple countries.Bill Gates-$291 million ( -0.2%) Bill Gates, the co-founder of Microsoft, has a net worth of $162 billion. Roy Rochlin/Getty Images for Netflix Bill Gates is the cofounder of Microsoft, though he stepped down from the company's board in 2020 and now owns only a small percentage of its shares. Most of his $162 billion in wealth is managed through Cascade Investment, a private firm that holds major stakes in companies like the Four Seasons Hotels.Gates also runs the Bill & Melinda Gates Foundation, a philanthropy organization that supports global health, education, and climate initiatives.Larry Ellison-$8.10 billion ( -4.2%) Larry Ellison, the co-founder, executive chairman, and chief technology officer of Oracle, is also a major investor in Tesla. Elizabeth Frantz/REUTERS Larry Ellison is the cofounder, executive chairman, and chief technology officer of Oracle, one of the world's largest software and cloud computing companies. With a net worth of $160 billion, Ellison is also a major investor in Tesla and owns a large portion of Lanai, a Hawaiian island.Ellison, along with OpenAI's Sam Altman and SoftBank's Masayoshi Son, are also spearheading Project Stargate, a $500 billion AI infrastructure initiative supported by Trump.Larry Page-$4.79 billion ( -2.9%) Larry Page stepped down as Alphabet's CEO in 2019. Kimberly White/Getty Images for Fortune Larry Page is the cofounder of Google and a board member of its parent company, Alphabet. While he stepped down as Alphabet's CEO in 2019, he remains a major shareholder and influential figure, with a net worth of $138 billion.Page is also a major backer of Kitty Hawk and Opener, companies that are developing electric flying vehicles.Steve Ballmer-$2.85 billion ( -1.9%) Steve Ballmer is the former CEO of Microsoft and remains one of the company's largest individual shareholders. Michael Buckner/Variety via Getty Images Steve Ballmer is the former CEO of Microsoft, a role he held from 2000 to 2014. He remains one of the company's largest individual shareholders, with a net worth of $131 billion.Outside Microsoft, Ballmer also owns the Los Angeles Clippers, an NBA team he purchased in 2014 for $2 billion.Sergey Brin-$4.46 billion ( -2.8%) Google cofounder Sergey Brin played a key role in developing its early search algorithms. Kelly Sullivan/Getty Images Sergey Brin is the cofounder of Google and played a key role in developing its early search algorithms. He served as president of Alphabet until stepping down in 2019.Like Page, Brin retains significant influence at Alphabet through his Class B shares. Most of his $130 billion net worth is tied to Alphabet stock.Recommended video
    0 Commentarii ·0 Distribuiri ·56 Views
  • 0 Commentarii ·0 Distribuiri ·22 Views
  • RT SpaceX: Nearly four days after the @framonauts launched to a polar orbit, Dragon and the Fram2 crew are set to return to Earth on Friday, April 4 ...
    x.com
    RTSpaceXNearly four days after the @framonauts launched to a polar orbit, Dragon and the Fram2 crew are set to return to Earth on Friday, April 4 http://spacex.com/launches/mission/?missionId=fram2Chun:I often say Fram2 is a Svalbard mission. We @framonauts all met on Svalbard, and we love the ice. The mission was planned when I lived there, and we fly polar because, in an ISS-like orbit, we are unable to see where we live. From this perspective, the mission has perfectly
    0 Commentarii ·0 Distribuiri ·21 Views
  • 0 Commentarii ·0 Distribuiri ·23 Views
  • Nintendo Switch 2 Supports Nvidia's AI-Powered DLSS and Ray Tracing, Nintendo and Nvidia Confirm
    www.gadgets360.com
    The Nintendo Switch 2 will support Nvidia DLSS upscaling technology and ray tracing in games, Nintendo has said after fully unveiling its next console. The Switch 2's technical specifications confirm that the hybrid console utilises a custom Nvidia processor. Nvidia, too, confirmed DLSS and ray-tracing support for the Switch 2 on Thursday, and said its custom GPU would bring next-level visuals and smoother gameplay on the new platform.DLSS, Ray Tracing on Nintendo Switch 2Nintendo confirmed Nvidia's Deep Learning Super Sampling (DLSS) ray tracing technology on the Switch 2 in a roundtable Q&A in New York following the Nintendo Direct broadcast on Wednesday. According to IGN, which attended the roundtable session, Nintendo did not provide details of the DLSS version supported.DLSS upscaling technology uses AI and machine learning to improve the quality of lower resolution images in real-time and boost framerates in games. Nvidia later announced the same in a blog post on Thursday, confirming the Nintendo Switch 2 was powered by a custom Nvidia processor featuring an Nvidia GPU with dedicated RT Cores and Tensor Cores.The new RT Cores bring real-time ray tracing, delivering lifelike lighting, reflections and shadows for more immersive worlds, the company said.Tensor Cores power AI-driven features like Deep Learning Super Sampling (DLSS), boosting resolution for sharper details without sacrificing image quality.Tensor Cores also enable AI-powered face tracking and background removal in video chat use cases, enhancing social gaming and streaming.Nvidia claimed the Switch 2 would produce 10x the graphics performance of the Nintendo Switch and deliver sharper visuals and smoother gameplay. In handheld mode, the Switch 2 display would support variable refresh rate via Nvidia G-SYNC to reduce screen tearing.Nintendo and Nvidia have not detailed CPU and GPU specifications of the Switch 2 and have instead shared broad details about the console's capabilities. The Nintendo Switch 2 can deliver up to 4K resolution in docked mode and up to 120fps gaming at 1080p in handheld mode.The Nintendo Switch successor is set to release on June 5, along with a host of launch-day first-party and third-party titles. The Switch 2 features a 7.9-inch LCD display and redesigned magnetic Joy-Con 2 controllers that also support mouse controls. The console is priced at $449.99 (roughly Rs. 38,500) in the US for the sole 256GB storage option, $150 more than the original Nintendo Switch.KEY SPECSNEWSHDD 256GBProcessor Custom NvidiaGraphics Custom NvidiaAV NoUSB 2 USB Type-C portsWeight 535.2 gramsEthernet NoMore Nintendo Gaming Consoles For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube. Further reading: Nintendo Switch 2, Nintendo, Switch 2, Nvidia, AI, AI Upscaling, DLSS, Ray Tracing Manas Mitul In his time as a journalist, Manas Mitul has written on a wide spectrum of beatsincluding politics, culture and sports. He enjoys reading, walking around in museums and rewatching films. Talk to Manas about football and tennis, but maybedont bring up his video game backlog. More Related Stories
    0 Commentarii ·0 Distribuiri ·24 Views
  • Apple Starts Testing iPad Mini With OLED Screen Expected to Launch in 2026, Tipster Claims
    www.gadgets360.com
    Photo Credit: Apple Apple's seventh generation iPad Mini was launched in October 2024 HighlightsApple is said to be testing an OLED screen for a small tabletThe company is expected to upgrade the iPad Mini with an OLED screenApple launched the iPad Pro (2024) with a Tandem OLED screen last yearAdvertisementApple is testing a new iPad Mini model equipped with an OLED screen, according to details shared by a tipster. The new 8-inch panel is said to be manufactured by Samsung Display, and the South Korean firm could begin production in H2 2025. Apple is expected to unveil the successor to last year's iPad Mini (7th Generation) in 2026, according to analysts. The company's iPad Pro (2024) was its first tablet model to be equipped with an OLED screen.Apple's iPad Mini Could Sport an OLED Screen Produced by SamsungIn a post on Weibo, the Chinese microblogging website, Digital Chat Station (translated from Chinese) claims that Apple is evaluating a small OLED screen for the iPad. The smallest tablet in the company's lineup is the iPad Mini, and this indicates that Apple is planning to replace the Liquid Retina LCD screen on the iPad Mini (7th Generation) with an OLED screen.The tipster also says that they do now know whether Apple's next iPad model will feature an OLED screen with a high refresh rate. The LCD screen on the iPad Mini (7th Generation) refreshes at 60Hz, while the more advanced OLED panels used on the iPad Pro (2024) have a 120Hz refresh rate.Digital Chat Station states that Apple is currently evaluating the OLED panel produced by Samsung, and production could begin in the second half of 2025. The company could launch an upgraded iPad Mini with the new OLED screen in 2026.While the next-gen iPad Mini is expected to arrive with an OLED screen, it will not be as advanced as the one on the iPad Pro, which sports a Tandem OLED screen that delivers increased brightness and improved colour reproduction, while reducing power consumption.According to previous reports, Apple is also working on an upgraded iPad Air with an OLED screen that could launch "as early as 2026". At the time, it was claimed that Apple would be equipped with a less advanced OLED panel to keep costs low.Last year, technology research firm Omdia predicted that Apple's rumoured decision to equip its iPad Air and iPad Mini models with OLED screens would also convince rivals to switch from LCD panels. The demand for these panels could cross the 30-million-unit mark by 2029, according to the firm.Affiliate links may be automatically generated - see our ethics statement for details.KEY SPECSNEWSDisplay 8.30-inchProcessor A17 ProFront Camera 12-megapixel + NoResolution 1448x2266 pixelsOS iPadOS 18Storage 128GBRear Camera 12-megapixel + NoMore Apple Tablets For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube. Further reading: iPad Mini, iPad Mini OLED, iPad Mini 2024, iPad, Samsung Display, Apple David Delima As a writer on technology with Gadgets 360, David Delima is interested in open-source technology, cybersecurity, consumer privacy, and loves to read and write about how the Internet works. David can be contacted via email at DavidD@ndtv.com,on Twitter at @DxDavey, and Mastodon at mstdn.social/@delima. More Related Stories
    0 Commentarii ·0 Distribuiri ·25 Views
  • Fastest Ways To Make $10,000+ Per Month In 2025 Ai, Crypto & Automation Secrets!
    medium.com
    FASTEST WAYS TO MAKE $10,000+ PER MONTH IN 2025 AI, CRYPTO & AUTOMATION SECRETS! The Game Has ChangedAre You Ready to Print Money in 2025?Welcome to the Golden Era of Online Wealthwhere AI works, crypto grows, and you get rich while sleeping. If youre still grinding the 9-5 hustle, youre LAGGING behind. 2025 is the year of automation, passive income, and digital dominationand youre either in, or youre out. Want to start making money IMMEDIATELY? Join the #1 Money-Making Community Now! CLICK HERE TO START PRINTING CASH! Lets get into the 5 fastest, most profitable, and 100% automated ways to stack $10K+ per month in 2025.--- 1. AI-Powered Affiliate MarketingThe Set-and-Forget Cash Machine SEO Keywords: best affiliate programs 2025, AI affiliate marketing, passive income with AIWhat if you could earn $500+ commissions per salewithout selling, cold calling, or even showing your face? Thats AI-powered affiliate marketing in 2025. Step 1: Sign up for high-ticket affiliate programs (SaaS, AI tools, Web3 platforms). Step 2: Let AI tools (ChatGPT, Claude, Jasper AI) write blogs, create YouTube scripts & automate your marketing. Step 3: Set up AI-powered email funnels & chatbots that sell 24/7 for you. INSIDER SECRET: Use Quora, Medium, & AI-generated YouTube Shorts to drive FREE traffic to your links! JOIN THE #1 AFFILIATE CASH MACHINE NOW! CLICK HERE TO START! --- 2. AI Website FlippingTurn $100 into $10,000 FAST! SEO Keywords: AI website flipping, best Flippa businesses 2025, profitable website nichesWhy buy and sell houses when you can flip AI-generated websites for 10X profitsin DAYS, not months? Step 1: Use 10Web, Framer AI, or Wix AI to create fully automated, high-value websites in hot niches (finance, AI, crypto, SaaS). Step 2: Fill them with SEO-optimized, AI-generated content to attract buyers. Step 3: Sell them on Flippa, Empire Flippers, or Motion Invest for thousands in pure profit. Limited-Time Opportunity! Learn How to Flip Websites for $10K+ Profits! CLICK HERE TO JOIN! --- 3. AI YouTube Automation$10K+ Per Month with No Face, No Effort! SEO Keywords: faceless YouTube automation 2025, AI YouTube script generator, high CPM YouTube nichesWhat if AI could run a YouTube channel FOR YOU, and you collect ad revenue, sponsorships, and affiliate commissions? In 2025, its not only possibleits EASY. Step 1: Use ElevenLabs, HeyGen, and Pictory AI to create high-quality, AI-generated faceless videos. Step 2: Target high-CPM niches (Forex, Crypto, AI, Luxury, SaaS). Step 3: Monetize with ads, affiliate links, digital products, and sponsors. Pro Tip: Channels in the finance & crypto space make $30-$100 per 1,000 viewsway higher than entertainment or vlogs! WANT TO START? JOIN NOW & BUILD YOUR AUTOMATED YOUTUBE EMPIRE! --- 4. AI-Powered DropshippingMake Bank Without Touching Inventory! SEO Keywords: AI dropshipping 2025, fastest Shopify store setup, winning products AIYou dont need warehouses, shipping, or headaches to make money in e-commerce anymore. AI automates EVERYTHING in 2025. Step 1: Find trending products using Minea, Niche Scraper, or AutoDS. Step 2: Use Gemini AI or Jasper AI to generate viral ad copy. Step 3: Scale fast with AI-optimized TikTok ads & Instagram reels. Build an AI-powered store today! CLICK TO START EARNING! --- 5. Crypto & DeFi Passive IncomeLet Your Money Work for You! SEO Keywords: best crypto to invest 2025, AI crypto trading bots, DeFi passive incomeForget gambling on meme coinsreal money is in AI crypto trading & DeFi staking. Heres how to get steady, passive crypto income in 2025. Step 1: Use AI trading bots like Pionex, Bitsgap, or 3Commas for auto-trading. Step 2: Stake ICP, Solana, or ETH on Lido, Aave, or ICP DeFi protocols for high APY. Step 3: Earn daily rewards and stack passive crypto income. JOIN THE CRYPTO MONEY MACHINE NOW! CLICK HERE TO START! --- FINAL THOUGHTS: You Can Make $10K+ Per MonthBut Only If You TAKE ACTION!2025 is not for the lazy. This is the year of AI-driven money, and those who move FAST will win BIG. Your 2025 Wealth Plan Starts NOW: AI Affiliate Marketing Earn commissions while AI does the selling. Website Flipping Build & flip digital assets for 5-figure profits. YouTube Automation Make passive income from faceless videos. AI Dropshipping Scale without ever touching inventory. Crypto & DeFi Generate 24/7 crypto passive income. DONT WAIT! Get Inside the #1 Online Money-Making Community Now! CLICK HERE TO START PRINTING CASH IN 2025! Which method excites you the most? Comment below & lets WIN together! ---
    0 Commentarii ·0 Distribuiri ·29 Views
  • The Playground of AI: Exploring the Basics of Reinforcement Learning
    medium.com
    The Playground of AI: Exploring the Basics of Reinforcement Learning18 min readJust now--Photo generated from OpenAIs ChatGPT (Prompted March 30, 2025)The current trend in the field of Data Science revolves around Generative Artificial Intelligence (GenAI) particularly chatbots utilizing Large Language Models (LLM). Before that, it was about predicting class or score using different features with the use of classical and deep learning models. However, throughout this timeline, a subset of machine learning has always existed though not as widely recognized and continues to evolve and thrive. This field is Reinforcement Learning (RL) which is focused on training agents to make decisions by interacting with their environment to maximize cumulative rewards.Photo taken from https://www.devopsschool.com/Unlike supervised learning, where the objective is to learn from labeled examples, or unsupervised learning, which focuses on identifying patterns in data; RL involves an autonomous agent that learns by making decisions and adapting based on the outcomes of its actions, often without prior data and typically in a trial-and-error process. [1][2] Reinforcement Learning even has deep roots in various disciplines, including psychology, neuroscience, economics, and engineering. The plethora of perspectives and influences makes RL a dynamic and highly interdisciplinary field. [3] In this introduction to Reinforcement Learning, we will explore the foundation and mathematics behind the field, the main framework with a brief teaser on the different advancements, and of course, a showcase on how RL works in Python.FundamentalsPhoto taken from https://thedecisionlab.com/At the core of Reinforcement Learning lies the Markov Decision Process (MDP) which is a mathematical framework that models decision-making in environments filled with uncertainty. A MDP consists of a set of states representing different situations an agent can encounter, actions the agent can take, and a transition probability that dictates the likelihood of moving between states. Additionally, a reward function provides feedback to the agent, helping it learn which actions lead to favorable outcomes. A key aspect of MDPs is the discount factor, which determines how much future rewards influence the agents decisions favoring either short-term or long-term gains. Another important concept of MDP is independent of past actions and states, relying the prediction of the next state solely on the current state.Photo taken from https://people.stfx.ca/Another key concept in RL is the Multi-Armed Bandits (MAB) problem whichtrade-off between exploration and exploitation. In this framework, an agent repeatedly chooses from K possible actions (or arms) to maximize cumulative rewards over time, even though the reward distribution for each action is unknown. The agent must balance exploring new options to gather information and exploiting the best-known choice for immediate benefit. Unlike supervised learning, which provides direct feedback on correct decisions, RL uses evaluative feedback that only reflects the effectiveness of chosen actions.DesignPhoto taken from https://lilianweng.github.io/The vanilla framework of Reinforcement learning involves an Agent which is the actor or decision-maker operating within the Environment defined as the world or system that defines the rules in which the agent can operate. Think of it as a game where the player is the Agent and the Environment is the confines of the said game. The Agent is bounded by the rules and design of the game, cannot think outside the box and no cheat codes(!!!). As a player or Agent plays the game, it will perform an Action (a, A) from the set of possible moves to interact with the Environment. Performing an action will lead to a State (s, S), a specific condition or configuration of the Environment at a given time as perceived by the Agent. As we play the game, we usually aim for an objective in order to progress, this is called Reward (r, R). Defined as the feedback or result from the Environment based on the Agents action, it tells the Agent how good or bad the action was. These are the basic parts of a vanilla or basic RL framework.Now we go deeper into the framework of RL. The main goal of a RL problem is to know the optimal strategy which is the sequence of Actions and States the Agent must follow in order to maximize the Rewards. A sequence of Actions and States is called Policy (). Think of this as the strategy in order to achieve things like highest score, finishing the game, or even like trolling around and achieving nothing.A simple RL model can now work with this framework and let the Agent solve the Environment by trial and error method, simulating all the possible combinations there is in order to identify the Policy or Policies that will maximize the Rewards. However, depending on the complexity of the Environment, there can be too many combinations of Actions and States which can is computationally expensive and time consuming. This is why algorithms were included in the initial framework to solve this dilemma.First, in order to determine the sort of quality of a Policy, we need to have a quantitative measure of the expected return of the agent being in a certain state. This is called a Value Function and it is derived from the Bellman equation that expresses the value of a state (or state-action pair) in terms of the expected immediate reward plus the discounted value of the next state (or next state-action pair). In RL, the Value Function can be divided into two broad categories; State Value Function and Action Value Function.Equation for State Value FunctionThe State Value Function represents expected cumulative reward an agent can achieve starting from a specific State and following a given Policy. This is crucial in the evaluation of deterministic policies or when understanding the value of being in a particular state is required.Equation for Action Value FunctionOn the other hand, Action Value Function represents the expected cumulative reward an agent can achieve from a defined State by taking a specific Action and following a given policy, thereafter. It is mainly used to evaluate and compare the potential for different actions when they are taking place in the same state. They are crucial for the selection of actions, where the goal is to determine the most appropriate action for each situation. As action-value functions take into account the expected return of different actions, they are particularly useful in environments with stochastic policies.Equation for Optimal State Value Function and Action Value FunctionSolving an RL task involves identifying a Policy that maximizes long-term Rewards which follows the Bellman Optimality Equation above. It also indicates the probabilistic nature of RL in transitioning to a State with a certain Reward given the current State and chosen Action. This equation serves as the baseline in terms of developing the RL algorithms and models being used currently in the field.PlaybookThere are a lot of models and methods currently developed in the field of Reinforcement Learning. To make things brief, we will be going through the general classifications of the models to give an overview of how things are defined.Photo taken from https://www.sciencedirect.comOne classification of the Reinforcement Learning is Model-free Method vs Model-based Method. As the figure above indicates, Model-free Methods determine the optimal policy or directly without creating a model of the environment. In this framework, the Agent learns only from Observation, Actions, and Rewards it experiences in the Environment (experience-based learning). This makes it a straightforward and flexible approach, especially for complex Environments where understanding the systems dynamics is difficult or impractical. However, it often requires a massive amount of interactions with the Environment, making it computationally expensive and slower to learn.[4]On the other hand, Model-based Methods build a representation of how the environment behaves for planning and improving decision-making. The process involves explicitly learning or using a model of the environments dynamics instead of relying only on direct experience. This makes model-based Method significantly sample-efficient, as it allows for planning and strategic decision-making rather than pure trial and error. The challenge and downside is learning an accurate model of the environment. If the model is imperfect or inaccurate, the agent may make poor decisions based on incorrect predictions.Photo taken from https://github.com/Another classification is in terms of how a Policy is updated based on the interaction in the Environment. This time it can be Online, Off-policy or Offline Reinforcement Learning. First, Online RL is a dynamic learning approach where an agent continuously interacts with the environment, takes Actions, and updates its Policy based on real-time feedback. This method allows the Agent to adapt quickly to changes in the Environment, making it suitable for tasks where conditions are unpredictable. However, since learning happens through direct interaction, Online RL often requires a large number of trials, making it computationally expensive and inefficient for complex problems.Unlike Online RL, Off-policy RL does not rely solely on real-time interactions. Instead, it allows agents to learn from previously collected data, making training more sample-efficient. This approach enables the agent to improve its Policy using experiences generated by other Policies or past iterations. While Off-policy RL provides flexibility and efficiency, it also introduces challenges such as distribution mismatch, where the data used for training may not fully align with the Optimal Policy being learned.Offline RL, also known as Batch RL, takes learning a step further by training Policies exclusively from pre-collected datasets without any interaction with the Environment. This makes it highly valuable in situations where real-world data collection is costly, dangerous, or impractical such as healthcare, robotics, and autonomous driving. Since Offline RL lacks direct interaction with the environment, it faces difficulties in generalizing to new situations and avoiding biases in the dataset.Again, this is only a glimpse of the diverse models in the field of Reinforcement Learning. Extensive discussions are needed to understand the ins and outs of each algorithm. But for now, let us move on to showcasing and visualizing how RL works.SimulationNow that we know the basics of Reinforcement Learning, we can now proceed to applying and simulating a RL problem. The section below include an overview to the library being used, initializing an Environment, Simulating an Action, Training an Agent on two different Environments: CartPole and Atari Breakout.In this code walkthrough, the main libraries used are: gymnasium which provides an Application Programming Interface (API) standard for reinforcement learning with a diverse collection of reference Environments; and the other is Stable Baselines3 (SB3) which contains set of reliable implementations (i.e. algorithms and wrapper) of reinforcement learning algorithms in PyTorch. Other modules used are for navigating file directory, ensuring compatibility, and visualizing the results by rendering the video of the Environment simulations.# File Directoryimport globimport ioimport base64import osimport shutil# RLimport gymnasium as gymfrom stable_baselines3 import PPO #Algorithm, check docs for othersfrom stable_baselines3.common.vec_env import DummyVecEnv # Wrapper for the env# For Rendering Video in Colabfrom gymnasium.wrappers import RecordVideofrom IPython.display import HTMLfrom pyvirtualdisplay import Displayfrom IPython import display as ipythondisplayimport matplotlib.pyplot as plt# Compatibilityimport numpy as npnp.bool8 = np.bool_CartPole Levelenv_name = "CartPole-v1"environ = gym.make(env_name, render_mode="rgb_array")The first part is initializing the Environment of the RL problem. We can choose from different Environments from the gymnasium documentation and other third party created Environments. Note that each environment has different dependencies so check the documentations first. For the first simulation, we will select CartPole-v1 where the task is to balance a pole attached to a cart by moving the cart left or right. The goal is to balance the pole as long as possible with the threshold of the Environment being set to 500 Frames to ensure that an episode (a trial/run of the game) will not be too long.environ = gym.make(env_name, render_mode="rgb_array")env = RecordVideo(environ, video_folder="./video", disable_logger=True, video_length=1000)for episode in range(5): obs, info = env.reset() done = False score = 0 while not done: action = env.action_space.sample() # Generate random action obs, reward, terminated, truncated, info = env.step(action) # Proceed on the generated action score += reward done = terminated or truncated # Ensure loop ends properly print([action, obs, reward, terminated]) print(f'Action: {action}') print(f'State: {obs}') print(f'Reward: {reward}') print(f'Episode: {episode} Total Score: {score}\n')env.close()To visualize the Cartpole Environment, we can simulate an episode by choosing random Actions, dictated by env.action_space.sample(), and check what happens. If we look at the output of the code, we can see that the Action can either by 0 (which is the cart moving left) and 1 (move to the right). For the State, as described in the documentation it is an array of length 4 with the elements (in sequence) being cart position, cart velocity, pole angle, and pole angular velocity. The third part is the Reward which can be 1 if the pole is still balanced at a certain angle and 0 if the pole exceeds the threshold angle of +12 or -12 degrees. Lastly, at the end of each Episode, we tally how long the pole is balanced with each movement of the cart and get the total Reward.# Opening Video of Policy in Colab# Similar with Env.render()def show_video(path='video/*.mp4'): mp4list = glob.glob(path) if len(mp4list) > 0: mp4 = mp4list[0] video = io.open(mp4, 'r+b').read() encoded = base64.b64encode(video) ipythondisplay.display(HTML(data='''<video alt="test" autoplay loop controls style="height: 400px;"> <source src="data:video/mp4;base64,{0}" type="video/mp4" /> </video>'''.format(encoded.decode('ascii')))) else: print("Could not find video")show_video()CartPole-v1 Episode with random ActionsAs we can see in the rendered video above, the pole was balanced for about 2 seconds before the game was terminated due to exceeding the threshold angle. Note that this is the result of doing random Actions per step which means the Policy is not optimal. With this, we can proceed to training the Agent by simulating multiple Episodes or trials for the Agent to know how to approach the Environment better than taking random Actions.#Wrap env into a dummy vectorized environment (for compatibility purposes)env = DummyVecEnv([lambda: env]) # Defining the agent (policy, environment, log path)model = PPO('MlpPolicy', env, tensorboard_log=log_path, verbose=1) model.learn(total_timesteps=20000) # Timesteps Depending on complexity of environmentTraining the model or the Agent required the Environment to be wrapped into a vectorized Environment which ensures compatibility with the code and can allow us to train multiple stacks of Environment per step to speed up the training process. Next is defining the algorithm for the Policy which for now is set to default Proximal Policy Optimization (PPO) with the main idea is that after an update, the new policy should be not too far from the old policy. For the MlpPolicy part, it is base Policy to be used on the which is dependent on the Environment. MlpPolicy is used with low-dimensional, vector observations. Next is we make the Agent learn the Environment by simulating multiple Policies and set a max number of timesteps to cap the learning process. Given that this is simple RL problem, 20,000 timesteps is almost enough for us to achieve high Rewards. Running the model.learn() outputs the state of the training, showing multiple metrics (losses, variance, deviation) on how training is performing.from stable_baselines3.common.evaluation import evaluate_policy #Testing/Validation# Random Agent, before trainingmean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100)print('Trained Model')print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")model2 = PPO('MlpPolicy', env, tensorboard_log=log_path, verbose=1)mean_reward2, std_reward2 = evaluate_policy(model2, env, n_eval_episodes=100)print('Base Model')print(f"mean_reward:{mean_reward2:.2f} +/- {std_reward2:.2f}")Next part is evaluating how the Agent now performs after learning the Environment and simulating multiple Policies. As indicated above, the Agent now achieved the threshold Reward of 500 which means that it know the optimal Policy of the Environment. If we compare the trained model with the base model, we can the major improvement in the achieved Reward, going from average 3334 to 500 (with standard deviation of 0).folder_path = "/content/video_test/" # Change to your folder pathshutil.rmtree(folder_path)env_name = 'CartPole-v1'environ = gym.make(env_name, render_mode="rgb_array")env = RecordVideo(environ, video_folder="/content/video_test", disable_logger=True, video_length=1000)for episode in range(2): obs, info = env.reset() done = False score = 0 while not done: action, _ = model.predict(obs) obs, reward, terminated, truncated, info = env.step(action) # Proceed on the generated action score += reward done = terminated or truncated # Ensure loop ends properly # print([action, obs, reward, terminated]) print(f'Episode: {episode} Score: {score}\n')env.close()show_video('video_test/*.mp4')A CartPole-v1 Episode after Training with 20000 TimestepsVisualizing the results of the training, we can see in the above rendered video that the pole is balanced all throughout the simulation. The video here lasts around 10 seconds which is the maximum duration as set in the Environment (500 Frames). This is basically how a Reinforcement Learning framework is done: Initialize an Environment, Train the Agent with selected algorithm and choose for how long the training will be done, then check the results of the training.Breakout LevelNext, we proceed to a more difficult Environment which is Breakout, a famous Atari game. The dynamics of the Environment are similar to pong: moving a paddle to navigate the ball to the brick walls at the top of the screen. The goal is to destroy as much brick walls if not all before the ball hits the bottom section of the screen.from stable_baselines3 import A2Cfrom stable_baselines3.common.vec_env import VecFrameStackfrom stable_baselines3.common.env_util import make_atari_envimport ale_pygym.register_envs(ale_py)env_atari = gym.make('ALE/Breakout-v5', render_mode="rgb_array")env = RecordVideo(env_atari, video_folder="/content/atari", disable_logger=True, video_length=1000)for episode in range(5): obs, info = env.reset() done = False score = 0 while not done: action = env.action_space.sample() # Generate random action obs, reward, terminated, truncated, info = env.step(action) # Proceed on the generated action score += reward done = terminated or truncated # Ensure loop ends properly # print([action, obs, reward, terminated]) print(f'Action: {action}') print(f'State: {obs}') print(f'Reward: {reward}') print(f'Episode: {episode} Score: {score}\n')env.close()Again, we initialize the Environment (again check the documentation to ensure compatibility and installing necessary dependencies) and simulate an Episode using random Actions. For this game, there are four actions that can be done: 0 for no action, 1 to fire the ball (to start the game), 2 for right movement of the paddle, and 3 to move the paddle to the left. The state is an observation space of Box(0, 255, (210, 160, 3), np.uint8) which is the RGB and pixel values of the Environment. The Reward is if a brick is destroyed in the specific state. Again, the goal is destroy as much brick as possible before game over.show_video('atari/*.mp4')Breakout Episode with Random ActionsLooking at the results of the episode with random Actions, we can see that the agent does not follow the trajectory of the ball (as expected given Actions are random). It got lucky at the end and it moved the paddle to hit the ball twice, breaking two bricks and scoring two points. Again, we need to train the Agent and make it learn by simulating different Policies.env_atari = make_atari_env('ALE/Breakout-v5', n_envs=4, seed=0)env_atari_vec = VecFrameStack(env_atari, n_stack=4)# Reset environment to get initial framesobs = env_atari_vec.reset()# Capture a frame from each environmentframes = env_atari.get_images() # Returns a list of 4 frames (one per env)# Create a 2x2 grid to display the framesfig, axes = plt.subplots(2, 2, figsize=(10, 10))for i, ax in enumerate(axes.flat): ax.imshow(frames[i]) # Display the frame for each env ax.axis("off") ax.set_title(f"Env {i+1}")plt.tight_layout()plt.show()Different Breakout Instances to be Stacked During TrainingFor the training, we will be running four separate instances of the Environment at the same time. These instances run in parallel, speeding up training by processing multiple game states at once. One more thing to note is that for this Environment, trajectory of the ball is important so the paddle is moved correctly towards the direction of the ball as it goes down. A single frame is not enough to check where the ball is going which is why we need to stack frames (given by VecFrameStack).model_atari = A2C('CnnPolicy', env_atari_vec, verbose=1, tensorboard_log=log_path)model_atari.learn(total_timesteps=500000, log_interval=10000)For the Atari Breakout problem, we will be using A2C (Advantage Actor-Critic) which is an algorithm combines value-based and policy-based approaches. The CNNpolicy is required for image-based observations such as Breakout. Next is we train the Agent for 500,000 timesteps to simulate different Policies and updating it using the chosen algorithm. As indicated along with the different metrics of the training, it took around 30 minutes to complete 500,000 timesteps. This is considering we did parallelized the process by using four instances at the same time.mean_reward, std_reward = evaluate_policy(model_atari, env_atari_vec, n_eval_episodes=20)print('Trained Model')print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")folder_path = "/content/atari_test/" # Change to your folder pathshutil.rmtree(folder_path)# env_atari = gym.make('ALE/Breakout-v5', render_mode="rgb_array")env_atari = make_atari_env('ALE/Breakout-v5', n_envs=1, seed=13)env = VecFrameStack(env_atari, n_stack=4)env = VecVideoRecorder(env, video_folder="/content/atari_test", record_video_trigger=lambda x: x == 0, video_length=1000)for episode in range(5): obs = env.reset() done = False score = 0 while not done: action, _ = model_atari.predict(obs) obs, reward, terminated, info = env.step(action) # Proceed on the generated action score += reward done = terminated or truncated # Ensure loop ends properly # print([action, obs, reward, terminated]) print(f'Episode: {episode} Score: {score}\n')env.close()show_video('atari_test/*.mp4')A Breakout Episode after Training with 500,000 TimestepsThen looking at the results of the training, we achieved an average reward of 23 destroyed bricks. Looking at the sample simulation of trained model, we can see that the movement of the paddle is now slightly coordinated with the trajectory of the ball (although it failed badly after one successful life). With the maximum score of 432 for the Atari Breakout, imagine how many timesteps are needed train the model for the Agent reach the max score. This highlights the dilemma in RL being too computationally expensive, that it will take a very long time to train a simple Environment to know the optimal Policy.Next StepsIn the walkthrough above, we simulated a Reinforcement Learning problem and trained an Agent by trial and error process. There are multiple directions from this especially if we want to be achieve more reward, like for the Atari Breakout game since the agent only scored 23 on average after training. The most straightforward option is increasing the number of timesteps to millions to let the Agent experience more policies. This will take forever but will ensure better results. Other options, similar to traditional supervised learning, are hyperparameter tuning and algorithm selection. Each algorithm has its own strengths so it is important to check the papers of the high performing ones (if not all). And then each algorithm has its own parameters such as learning rate that can be tweaked for each Environment to improve the performance.Other explorations that can be done is instead of using model-free and on-policy methods; we can use model-based, off-policy or offline RL methods to check how it will fare for the specific Environment. There are a lot of branches of RL in terms of model and algorithms so best to know it all.Lastly, in my opinion, is the best way to understand RL and how to specialize in RL is to learn to create a custom Environment. Creating the specifics, the rules, the Agent, what Actions it can take, how to score the Rewards. Knowing how this works can let us be creative enough to apply RL in different situations.ConclusionReinforcement Learning is normally not as popular and widely used as our typical supervised and unsupervised machine learning world. However, it is a vast and quickly evolving field and similar with those two, provides insights that are new even for the domain experts. RL is already being applies in different fields such as Robotics which instead of investing a lot of money into the hardware to test, RL can do the simulations; Gaming where AI is used in different aspects such as game content, game testing, strategies, etc.; Autonomous Driving which enable vehicles to learn optimal behaviors to ensure safety and efficiency. As part of the education sector, it is noteworthy to highlight the application of RL in terms of creating customized curriculum in order to maximize the learnings and motivation of students in going through classes.Again, this is only part 1 of Reinforcement Learning series from someone who started with minimal knowledge of the field. This hopefully becomes a road to specializing the field of RL or at least a specific area of the field. This only the start of uncovering and knowing the ins and outs of the Playground of AI.Python notebook for the scripts provided: https://github.com/redvjames/RL_sandbox (Tested in Google Colab)Reference[1] Ghasemi, M., Moosavi, A. H., Sorkhoh, I., Agrawal, A., Alzhouri, F., & Ebrahimi, D. (2024). An introduction to reinforcement learning: Fundamental concepts and practical applications. arXiv preprint arXiv:2408.07712. https://doi.org/10.48550/arXiv.2408.07712[2] Naeem, M., Rizvi, S. T. H., & Coronato, A. (2020). A gentle introduction to reinforcement learning and its application in different fields. IEEE access, 8, 209320209344. https://doi.org/10.1109/ACCESS.2020.3038605[3] Ahilan, S. (2023). A Succinct Summary of Reinforcement Learning. arXiv preprint arXiv:2301.01379. https://doi.org/10.48550/arXiv.2301.01379[4] AlMahamid, F., & Grolinger, K. (2021, September). Reinforcement learning algorithms: An overview and classification. In 2021 IEEE canadian conference on electrical and computer engineering (CCECE) (pp. 17). IEEE. https://doi.org/10.1109/CCECE53047.2021.9569056
    0 Commentarii ·0 Distribuiri ·31 Views