• Reflecting on The Game Awards 2024 | GI Microcast
    www.gamesindustry.biz
    Reflecting on The Game Awards 2024 | GI MicrocastLatest episode available to download now News by GamesIndustry.biz Staff Contributor Published on Dec. 16, 2024 The latest episode of The GamesIndustry.biz Microcast is available to download now.This week, our show focuses on The Game Awards 2024. Chris reports on his experiences from attending the event, we discuss some of the big winners, and (inevitably) dive deeper into the big announcements from the night, as well as what they indicate about the state of games in the year ahead.We also have the usual What Do The Numbers Mean? segment, in which we take a look at the initial performance of Indiana Jones and the Great Circle.You can listen via the player below, download the audio file directly here, or subscribe to our podcast feed, available via Spotify, iTunes, Amazon Music, CastBox, Player FM, TuneIn and other widely-used podcast platforms.Episode edited by Alix Attenborough.To see this content please enable targeting cookies.
    0 Comments ·0 Shares ·116 Views
  • What impact have layoffs had on the games industry over the past two years?
    www.gamesindustry.biz
    What impact have layoffs had on the games industry over the past two years?InGame Job CEO Katya Sabirova shares the key takeaways from the Big Games Industry Employment Survey 2024Image credit: Big Games Industry Employment Survey, Values Value, InGame Job Feature by Katya Sabirova Contributor Published on Dec. 16, 2024 Mass layoffs have been the defining theme of the past two years and they're not over yet. The industry is still feeling the effects of this crisis, even though many players are starting to see signs of stabilization.Let's take a closer look at the layoffs of 2023-2024 in Europe to understand who was hit the hardest, how long it took professionals to find new jobs, and how their incomes and working conditions were impacted.Our main source of data is the Big Games Industry Employment Survey 2024, conducted in Spring 2024 and presented at the Devcom conference in August 2024. GamesIndustry.biz previously covered a talk by Tanja Loktionova, one of the survey's organizers, where she shared its initial findings. Now, the full report is available for free on the InGame Job portal.In this article, we'll dig deeper into the wave of mass layoffs in the industry, which was a key focus of the Big Games Industry Employment Survey 2024.Who suffered the most?Let's recap: the anonymous survey gathered over 1,800 responses from game industry professionals. The majority were mid-level or higher specialists (9% Junior, 29% Middle, 30% Senior, 28% Lead/Top). Katya Sabirova, InGame Job & Values Value | Image credit: Values ValueAdditionally, 19% of respondents had over ten years of experience in games. In short, these were mostly seasoned professionals who might have been expected to be safe from layoffs.However, even among them, some were affected: 15% reported being laid off in 2023-2024 but had already found new jobs by the time of the survey. Another 6.2% said they were laid off and remained unemployed at the time of the survey.Altogether, 21.6% of respondents reported experiencing layoffs. So, who are these professionals who found themselves out of work?When it comes to seniority levels, the survey shows that layoffs impacted professionals across the board, regardless of expertise. Between 23% and 26% of juniors, mid-levels, and seniors reported being laid off. At the Lead/Top level, the percentage was slightly lower, at 15-16%.However, the recovery process varied by seniority. Senior+ professionals tended to find new jobs relatively quickly, while mid-levels and juniors faced more challenges. By the time of the survey, 10% of mid-levels and 9% of juniors were still unemployed after being laid off. In contrast, only 3-5% of seniors, leads, and top-level professionals remained out of work. Image credit: Values ValueAmong the specializations most affected by layoffs, artists (28%), QA specialists (27%), and HR/recruitment professionals (25%) were hit the hardest. Artists and testers, in particular, struggled to find new jobs quickly 10% of artists and 11% of testers were still unemployed at the time of the survey. Image credit: Values ValueSalaries for QA and HR/recruitment professionals also declined in 2024 compared to 2023. This drop appears closely tied to layoffs, as professionals in these fields faced limited job opportunities and often had to accept less favorable offers to reduce the length of their job search. Image credit: Values ValueHow long did professionals take to find new jobs after layoffs?Looking at the full sample of respondents who changed jobs in 2023-2024, more than half managed to find a new position in less than three months. We assume that this group largely includes those who left their previous roles voluntarily.The concerning figures are those representing longer job search periods: 12.3% took between six months and a year to find a new job, while 8.1% reported searching for over a year. Image credit: Values ValueWhen looking at seniority levels, juniors emerge as the most vulnerable group nearly half of all junior respondents reported taking more than six months to find a new job. Mid-levels and seniors had similar timelines, with mid-levels trailing slightly behind seniors in speed. Only 5-6% of mid-levels and seniors spent over a year searching for a new role.Top-level experts and senior leaders demonstrated faster job searches, with 62% finding a new role in under three months. However, 25% of them reported that it took over six months to secure their next position. We believe this is because professionals at this level have more time and flexibility to carefully select roles that meet their expectations and requirements. Image credit: Values ValueThe next slide, breaking down results by professional fields, highlights how challenging the job search was for QA specialists, artists, and, to some extent, HR managers and recruiters.QA specialists had the hardest time; nearly half of respondents in this group spent over six months looking for a new job, with 26% searching for more than a year. Similarly, 10% of artists and HR/recruitment professionals also reported spending over a year finding their next role. Image credit: Values ValueSasha Kononenko, recruitment lead and partner at Values Value told me: "There's an interesting trend with UA managers: from late winter to mid-summer 2024, there was a noticeable influx of candidates actively job hunting. However, by fall, most had already secured new roles, and those who have been with their current companies for a while are now highly reluctant to consider any changes."Artists are facing a tough time those entering the job market are struggling to find new opportunities due to high competition and a limited number of openings."Who had to switch industries?A total of 10% of respondents left the gaming industry during the wave of mass layoffs.The highest percentage of those who left after failing to find a role was among juniors at 31%. That's significant. Imagine: nearly one-third of entry-level professionals in the gaming industry (the potential mid-level specialists of tomorrow) exited the field after being unable to secure a position. This ongoing crisis is driving young talent away, slowing the industry's growth and hindering its ability to benefit from the fresh, innovative ideas that these professionals often bring.Other groups also experienced departures during this period: 11% of mid-levels, 8% of seniors, 5% of team leads, and 11% of top-level experts left the industry.By specialization, 28% of QA specialists, 32% of HR managers and recruiters, 15% of analysts, 14% of product and project managers, 9% of marketers, and 5% of programmers transitioned out of gaming. As for game designers, the destination for the 10% who left the industry remains unclear, but this is the share of respondents who reported being forced to switch fields.How did working conditions change for those who switched jobs during the layoffs?According to the survey:44% of specialists who changed jobs in 2023-2024 saw an increase in salary and/or career advancement.24% ended up in lower positions and/or with reduced salaries.21% found that their salary and position stayed the same.And, as was mentioned above, 10% had to switch other industries.Before the crisis, job changes were often a reliable trigger for salary increases and career growth. However, the new data reveals that mass layoffs have disrupted this trend.In 2023-2024, 25% of mid-levels and 26% of seniors reported accepting lower salaries and/or positions at their new jobs. Similarly, 15% of team leads and 10% of top-level experts experienced the same.Additionally, nearly a quarter of respondents across all seniority levels (except juniors) indicated that their income and position remained about the same after changing jobs.Juniors, however, faced unique challenges. A significant 36% of entry-level professionals accepted less favorable conditions (lower salary or position) after switching jobs, while only 11% reported maintaining the same terms.When it comes to professional fields, there's no surprise QA specialists and HR professionals were hit the hardest. Additionally, many game designers (25%), artists (31%), and programmers (27%) experienced a decline in earnings and/or career progression after switching jobs last year. Image credit: Values ValueFor this article, we spoke anonymously with a laid-off employee from one of the largest game studios. Here's their comment:"It took me about six months to find a new job. I should mention that during the first few months, I wasn't very active in my job search my priority was dealing with my work visa, which was tied to my previous employment, and I needed to sort that out. Unfortunately, besides finding new employment, those laid off often have to deal with a lot of additional issues. One of them is handling emotional breakdowns, because being laid off is always painful, especially when you don't expect it and feel secure in your position."Job hunting is a full-time job. I woke up in the morning and opened LinkedIn. It's recommended to tailor your resume for every job you apply to, and that's good advice, but I didn't have the energy for that. What helped was that I was mainly looking for similar roles. The most effective way to find a job is still through networking. So working on your personal brand is a great idea. People should be able to associate your name with your field when someone is looking for a specialist like you."Now I'm working at a company of a completely different scale, so my conditions have changed: I earn less, and the usual corporate benefits like health insurance, company events, and so on are gone. But I view it with calm understanding: now I have more opportunities to see the results of my work and my contribution to the overall mission.What's happening in the job market now?How have companies' hiring approaches changed? How have candidates' negotiation strategies evolved when seeking employment? What should we expect from the job market in the future? Sasha Kononenko, Values Value | Image credit: Values ValueSasha Kononenko shares her observations: "As for companies' hiring approaches, they continue to set high standards and expect finalists to meet those requirements 100%, all while keeping salary expectations reasonable. It feels like companies are leaving less room for compromise for example, considering candidates with potential who might need time to build missing skills. Employers seem less willing to provide a ramp-up period, expecting new hires to hit the ground running and deliver results immediately."Candidates, in turn, are concerned about the stability of potential employers: they want to know about flagship projects in the company's portfolio, what's already generating profit, or if there are investments from venture funds or other sources that indicate the company isn't likely to shut down overnight. These guarantees are now a key focus for candidates."If you're affected by layoffs and struggle to find the right job, how can you protect yourself from endless job searching? Kononenko suggests focusing on personal branding. There are many webinars now on building and developing a personal brand, regardless of whether you're in PR, bizdev, or a 2D artist. Being present on professional social networks is essential for everyone it's a valuable asset in case of layoffs.With a personal brand and a broad network, the chances of finding a new job quickly are much higher. So, now is the time to shed shyness and introversion, and start sharing your successful and unsuccessful cases, your experience, industry insights, start discussions, and exchange opinions. Build a network of potential hiring managers, industry experts, and interesting people who can offer expert advice or connect you to the right contacts.Sasha suggests asking yourself: "If I were laid off tomorrow, which companies would I want to work for?" Make a list, check if there are relevant job openings, connect with a recruiter or Talent Manager from that company. Even if you don't need it right away, you'll be prepared for worst-case scenarios.Katya Sabirova is CEO at InGame Job, and PR and comms adviser at Values Value
    0 Comments ·0 Shares ·101 Views
  • Catly developer denies it's using generative AI or blockchain technology in its cute cat game
    www.gamedeveloper.com
    At last week's Game Awards ceremony, developer SuperAuthenti debuted a teaser for its in-development open-world adventure cat game Catly, a video that was short on gameplay details and long on exceptionally cute cats with oversized, colorful eyes. After the trailer debuted, viewersmany of them game developersbegan speculating that the game or the trailer are using generative AI technology.As internet sleuths dug through the business profiles of SuperAuthenti and its executives, some began to wonder if the game might also use blockchain technology.The company (like a cat) played coy at first, first telling Digital Trends it would share more details on the game in 2025. That seems to have changed. Today, a SuperAuthenti PR spokesperson told Game Developer that there is no generative AI in Catly or its teaser. Additionally, they said Catly is "not a blockchain game," and there are no non-fungible tokens (NFTs) or other blockchain currency affiliated with the product."We did not use generative AI to produce the video and the game," the spokesperson said. "In fact we are very surprised by such speculation. We do not think there are any existing AI tools that could produce a video like that. Industry experts have echoed this opinion."SuperAuthenti shared a work-in-progress video of the Catly trailer with Game Developer, which contained a number of before-and-after shots showing the pre-rendered kitties bouncing around their playroom. The cat's models did not appear to contain telltale signs of generated 3D models (no unusual symmetry, no melded limbs, and no baked-in textures), and the environment also appeared to be created with traditional 3D animation. Some shots featured the cats rendered with full fur, others showed models implemented before fur animations were added.Related:The cat models did appear to have been fully mapped and rigged before these clips were captured, but it otherwise didn't seem that different from other behind-the-scenes clips of CG animation.SuperAuthenti says any there isn't any blockchain technology in CatlyAs spotted by Digital Trends, Speculation over Catly's possible use of blockchain tech was based on a possible connection to blockchain game developer TenthPlanet. TenthPlanet was "started" by William Wei Chen and colleague Kevin Yeung. Yeung is registered as the founder of SuperAuthenti. Animation and VFX news outlet 80 Level stated it reviewed documents saying SuperAuthenti is the sole shareholder of Shanghai Binmao Technology, which previously developed a blockchain-based "botanical and gardening experience."SuperAuthenti's spokesperson did not address these business connections, but did push back on the idea that Catly uses any blockchain technology. "Catly is not a blockchain game," the spokesperson said. "There are no NFTs. Our company/project has never issued any blockchain currency and any NFTs. Our company does not and has never owned any blockchain currency and NFTs."The spokesperson said that SuperAuthenti is "excited to reveal more about the game," and its own background in 2025.
    0 Comments ·0 Shares ·99 Views
  • League of Legends skins won't get custom VO while SAG-AFTRA strikes Formosa Interactive
    www.gamedeveloper.com
    Justin Carter, Contributing EditorDecember 16, 20242 Min ReadImage via Riot Games.At a GlanceThe ongoing voice actors strike is forcing Riot to change up its approach to voice work for different League skins.The video game voice actors strike is entering its fifth month, and Riot Games is making "temporary changes" to how it handles voices for in-game Leage of Legends skins. These changes come months after SAG-AFTRA called for Formosa Interactive, the popular MOBA's voiceover studio, to be struck back in September.Since League's PC version is a struck title, union actors aren't allowed to record lines for it while the strike is active. As such, skins for champions with actors based in the United States will use already-recorded "base voiceovers (VO)" rather than lines done by different actors. When the strike ends, the studio will update those affected skins with new lines from their original actors "as soon as scheduling and availability will allow."Some skins alter a character's voice, and actors change their performance and record new lines to reflect that theming. Riot said its new 'policy' only affects English-language voices, so other languages will have custom VO "as planned." It also noted that while the mobile game League of Legends: Wild Rift is not struck, it will ship character skins with similar base VO if an actor for that title opts to not record in solidarity with their peers."We know this isnt ideal, and we understand its frustrating to have to wait for custom VO," said Riot. "But this approach lets us respect the ongoing strike while continuing to deliver new content. Were committed to bringing you updated VO with the quality you expect as soon as we can."Riot was pulled further into the strike's orbit when the actors union filed an unfair labor charge against Formosa and accused it of seeking non-union talent on a struck game from Riot.Shortly after, the developer released a statement saying it was uninvolved in Formosa's alleged behavior, and that the project in question "relates to a non-Riot [title], and has nothing to do withLeagueor any of our games."For its part, Formosa had also dismissed SAG-AFTRA's accusations at the time, saying it "has not acted in any manner to undermine employee or union rights, nor our relationship with the union. [...] We stand with developers, publishers, platform holders, and talent to support global game development in a way that is safe and ethical for all."Read more about:UnionizationAbout the AuthorJustin CarterContributing Editor, GameDeveloper.comA Kansas City, MO native, Justin Carter has written for numerous sites including IGN, Polygon, and SyFy Wire. In addition to Game Developer, his writing can be found at io9 over on Gizmodo. Don't ask him about how much gum he's had, because the answer will be more than he's willing to admit.See more from Justin CarterDaily news, dev blogs, and stories from Game Developer straight to your inboxStay UpdatedYou May Also Like
    0 Comments ·0 Shares ·85 Views
  • Waymo is sending autonomous vehicles to Japan for first international tests
    www.theverge.com
    Waymos autonomous vehicles are going to Tokyo, marking the first time that the Alphabet company is deploying vehicles on public roads in a foreign market. Waymo is billing the excursion as a simple road trip for collecting data about the nuances of Japanese driving, including left-hand traffic and navigating a dense urban environment. The vehicles will be driven manually for the purposes of gathering mapping data and will be managed by a local taxi fleet operator, Nihon Kotsu. About 25 vehicles will be sent, with the first set to arrive in early 2025. And while the tests will undoubtedly be seen as laying the groundwork for a future Tokyo-based robotaxi service, Waymo said it isnt ready to announce anything quite yet. While we look forward to bringing the life-saving benefits of the Waymo Driver global, we have no plans to serve riders in Tokyo at this time, Waymo spokesperson Sandy Karp said. Rather, were bringing our technology to learn and understand how Waymo fits into the existing transportation landscape and learning how to best partner with local officials and communities.The inclusion of GO, a popular taxi app in Japan, in the strategic partnership could signal Waymos intention to put its autonomous vehicles into service through a locally based mobility provider. Waymo is already doing this in the US, making its autonomous vehicles available on Ubers ridehail app in Austin and Atlanta. We have no plans to serve riders in Tokyo at this timeWaymos robotaxi business in the US is growing, albeit slowly. The company currently has approximately 700 vehicles in operation in several cities, including San Francisco, Los Angeles, Austin, and Phoenix. It also plans to launch a robotaxi service in Atlanta in an exclusive partnership with Uber and is planning to launch in Miami in 2026. Alphabet CEO Sundar Pichai recently said that Waymo was providing 175,000 paid trips per week, or about a million miles.In Tokyo, Waymos vehicles will be operated by trained autonomous specialists employed by Nihon Kotsu. Once the company feels like its ready, it will transition to hands-free autonomous driving with a safety driver behind the wheel. Karp wouldnt say whether that would eventually progress to fully driverless operations. The vehicles will be geofenced to certain neighborhoods in Tokyo, including Minato, Shinjuku, Shibuya, Chiyoda, Ch, Shinagawa, and Kt.In bringing its vehicles to its first foreign country, Alphabet is trying to project confidence in its technology, especially at a time when companies are pulling back on costly robotaxi projects. General Motors recently announced that it would no longer fund Cruise and would instead pivot to driver-assist technology and personally owned autonomous vehicles. Several companies have tested their autonomous vehicles in Japan, but the country is a bit of a backwater compared to China and the US. Part of the problem seems to be the countrys robust auto industry is focusing its testing in countries other than its native one. Toyota and Nissan are both seeking to deploy robotaxis in China in collaboration with local operators.
    0 Comments ·0 Shares ·79 Views
  • The Framework Laptop 16 just got a modular gadget that enables quadruple SSDs
    www.theverge.com
    The most ambitious laptop ever made just got a long-promised modular upgrade. Starting today, you can pay $39 to add two extra M.2 slots to the Framework Laptop 16 letting you potentially carry around an AI accelerator, an eGPU adapter, or a grand total of four solid state storage sticks for ludicrous capacity. RelatedAs Frameworks blog post points out, the new Dual M.2 Adapter is Frameworks first new modular component since launch that takes advantage of the Laptop 16s big expansion bay around back. At launch, you only had two options: a Radeon RX 7700S discrete graphics card for extra money, or a mostly empty bay that only contained fans. But now, you can add the Dual M.2 Adapter to that mostly empty bay to fit an additional pair of M.2 2280, 2260, 2240 or 2230 modules, with four lanes of PCIe 4.0 each, on top of the twin SSD slots (M.2 2280 and M.2 2230) that come with the laptop to begin with. With current stick SSD capacities topping out at around 8TB (2280) and 2TB (2230) respectively, that means you can theoretically cart around 26TB of storage at once... not counting any 1TB Framework Expansion Cards you stick into the sides of the laptop, or any giant SD cards you plug into the $25 full-size SD card modules that Framework finally released this fall. (With 2TB SD cards on the market, I guess the actual maximum capacity of the Framework Laptop 16 is now 38TB.)And while those who bought the Radeon discrete GPU wont be able to take advantage without swapping out that module, swaps are thankfully quick and easy:In addition to the adapter, Framework has swapped out the Framework Laptop 16s liquid metal cooling for Honeywell PTM7958 thermal paste, and will help provide that for any customer who asks; while Framework characterizes this as a change to fix possible performance degradation over time, I definitely encountered uncomfortable levels of heat and fan noise right away in my review and long-term tests. Find more about Frameworks recent updates in its full blog post like the new Framework Mystery Boxes tinkerers can buy to try out an assortment of random, possibly non-functional parts that users have returned to the company.
    0 Comments ·0 Shares ·77 Views
  • Blender Market Best of 2024 Bundle Re-run
    gamefromscratch.com
    The Blender Market Best of 2024 Humble Bundle has returned for the next two days only. This bundle contains a collection of some of the most popular Blender add-ons with a big but. The versions contained are a snapshot in time and will not receive updates. If this isnt an issue, the bundle is an amazing opportunity to pick up many popular Blender plugins for $30. Unlike the original bundle, this one does not contain any tiers.Bundle Contents:Summer PackClay DohCableratorGobos PlusProcedural SignsRealtime MaterialsReal CloudBlend ShopRealistic TouchPhysical Open WatersSpock: Structured Scifi PackerPerspective PlotterMalfunktion Effects & FiltersHard Ops/ Boxcutter Ultimate BundleTrue Terrain 5Flip FluidsHuman Generator (personal)RetopoflowPhysical Celestial ObjectsProcedural AlleysFlora PaintTrue SkyCloud ScapesTraffiqKit Ops 3 ProSci-Fi Flex v2Conform ObjectShaders PlusUnderwater CausticsBotaniqUndergrowthYou can learn more about the Blender Market Best of 2024 Humble Bundle in the video below. Using links on this page helps support GFS (and thanks so much if you do!). If you have trouble opening a link, paste it into a new tab and it should work just fine.
    0 Comments ·0 Shares ·115 Views
  • Goo Engine x Blender Anime for Blender
    gamefromscratch.com
    DillonGoo Studios, a studio specializing in anime art style productions have just announced they are joining with the Blender development team to bring their custom anime focused version of Blender (featured here). In addition to bringing most of the custom technology in their Goo Engine Blender fork directly to the main version of Blender, they will also be creating a new Anime style short film to dogfood the new NPR renderer.Details from the Dillon Goo blog announcement:This is the big moment weve been waiting for! The last few years of working on Goo Engine have all been leading to the moment where we can finally integrate the NPR features weve been researching into Blender official! Well be contributing to the development and design ourselves, and guiding the direction of NPR in Blender.OPEN MOVIE ANNOUNCEMENTAlongside the NPR Engine development, well also be producing a short Open Movie project to test out the build!All Open Movie assets will be released here on Patreon!NOTE: Character in the image is not related to the Open Movie. Well be starting preproduction on it in early 2025!Key LinksGoo Engine Blender NPR AnnouncementNPR Prototype Branch on Blender BuildsYou can learn more about DillonGoo Studio, Goo Engine and future Anime / NPR rendering in Blender in the video below. This video was sponsored by TechSmith the makers of Camtasia (which is what I use to create all of my videos). You canlearn more about Camtasia hereand use code GAMEFROMSCRATCH at checkout for 15% off.
    0 Comments ·0 Shares ·132 Views
  • Microsoft AI Research Introduces OLA-VLM: A Vision-Centric Approach to Optimizing Multimodal Large Language Models
    www.marktechpost.com
    Multimodal large language models (MLLMs) are advancing rapidly, enabling machines to interpret and reason about textual and visual data simultaneously. These models have transformative applications in image analysis, visual question answering, and multimodal reasoning. By bridging the gap between vision & language, they play a crucial role in improving artificial intelligences ability to understand and interact with the world holistically.Despite their promise, these systems need to overcome significant challenges. A core limitation is the reliance on natural language supervision for training, often resulting in suboptimal visual representation quality. While increasing dataset size and computational complexity have led to modest improvements, they need more targeted optimization for visual understanding within these models to ensure they achieve the desired performance in vision-based tasks. Current methods frequently need to balance computational efficiency and improved performance.Existing techniques for training MLLMs typically involve using visual encoders to extract features from images and feeding them into the language model alongside natural language data. Some methods employ multiple visual encoders or cross-attention mechanisms to enhance understanding. However, these approaches come at the cost of significantly higher data and computation requirements, limiting their scalability and practicality. This inefficiency underscores the need for a more effective way to optimize MLLMs for visual comprehension.Researchers at SHI Labs at Georgia Tech and Microsoft Research introduced a novel approach called OLA-VLM to address these challenges. The method aims to improve MLLMs by distilling auxiliary visual information into their hidden layers during pretraining. Instead of increasing visual encoder complexity, OLA-VLM leverages embedding optimization to enhance the alignment of visual and textual data. Introducing this optimization into intermediate layers of the language model ensures better visual reasoning without additional computational overhead during inference.The technology behind OLA-VLM involves embedding loss functions to optimize representations from specialized visual encoders. These encoders are trained for image segmentation, depth estimation, and image generation tasks. The distilled features are mapped to specific layers of the language model using predictive embedding optimization techniques. Further, special task-specific tokens are appended to the input sequence, allowing the model to incorporate auxiliary visual information seamlessly. This design ensures that the visual features are effectively integrated into the MLLMs representations without disrupting the primary training objective of next-token prediction. The result is a model that learns more robust and vision-centric representations.The performance of OLA-VLM was rigorously tested on various benchmarks, showing substantial improvements over existing single- and multi-encoder models. On CV-Bench, a vision-centric benchmark suite, OLA-VLM outperformed the LLaVA-1.5 baseline by up to 8.7% in in-depth estimation tasks, achieving an accuracy of 77.8%. For segmentation tasks, it achieved a mean Intersection over Union (mIoU) score of 45.4%, significantly improving over the baselines 39.3%. The model also demonstrated consistent gains across 2D and 3D vision tasks, achieving an average improvement of up to 2.5% on benchmarks like distance and relation reasoning. OLA-VLM achieved these results using only a single visual encoder during inference, making it far more efficient than multi-encoder systems.To further validate its effectiveness, researchers analyzed the representations learned by OLA-VLM. Probing experiments revealed that the model achieved superior visual feature alignment in its intermediate layers. This alignment significantly enhanced the models downstream performance across various tasks. For instance, the researchers noted that integrating special task-specific tokens during training contributed to better optimizing features for depth, segmentation, and image generation tasks. The results underscored the efficiency of the predictive embedding optimization approach, proving its capability to balance high-quality visual understanding with computational efficiency.OLA-VLM establishes a new standard for integrating visual information into MLLMs by focusing on embedding optimization during pretraining. This research addresses the gap in current training methods by introducing a vision-centric perspective to improve the quality of visual representations. The proposed approach enhances performance on vision-language tasks and achieves this with fewer computational resources compared to existing methods. OLA-VLM exemplifies how targeted optimization during pretraining can substantially improve multimodal model performance.In conclusion, the research conducted by SHI Labs and Microsoft Research highlights a groundbreaking advancement in multimodal AI. By optimizing visual representations within MLLMs, OLA-VLM bridges a critical gap in performance and efficiency. This method demonstrates how embedding optimization can effectively address challenges in vision-language alignment, paving the way for more robust and scalable multimodal systems in the future.Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our60k+ ML SubReddit. Nikhil+ postsNikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute. [Download] Evaluation of Large Language Model Vulnerabilities Report (Promoted)
    0 Comments ·0 Shares ·75 Views
  • The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply Them
    towardsai.net
    The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply Them 0 like December 16, 2024Share this postAuthor(s): Prashant Kalepu Originally published on Towards AI. The Top 10 AI Research Papers of 2024: Key Takeaways and How You Can Apply ThemPhoto by Maxim Tolchinskiy on UnsplashAs the curtains draw on 2024, its time to reflect on the innovations that have defined the year in AI. And lets be real what a year it has been! From breakthroughs in large language models to revolutionary approaches in computer vision and AI safety, the research community has outdone itself.But with so much groundbreaking work out there, which ones truly stood out? Which papers made us pause, rethink, and wonder, How can I use this in my own work? Well, Ive got you covered! Heres my personal list of favorite AI research papers from 2024 the ones that sparked my imagination and made me want to dive straight into experimentation.Whether youre an AI enthusiast, a researcher hunting for your next big project, or someone curious about whats shaping the AI world, this list isnt just a year-end recap. Its your inspiration board. These papers are not just fascinating; theyre also usable full of ideas, frameworks, and insights you can directly implement in your own work.So, grab a coffee (or a milkshake, if youre like me) and lets explore the top AI research papers of 2024. By the end of this, I bet youll have more than a few new ideas brewing for your next project.1. Vision MambaSummary: Vision Mamba introduces the application of state-space models (SSMs) to computer vision tasks. Unlike transformer-based architectures that rely on computationally expensive attention mechanisms, Vision Mamba achieves competitive performance with linear complexity. The paper showcases how these models handle temporal and spatial dependencies in video and image data more efficiently, making them ideal for low-latency applications.Key Contributions:State-space models for vision tasks.Improved speed and memory efficiency compared to transformers.Competitive results in video and image classification benchmarks.How You Can Use It:Robotics and AR/VR Systems: Use Vision Mambas lightweight architecture to build real-time vision systems.Multi-Modal Applications: Combine with NLP models to create AI assistants that interpret both text and images.Edge Computing: Deploy on devices with limited computational resources, like drones or smart glasses.My Intuition:Imagine you are building a real-time security system for a retail store that detects suspicious behavior using video feeds. Vision Mambas efficient processing means you can analyze multiple camera feeds on an edge device without needing a powerful server. For example, it could flag unusual patterns like someone hovering too long in certain aisles or repetitive movement in restricted areas without delays or memory bottlenecks.2. Kernel Arnold Networks (KAN)Summary: Kernel Arnold Networks (KAN) propose a new way of representing and processing data, challenging traditional deep neural networks. By leveraging kernel methods and differential equations, KAN achieves scalability and robustness, particularly in tasks requiring high interpretability or dynamic adaptability.Key Contributions:Unique combination of kernel methods with deep learning principles.Efficient handling of non-linear relationships.Application to a broad range of tasks, including physics-based simulations and temporal data analysis.How You Can Use It:Time Series Analysis: Apply KAN to financial forecasting or climate modeling, where complex temporal patterns are present.Scientific Research: Use for simulation-heavy fields like molecular dynamics or astrophysics.Real-Time Analytics: Implement for fraud detection or anomaly recognition in streams of data.My Intuition:Suppose youre working for an e-commerce company, and your task is to detect abnormal spikes in customer activity, such as sudden bulk purchases of specific products during flash sales. Using KAN, you can model these complex, non-linear patterns in real time and quickly flag unusual behavior for further investigation, ensuring smooth operations.3. GEMMA ModelsSummary: GEMMA Models focus on integrating safety and fairness into AI systems without compromising their performance. By introducing novel training techniques and robust evaluation methods, the paper emphasizes reducing bias, enhancing robustness, and improving generalization capabilities in AI models.Key Contributions:Frameworks for fairness in multi-modal AI.Techniques for adversarial robustness.Metrics and benchmarks for safety-focused evaluation.How You Can Use It:Healthcare AI: Develop models for diagnosis or treatment recommendations, ensuring fairness across demographic groups.Ethical AI Tools: Create applications that provide transparent insights into decision-making processes.Real-Time Monitoring: Build tools that detect and mitigate biases during model inference.My Intuition:Imagine youre building an AI hiring assistant that screens resumes and conducts initial video interviews. Using GEMMA, you can ensure the AI evaluates candidates equally, regardless of gender, ethnicity, or accents, making the hiring process fairer. For instance, if it detects potential bias in how resumes are ranked, the model can adjust its decision-making criteria dynamically.4. Qwen 2 Model SeriesSummary: Qwen 2, developed by Alibaba, offers a modular and scalable architecture optimized for multi-modal tasks. It integrates text, image, and code generation capabilities with advanced mixture-of-expert techniques, enabling seamless processing of diverse data formats.Key Contributions:State-of-the-art performance in multi-modal benchmarks.Modular design for scalability and efficiency.Specialization in cross-modal reasoning tasks.How You Can Use It:Assistive Technology: Build applications for the visually impaired that interpret and describe images in real-time.Cross-Lingual and Cross-Modal AI: Use Qwen 2 for advanced language translation paired with visual context.Interactive AI Systems: Develop virtual assistants that understand and respond to multi-modal queries.My Intuition:Think of a travel assistant app that uses Qwen 2. A user could upload a photo of a restaurant menu in a foreign language, and the app would not only translate the text but also suggest dietary options based on their preferences. For example, it could identify vegetarian dishes by analyzing both the image and the translation context.5. Mixture of Experts (MixR A7B)Summary: MixR A7B presents an advanced modular architecture with mixture-of-expert techniques, allowing it to allocate computational resources dynamically based on the task at hand. This results in improved efficiency for multi-tasking and personalized applications.Key Contributions:Modular AI for personalized task performance.Scalable architecture for large-scale deployments.Dynamic resource allocation for computational efficiency.How You Can Use It:Recommendation Engines: Build AI systems that adapt to individual user preferences in real time.Personalized Learning Platforms: Develop adaptive educational tools tailored to students needs.Efficient AI Deployments: Reduce computational overhead in large-scale AI systems for diverse applications.My Intuition:Picture an e-learning platform where students of different learning speeds interact with the same AI tutor. Using MixR A7B, the AI could allocate more computational focus on struggling students while reducing resources for those who are advancing quickly, personalizing learning experiences in real time.6. Gemini 1.5Summary: Gemini 1.5 is Googles response to the increasing demand for long-context processing in NLP. It introduces a 10-million-token context length, making it ideal for analyzing large documents, such as books or legal texts, with unparalleled efficiency and speed.Key Contributions:Industry-leading long-context understanding.Efficient memory and computational optimization.Breakthrough performance in summarization and retrieval tasks.How You Can Use It:Document Analysis: Summarize lengthy contracts, legal documents, or books.Research Tools: Build AI systems to help researchers extract insights from large academic datasets.Advanced Chatbots: Develop chatbots capable of maintaining detailed, context-aware conversations.My Intuition:Imagine a legal-tech startup building a tool to help lawyers quickly analyze and summarize 500-page legal agreements. With Gemini 1.5, the system could not only summarize key points but also highlight potential risks or conflicting clauses, saving lawyers countless hours of manual work.7. ChatGPT++: Enhanced In-Context LearningSummary: ChatGPT++ introduces novel advancements in in-context learning, enabling models to better understand user-provided examples and adapt responses dynamically. The paper focuses on fine-tuning techniques that allow for personalized AI assistants that deliver tailored outputs based on context and history.Key Contributions:Enhanced in-context learning capabilities for personalization.Improved response coherence across extended conversations.Integration of memory modules to maintain long-term context.How You Can Use It:Personalized AI Assistants: Build customer support tools that adapt to a users tone and past queries.Learning Platforms: Develop language tutors that adjust based on how well a student performs in previous exercises.Knowledge Management Tools: Design AI systems that retain and retrieve relevant context for workplace documentation.My Intuition:Consider a virtual career coach that remembers a users past mock interviews and adapts its feedback based on their progress. For instance, if someone struggled with behavioral questions in their last session, ChatGPT++ could emphasize those areas in the next interaction, offering more detailed suggestions tailored to improvement over time.8. Mistral-7B InstructSummary: Mistral-7B Instruct is a fine-tuned large language model (LLM) with only 7 billion parameters but performance comparable to much larger models. It focuses on instruction-following tasks, making it lightweight yet powerful for practical applications.Key Contributions:Performance optimization for smaller-scale LLMs.Fine-tuned for instruction clarity and task-specific outputs.Reduced computational requirements without sacrificing accuracy.How You Can Use It:AI Tools for Small Businesses: Deploy lightweight, cost-effective AI solutions for generating content, answering FAQs, or automating customer queries.Mobile Apps: Build language-powered apps that run efficiently on mobile devices.Specialized Assistants: Create domain-specific AI assistants tailored to areas like healthcare or finance.My Intuition:Imagine creating a mobile app that acts as a personal writing coach for students. Using Mistral-7B Instruct, the app could provide grammar corrections, suggest better phrasing, and explain language rules in simple terms. For example, it could rewrite essays for clarity and explain why changes were made all on a lightweight, on-device model.9. Orca LLM: Reasoning with ExamplesSummary: Orca LLM focuses on improving reasoning capabilities by training on a novel dataset of example-based reasoning tasks. It bridges the gap between generalist LLMs and specialized reasoning engines, enhancing its ability to solve complex logical problems.Key Contributions:Training on example-based reasoning datasets.Improved performance in multi-step reasoning tasks.Enhanced capabilities in logical reasoning and structured problem-solving.How You Can Use It:AI Tutors: Develop systems to teach critical thinking skills to students by walking them through logical problems step-by-step.Data Analytics Tools: Build platforms that assist in decision-making by logically evaluating trade-offs.Interactive Puzzles: Create games or applications involving AI that solves riddles or logical challenges.My Intuition:Picture a study tool for competitive exam aspirants, like CAT or GMAT, where the AI breaks down complex quantitative and reasoning questions into step-by-step solutions. Orca could show how to approach problems logically, making the learning experience more interactive and effective.10. CLAW-LM: Context Learning Across WindowsSummary: CLAW-LM introduces a novel approach to handling fragmented contexts in NLP tasks. The model excels in processing context spread across multiple windows, enabling it to maintain a consistent understanding of segmented information.Key Contributions:Context aggregation techniques for fragmented inputs.Improved coherence and relevance in long-form text generation.Benchmark-leading performance in tasks requiring cross-window context retention.How You Can Use It:Academic Research Summaries: Build AI tools that aggregate information from multiple fragmented research papers.Customer Interaction History: Develop AI for customer support that synthesizes information from scattered tickets.Multi-Document Summarization: Create tools to summarize insights across multiple reports or articles.My Intuition:Imagine working in a newsroom and needing to create an in-depth summary of breaking news. CLAW-LM could pull data from multiple news updates (tweets, articles, press releases) and generate a coherent report while retaining important details from each fragmented piece. For instance, it could pull together a timeline of events in a crisis and highlight key developments across different sources.Final ThoughtsThese 10 papers showcase the cutting-edge trends in AI, from advancing computer vision and neural networks to innovating NLP and multi-modal systems. Whether youre building scalable systems for businesses, creating real-world applications, or diving into the theory behind AI advancements, these papers offer tools, techniques, and inspiration to fuel your journey.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AITowards AI - Medium Share this post
    0 Comments ·0 Shares ·97 Views