• ARCHINECT.COM
    Featured jobs this week at Kiss-Architects, Studio One Eleven, Jayson Architecture, and Venn Studio
    Take a look at our latest curated selection of architecture and design firms currently hiring onArchinect Jobs: This week'sfeatured employer highlightincludes job openings in Brooklyn, Los Angeles/Long Beach, and San Francisco.For even more opportunities, visit theArchinect job boardand explore our active community ofjob seekers,firms, andschools.Brooklyn-based firmKiss-Architectsis hiring for two roles: a Research/Sustainability Intern who is currently attending or has attended a college or university within the last 12 months, has a passion for the environment, and familiar with BIM, LCA, and energy analysis software; and an Intermediate Architect with three or more years of experience and who possesses a knowledge of passive house, LEED, BIM workflows, energy/daylighting simulation, and LCA/embodied carbon analysis.Bushwick Inlet Park by Kiss-Architects. Photo: Paul WarcholStudio One Elevenis in search of an experiencedSenior Designer in Long Beach, California. The...
    0 Commenti 0 condivisioni 108 Views
  • GAMINGBOLT.COM
    Lunar Remastered Collection Releases on April 18th for $49.99
    GungHo Online Entertainments Lunar Remastered Collection is out on April 18th, retailing for $49.99. It will be available for PS4, Xbox One, Nintendo Switch, and PC, with compatibility for PS5 and Xbox Series X/SAlongside digital editions, there will be physical editions with reversible covers in North America via Amazon ($54.99) and Europe for PS4 and Switch via Clear River Games (54.99). They feature new art from Lunar: Silver Star Story Complete and Lunar 2: Eternal Blue Complete by series animation director and character designer Toshiyuki Kubooka.Based on Game Arts acclaimed RPG classics, Lunar Remastered Collection features improved visuals, wide-screen support, additional language options (English, Japanese, French, and German), and fully voiced attacks (including a new English voiceover).Fans can opt for a classic mode to experience the original visuals and try out quality-of-life features like speeding up combat. Additional settings for streamlining battles are also available. Stay tuned for more updates in the coming months.
    0 Commenti 0 condivisioni 87 Views
  • Get 6 free smoke and fire effects sequences from FX BackPack
    html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"FX BackPacks commercial Smoke Impacts collection of stock FX clips and VDB file sequences, one of which is included in its free Discovery Pack of smoke and fire effects.New stock effects firm FX BackPack has released the Discovery Pack: six free smoke and fire effects for use in visual effects or motion graphics projects.The assets are provided as sequences of VDB files optimized for use in Blender, but suitable for use in other DCC apps, and as 2K MOV files with alpha channels for compositing.Six free stock FX clips of smoke and fire, for use in VFX or motion graphicsThe Dicovery Pack consists of six stock effects: three of fires and three of smoke.All are based on Houdini simulations that are then resampled to reduce file sizes while preserving as much of the original detail as possible.The fire effects are flame sims with sources with suitable shapes for use on the windows or facades of buildings, and on vertical structures like poles, pillars or ropes.The smoke effects consist of a ground explosion, a smoke plume, and horizontal jets mimicking smoke grenades or steam escaping from pipes.The 150-to-300-frame animations are provided as 2K ProRes 4444 MOV files with transparent backgrounds that can be composited into footage for use in VFX or motion graphics projects.Also available as VDB file sequences for rendering in Blender or other CG softwareUsers also get the effects as sequences of VDB files that can be rendered in any DCC app or game engine that supports the OpenVDB file format, now including Unreal Engine.The VDB files are optimized for use in Blender, where they can used with either the Cycles or Eevee render engines, and are designed to be lightweight and quick to render.Blender users also get a .blend scene file with five readymade smoke shaders and three stylized fire shaders.Available as a free sample of FX BackPacks commercial asset collectionsThe files are samples from FX BackPacks commercial asset collections, each of which contains 10 variations of smoke or fire effects.The commercial packs cost $4.99 or $9.99, depending on whether you want both the MOV files and VDB file sequences, or the MOV files alone.As well as downloading the samples free on Gumroad, you can buy the Discovery Pack on Blender Market, with all of the money from sales going to the Blender Development Fund.FX BackPack plans to release new collections throughout 2025 and 2026, starting with volumetric effects, but eventually including magic FX, destruction, water and particles.License and system requirementsFX BackPacks Discovery Pack assets are provided as sequences of VDB files, suitable for use in any DCC application that supports the format, and as 2K MOV files, for use in compositing.The Blender files are compatible with Blender 3.0+, and the Cycles and Eevee renderers.The assets are licensed for commercial use.Download FX BackPacks smoke and fire FX Discovery Pack from Gumroad(Enter a figure of $0 to download it for free, or make a voluntary donation)Have your say on this story by following CG Channel on Facebook, Instagram and X (formerly Twitter). As well as being able to comment on stories, followers of our social media accounts can see videos we dont post on the site itself, including making-ofs for the latest VFX movies, animations, games cinematics and motion graphics projects.
    0 Commenti 0 condivisioni 98 Views
  • WWW.SMITHSONIANMAG.COM
    These Fascinating Objects Show How the Palace of Versailles Drove Surprising Scientific Advances in the 17th and 18th Centuries
    A visitor examines a watchcrafted byAbraham-Louis Breguetfor Marie Antoinette. Science Museum GroupThe sprawling grounds of Frances opulentPalace of Versailles are not only a masterpiece of landscaping, but a mark of 17th-century engineering innovation. The propertys ornamental fountains and ponds, for example, were only made possible by a specially builtmachine that pulled water from the Seine and pushed it up a steep hill to Versailles.The machine was designed at the request of the French kingLouis XIVthe monarch who commissioned Versailles, also known as theSun King. His and his successors courts are mainly remembered for ostentatious finery, but their immense wealth also enabled discovery, and Versailles turned into a hotbed for science: a theme explored by a new exhibition at the Science Museum in London: Versailles: Science and Splendor.The exhibition highlights how science flourished at Versailles, from the kings personal interest in luxurious scientific instruments and spectacular demonstrations, to its strategic role beyond the palace through newly founded institutions and scientific expeditions, saysGlyn Morgan, the museums curatorial lead for exhibitions, in astatement. Jean-Dominique Cassini'smap of the moon Observatoire de ParisThe show spans the 17th and 18th centuries, covering the reigns of Louis XIV (who founded the FrenchAcademy of Sciences in 1666), Louis XV andLouis XVI (who was executed in 1793 during theFrench Revolution). Through more than 120 artifacts, the exhibition illustrates the importance of art and science in the French royal court.Many of the objects show innovations in medicine, such as a curved scalpel designed by Louis XIVs royal surgeon, Charles-Franois Flix.Guardians Jonathan Jones writes, The rehearsals worked: He fixed the royal fistula and Louis XIV lived on until 1715, his 72-year reign a world record.Under the reign of Louis XV (who ruled between 1715 and 1774), a midwife calledMadame du Coudray became a powerful force against French infant mortality. The king hired her to travel throughout rural France to train other midwives in the mechanics of birth, employing sophisticated life-sized mannequins, per the statement. Du Coudray ultimately educated more than 5,000 women and physicians, and her last surviving model is on display in the exhibition. Part of a mannequinused by Madame du Coudray to train midwives Muse Flaubert d'Histoire de la Mdecine / Mtropole Rouen NormandieAfter Louis XV died of smallpox, his son and successor Louis XVI announced that the royal family would get inoculated. Included in the exhibition are posters made to reassure the public of the strategys success.Science at Versailles extended to the wonders of the natural world. French botanists grew and studied exotic plants. Versailles had amenagerie stocked with animals likecoatis andcassowaries. Visitors to the Science Museum will be able to see the menageries most famous resident: Louis XVs rhinoceros, given to the king by a French governor based in India and later dissected and taxidermied upon its death.As it was studied by scientists, it became incredibly important to our growing zoological knowledge, as Morgan tells theObservers Vanessa Thorpe. The photographs really did not do justice to just how impressive and characterful it is. The skin is almost jet black. Louis XVs rhinoceros was dissected and taxidermied after its death in 1793. Science Museum GroupAlso on display are philosopherEmilie du Chtelets handwritten, annotated French translation of Isaac NewtonsPrincipia Mathematica; thewatch made for Louis XVIs wife, Marie Antoinette, by Abraham-Louis Breguet; and an early map of the moon drawn byJean-Dominique Cassini in 1679. The French Revolution claimed the lives of some of the French royal courts thinkers, but their advances were permanent. Per the Guardian, Science strode on.Royal ambition, scientific knowledge and ideals of beauty culminated at Versailles in spectacular demonstrations and brilliant innovations from the brightest minds of the time, saysIan Blatchford, director and chief executive of the Science Museum Group, in the statement. We are thrilled to introduce our visitors to these fascinating stories through the stunning objects on display.Versailles: Science and Splendor is on view at the Science Museum in London through April 21, 2025. Get the latest stories in your inbox every weekday.Filed Under: Animals, Art, Earth Science, Exhibitions, Exhibits, Famous Scientists, France, Innovations, Kings, Medicine, Moon, Natural Sciences, Royal Family, Women in Science
    0 Commenti 0 condivisioni 79 Views
  • VENTUREBEAT.COM
    OpenAIs agentic era begins: ChatGPT Tasks offers job scheduling, reminders and more
    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreChatGPT is taking a significant step toward becoming a full-blown personal assistant with the release of a new feature called Tasks. This could signal that OpenAI will release more agents in the future.Currently in beta, Tasks lets ChatGPT Plus, Team and Pro users schedule actions ahead of time. For example, if someone wants to receive project reminders or daily weather updates, they can prompt ChatGPT, which will notify them of the chosen date and time. Tasks can be recurring or one-time reminders.To set up a task, users should toggle to 4o with scheduled tasks on the model picker and write a reminder prompt. ChatGPT can also suggest tasks from previous conversations. Tasks work with all versions of ChatGPT and send notifications through desktop, web and mobile. However, users can access the task manager only on the web version of ChatGPT.OpenAI said the beta period will help its researchers understand how people use Tasks and refine the feature before making it available to all ChatGPT users.Tasks join other assistant-like features for ChatGPT. During its 12 Days of OpenAI event in December, OpenAI launched screen sharing, letting users open ChatGPT while reading a text message and asking it to help them respond.OpenAIs first agent?Rumors around OpenAI releasing an AI agent swirled when some users caught ChatGPT providing access to scheduled tasks as far back as December. The agent, called Operator, would be the companys first agent.People believed Tasks could be the precursor to Operator. X user @kimmonismus, aka Chubby, said, The Information seems to be right and everything is being prepared for the release of Operator, i.e. OpenAIs agent. The Tasks function found by Tibor seems to be the first significant step in the preparation of Operator. It is questionable whether we will get a release this month or just a preview.Testing Catalog News on X theorized Tasks could eventually enable ChatGPT to search for specific information, summarize data, open websites, access documents and think through problems.VentureBeat reached out to OpenAI about Tasks and Operator. The company declined to answer the question and only said Tasks will be an important step toward making ChatGPT a more helpful AI companion.OpenAI has already made its first foray into the agentic space with Swarm, a framework it released to help orchestrate AI agents.Making ChatGPT a better assistantIf youre like me, you probably use the Reminders app and Google Calendar too much to remind yourself to text a relative Happy Birthday or of your brunch plans, or to alert you to news embargoes and planned interviews.There are many reminder, calendar and productivity apps available for both consumers and enterprises. These include Google Calendar, Outlook Calendar, Asana, Trello and Notion. The productivity assistant space is not short of applications that remind individuals and teams of tasks they must accomplish.OpenAIs big play in such a crowded space is interesting, especially since most people dont think of chatbots as scheduling assistants. But ChatGPT already streamlines the process for users to migrate their coding or writing tasks to the platform or search the web without leaving the chat interface. ChatGPT even opens up developers IDE nearly automatically for them.As ChatGPT adds more actions to its platform, setting up scheduled tasks and reminders is not so far-fetched. This makes ChatGPT a viable competitor to many productivity and scheduling apps.Daily insights on business use cases with VB DailyIf you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.Read our Privacy PolicyThanks for subscribing. Check out more VB newsletters here.An error occured.
    0 Commenti 0 condivisioni 107 Views
  • WWW.GAMESINDUSTRY.BIZ
    GOG bolsters game preservation efforts by joining European game archivist organisation
    GOG bolsters game preservation efforts by joining European game archivist organisationSays this is "a significant step in bridging the private sector with cultural organisations"Image credit: GOG News by Vikki Blake Contributor Published on Jan. 14, 2025 GOG has partnered with the European Federation of Game Archives, Museums, and Preservation Projects (EFGAMP), Europe's largest federation of organisations working to preserve "video games as cultural heritage".With fellow members including The Video Game Museum in Rome, France's MO5.COM, Computerspielemuseum in Berlin, The Netherlands Institute for Sound & Vision, and Embracer Games Archive, GOG said its membership marked "a significant step in bridging the private sector with cultural organisations across Europe."It also hoped that by joining EFGAMP, GOG would "reinforce itself position as one of the global champions in game preservation" and emphasise its "passionate" advocacy work.GOG was created with video game preservation in mind," said Maciej Gołębiewski, managing director at GOG. "Classic games and the mission to safeguard them for future generations have always been at the core of our work."Over the past decade, weve honed our expertise in this area. The GOG Preservation Program, which ensures compatibility for over 100 games and delivers hundreds of enhancements, is just one example of this commitment. We were thrilled to see the Program warmly received not only by our players but also by our partners and the gaming industry as a whole."COO of EFGAMP, Andreas Lange, added: GOG brings a unique perspective to EFGAMP as a European leader in digital game distribution."Their experience in making classic games accessible to modern audiences complements the work of our existing members by bringing further digital expertise to our collective efforts."As GOG distributes classic games worldwide, GOG is a fantastic addition to EFGAMP, whose members are primarily rooted in the cultural heritage sector."
    0 Commenti 0 condivisioni 96 Views
  • WWW.GAMEDEVELOPER.COM
    Space Ape Games launches offshoot studio NextBeat
    Mobile developer Space Ape Games recently launched a spinoff team called NextBeat, reports PocketGamer. Several Space Ape employees are transferring to the new studio, including co-founder Simon Hade as its CEO and product manager George Yao as head of live games.Per MobileGamer, 30 staffers currently make up NextBeat. Several non-Space Ape developers have joined, such as CFO (and Kepler Interactive veteran) Joe Adams.NextBeat's emergence comes as Space Ape is set to be fully acquired by Supercell by the spring. The Clash of Clans maker previously invested stake into the London developer, which makes BeatStar and Transformers: Earth Wars for phones.NextBeat is taking BeatStar on tourAccording to PocketGamer, the BeatStar and Country Star properties are also jumping over to NextBeat. Hade and co-founder Olly Barnes told Music Business Worldwide the former game has over 100 million downloads worldwide and is projected to exceed $200 million in lifetime revenue before the end of 2025. As such, Hade said they had to "address this immense opportunity, now.""BeatStar taught us that there is huge per-user monetization potential for music in gaming," said Barnes. "The vast mobile gaming audience and ARPU insights from Country Star's genre-focused approach underscored the need to launch a fully resourced company dedicated to this opportunity."As Hade told Music Business, NextBeat is considered more of a "dedicated, music-focused venture" than a mobile game developer like Space Ape. His ultimate hope is for the new studio to "expand into more experiences, games and apps that showcase the inspiration and creativity of our artist partners." Such plans, according to the outlet, potentially include entering the educational tech and mental health sectors.For now, the focus is on growing out BeatStar into other genres via potential standalone games. Barnes told MobileGamer the next "most obvious evolution" would be nostalgic rock, a la Guitar Hero. Hade showed similar interest in Latin, anime music, and EDM, but said some genres might require "slightly different" monetization approaches.
    0 Commenti 0 condivisioni 110 Views
  • WWW.THEVERGE.COM
    Instagram alternative Pixelfed now has apps
    Instagram alternative Pixelfed now has appsInstagram alternative Pixelfed now has apps / Pixelfed recently said its seeing unprecedented levels of traffic to the original pixelfed.social server.By Umar Shakir, a news writer fond of the electric vehicle lifestyle and things that plug in via USB-C. He spent over 15 years in IT support before joining The Verge. Jan 14, 2025, 7:10 PM UTCShare this story Image: PixelfedPixelfed, a decentralized and ad-free Instagram alternative, now has apps on iOS and Android, as reported by TechCrunch. The iOS app launched today, while the Android app launched on January 10th.The platform is seeing a surge in popularity following Metas announcement last week that it would be drastically changing its content moderation policies; over the weekend, Pixelfed said that its seeing unprecedented levels of traffic to the pixelfed.social server and was working to increase resources.Pixelfed was also in the news this week because some users claimed that Meta randomly blocked links to the site they shared on Facebook. According to Engadget, Meta blocked the Pixelfed links by mistake and is now reinstating the posts.The creator of Pixelfed, Daniel Supernault, also launched a decentralized version of TikTok last October called Loops. With TikTok facing a ban in the US and the fallout from Metas content moderation changes, Pixelfed and Loops offer other options for people to jump ship to.Most PopularMost Popular
    0 Commenti 0 condivisioni 92 Views
  • WWW.MARKTECHPOST.COM
    OpenBMB Just Released MiniCPM-o 2.6: A New 8B Parameters, Any-to-Any Multimodal Model that can Understand Vision, Speech, and Language and Runs on Edge Devices
    Artificial intelligence has made significant strides in recent years, but challenges remAIn in balancing computational efficiency and versatility. State-of-the-art multimodal models, such as GPT-4, often require substantial computational resources, limiting their use to high-end servers. This creates accessibility barriers and leaves edge devices like smartphones and tablets unable to leverage such technologies effectively. Additionally, real-time processing for tasks like video analysis or speech-to-text conversion continues to face technical hurdles, further highlighting the need for efficient, flexible AI models that can function seamlessly on limited hardware.OpenBMB Releases MiniCPM-o 2.6: A Flexible Multimodal ModelOpenBMBs MiniCPM-o 2.6 addresses these challenges with its 8-billion-parameter architecture. This model offers comprehensive multimodal capabilities, supporting vision, speech, and language processing while running efficiently on edge devices such as smartphones, tablets, and iPads. MiniCPM-o 2.6 incorporates a modular design with:SigLip-400M for visual understanding.Whisper-300M for multilingual speech processing.ChatTTS-200M for conversational capabilities.Qwen2.5-7B for advanced text comprehension.The model achieves a 70.2 average score on the OpenCompass benchmark, outperforming GPT-4V on visual tasks. Its multilingual support and ability to function on consumer-grade devices make it a practical choice for diverse applications.Technical Details and BenefitsMiniCPM-o 2.6 integrates advanced technologies into a compact and efficient framework:Parameter Optimization: Despite its size, the model is optimized for edge devices through frameworks like llama.cpp and vLLM, maintaining accuracy while minimizing resource demands.Multimodal Processing: It processes images up to 1.8 million pixels (13441344 resolution) and includes OCR capabilities that lead benchmarks like OCRBench.Streaming Support: The model supports continuous video and audio processing, enabling real-time applications like surveillance and live broadcasting.Speech Features: It offers bilingual speech understanding, voice cloning, and emotion control, facilitating natural, real-time interactions.Ease of Integration: Compatibility with platforms like Gradio simplifies deployment, and its commercial-friendly nature supports applications with fewer than one million daily active users.These features make MiniCPM-o 2.6 accessible to developers and businesses, enabling them to deploy sophisticated AI solutions without relying on extensive infrastructure.Performance Insights and Real-World ApplicationsMiniCPM-o 2.6 has delivered notable performance results:Visual Tasks: Outperforming GPT-4V on OpenCompass with a 70.2 average score underscores its capability in visual reasoning.Speech Processing: Real-time English/Chinese conversation, emotion control, and voice cloning provide advanced natural language interaction capabilities.Multimodal Efficiency: Continuous video/audio processing supports use cases such as live translation and interactive learning tools.OCR Excellence: High-resolution processing ensures accurate document digitization and other OCR tasks.These capabilities can impact industries ranging from education to healthcare. For example, real-time speech and emotion recognition could enhance accessibility tools, while its video and audio processing enable new opportunities in content creation and media.ConclusionMiniCPM-o 2.6 represents a significant development in AI technology, addressing long-standing challenges of resource-intensive models and edge-device compatibility. By combining advanced multimodal capabilities with efficient operation on consumer-grade devices, OpenBMB has created a model that is both powerful and accessible. As AI becomes increasingly integral to daily life, MiniCPM-o 2.6 highlights how innovation can bridge the gap between performance and practicality, empowering developers and users across industries to leverage cutting-edge technology effectively.Check out the Model on Hugging Face. All credit for this research goes to the researchers of this project. Also,dont forget to follow us onTwitter and join ourTelegram Channel andLinkedIn Group. Dont Forget to join our65k+ ML SubReddit.(Promoted) Asif RazzaqAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences. Meet 'Height':The only autonomous project management tool (Sponsored)
    0 Commenti 0 condivisioni 81 Views
  • TOWARDSAI.NET
    Fine-tuning Embeddings for RAG applications
    Author(s): Anuar Sharafudinov Originally published on Towards AI. Credits: GPT4oThe rise of the Retrieval-Augmented Generation (RAG) has revolutionized how we build intelligent applications. At its core, RAG is all about efficiently turning large chunks of text into actionable embeddings and then letting an AI model piece together contextually relevant answers.However, what works in theory can stumble in real-world scenarios. Why? One of the biggest culprits is poor or unclear embedding representations. Often, these representations dont align well with the demands of production-level applications particularly for tasks like question-answering.The solution is fine-tuning embeddings an impactful way to enhance your RAG implementation.RAG 101: How It WorksLets break it down. Heres the typical RAG workflow:User Input: A user submits a question or query.Query Embedding: The system generates an embedding for the query.Chunk Matching: It searches for the chunk embeddings most similar to the query using cosine similarity.Answer Generation: The contents of the retrieved top chunks are sent as context to a language model, which generates the final response.This setup works well in theory. However, when embeddings lack precision, the results can feel off-target, especially when dealing with large datasets.The Fine-Tuning SolutionWhat if you could pre-train your embeddings to anticipate the kinds of questions your users might ask?Heres the idea:Generate Question-Chunk Pairs: For each chunk of text in your dataset, generate multiple potential questions it could answer.Fine-Tune the Embedding Model: Train the model to pull embeddings of related questions and chunks closer together in multidimensional space while pushing unrelated ones further apart.While this approach might seem like overfitting, it actually focuses on optimizing for generalization. It turns out, fine-tuning embeddings in this way equips the system to handle unseen queries with improved accuracy.The Results Speak for ThemselvesFine-tuning embeddings yielded remarkable improvements across several models. For training, we used one of our internal experimental datasets. It consists of 52 chunks, each approximately 800 tokens long. For each chunk, we used Anthropics Claude-3-Sonnet to generate 35 corresponding questions.To evaluate performance, we measured how often the correct chunk appeared within the top 3, top 5, and top 10 retrieved results. To provide a broader context, we also included results for OpenAI/text-embedding-large-3. However, since it is a closed-source model, we could not apply fine-tuning to it.Heres a snapshot of the results:Open-Sourcing the CodeIf youre inspired to experiment with fine-tuning, weve got you covered. Check out our code repository with training and testing scripts for Alibaba-NLP/gte-Qwen21.5B-instruct and jinaai/jina-embeddings-v3 models. The repo also includes support for two training methods: TripletMarginLoss and CosineEmbeddingLoss.Model requirementsAlibaba-NLP/gte-Qwen21.5B-instruct Requires about 30GB of VRAM. A GPU with 40GB of memory and higher (e.g., A100) is recommended. Its forward pass logic is standard and can be applied to many similar embedding models.jinaai/jina-embeddings-v3 is a very lightweight model requiring only 8GB of GPU memory for fine-tuning. Its forward-pass logic is slightly specific, but the core concept is clear.Training methodsTripletMarginLoss. This method uses an anchor (a), a positive sample (p), and a negative sample (n):Anchor (a): Chunk content embeddingPositive sample (p): A corresponding question embeddingNegative sample (n): An unrelated question embeddingThe loss function illustrationTo build a training set, create (chunk, questions) pairs and randomly select unrelated questions as negative samples.2. CosineEmbeddingLoss. This method uses positive and negative samples from different parts of the training set:x1: The chunk embeddingx2: Either a positive or negative sample embeddingy: Label indicating if x2 is positive (y=1) or negative (y=-1).The loss function illustrationAdapting the CodeTo use your own dataset, modify the prepare_data function in train.py. Ensure it returns chunks and their corresponding questions as pairs.Note: The repository does not include question generation logic, but various approaches are available. Below, weve included a sample code that we used for reference.#1. split the document into chunks (simple way)def split_into_chunks(content, chunk_size): import tiktoken enc = tiktoken.get_encoding("o200k_base") a = enc.encode(content) left, chunks = 0, [] while left < len(a): arr = a[left : left+chunk_size] chunks.append(enc.decode(arr)) left+=chunk_size return chunkschunks = split_into_chunks(document_content, 400)#2. generate questions def anthropic_run(system_prompt, user_message): import anthropic client = anthropic.Anthropic( api_key=ANTHROPIC_API_KEY, ) message = client.messages.create( model="claude-3-sonnet-20240229", #"claude-3-opus-20240229", max_tokens=4096, system=system_prompt, messages=[ {"role": "user", "content": user_message} ] ) return message.content[0].text system_prompt = ''' Given a chunk from document. Generate 3-5 questions related to the chunk. Each question must be full and not require additional context. Example output: 1. How to open new account? 2. How much BMW X5 costs? ''' for chunk in chunks: text = "#"+chunk["keywords"]+"\n"+chunk["content"] out = anthropic_run(system_prompt, text) question_pattern = re.compile(r'^\s*\d+\.\s+(.*)', re.MULTILINE) questions = question_pattern.findall(out) print(text, questions)#now you have (chunk, questions) pairsContinuous ImprovementAnother advantage of this approach is its potential for continuous improvement. Over time, as new cases emerge, you can retrain the model. For instance, if a corresponding chunk wasnt found in the top 10 (avoid large numbers to avoid the lost in the middle issue) and the LLM failed to generate an answer, simply add this question and its correct chunk to the training set. This ensures the system evolves to handle similar issues in the future more effectively.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
    0 Commenti 0 condivisioni 108 Views