• GAMINGBOLT.COM
    Marathon Outlines Runners, UESC, and ONI in Alpha Intro Cinematic
    Bungie has officially released the intro cinematic for its Marathon closed alpha, which commences on April 23rd. It provides a brief primer on the Runners, their mission on Tau Ceti IV and much more. Check it out below. The Runners’ goals revolve around salvaging (and extracting) resources from the planet. Interestingly, the United Earth Space Council, which played a major part in the original trilogy, is also present. Players will encounter Runners from the faction, and dealing with them is up to their “discretion.” However, the biggest surprise is the reveal of ONI, the mandatory support AI that accompanies players and helps transfer and reboot their consciousness into a new shell. The name, of course, references Bungie’s 2001 third-person action title, and given its cyberpunk nature, you have to wonder if the universe ties into Marathon or if it’s just a neat Easter egg. The Marathon alpha is available until May 4th, and the full game will launch on September 23rd for Xbox Series X/S, PS5, and PC.
    0 Reacties 0 aandelen 31 Views
  • EN.WIKIPEDIA.ORG
    Wikipedia picture of the day for April 21
    Sherlock Jr. is a 1924 American silent comedy film starring and directed by Buster Keaton and written by Clyde Bruckman, Jean Havez, and Joseph A. Mitchell. It features Kathryn McGuire, Joe Keaton, and Ward Crane. Production began in January 1924, and the film was released on April 21, 1924. It was selected in 1991 for preservation in the United States National Film Registry by the Library of Congress as being "culturally, historically, or aesthetically significant". In 2000, the American Film Institute, as part of its series AFI 100 Years..., ranked the film at number 62 in AFI's 100 Years...100 Laughs. Film credit: Buster Keaton Recently featured: Trou au Natron African hawk-eagle Christ Crowned with Thorns Archive More featured pictures
    0 Reacties 0 aandelen 29 Views
  • EN.WIKIPEDIA.ORG
    On this day: April 21
    April 21: Natale di Roma in Italy (AD 47); Patriots' Day in some parts of the United States (2025) Wignacourt Aqueduct 900 – A debt was pardoned by the chief of Tondo on the island of Luzon and recorded on the Laguna Copperplate Inscription, the earliest known calendar-dated document found in the Philippines. 1615 – The Wignacourt Aqueduct (pictured) in Malta was inaugurated, and was used to carry water to Valletta for about 300 years. 1725 – J. S. Bach's cantata Bleib bei uns, denn es will Abend werden, was first performed on Easter Monday. 1925 or 1926 – Al-Baqi Cemetery in Medina, the site of the mausoleum of four of the Twelve Imams of Shia Islam, was demolished by Wahhabis. 1975 – South Vietnamese president Nguyễn Văn Thiệu resigned on hearing of the fall of Xuân Lộc, the last battle of the Vietnam War. Pope Alexander II (d. 1073)Antonín Kammel (b. 1730)Cheryl Gillan (b. 1952)Vivian Maier (d. 2009) More anniversaries: April 20 April 21 April 22 Archive By email List of days of the year About
    0 Reacties 0 aandelen 26 Views
  • WWW.THEVERGE.COM
    Pete Hegseth reportedly spilled Yemen attack details in another Signal chat
    Secretary of Defense Pete Hegseth at the White House on April 10, 2025. | Photo: Anna Moneymaker / Getty Images US Defense Secretary Pete Hegseth reportedly shared details about the March 15th Yemen military strikes in another Signal chat with people who weren’t government officials, reports The New York Times. The chat included his wife and “about a dozen” others he knew personally and professionally, the outlet writes, citing conversations with four unnamed sources. The details he shared “included the flight schedules for the F/A-18 Hornets targeting the Houthis in Yemen,” writes the Times, which notes the details were “essentially the same” as those shared in the Signal chat between Hegseth and other officials last month that included Atlantic editor Jeffrey Goldberg, who was added by mistake. But in this case, according to the Times, the chat was one that Hegseth made in January before he was Defense Secretary: Unlike the chat in which The Atlantic was mistakenly included, the newly revealed one was created by Mr. Hegseth. It included his wife and about a dozen other people from his personal and professional inner circle in January, before his confirmation as defense secretary, and was named “Defense | Team Huddle,” the people familiar with the chat said. He used his private phone, rather than his government one, to access the Signal chat. The outlet’s sources told it that “Hegseth typically did not use the chat to discuss sensitive military operations and said it did not include other cabinet-level officials.” According to the Times, a US official confirmed the “informal group chat” but insisted no classified information had ever been discussed on it. The unnamed official wouldn’t comment on whether Hegseth “shared detail targeting information,” the story says. Other details about the chat mentioned by the Times include that Hegseth’s aides “had warned him a day or two before the Yemen strikes not to discuss such sensitive operational details in his Signal group chat,” and some encouraged him to move any work-related matters from the chat to his government phone, but he never did so.
    0 Reacties 0 aandelen 35 Views
  • TOWARDSAI.NET
    RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph
    Author(s): Dwaipayan Bandyopadhyay Originally published on Towards AI. Retrieval Augmented Generation is a very well-known approach in the field of Generative AI, which usually consists of a linear flow of chunking a document, storing it in a vector database, then retrieving relevant chunks based on the user query and feeding that to an LLM to get the final response. In recent times, the term “Agentic AI” has been taking the internet by storm, in simple terms it refers to break down a problem into smaller sections and assigning it to certain “agents” who are capable of handling a certain task, and combining smaller agents like that to build a complex workflow. What if we combine this Agentic Approach and Retrieval Augmented Generation? In this article, we will explain a similar concept/architecture we developed using LangGraph, FAISS and OpenAI. Source : Image by Author We will not explore AI Agents and how they work in this article; otherwise, this would become a full-fledged book. But to give a brief overview of what “AI Agents” are, we can consider an “AI Agent” as an assistant, someone or something that is a master in one particular task, multiple agents with multiple capabilities are being added together to make a full Graphical Agentic Workflow, where each agents may communicate with each other, can understand what response the previous agent returned etc. In our approach, we divided the concept of “Retrieval Augmented Generation” into three different tasks and created agent for each task which are capable of handling one specific task, one agent will look into the Retrieval Part, whereas the other will look into the Augmentation Part, and finally the last agent will look into the Generation Part. Then we have combined all three agents to make a complete end-to-end agentic workflow. Let’s dive deep into the coding section. Coding Section Starts Firstly, we will install all the necessary packages required. The best practice would be to create a virtual environment first, and then install the following packages. After they are installed successfully, we will import all the necessary packages to create the Retriever agent first. Coding the RetrieverAgent : from langchain_openai import ChatOpenAIfrom langchain_community.vectorstores.faiss import FAISSfrom langchain_openai import OpenAIEmbeddingsfrom langchain.docstore.document import Documentfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom pypdf import PdfReaderimport refrom dotenv import load_dotenvimport streamlit as stload_dotenv()LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)def extract_text_from_pdf(pdf_path): try: pdf = PdfReader(pdf_path) output = [] for i, page in enumerate(pdf.pages, 1): text = page.extract_text() text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text) text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip()) text = re.sub(r"\n\s*\n", "\n\n", text) output.append((text, i)) # Tuple of (text, page number) return output except Exception as e: st.error(f"Error reading PDF: {e}") return []def text_to_docs(text_with_pages): docs = [] text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200) for text, page_num in text_with_pages: chunks = text_splitter.split_text(text) for i, chunk in enumerate(chunks): doc = Document( page_content=chunk, metadata={"source": f"page-{page_num}", "page_num": page_num} ) docs.append(doc) return docsdef create_vectordb(pdf_path): text_with_pages = extract_text_from_pdf(pdf_path) if not text_with_pages: raise ValueError("No text extracted from PDF.") docs = text_to_docs(text_with_pages) embeddings = OpenAIEmbeddings() return FAISS.from_documents(docs, embeddings)# Define Toolsdef retrieve_from_pdf(query: str, vectordb) -> dict: """Retrieve the most relevant text and page number using similarity search.""" # Use similarity_search to get the top result docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result if docs: doc = docs[0] content = f"Page {doc.metadata['page_num']}: {doc.page_content}" page_num = doc.metadata["page_num"] return {"content": content, "page_num": page_num} return {"content": "No content retrieved.", "page_num": None}RETRIEVE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query. - Use the provided retrieval function to get content and a single page number. - Return the content directly with the page number included (e.g., 'Page X: text'). - If no content is found, return "No content retrieved." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}"),]) Explanation of the Code – In this retriever agent code, firstly, we are importing all the necessary modules and classes required, We are storing our credentials, such as OpenAI API Key, in a .env file, which is why the dotenv module has been used here alongside the load_dotenv function call. Next up, we are initialising the LLM by providing required arguments such as model name, temperature, etc. Descriptions of Functions extract_text_from_pdf is being used to read and extract the content of the PDF and cleanse it a bit by fixing hyphenated line breaks, which are causing a word to break into two pieces, converting single newlines into spaces unless they are a part of paragraph spacing, etc. The cleaning process is done page-wise, which is why a loop is applied over the number of pages using the enumerate function. Finally, from this function, the cleansed extracted content is returned alongside its’s pagenumber is returned as a form of list of tuples. If any unwanted error occurs, that too can be handled via the try-except block used; this ensures the code works seamlessly without breaking due to errors. text_to_docs is being used to do Chunking, here the RecursiveCharacterTextSplitter class of the langchain module is being used, each chunk size would be of 4000, and the overlapping would be of 200. Then a loop is being done over the text_with_pages argument, which will receive the output from the previous function, i.e extract_text_from_pdf, as it returns the output in a list of tuples format. Two variables are being used in the loop to consider both items of the tuple. Then the cleansed text is split into chunks and converted into a Document object, which will be further used to convert them into Embeddings. Apart from the page content, the Document object will hold the page number and a string label including the page number as metadata. Each Document will then be appended to a list and returned. create_vectordb This function uses the above two functions to create Embeddings using FAISS (Facebook AI Similarity Search) Vectorstore. It is a lightweight vector store that stores the index locally and helps in doing similarity searches with ease. This function just creates and returns the Vector database. That’s it. retrieve_from_pdf In this function, we are doing the similarity search and getting the top 3 chunks, and if found, then we are considering the first chunk only so that it consists of the most similar content and returning it along with it’s page number as a dictionary. The RETRIEVE_PROMPT is a ChatPromptTemplate consisting of the instruction, i.e System Message for the LLM, mentioning its job as a retriever agent. It will also consider the entire chat history of a particular session and will accept the user query as human input. Coding the Augmentator Agent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderfrom typing import Optionaldef augment_with_context(content: str, page_num: Optional[int]) -> str: """Augment retrieved content with source context.""" if content != "No content retrieved." and page_num: return f"{content}\n\nAdditional context: Sourced from page {page_num}." return f"{content}\n\nAdditional context: No specific page identified."AUGMENT_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Augment Agent. Enhance the retrieved content with additional context. - If content is available, append a note with the single page number. - If no content is retrieved, return "No augmented content." """ MessagesPlaceholder(variable_name="chat_history"), ("human", "Retrieved content: {retrieved_content}\nPage number: {page_num}"),]) Explanation of the Functions augment_with_context This is a very straightforward approach where we are looking for some extra information from the provided PDF to solidify the retrieved information by the retrieval agent. If found, the extra content, alongside its page number, will be added to the original retrieved content; otherwise, if both are not found, it will simply return the same original content without any modification The AUGMENT_PROMPT Coding the GeneratorAgent from langchain.prompts import ChatPromptTemplate, MessagesPlaceholderGENERATE_PROMPT = ChatPromptTemplate.from_messages([ ("system", """), You are the Generate Agent. Create a detailed response based on the augmented content. - Focus on DBMS and SQL content. - Append "Source: Page X" at the end if a page number is available. - If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number. - If the question is not DBMS-related, reply "Not applicable." - Use the chat history to maintain context. """ MessagesPlaceholder(variable_name="chat_history"), ("human", "{query}\nAugmented content: {augmented_content}"),]) The generator agent only consists of the PromptTemplate with the instruction of how to generate final response based on the retrieved content as well as augmented extra information from previous two steps. After all these separate agents are created, it’s time to store them under a single umbrella and form the entire end-to-end workflow using LangGraph. Code for the Graph Creation using LangGraph import streamlit as stfrom langgraph.graph import StateGraph, ENDfrom typing import TypedDict, List, Optionalimport refrom IPython.display import display, Imagefrom retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)from augmentation import augment_with_context,AUGMENT_PROMPTfrom generation import GENERATE_PROMPTfrom dotenv import load_dotenvload_dotenv()PDF_FILE_PATH = "dbms_notes.pdf"# Define the Agent Stateclass AgentState(TypedDict): query: str chat_history: List[dict] retrieved_content: Optional[str] page_num: Optional[int] # Single page number instead of a list augmented_content: Optional[str] response: Optional[str]def format_for_display(text): def replace_latex(match): latex_expr = match.group(1) return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX text = re.sub(r'\\frac\{([^}]+)\}\{([^}]+)\}', r'$\\frac{\1}{\2}$', text) return text# Define Multi-Agent Nodesdef retrieve_agent(state: AgentState) -> AgentState: chain = RETRIEVE_PROMPT | LLM retrieved = retrieve_from_pdf(state["query"], st.session_state.vectordb) response = chain.invoke({"query": state["query"], "chat_history": state["chat_history"]}) #print(retrieved) return { "retrieved_content": retrieved['content'], "page_num": retrieved["page_num"] }def augment_agent(state: AgentState) -> AgentState: chain = AUGMENT_PROMPT | LLM if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": # Prepare input for the LLM input_data = { "retrieved_content": state["retrieved_content"], "page_num": str(state["page_num"]) if state["page_num"] else "None", "chat_history": state["chat_history"] } # Invoke the LLM to generate augmented content response = chain.invoke(input_data) augmented_content = response.content # Use the LLM's output else: augmented_content = "No augmented content." return {"augmented_content": augmented_content}def generate_agent(state: AgentState) -> AgentState: chain = GENERATE_PROMPT | LLM response = chain.invoke({ "query": state["query"], "augmented_content": state["augmented_content"] or "No augmented content.", "chat_history": state["chat_history"] }) return {"response": response.content}# Define Conditional Edge Logicdef decide_augmentation(state: AgentState) -> str: if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.": return "augmentation" return "generation"workflow = StateGraph(AgentState)workflow.add_node("retrieve_agent", retrieve_agent)workflow.add_node("augment_agent", augment_agent)workflow.add_node("generate_agent", generate_agent)workflow.set_entry_point("retrieve_agent")workflow.add_conditional_edges( "retrieve_agent", decide_augmentation, { "augmentation": "augment_agent", "generation": "generate_agent" })workflow.add_edge("augment_agent", "generate_agent")workflow.add_edge("generate_agent", END)agent = workflow.compile()# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))st.set_page_config(page_title="🤖 RAGent", layout="wide")st.title("🤖 RAGent : Your Personal Teaching Assistant")st.markdown("Ask any question from your book and get detailed answers with a single source page!")# Initialize session state for vector databaseif "vectordb" not in st.session_state: with st.spinner("Loading PDF content... This may take a minute."): try: st.session_state.vectordb = create_vectordb(PDF_FILE_PATH) except Exception as e: st.error(f"Failed to load PDF: {e}") st.stop()# Initialize chat history in session stateif "messages" not in st.session_state: st.session_state.messages = []# Display chat historyfor message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"])# User inputuser_input = st.chat_input("Ask anything from the PDF")if user_input: # Add user message to chat history st.session_state.messages.append({"role": "user", "content": user_input}) with st.chat_message("user"): st.markdown(user_input) # Display assistant response with st.chat_message("assistant"): message_placeholder = st.empty() # Prepare chat history for the agent chat_history = [ {"type": "human", "content": msg["content"]} if msg["role"] == "user" else {"type": "ai", "content": msg["content"]} for msg in st.session_state.messages[:-1] # Exclude current input ] # Prepare initial state initial_state = { "query": user_input, "chat_history": chat_history, "retrieved_content": None, "page_num": None, "augmented_content": None, "response": None, # Add field for Ragas sample } # Run the agent with a spinner with st.spinner("Processing..."): final_state = agent.invoke(initial_state) answer = final_state["response"] formatted_answer = format_for_display(answer) # Display response message_placeholder.markdown(formatted_answer) # Update chat history st.session_state.messages.append({ "role": "assistant", "content": formatted_answer }) Explanation of the Code AgentState class — In this class, we are defining a schema that will be enforced on top of the LLM response and the entire “state” will carry this same structure throughout the entire workflow. This will be passed as argument during the StateGraph creation. format_for_display function — This function has a nested function, which will be used to handle LaTeX-based outputs. We are using this because the document may contain fractions which might not be handled by Streamlit properly, so using this as an extra precaution. retrieve_agent function — This will use the retrieve_from_pdf function we defined earlier. Firstly, we will create a chain using the retrieve prompt and LLM. Then, invoke it using the query provided by the user, which is nothing but the user’s question, and also consider the entire chat_history as well, and finally it will return the content and page number. augment_agent function — Here, we will again create a chain using the AUGMENT_PROMPT, this time and check whether the retriever agent returned any content or not. If it returned any content, then we will call the augment_with_context function and pass the retrieved content, page number, as well as the chat_history, then return the content provided by the response. generate_agent function — Here, finally, we are passing the augmented content, user query and chat history so that LLM can leverage the augmented content and generate the final response based on the augmented information and display it to the user. decide_augmentation function — This is an optional step being provided to check whether it is necessary for the augmentation agent to run or not. After all the necessary agents are created, it’s time to combine them to create an end-to-end workflow, which will be done by using the StateGraph class of LangGraph. During initialisation of the StateGraph class, we will pass the AgentState class we defined earlier as its parameter to indicate that during the entire workflow, these are the only keys that will be there in the response, nothing else. Then we are adding the nodes into the StateGraph to create the entire workflow, setting up the entry point manually to make it understand which node will be executed first, adding edges in between the nodes to state how the workflow will look like, adding a conditional edge in between to signify that the node connected with the conditional edge, may or may not be called during the workflow everytime. Finally, compiling the entire workflow to check whether everything is working fine and the graph that has been created is proper or not. We can display the graph using the IPython module and Mermaid ink method. The graph will look like below, if everything goes correctly. Source : Image by Author Then, the rest of the code is entirely Streamlit-based. The user can design the UI according to their choice. We have taken a very basic approach in designing the UI, so that it remains user-friendly. We are considering some session states as well, so to maintain the chat history, user query, etc. This will not start without the user input, meaning that until and unless the user provides any query, the workflow will not start. Screenshots of the Application in Working Condition – Source : Image by Author Source : Image by Author This article has been written in collaboration with Biswajit Das Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor. Published via Towards AI
    0 Reacties 0 aandelen 37 Views
  • FUTURISM.COM
    Investor Says AI Is Already "Fully Replacing People"
    The hype over artificial intelligence might be quieting as the US tech sector stresses over tariffs, but some investors are still knee-deep in the mud, panning for gold.One of them, prominent venture capitalist and former gaming CEO Victor Lazarte, is so confident that he claims AI is already "fully replacing people." While some companies have pumped the breaks on hyped-up promises of a fully-automated future, Lazarte is charging full steam ahead."Big companies talk about, like, 'AI isn't replacing people, it's augmenting them,'" the tycoon said on the Twenty Minute VC podcast. "This is bullshit. It's fully replacing people."As per Business Insider, Lazarte highlighted that lawyers and HR workers should be particularly nervous that AI is coming for their jobs, noting that law school students "should think about what they could do three years from now that AI could not."It's a curious claim coming from a guy like Lazarte, whose firm, Benchmark, is heavily invested in startups like AI-based hiring platform Mercor and AI-powered research lab Decart.The venture capitalist doesn't bother to provide receipts for his claim, instead insinuating: I have a lot of money riding in this, trust me. In order to dig into his claim, we'll have to look at examples of AI in law and recruiting today — and boy is it a disaster.Starting with law, recent headlines aren't great. A New York Supreme Court judge recently slammed an entrepreneur for trying to pass an AI-generated video off as a stand-in for a human lawyer. The man was reportedly testing his legal-aid startup software, called Pro Se Pro, in the real world."You are not going to use this courtroom as a launch for your business," boomed the justice.Other high profile incidents include one where Michael Cohen, Trump's former legal counsel and White House plumber, was caught filing AI generated briefs, which might have been fine if the software didn't completely make up the cases he was citing.Though there are a lot of startup founders and investors who, like Lazarte, have a personal interest in passing off AI as "ready for the courtroom," actual legal pros aren't convinced."I think courts will clamp down before AI appearances can gain a foothold," law professor Mark Bartholomew told BI.The reason, of course, is AI's deep-seated penchant to spit out an answer as fast as it can, accuracy be damned."If you type a legal question into the Google search function, then generative AI is all too ready to answer," wrote legal columnist Virginia Hammerle. "That is not a good thing."And when it comes to hiring, well, that's a whole other fiasco.Though today's AI models are chock full of racist and misogynist biases — courtesy of the real-life data they're trained on — companies are nonetheless blazing ahead by putting AI in charge of human resources. One study found that 99 percent of Fortune 500 companies were using AI to filter applicants, and there's a growing push to sell AI to do the actual interviewing as well.That's creating a hellish environment for job seekers, as some candidates deploy their own AI to fight the hiring AI and spam job listings with applications. It's a vicious cycle that's boxing out non-AI savvy job seekers — especially disabled, elderly, and immigrant workers — while making AI spam a precondition for finding a job.When it comes to playing fast and loose with AI, UC Berkeley computer science professor Hany Farid sums it up best: "Just because something is inevitable, it doesn’t mean you deploy [it]."More on AI: Freelancers Are Getting Ruined by AIShare This Article
    0 Reacties 0 aandelen 34 Views
  • WWW.CNET.COM
    Subaru Boosts 2026 Solterra EV's Range, Debuts Larger Trailseeker Electric SUV
    The new Solterra gets more range, but less rugged looking. Meanwhile, the new Trailseeker makes room for outdoorsy adventure gear.
    0 Reacties 0 aandelen 33 Views
  • 0 Reacties 0 aandelen 44 Views
  • TECHCRUNCH.COM
    OpenAI’s o3 AI model scores lower on a benchmark than the company initially implied
    A discrepancy between first- and third-party benchmark results for OpenAI’s o3 AI model is raising questions about the company’s transparency and model testing practices. When OpenAI unveiled o3 in December, the company claimed the model could answer just over a fourth of questions on FrontierMath, a challenging set of math problems. That score blew the competition away — the next-best model managed to answer only around 2% of FrontierMath problems correctly. “Today, all offerings out there have less than 2% [on FrontierMath],” Mark Chen, chief research officer at OpenAI, said during a livestream. “We’re seeing [internally], with o3 in aggressive test-time compute settings, we’re able to get over 25%.” As it turns out, that figure was likely an upper bound, achieved by a version of o3 with more computing behind it than the model OpenAI publicly launched last week. Epoch AI, the research institute behind FrontierMath, released results of its independent benchmark tests of o3 on Friday. Epoch found that o3 scored around 10%, well below OpenAI’s highest claimed score. That doesn’t mean OpenAI lied, per se. The benchmark results the company published in December show a lower-bound score that matches the score Epoch observed. Epoch also noted its testing setup likely differs from OpenAI’s, and that it used an updated release of FrontierMath for its evaluations. “The difference between our results and OpenAI’s might be due to OpenAI evaluating with a more powerful internal scaffold, using more test-time [computing], or because those results were run on a different subset of FrontierMath (the 180 problems in frontiermath-2024-11-26 vs the 290 problems in frontiermath-2025-02-28-private),” wrote Epoch. According to a post on X from the ARC Prize Foundation, an organization that tested a pre-release version of o3, the public o3 model “is a different model […] tuned for chat/product use,” corroborating Epoch’s report. “All released o3 compute tiers are smaller than the version we [benchmarked],” wrote ARC Prize. Generally speaking, bigger compute tiers can be expected to achieve better benchmark scores. Granted, the fact that the public release of o3 falls short of OpenAI’s testing promises is a bit of a moot point, since the company’s o3-mini-high and o4-mini models outperform o3 on FrontierMath, and OpenAI plans to debut a more powerful o3 variant, o3-pro, in the coming weeks. It is, however, another reminder that AI benchmarks are best not taken at face value — particularly when the source is a company with services to sell. Benchmarking “controversies” are becoming a common occurrence in the AI industry as vendors race to capture headlines and mindshare with new models. In January, Epoch was criticized for waiting to disclose funding from OpenAI until after the company announced o3. Many academics who contributed to FrontierMath weren’t informed of OpenAI’s involvement until it was made public. More recently, Elon Musk’s xAI was accused of publishing misleading benchmark charts for its latest AI model, Grok 3. Just this month, Meta admitted to touting benchmark scores for a version of a model that differed from the one the company made available to developers.
    0 Reacties 0 aandelen 34 Views
  • BUILDINGSOFNEWENGLAND.COM
    Charles H. Owens House // 1906
    This stately house on Powell Street in Brookline, Massachusetts, was built in 1906 for Charles H. Owens, Jr., and his wife, Nellie. Charles was a “house decorator”, who built the house next door just two years earlier before moving into this larger home in 1906. Both houses were designed by the architectural firm of Loring & Phipps, who Owens likely collaborated on commissions with in his career. The house is 2 1/2-stories with a shallow hip roof and is an excellent example of an academic interpretation of the Colonial Revival style. The principal facade has flush board siding with round-arched windows on the first floor along with a squat fanlight transom over the center entrance.
    0 Reacties 0 aandelen 37 Views