RAG AI: Do it yourself, says NYC data scientist

@ComputerWeekly shared a link

2025-02-17 11:01:56 ·

www.computerweekly.com

Organisations should build their own generative artificial intelligence-based (GenAI-based) on retrieval augmented generation (RAG) with open sources products such as DeepSeek and Llama.This is according to Alaa Moussawi, chief data scientist at New York City Council, who recently spoke at the Leap 2025 tech event in Saudi Arabia.The event, held near the Saudi capital Riyadh, majored on AI and came as the desert kingdom announced $15bn of planned investment in AI.But, says Moussawi, theres nothing to stop any organisation testing and deploying AI with very little outlay at all, as he described the councils first such project way back in 2018.New York City Council is the legislative branch of the New York City government thats mainly responsible for passing laws and budget in the city. The council has 51 elected officials plus attorneys and policy analysts.What Moussawis team set out to do was make the legislative process more fact-based and evidence-driven and make the everyday work of attorneys, policy analysts and elected officials smoother.To that end, Moussawis team built its first AI-like app a duplicate checker for legislation for production use at the council in 2018.Whenever a council member has an idea for legislation, its put into the database and timestamped so it can be checked for originality and credited to the elected official who made that law come to fruition.There are tens of thousands of ideas in the system and a key step in the legislative process is to check whether an idea has been proposed before.If it was, then the idea must be credited to that official, says Moussawi. It is a very contentious thing. Weve had errors happen in the past where a bill got to the point of being voted on and finally another council member recalled they had proposed the idea, but the person who had done the duplicate check manually had somehow missed it.By todays standards, its a rudimentary model, says Moussawi. It uses Googles Word2Vec, which was released in 2013 and captures information about the meaning of words based on those around it.Its somewhat slow, says Moussawi. But the important thing is that while it might take a bit of time five or 10 seconds to return similarity rankings its much faster than a human and it makes their jobs much easier.The key technology behind the duplicate checker is vector embedding, which is effectively a list of numbers the vectors that represent the position of a word in a high-dimensional vector space.That could often consist of over a thousand dimensions, says Moussawi. A vector embedding is really just a list of numbers.Moussawi demonstrated the idea by simplifying things down to two vectors. In a game of cards, for example, you can take the vector for royalty and the vector for woman and they should give you the vector for queen if you add them together.Strong vector embeddings can derive these relationships from the data, says Moussawi. Similarly, if you added the vectors for royalty and men, you can expect to get the vector for king.Thats essentially the technology in the councils duplicate checker. It trains itself by using the full set of texts to generate its vector embeddings.Then it sums over all the word embeddings to create an idea vector, he says. We can measure the distance between this idea for a law and another idea for a law. You could measure it with your ruler if you were working with two-dimensional space, or you apply the Pythagorean theorem extended to a higher dimensional space, which is fairly straightforward. And thats all there is to it the measure of distance between two ideas.Moussawi is a strong advocate that organisations should get their hands dirty with generative AI (GenAI). Hes a software engineering PhD and a close student of developments through the various iterations of neural networks but is keen to stress their limitations.AI text models, including the state-of-the-art models we use today, are about simply predicting the next best word in a sequence of words and repeating the process, he says. So, for example, if you ask a large language model [LLM], Why did the chicken cross the road?, its going to pump it into the model and predict the next word, the, and the next one, chicken and so on.Thats really all its doing, and this should somewhat make you understand why LLMs are actually not intelligent or dont have true thought the way we do.By contrast, Im explaining a concept to you and Im trying to relay that idea and Im finding the words to express that idea. A large language model has no idea what word is going to come next in the sequence. Its not thinking about a concept.According to Moussawi, the big breakthrough in the scientific community that came in 2020 was that compute, datasets and parameters could scale and scale and you could keep throwing more compute power at them and get better performance.He stresses that organisations should bear in mind that the science behind the algorithms isnt secret knowledge: We have all these open source models like Deepseek and Llama. But the important takeaway is that the fundamental architecture of the technology did not really change very much, We just made it more efficient. These LLMs didnt learn to magically think. All of a sudden, we just made it more efficient.Coming up to date, Moussawi says New York City Council has banned the use of third-party LLMs in the workplace because of security concerns.This means the organisation has opted for open source models that avoid the security concerns that come with cloud-based subscriptions or third-party APIs.With the release of the first Llama models, we started tinkering on our local cluster, and you should too. There are C++ implementations that can be run on your laptop. You can do some surprisingly good inference, and its great for developing a proof-of-concept, which is what we did at the council.The first thing to do is to index documents into some vector database. This is all work you just do once on the back end to set up your system, so thats ready to be queried based on the vector database that youve built.Next, you need to set up a pipeline to retrieve the documents relevant to a given query. The idea is that you ask it a prompt and youd run that vector against your vector database legal memos youve stored in your vector database or plain language summaries or other legal documents that youve copied from wherever, depending on your domain.This process is known as retrieval augmented generation or RAG and its a great way to provide your model with scope regarding what its output should be limited to. This significantly reduces hallucinations and, since its pulling the documents that its responding with from the vector database, it can cite sources.These, says Moussawi, provide guardrails for your model and give the end user a way to ensure the output is legitimate because sources are being cited.And thats exactly what Moussawis team did, and his message while he awaits delivery of the council data science teams first GPUs is: What are you waiting for?Read more about AI and Saudi ArabiaStorage technology explained: Vector databases at the core of AI: We look at the use of vector data in AI and how vector databases work, plus vector embedding, the challenges for storage of vector data and the key suppliers of vector database products.Saudi Arabia calls for humanitarian AI after tightening screws on rights protesters: Oppressive state wants global digital identity system at the heart of all AI, to make it trustworthy and prevent it being used for unauthorised surveillance.

0 Comments ·0 Shares ·72 Views

Upgrade to Pro