Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio
www.marktechpost.com
In this tutorial, we implement a Bilingual Chat Assistant powered by Arcees Meraj-Mini model, which is deployed seamlessly on Google Colab using T4 GPU. This tutorial showcases the capabilities of open-source language models while providing a practical, hands-on experience in deploying state-of-the-art AI solutions within the constraints of free cloud resources. Well utilise a powerful stack of tools including:Arcees Meraj-Mini modelTransformers library for model loading and tokenizationAccelerate and bitsandbytes for efficient quantizationPyTorch for deep learning computationsGradio for creating an interactive web interfaceCopy CodeCopiedUse a different Browser# Enable GPU acceleration!nvidia-smi --query-gpu=name,memory.total --format=csv# Install dependencies!pip install -qU transformers accelerate bitsandbytes!pip install -q gradioFirst we enable GPU acceleration by querying the GPUs name and total memory using the nvidia-smi command. It then installs and updates key Python librariessuch as transformers, accelerate, bitsandbytes, and gradioto support machine learning tasks and deploy interactive applications.Copy CodeCopiedUse a different Browserimport torchfrom transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfigquant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True)model = AutoModelForCausalLM.from_pretrained( "arcee-ai/Meraj-Mini", quantization_config=quant_config, device_map="auto")tokenizer = AutoTokenizer.from_pretrained("arcee-ai/Meraj-Mini")Then we configures 4-bit quantization settings using BitsAndBytesConfig for efficient model loading, then loads the arcee-ai/Meraj-Mini causal language model along with its tokenizer from Hugging Face, automatically mapping devices for optimal performance.Copy CodeCopiedUse a different Browserchat_pipeline = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.9, repetition_penalty=1.1, do_sample=True)Here we create a text generation pipeline tailored for chat interactions using Hugging Faces pipeline function. It configures maximum new tokens, temperature, top_p, and repetition penalty to balance diversity and coherence during text generation.Copy CodeCopiedUse a different Browserdef format_chat(messages): prompt = "" for msg in messages: prompt += f"<|im_start|>{msg['role']}n{msg['content']}<|im_end|>n" prompt += "<|im_start|>assistantn" return promptdef generate_response(user_input, history=[]): history.append({"role": "user", "content": user_input}) formatted_prompt = format_chat(history) output = chat_pipeline(formatted_prompt)[0]['generated_text'] assistant_response = output.split("<|im_start|>assistantn")[-1].split("<|im_end|>")[0] history.append({"role": "assistant", "content": assistant_response}) return assistant_response, historyWe define two functions to facilitate a conversational interface. The first function formats a chat history into a structured prompt with custom delimiters, while the second appends a new user message, generates a response using the text-generation pipeline, and updates the conversation history accordingly.Copy CodeCopiedUse a different Browserimport gradio as grwith gr.Blocks() as demo: chatbot = gr.Chatbot() msg = gr.Textbox(label="Message") clear = gr.Button("Clear History") def respond(message, chat_history): response, _ = generate_response(message, chat_history.copy()) return response, chat_history + [(message, response)] msg.submit(respond, [msg, chatbot], [msg, chatbot]) clear.click(lambda: None, None, chatbot, queue=False)demo.launch(share=True)Finally, we build a web-based chatbot interface using Gradio. It creates UI elements for chat history, message input, and a clear history button, and defines a response function that integrates with the text-generation pipeline to update the conversation. Finally, the demo is launched with sharing enabled for public access.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our80k+ ML SubReddit.The post Building an Interactive Bilingual (Arabic and English) Chat Interface with Open Source Meraj-Mini by Arcee AI: Leveraging GPU Acceleration, PyTorch, Transformers, Accelerate, BitsAndBytes, and Gradio appeared first on MarkTechPost.
0 Commenti ·0 condivisioni ·55 Views