www.marktechpost.com
In this tutorial, we explore how to fine-tune NVIDIAs NV-Embed-v1 model on the Amazon Polarity dataset using LoRA (Low-Rank Adaptation) with PEFT (Parameter-Efficient Fine-Tuning) from Hugging Face. By leveraging LoRA, we efficiently adapt the model without modifying all its parameters, making fine-tuning feasible on low-VRAM GPUs.Steps to the implementation in this tutorial can be broken into the following steps:Authenticating with Hugging Face to access NV-Embed-v1Loading and configuring the model efficientlyApplying LoRA fine-tuning using PEFTPreprocessing the Amazon Polarity dataset for trainingOptimizing GPU memory usage with `device_map=auto`Training and evaluating the model on sentiment classificationBy the end of this guide, youll have a fine-tuned NV-Embed-v1 model optimized for binary sentiment classification, demonstrating how to apply efficient fine-tuning techniques to real-world NLP tasks.from huggingface_hub import loginlogin() # Enter your Hugging Face token when promptedimport osHF_TOKEN = "...." # Replace with your actual tokenos.environ["HF_TOKEN"] = HF_TOKENimport torchimport torch.distributed as distfrom transformers import AutoModel, AutoTokenizer, TrainingArguments, Trainerfrom datasets import load_datasetfrom peft import LoraConfig, get_peft_modelFirst, we log into the Hugging Face Hub using your API token, set the token as an environment variable, and import various libraries needed for distributed training and fine-tuning transformer models with techniques like LoRA.MODEL_NAME = "nvidia/NV-Embed-v1"HF_TOKEN = "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" # Replace with your actual tokentokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=HF_TOKEN)model = AutoModel.from_pretrained( MODEL_NAME, device_map="auto", # Enable efficient GPU placement torch_dtype=torch.float16, # Use FP16 for efficiency token=HF_TOKEN)This snippet sets a specific model name and authentication token, then loads the corresponding pretrained tokenizer and model from Hugging Faces model hub. It also configures the model to use automatic GPU allocation and FP16 precision for improved efficiency.lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["self_attn.q_proj", "self_attn.v_proj"], lora_dropout=0.1, bias="none", task_type="FEATURE_EXTRACTION",)model = get_peft_model(model, lora_config)model.print_trainable_parameters()With the above code, we configure a LoRA setup with specified parameters (like r=16, lora_alpha=32, and a dropout of 0.1) targeting the self-attention mechanisms query and value projection layers. It then integrates this configuration into the model using PEFT so that only these LoRA layers are trainable for feature extraction, and finally, the trainable parameters are printed.dataset = load_dataset("amazon_polarity")def tokenize_function(examples): return tokenizer(examples["content"], padding="max_length", truncation=True)tokenized_datasets = dataset.map(tokenize_function, batched=True)Here, we load the Amazon Polarity dataset, define a function to tokenize its content field with padding and truncation, and applies this function to convert the dataset into a tokenized format for model training.training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", per_device_train_batch_size=4, per_device_eval_batch_size=4, num_train_epochs=1, save_strategy="epoch", save_total_limit=1, logging_dir="./logs", logging_steps=10, fp16=True, # Mixed precision)trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["test"],)trainer.train()With the above code, we set up training parameterslike output directories, batch sizes, logging, and FP16 mixed precisionusing TrainingArguments, create a Trainer with the model and tokenized train/test datasets, and finally initiate the training process.model.save_pretrained("./fine_tuned_nv_embed")tokenizer.save_pretrained("./fine_tuned_nv_embed")print(" Training Complete! Model Saved.")Finally, we save the fine-tuned model and its tokenizer to the specified directory and then print a confirmation message indicating that training is complete and the model is saved.By the end of this tutorial, we successfully fine-tuned NV-Embed-v1 on the Amazon Polarity dataset using LoRA and PEFT, ensuring efficient memory usage and scalable adaptation. This tutorial highlights the power of parameter-efficient fine-tuning, enabling domain adaptation of large models without requiring massive computational resources. This approach can be extended to other transformer-based models, making it valuable for custom embeddings, sentiment analysis, and NLP-driven applications. Whether youre working on product review classification, AI-driven recommendation systems, or domain-specific search engines, this method allows you to fine-tune large-scale models on a budget efficiently.Here is the Colab Notebook for the above project. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our75k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Moonshot AI and UCLA Researchers ReleaseMoonlight: A 3B/16B-Parameter Mixture-of-Expert (MoE) Model Trained with 5.7T Tokens Using Muon OptimizerAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Stanford Researchers Introduce OctoTools: A Training-Free Open-Source Agentic AI Framework Designed to Tackle Complex Reasoning Across Diverse DomainsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/SGLang: An Open-Source Inference Engine Transforming LLM Deployment through CPU Scheduling, Cache-Aware Load Balancing, and Rapid Structured Output GenerationAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Stanford Researchers Developed POPPER: An Agentic AI Framework that Automates Hypothesis Validation with Rigorous Statistical Control, Reducing Errors and Accelerating Scientific Discovery by 10x