
A Code Implementation of Monocular Depth Estimation Using Intel MiDaS Open Source Model on Google Colab with PyTorch and OpenCV
www.marktechpost.com
Monocular depth estimation involves predicting scene depth from a single RGB imagea fundamental task in computer vision with wide-ranging applications, including augmented reality, robotics, and 3D scene understanding. In this tutorial, we implement Intels MiDaS (Monocular Depth Estimation via a Multi-Scale Vision Transformer), a state-of-the-art model designed for high-quality depth prediction from a single image. Leveraging Google Colab as the compute platform, along with PyTorch, OpenCV, and Matplotlib, this tutorial enables you to upload your image and visualize the corresponding depth maps easily.!pip install -q timm opencv-python matplotlibFirst, we install the necessary Python librariestimm for model support, opencv-python for image processing, and matplotlib for visualizing the depth maps.!git clone https://github.com/isl-org/MiDaS.git%cd MiDaSThen, we clone the official Intel MiDaS repository from GitHub and navigate into its directory to access the model code and transformation utilities.import torchimport cv2import matplotlib.pyplot as pltimport numpy as npfrom PIL import Imagefrom torchvision.transforms import Composefrom google.colab import filesfrom midas.dpt_depth import DPTDepthModelfrom midas.transforms import Resize, NormalizeImage, PrepareForNetdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")We import all the necessary libraries and MiDaS components required for loading the model, preprocessing images, handling uploads, and visualizing depth predictions. Then we set the computation device to GPU (CUDA) if available; otherwise, it defaults to CPU, ensuring system compatibility.model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)model = model_path.to(device)model.eval()Here, we download the pretrained MiDaS DPT_Large model from Intels torch.hub, moves it to the selected device (CPU or GPU), and sets it to evaluation mode for inference.transform = Compose([ Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"), NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), PrepareForNet()])We define MiDaSs image preprocessing pipeline, which resizes the input image, normalizes its pixel values, and formats it appropriately for model inference.uploaded = files.upload()for filename in uploaded: img = cv2.imread(filename) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) breakWe allow the user to upload an image in Colab, read it using OpenCV, and convert it from BGR to RGB format for accurate color representation.img_input = transform({"image": img})["image"]input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)with torch.no_grad(): prediction = model(input_tensor) prediction = torch.nn.functional.interpolate( prediction.unsqueeze(1), size=img.shape[:2], mode="bicubic", align_corners=False, ).squeeze()depth_map = prediction.cpu().numpy()Now, we apply the preprocessing transform to the uploaded image, convert it to a tensor, perform depth prediction using the MiDaS model, resize the output to match the original image dimensions, and extract the final depth map as a NumPy array.plt.figure(figsize=(10, 5))plt.subplot(1, 2, 1)plt.imshow(img)plt.title("Original Image")plt.axis("off")plt.subplot(1, 2, 2)plt.imshow(depth_map, cmap='inferno')plt.title("Depth Map")plt.axis("off")plt.tight_layout()plt.show()Finally, we create a side-by-side visualization of the original image and its corresponding depth map using Matplotlib. The depth map is displayed using the inferno colormap for better contrast.In conclusion, by completing this tutorial, weve successfully deployed Intels MiDaS model on Google Colab to perform monocular depth estimation using just an RGB image. Using PyTorch for model inference, OpenCV for image processing, and Matplotlib for visualization, weve built a robust pipeline to generate high-quality depth maps with minimal setup. This implementation is a strong foundation for further exploration, including video depth estimation, real-time applications, and integration of AR/VR systems.Here is the Colab Notebook. Also,dont forget to follow us onTwitterand join ourTelegram ChannelandLinkedIn Group. Dont Forget to join our85k+ ML SubReddit. Asif RazzaqWebsite| + postsBioAsif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.Asif Razzaqhttps://www.marktechpost.com/author/6flvq/Meet Open Deep Search (ODS): A Plug-and-Play Framework Democratizing Search with Open-source Reasoning AgentsAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google DeepMind Researchers Propose CaMeL: A Robust Defense that Creates a Protective System Layer around the LLM, Securing It even when Underlying Models may be Susceptible to AttacksAsif Razzaqhttps://www.marktechpost.com/author/6flvq/DeepSeek AI Unveils DeepSeek-V3-0324: Blazing Fast Performance on Mac Studio, Heating Up the Competition with OpenAIAsif Razzaqhttps://www.marktechpost.com/author/6flvq/Google AI Released Gemini 2.5 Pro Experimental: An Advanced AI Model that Excels in Reasoning, Coding, and Multimodal Capabilities
0 التعليقات
·0 المشاركات
·60 مشاهدة