Modern GUI Applications for Computer Vision in Python

поделился ссылкой

2025-05-01 02:32:46 -

Introduction I’m a huge fan of interactive visualizations. As a computer vision engineer, I deal almost daily with image processing related tasks and more often than not I am iterating on a problem where I need visual feedback to make decisions. Let’s think of a very simple image processing pipeline with a single step that has some parameters to transform an image: How do you know which parameters to adjust? Does the pipeline even work as expected? Without visualizing your output, you might miss out on some key insights and make sub optimal choices. Sometimes simply showing the output image and/or some calculated metrics can be enough to iterate on the parameters. But I’ve found myself in many situations where a tool would be immensely helpful to iterate quickly and interactively on my pipeline. So in this article I will show you how to work with simple built-in interactive elements from OpenCV as well as how to build more modern user interfaces for Computer Vision projects using customtkinter. Prerequisites If you want to follow along, I recommend you to set up your local environment with uv and install the following packages: uv add numpy opencv-Python pillow customtkinter Goal Before we dive into the code of the project, let’s quickly outline what we want to build. The application should use the webcam feed and allow the user to select different types of filters that will be applied to the stream. The processed image should be shown in real-time in the window. A rough sketch of a potential UI would look as follows: OpenCV – GUI Let’s start with a simple loop that fetches frames from your webcam and displays them in an OpenCV window. import cv2 cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break cv2.imshow("Video Feed", frame) key = cv2.waitKey(1) & 0xFF if key == ord('q'): break cap.release() cv2.destroyAllWindows() Keyboard Input The simplest way to add interactivity here, is by adding keyboard inputs. For example, we can cycle through different filters with the number keys. ... filter_type = "normal" while True: ... if filter_type == "grayscale": frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) elif filter_type == "normal": pass ... if key == ord('1'): filter_type = "normal" if key == ord('2'): filter_type = "grayscale" ... Now you can switch between the normal image and the grayscale version by pressing the number keys 1 and 2. Let’s also quickly add a caption to the image so we can actually see the name of the filter we’re applying. Now we need to be careful here: if you take a look at the shape of the frame after the filter, you will notice that the dimensionality of the frame array has changed. Remember that OpenCV image arrays are ordered HWC (height, width, color) with color as BGR (green, blue, red), so the 640×480 image from my webcam has shape (480, 640, 3). print(filter_type, frame.shape) # normal (480, 640, 3) # grayscale (480, 640) Now because the grayscale operation outputs a single channel image, the color dimension is dropped. If we now want to draw on top of this image, we either need to specify a single channel color for the grayscale image or we convert that image back to the original BGR format. The second option is a bit cleaner because we can unify the annotation of the image. if filter_type == "grayscale": frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) elif filter_type == "normal": pass if len(frame.shape) == 2: # Convert grayscale to BGR frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR) Caption I want to add a black border at the bottom of the image, on top of which the name of the filter will be shown. We can make use of the copyMakeBorder function to pad the image with a border color at the bottom. Then we can add the text on top of this border. # Add a black border at the bottom of the frame border_height = 50 border_color = (0, 0, 0) frame = cv2.copyMakeBorder(frame, 0, border_height, 0, 0, cv2.BORDER_CONSTANT, value=border_color) # Show the filter name cv2.putText( frame, filter_type, (frame.shape[1] // 2 - 50, frame.shape[0] - border_height // 2 + 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA, ) This is how the output should look, and you can switch between the normal and grayscale mode and the frames will be captioned accordingly. Sliders Now instead of using the keyboard as input method, OpenCV offers a basic trackbar slider UI element. The trackbar needs to be initialized at the beginning of the script. We need to reference the same window as we will be showing our images in later, so I will create a variable for the name of the window. Using this name, we can create the trackbar and let it be a selector for the index in the list of filters. filter_types = ["normal", "grayscale"] win_name = "Webcam Stream" cv2.namedWindow(win_name) tb_filter = "Filter" # def createTrackbar(trackbarName: str, windowName: str, value: int, count: int, onChange: _typing.Callable[[int], None]) -> None: ... cv2.createTrackbar( tb_filter, win_name, 0, len(filter_types) - 1, lambda _: None, ) Notice how we use an empty lambda for the onChange callback, we will fetch the value manually in the loop. Everything else will stay the same. while True: ... # Get the selected filter type filter_id = cv2.getTrackbarPos(tb_filter, win_name) filter_type = filter_types[filter_id] ... And voilà, we have a trackbar to select our filter. Now we can also easily add more filters easily by extending our list and implementing each processing step. filter_types = [ "normal", "grayscale", "blur", "threshold", "canny", "sobel", "laplacian", ] ... if filter_type == "grayscale": frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) elif filter_type == "blur": frame = cv2.GaussianBlur(frame, ksize=(15, 15), sigmaX=0) elif filter_type == "threshold": gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) _, thresholded_frame = cv2.threshold(gray, thresh=127, maxval=255, type=cv2.THRESH_BINARY) elif filter_type == "canny": frame = cv2.Canny(frame, threshold1=100, threshold2=200) elif filter_type == "sobel": frame = cv2.Sobel(frame, ddepth=cv2.CV_64F, dx=1, dy=0, ksize=5) elif filter_type == "laplacian": frame = cv2.Laplacian(frame, ddepth=cv2.CV_64F) elif filter_type == "normal": pass if frame.dtype != np.uint8: # Scale the frame to uint8 if necessary cv2.normalize(frame, frame, 0, 255, cv2.NORM_MINMAX) frame = frame.astype(np.uint8) Modern GUI with CustomTkinter Now I don’t know about you but the current user interface does not look very modern to me. Don’t get me wrong, there is some beauty in the style of the interface, but I prefer cleaner, more modern designs. Plus we’re already at the limit of what OpenCV offers out of the box in terms of UI elements. Yep, no buttons, text fields, dropdowns, checkboxes or radio buttons and no custom layouts. So let’s see how we can transform the look and user experience of this basic application to a fresh and clean one. So to get started, we first need to create a class for our app. We create two frames: the first one contains our filter selection on the left side and the second one wraps the image display. For now, let’s start with a simple placeholder text. Unfortunately there’s no out of the box opencv component from customtkinter directly, so we will need to quickly build our own in the next few steps. But let’s first finish the basic UI layout. import customtkinter class App(customtkinter.CTk): def __init__(self) -> None: super().__init__() self.title("Webcam Stream") self.geometry("800x600") self.filter_var = customtkinter.IntVar(value=0) # Frame for filters self.filters_frame = customtkinter.CTkFrame(self) self.filters_frame.pack(side="left", fill="both", expand=False, padx=10, pady=10) # Frame for image display self.image_frame = customtkinter.CTkFrame(self) self.image_frame.pack(side="right", fill="both", expand=True, padx=10, pady=10) self.image_display = customtkinter.CTkLabel(self.image_frame, text="Loading...") self.image_display.pack(fill="both", expand=True, padx=10, pady=10) app = App() app.mainloop() Filter Radio Buttons Now that the skeleton is built, we can start filling in our components. For the left side, I will be using the same list of filter_types to populate a group of radio buttons to select the filter. # Create radio buttons for each filter type self.filter_var = customtkinter.IntVar(value=0) for filter_id, filter_type in enumerate(filter_types): rb_filter = customtkinter.CTkRadioButton( self.filters_frame, text=filter_type.capitalize(), variable=self.filter_var, value=filter_id, ) rb_filter.pack(padx=10, pady=10) if filter_id == 0: rb_filter.select() Image Display Component Now we can get started on the interesting part, how to get our OpenCV frames to show up in the image component. Because there’s no built-in component, let’s create our own based on the CTKLabel. This allows us to display a loading text while the webcam stream is starting up. ... class CTkImageDisplay(customtkinter.CTkLabel): """ A reusable ctk widget widget to display opencv images. """ def __init__( self, master: Any, ) -> None: self._textvariable = customtkinter.StringVar(master, "Loading...") super().__init__( master, textvariable=self._textvariable, image=None, ) ... class App(customtkinter.CTk): def __init__(self) -> None: ... self.image_display = CTkImageDisplay(self.image_frame) self.image_display.pack(fill="both", expand=True, padx=10, pady=10) So far nothing has changed, we simply swapped out the existing label with our custom class implementation. In our CTKImageDisplay class we can define an function to show an image in the component, let’s call it set_frame. import cv2 import numpy.typing as npt from PIL import Image class CTkImageDisplay(customtkinter.CTkLabel): ... def set_frame(self, frame: npt.NDArray) -> None: """ Set the frame to be displayed in the widget. Args: frame: The new frame to display, in opencv format (BGR). """ target_width, target_height = frame.shape[1], frame.shape[0] # Convert the frame to PIL Image format frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frame_pil = Image.fromarray(frame_rgb, "RGB") ctk_image = customtkinter.CTkImage( light_image=frame_pil, dark_image=frame_pil, size=(target_width, target_height), ) self.configure(image=ctk_image, text="") self._textvariable.set("") Let’s digest this. First we need to know how big our image component will be, we can extract that information from the shape property of our image array. To display the image in tkinter, we need a Pillow Image type, we cannot directly use the OpenCV array. To convert an OpenCV array to Pillow, we first need to convert the color space from BGR to RGB and then we can use the Image.fromarray function to create the Pillow Image object. Next we can create a CTKImage, where we use the same image no matter the theme and set the size according to our frame. Finally we can use the configure method to set the image in our frame. At the end, we also reset the text variable to remove the “Loading…” text, even though it would theoretically be hidden behind the image. To quickly test this, we can set the first image of our webcam in the constructor. (We will see in a second why this is not such a good idea) class App(customtkinter.CTk): def __init__(self) -> None: ... cap = cv2.VideoCapture(0) _, frame0 = cap.read() self.image_display.set_frame(frame0) If you run this, you will notice that the window takes a bit longer to pop up, but after a short delay you should see a static image from your webcam. NOTE: If you don’t have a webcam ready you can also just use a local video file by passing the file path to the cv2.VideoCapture constructor call. Now this is not very exciting, since the frame doesn’t update yet. So let’s see what happens if we try to do this naively. class App(customtkinter.CTk): def __init__(self) -> None: ... cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() if not ret: break self.image_display.set_frame(frame) Almost the same as before, except now we run the frame loop as we did in the previous chapter with the OpenCV GUI. If you run this, you will see… exactly nothing. The window never shows up, since we’re creating an infinite loop in the constructor of the app! This is also the reason why the program only showed up after a delay in the previous example, the opening of the Webcam stream is a blocking operation, and the event loop for the window cannot run, so it doesn’t show up yet. So let’s fix this by adding a slightly better implementation that allows the gui event loop to run while we also update the frame every once in a while. We can use the after method of tkinter to schedule a function call while yielding the process during the wait time. ... self.cap = cv2.VideoCapture(0) self.after(10, self.update_frame) def update_frame(self) -> None: """ Update the displayed frame. """ ret, frame = self.cap.read() if not ret: return self.image_display.set_frame(frame) self.after(10, self.update_frame) So now we still set up the webcam stream in the constructor, so we haven’t solved that problem yet. But at least we can see a continuous stream of frames in our image component. Applying Filters Now that the frame loop is running. we can re-implement our filters from the beginning and apply them to our webcam stream. In the update_frame function, we can check the current filter variable and apply the corresponding filter function. def update_frame(self) -> None: ... # Get the selected filter type filter_id = self.filter_var.get() filter_type = filter_types[filter_id] if filter_type == "grayscale": frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) elif filter_type == "blur": frame = cv2.GaussianBlur(frame, ksize=(15, 15), sigmaX=0) elif filter_type == "threshold": gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) _, frame = cv2.threshold(gray, thresh=127, maxval=255, type=cv2.THRESH_BINARY) elif filter_type == "canny": frame = cv2.Canny(frame, threshold1=100, threshold2=200) elif filter_type == "sobel": frame = cv2.Sobel(frame, ddepth=cv2.CV_64F, dx=1, dy=0, ksize=5) elif filter_type == "laplacian": frame = cv2.Laplacian(frame, ddepth=cv2.CV_64F) elif filter_type == "normal": pass if frame.dtype != np.uint8: # Scale the frame to uint8 if necessary cv2.normalize(frame, frame, 0, 255, cv2.NORM_MINMAX) frame = frame.astype(np.uint8) if len(frame.shape) == 2: # Convert grayscale to BGR frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR) self.image_display.set_frame(frame) self.after(10, self.update_frame) And now we’re back to the full functionality of the application, you can select any filter on the left side and it will be applied in real-time to the webcam feed! Multithreading and Synchronization Now although the application runs as is, there are some problems with the current way we run our frame loop. Currently everything runs in a single thread, the main GUI thread. This is why in the beginning, we don’t immediately see the window pop up, our webcam initialization blocks the main thread. Now imagine, if we did some heavier image processing, maybe running the images through neural network, you wouldn’t want your user interface to always be blocked while the network is running inference. This will lead to a very unresponsive user experience when clicking the UI elements! A better way to handle this in our application is to separate the image processing from the user interface. Generally this is almost always a good idea to separate your GUI logic from any type of non-trivial processing. So in our case, we will run a separate thread that is responsible for the image loop. It will read the frames from the webcam stream and apply the filters. NOTE: Python threads are not “real” threads in a sense that they do not have the capability to run on different logical cpu cores and hence will not really run in parallel. In Python multithreading the context will switch between the threads, but due to the GIL, the global interpreter lock, a single python process can only run one physical thread. If you want “real” parallel processing, you will need to use multiprocessing. Since our process here is not CPU bound but actually I/O bound, multithreading suffices. class App(customtkinter.CTk): def __init__(self) -> None: ... self.webcam_thread = threading.Thread(target=self.run_webcam_loop, daemon=True) self.webcam_thread.start() def run_webcam_loop(self) -> None: """ Run the webcam loop in a separate thread. """ self.cap = cv2.VideoCapture(0) if not self.cap.isOpened(): return while True: ret, frame = self.cap.read() if not ret: break # Filters ... self.image_display.set_frame(frame) If you run this, you will now see that our window opens up immediately and we even see our loading text while the webcam stream is opening up. However, as soon as the stream starts, the frames start to flicker. Depending on a lot of factors, you might experience different visual artifacts or errors at this stage. Warning: flashing image Now why is this happening? The problem is that we’re simultaneously trying to update the new frame while the internal refresh loop of the user interface might be using the information of the array to draw it on the screen. They are both competing for the same frame array. It is generally not a good idea to directly update the UI elements from a different thread, in some frameworks this might even be prevented and will raise exceptions. In Tkinter, we can do it, but we will get weird results. We need some type of synchronization between our threads. That’s where the Queue comes into play. You’re probably familiar with queues from the grocery store or theme parks. The concept of the queue here is very similar: the first element that goes into the queue also leaves first (First In First Out). In this case, we actually just want a queue with a single element, a single slot queue. The queue implementation in Python is thread-safe, meaning we can put and get objects from the queue from different threads. Perfect for our use case, the processing thread will put the image arrays to the queue and the GUI thread will try to get an element, but not block if the queue is empty. class App(customtkinter.CTk): def __init__(self) -> None: ... self.queue = queue.Queue(maxsize=1) self.webcam_thread = threading.Thread(target=self.run_webcam_loop, daemon=True) self.webcam_thread.start() self.frame_loop_dt_ms = 16 # ~60 FPS self.after(self.frame_loop_dt_ms, self._update_frame) def _update_frame(self) -> None: """ Update the frame in the image display widget. """ try: frame = self.queue.get_nowait() self.image_display.set_frame(frame) except queue.Empty: pass self.after(self.frame_loop_dt_ms, self._update_frame) def run_webcam_loop(self) -> None: ... while True: ... self.queue.put(frame) Notice how we move the direct call to the set_frame function from the webcam loop which runs in its own thread to the _update_frame function that is running on the main thread, repeatedly scheduled in 16ms intervals. Here it’s important to use the get_nowait function in the main thread, otherwise if we would use the get function, we would be blocking there. This call does not block, but raises a queue.Empty exception if there’s no element to fetch so we have to catch this and ignore it. In the webcam loop, we can use the blocking put function because it doesn’t matter that we block the run_webcam_loop, there’s nothing else needing to run there. And now everything is running as expected, no more flashing frames! Conclusion Combining a UI framework like Tkinter with OpenCV allows us to build modern looking applications with an interactive graphical user interface. Due to the UI running in the main thread, we run the image processing in a separate thread and synchronize the data between the threads using a single-slot queue. You can find a cleaned up version of this demo in the repository below with a more modular structure. Let me know if you build something interesting with this approach. Take care! Checkout the full source code in the GitHub repo: https://github.com/trflorian/ctk-opencv The post Modern GUI Applications for Computer Vision in Python appeared first on Towards Data Science.

0 Комментарии 0 Поделились 51 Просмотры