MEDIUM.COM
Building Real-World AI: Build Real World AI Applications with Gemini and Imagen
Building Real-World AI: Build Real World AI Applications with Gemini and ImagenShaik Nomman·Follow3 min read·Just now--In the realm of Artificial Intelligence, the ability to not only generate realistic visuals but also to understand and describe them opens up exciting possibilities. In this post, we’ll walk through building a practical application using Google Cloud’s Vertex AI, where we’ll leverage the power of Imagen to generate a beautiful bouquet image and Gemini to then analyze it and create fitting birthday wishes.Step 1: Setting the Stage with Vertex AIBefore we dive into the code, ensure you have the google-cloud-aiplatform library installed. You can install it using pip:Bashpip install google-cloud-aiplatformAlso, make sure you have your Google Cloud Project ID and the desired location (region) handy, as we’ll need to configure Vertex AI.Step 2: Crafting the Bouquet with ImagenOur first task is to generate an image of a bouquet based on a text prompt. We’ll use the imagen-3.0-generate-002 model available on Vertex AI for this. Here's the Python code:Pythonimport vertexaifrom vertexai.preview.vision_models import ImageGenerationModelPROJECT_ID = 'your-gcp-project-id' # Replace with your actual Project IDLOCATION = 'your-gcp-region' # Replace with your actual regiondef generate_bouquet_image(project_id: str, location: str, output_file: str, prompt: str): """Generates an image of a bouquet based on the given prompt.""" vertexai.init(project=project_id, location=location) model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002") images = model.generate_images( prompt=prompt, number_of_images=1, seed=1, add_watermark=False, ) images[0].save(location=output_file) print(f"Bouquet image generated and saved to: {output_file}") return imagesif __name__ == "__main__": output_file_name = 'bouquet_image.jpeg' bouquet_prompt = 'Create an image containing a bouquet of 2 sunflowers and 3 roses' generate_bouquet_image(PROJECT_ID, LOCATION, output_file_name, bouquet_prompt)import vertexaifrom vertexai.preview.vision_models import ImageGenerationModelPROJECT_ID = 'your-gcp-project-id' # Replace with your actual Project IDLOCATION = 'your-gcp-region' # Replace with your actual regionRemember to replace 'your-gcp-project-id' and 'your-gcp-region' with your actual Google Cloud Project ID and region.This script initializes Vertex AI, loads the Imagen model, and then generates an image based on the provided prompt. The generated image is saved as bouquet_image.jpeg in the current directory.Step 3: Describing the Beauty with GeminiNow that we have our bouquet image, let’s use the power of the Gemini model (gemini-2.0-flash-001) to analyze it and generate relevant birthday wishes. Here's the code:Pythonimport vertexaifrom vertexai.generative_models import GenerativeModel, Part, Imageimport osPROJECT_ID = 'your-gcp-project-id' # Replace with your actual Project IDLOCATION = 'your-gcp-region' # Replace with your actual regiondef analyze_bouquet_image(image_path: str): """Analyzes the image and generates birthday wishes based on it.""" vertexai.init(project=PROJECT_ID, location=LOCATION) multimodal_model = GenerativeModel('gemini-2.0-flash-001') responses = multimodal_model.generate_content( [ Part.from_text('Generate birthday wishes based on this image'), Part.from_image(Image.load_from_file(image_path)), ], stream=True, ) print("Birthday wishes (streaming):") for response in responses: print(response.text, end="", flush=True) print()if __name__ == "__main__": image_file_path = "./bouquet_image.jpeg" analyze_bouquet_image(image_file_path)Again, replace 'your-gcp-project-id' and 'your-gcp-region' with your actual Google Cloud Project ID and region.This script initializes Vertex AI and loads the Gemini model. The analyze_bouquet_image function takes the path to the generated image (bouquet_image.jpeg). It then creates a multi-modal prompt with a text instruction and the image loaded using Part.from_image(Image.load_from_file(image_path)). The stream=True parameter allows us to see the birthday wishes as they are generated.Running the ApplicationTo see this in action:Save the first code block as a Python file (e.g., generate_bouquet.py).Save the second code block as another Python file (e.g., analyze_bouquet.py).Ensure you replace the placeholder Project ID and Region in both files.First, run the image generation script:Bashpython3 generate_bouquet.pyThis will create the bouquet_image.jpeg file.Then, run the image analysis script:Bashpython3 analyze_bouquet.pyThis will load the generated image and print the streaming birthday wishes to your console.The Power of Combining Imagen and GeminiThis example demonstrates a simple yet powerful application of combining Imagen’s image generation capabilities with Gemini’s multi-modal understanding. Imagine extending this to build interactive applications where users can describe their dream visuals and receive both a realistic representation and contextually relevant descriptions or follow-up actions.Vertex AI makes it incredibly accessible to harness these cutting-edge AI models, paving the way for innovative solutions in various domains, from creative content generation to intelligent image analysis.#VertexAI #Gemini #Imagen #GenerativeAI #MultiModalAI #GoogleCloud #AIApplications #RealWorldAI #Python
0 Comentários 0 Compartilhamentos 56 Visualizações