TOWARDSAI.NET
Transform Image Data into Insights with VisualInsights AI Automation
Author(s): Yotam Braun Originally published on Towards AI. Extracting insights from images can often feel challenging. Whether youre a researcher, an analyst, or simply curious, efficiently analyzing and understanding images is crucial but not always straightforward. This is where VisualInsight comes in.GitHub yotambraun/VisualInsightContribute to yotambraun/VisualInsight development by creating an account on GitHub.github.comChallenges with Traditional Image Analysis MethodsManual Effort: Finding the right tools, writing custom scripts, and working with large datasets often involves significant manual work.Complexity: Navigating advanced algorithms, ML frameworks, or open-source projects can be overwhelming, especially for smaller teams.Storage and Security: Ensuring data is securely stored and easily retrievable adds another layer of complexity.Scaling: Handling larger datasets requires scalable infrastructure, which often involves high overhead.VisualInsight addresses these challenges with a seamless and automated solution for image analysis.Figure 2: Example of the user interface where you can upload imagesAs you can see, the UI helps to simplify the process. You just drag and drop your image no complicated scripts required.Introducing VisualInsightCore IdeaVisualInsight is a Streamlit-based web application that simplifies image analysis using Google Generative AI (Gemini). It incorporates AWS S3 for secure storage of original images and results.Figure 3: Analysis results displayed in the Streamlit applicationBy automating much of the heavy lifting, VisualInsight ensures you spend less time on configuration and more time on innovation.Key ComponentsStreamlit UI: A user-friendly interface for uploading, viewing, and analyzing images.LLM Service (Google Gemini): Advanced text-based insights derived from images.AWS S3 Storage: Secure storage for files and AI-generated analyses.Docker & Terraform: Infrastructure for quick deployments and reproducibility.CI/CD via GitHub Actions: Automated builds, tests, and deployments for reliability.How VisualInsight WorksUpload an ImageDrag and drop a JPG or PNG file onto the application.AI Analysis with Google GeminiThe uploaded image is then passed to the LLMService class, which uses Googles Generative AI (Gemini) to generate descriptive insights about the image content.Figure 4: Further analysis details being displayed to the user3. Storage in AWS S3 Once analyzed, the application uploads both the original image and any analysis results to an S3 bucket for safe-keeping.4. Display Results Insights are displayed in the application interface for immediate feedback.Figure 5: Another view of the analysis interfaceCode HighlightsBelow are some of the core services that power VisualInsight.LLM Service (app/services/llm_service.py)Handles the interaction with Google Gemini for image analysis.import google.generativeai as genaiimport osfrom datetime import datetimefrom PIL import Imagefrom utils.logger import setup_loggerlogger = setup_logger()class LLMService: def __init__(self): genai.configure(api_key=os.getenv('GOOGLE_API_KEY')) self.model = genai.GenerativeModel('gemini-1.5-flash-002') self.prompt = """ Analyze this Image and provide: 1. Image type 2. Key information 3. Important details 4. Notable observations """ def analyze_document(self, image: Image.Image) -> dict: try: logger.info("Sending request to LLM") # Generate content directly with the PIL image response = self.model.generate_content([ self.prompt, image ]) return { "analysis": response.text, "timestamp": datetime.now().isoformat() } except Exception as e: logger.error(f"LLM analysis failed: {str(e)}") raise Exception(f"Failed to analyze document: {str(e)}")Whats Happening Here?I configure our Google Generative AI (Gemini) with an API key.A default prompt outlines the kind of analysis we want.The analyze_document method sends the image to Gemini and returns its text-based analysis.2. S3 Service (app/services/s3_service.py)Uploads files to AWS S3 with timestamped keys and generates presigned URLs for private access.import boto3import osfrom datetime import datetimefrom utils.logger import setup_loggerlogger = setup_logger()class S3Service: def __init__(self): self.s3_client = boto3.client( 's3', aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'), aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'), region_name=os.getenv('AWS_REGION', 'us-east-1') ) self.bucket_name = os.getenv('S3_BUCKET_NAME') def upload_file(self, file): """Upload file to S3 and return the URL""" try: # Generate unique filename timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') file_key = f"uploads/{timestamp}_{file.name}" # Upload to S3 self.s3_client.upload_fileobj( file, self.bucket_name, file_key ) # Generate presigned URL that expires in 1 hour url = self.s3_client.generate_presigned_url( 'get_object', Params={ 'Bucket': self.bucket_name, 'Key': file_key }, ExpiresIn=3600 ) logger.info(f"File uploaded successfully: {url}") return url except Exception as e: logger.error(f"S3 upload failed: {str(e)}") raise Exception(f"Failed to upload file to S3: {str(e)}")Figure 6: The AWS S3 bucket that stores uploaded images and analysis resultsCore Features:Uses boto3 to interact with AWS S3.Generates a time-stamped key for each file.Creates a presigned URL for private file access without requiring you to open up the entire bucket.3. The Streamlit Application (app/main.py)Provides the user interface for file uploads, analysis initiation, and displaying results.import streamlit as stimport osfrom dotenv import load_dotenvfrom services.s3_service import S3Servicefrom services.llm_service import LLMServicefrom utils.logger import setup_loggerfrom PIL import Image# Load environment variablesload_dotenv()# Setup logginglogger = setup_logger()# Initialize servicess3_service = S3Service()llm_service = LLMService()def main(): st.title("Document Analyzer") uploaded_file = st.file_uploader("Upload a document", type=['png', 'jpg', 'jpeg']) if uploaded_file: # Display image image = Image.open(uploaded_file) st.image(image, caption='Uploaded Document', use_column_width=True) if st.button('Analyze Document'): with st.spinner('Processing...'): try: # Analyze with LLM directly logger.info("Starting document analysis") analysis = llm_service.analyze_document(image) # Upload to S3 for storage logger.info(f"Uploading file: {uploaded_file.name}") s3_url = s3_service.upload_file(uploaded_file) # Display results st.success("Analysis Complete!") st.json(analysis) except Exception as e: logger.error(f"Error processing document: {str(e)}") st.error(f"Error: {str(e)}")if __name__ == "__main__": main()Streamlit handles the UI: file upload, display, button triggers.LLMService and S3Service are orchestrated together to handle the AI query and file upload.Real-time logs inform you of the status and highlight any issues.Running VisualInsight LocallyClone the Repositorygit clone https://github.com/yotambraun/VisualInsight.gitcd VisualInsight2. Environment SetupCreate a .env file at the project root:AWS_ACCESS_KEY_ID=YOUR_AWS_KEYAWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRETAWS_REGION=us-east-1S3_BUCKET_NAME=YOUR_BUCKET_NAMEGOOGLE_API_KEY=YOUR_GOOGLE_GENAI_KEY3. Install Dependenciespip install -r requirements.txt4. Run the Appstreamlit run app/main.pyNavigate to http://localhost:8501 in your browser to start using VisualInsight!Containerization with DockerUse Docker for consistent application performance across environments.Figure 7: AWS ECS used for container orchestrationDockerfile (excerpt):FROM python:3.9-slimWORKDIR /app# Install dependenciesCOPY requirements.txt .RUN pip install -r requirements.txt# Copy application codeCOPY app/ .EXPOSE 8501ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]Steps:Build and Run Locally:docker build -t visualinsight:latest .Run:docker run -p 8501:8501 visualinsight:latestVisit http://localhost:8501 to use the app.Infrastructure as Code with TerraformFigure 8: AWS ECR, storing Docker images for the applicationI use Terraform to create and manage AWS resources: S3, ECR, ECS, and more for deploying the application.Why Terraform?Terraform allows you to define your cloud infrastructure as code. Rather than manually creating AWS resources via the console or CLI, you simply write a configuration file. This ensures that your infrastructure is consistent, version-controlled, and easily replicable across multiple environments.Key Advantages of Using Terraform:Reproducibility: The same configurations can be deployed multiple times without drift.Collaboration: Teams can review Terraform files in Git, allowing for better code reviews and fewer mistakes.Scalability: Quick spin-up of additional resources if your usage grows.Example Variables (infrastructure/terraform/variables.tf)variable "aws_region" { description = "AWS region" type = string default = "us-east-1"}variable "bucket_name" { description = "Name of the S3 bucket" type = string}2. Main Configuration (infrastructure/terraform/main.tf)terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 4.0" } }}provider "aws" { region = var.aws_region}resource "aws_s3_bucket" "documents" { bucket = var.bucket_name}resource "aws_ecr_repository" "app" { name = "document-analyzer"}resource "aws_ecs_cluster" "main" { name = "document-analyzer-cluster"}# ... ECS Service, Security Groups, Task Definition, etc.Why ECR and ECS?Amazon ECR (Elastic Container Registry): A private registry for storing your Docker images. Instead of relying on Docker Hub or other third parties, ECR keeps your images secure within your AWS account.Amazon ECS (Elastic Container Service): An AWS-native container orchestration service. It manages the scaling and deployment of your containerized application automatically. With Fargate (serverless compute engine for containers), you dont have to worry about provisioning or managing EC2 instances; it abstracts away all the heavy lifting.In Short:ECR stores your built Docker images.ECS pulls those images from ECR and runs them as containers in a scalable manner.3. Deploying via Terraformcd infrastructure/terraformterraform initterraform plan -var="bucket_name=my-visualinsight-bucket"terraform apply -var="bucket_name=my-visualinsight-bucket"Terraform will:Create an S3 bucket.Create an ECR repository.Set up an ECS cluster, tasks, services, IAM roles, and more.Automated CI/CD with GitHub ActionsAutomate the build, test, and deployment process to ensure consistent updates.Your .github/workflows/deploy.yml takes care of:AWS Login: Authenticates with your AWS account using secrets.Docker Build & Push: Builds the Docker image and pushes it to Amazon ECR.ECS Update: Forces a new deployment on ECS to pull the latest image.Figure 9: GitHub ActionsFigure 10: GitHub Actions pipeline for CI/CDSample Deploy Workflow:name: Deploy to AWSon: push: branches: [ main ]jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v1 with: aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }} aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }} aws-region: us-east-1 - name: Login to Amazon ECR id: login-ecr uses: aws-actions/amazon-ecr-login@v1 - name: Build and push Docker image env: ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }} ECR_REPOSITORY: document-analyzer IMAGE_TAG: ${{ github.sha }} run: | docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG . docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG - name: Deploy to ECS run: | aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deploymentWhenever you push to main, GitHub Actions will build and deploy your latest changes automatically.Real-World ImpactTime EfficiencyWith AI-driven analysis, theres no need for manual labeling or advanced ML pipeline setup.ScalabilityAWS S3 + ECS means you can handle ever-growing image datasets and traffic without re-architecting.ReliabilityDocker ensures consistent environments; Terraform standardizes infrastructure, and GitHub Actions automates testing and deployment.User-FriendlyStreamlits intuitive UI means non-developers can upload images and see insights in real time.ConclusionVisualInsight takes the guesswork out of image analysis. By combining Streamlit, Google Generative AI (Gemini), AWS S3, Terraform and CI/CD, it delivers a robust, scalable solution thats easy to use and maintain. VisualInsight streamlines the entire workflow so you can focus on making discoveries, not wrestling with infrastructure.Key TakeawaysAutomation reduces manual work and simplifies processes.Infrastructure as Code promotes collaboration and reproducibility.Docker ensures consistency across development and production environments.CI/CD enables fast and reliable updates.Feel free to clone the GitHub Repository and customize it for your own project needs. If you enjoyed this, consider clapping on Medium, sharing with others, or following me for more deep dives into AI and cloud solutions!Thanks for Reading!If you enjoyed this post, please give it a clap. Feel free to follow me onMediumReferencesGoogle Gemini: Googles advanced AI model designed for multimodal data processing, including text, images, and audio.Streamlit: An open-source app framework for creating and sharing data applications using Python.AWS S3: Amazon Simple Storage Service (S3) is an object storage service offering scalability, data availability, security, and performance.Docker: A platform for developing, shipping, and running applications inside containers, ensuring consistency across multiple development and release cycles.Terraform: An open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.GitHub Actions: A CI/CD platform that allows you to automate your build, test, and deployment pipeline.AWS ECR (Elastic Container Registry): A fully managed container registry that makes it easy for developers to store, manage, and deploy Docker container images.AWS ECS (Elastic Container Service): A highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS.These references provide detailed information about each component used in the VisualInsight application.Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming asponsor. Published via Towards AI
0 Comentários
0 Compartilhamentos
48 Visualizações