Interflexion: Scaling AI Predictions with Google Cloud MLOps
Success Stories
Interflexion: Scaling AI Predictions with Google Cloud MLOps
Industry: Software as a service (SaaS)
About the Client
Interflexion required a solution to operationalize their custom TensorFlow machine learning models. They needed to move from model development to a production-ready testing environment capable of handling incoming prediction requests, leveraging cloud-native tools for efficiency and scalability.
Partner role in project:
Evonence enabled Interflexion to develop and deploy a TensorFlow-based machine learning model efficiently using Google Cloud. They facilitated model training on Vertex AI, data management with Google Cloud Storage, and scalable deployment via Cloud Run.
The challenge:
Interflexion needed to develop a TensorFlow model, train it on a specific dataset, and then deploy this model for real-time testing and predictions. Key challenges included managing the training process, handling the dataset efficiently, and creating a scalable, accessible endpoint for the trained model to perform inference.
The solution:
Evonence implemented a solution leveraging Google Cloud. The TensorFlow model was trained using Vertex AI Training, utilizing datasets stored in a Google Cloud Storage bucket. The trained model (e.g., nn.h5) was then containerized and deployed as a web service using Cloud Run, providing a scalable API endpoint for image-based predictions.
Leveraging Google’s product suite:
Vertext AI, Google Cloud Storage, Cloud Run, Tensor Flow.
Cloud scale and speed:
The solution is built for cloud scale and speed by leveraging Google Cloud's managed services. For model training, Vertex AI Training is used, which can accelerate the process through distributed training across multiple nodes and by utilizing hardware accelerators like NVIDIA GPUs (T4, V100). This is crucial for handling large datasets and complex models efficiently. For deployment, the application is served on Google Cloud Run, a serverless platform that automatically scales to handle fluctuating request volumes, ensuring both low-latency predictions and cost-effectiveness. The model's footprint and inference speed are key criteria in the design, ensuring the application remains efficient when deployed on Cloud Run.
Google AI-enhanced predictions:
The system delivers Google AI-enhanced predictions by using a custom-trained TensorFlow model deployed on Google Cloud infrastructure. The core of the solution is a Convolutional Neural Network (CNN), a specialized deep learning model ideal for image classification tasks like recognizing handwritten digits. The model is trained on Vertex AI, leveraging its powerful infrastructure and tools like Vertex AI Vizier for hyperparameter tuning to maximize accuracy. When a user sends an image to the deployed API, the application preprocesses it by converting it to grayscale, resizing it to a uniform 28x28 dimension, and normalizing pixel values. This prepared tensor is then fed into the TensorFlow model, which returns the predicted digit as a JSON response.
The results:
This approach provided a streamlined and robust MLOps pipeline for Interflexion. Vertex AI simplified the training process, Google Cloud Storage offered secure and accessible data storage, and Cloud Run enabled a cost-effective, scalable deployment for the TensorFlow model. Interflexion successfully deployed their model for testing and inference.
High availability:
High availability is achieved by deploying the application on Google Cloud Run, a fully managed, serverless platform. Cloud Run automatically manages the underlying infrastructure, ensuring the application is always accessible via a unique HTTPS endpoint. Because it is serverless, it inherently handles scaling from zero to many instances based on traffic, providing robust performance without the need to provision or manage servers. The entire application, including the Flask web server and the TensorFlow model, is containerized using Docker, making the deployment consistent and reliable. This architecture ensures that the prediction service remains operational and responsive to incoming API requests.
AI-based predictions:
The solution provides AI-based predictions by automating the classification of handwritten digits from images. The machine learning model, a Convolutional Neural Network (CNN), takes an image file as input through a POST request to a secure API endpoint. The application processes the image and uses the trained model to identify which digit (0-9) is present. The system then returns a simple, machine-readable JSON response containing the prediction. For example, upon receiving an image of a handwritten "7," the API would return {"prediction": 7}. This automates numerical data extraction and reduces errors associated with manual data entry.