A Guide for Non-Data Scientists: Building a Predictive Model with Vertex AI AutoML

·

6 min read

Cover Image for A Guide for Non-Data Scientists: Building a Predictive Model with Vertex AI AutoML

Introduction

The field of machine learning (ML) has traditionally been the domain of specialized data scientists; however, the paradigm is shifting. Google Cloud's Vertex AI platform is at the forefront of this transformation, offering powerful tools that democratize access to artificial intelligence. This guide provides a formal, step-by-step walkthrough for business analysts, developers, and other professionals who lack a data science background to build, train, and deploy a functional ML predictive model using Vertex AI AutoML.

The objective of this tutorial is to construct a classification model on tabular data, a common format for business analytics. Such models are fundamental in business analytics, capable of predicting binary outcomes such as customer churn, lead conversion, or transaction fraud. By leveraging the no-code interface of AutoML, you will navigate the end-to-end ML pipeline—from data ingestion to prediction—without writing a single line of code.

Prerequisites

Before commencing this tutorial, please ensure the following requirements are met:

  1. Google Cloud Platform (GCP) Account: You must have an active GCP account. New users may be eligible for a free trial.

  2. A GCP Project: A project must be created within your GCP account, and billing must be enabled for it.

  3. Vertex AI API Enabled: Within your designated project, the Vertex AI API must be enabled. If it is not, you will be prompted to enable it upon first navigating to the Vertex AI service.

  4. A Tabular Dataset: You will need a dataset in CSV (Comma-Separated Values) format. Tabular data is structured in rows and columns, similar to a spreadsheet. For this guide, we will use a hypothetical customer churn dataset. Your CSV file should contain several feature columns (e.g., customer tenure, monthly charges, services subscribed to) and one target column that the model will learn to predict (e.g., a column named 'Churn' with 'Yes' or 'No' values).


Step-by-Step Instructions

Step 1: Prepare and Stage Your Data

Your model's performance is contingent upon the quality of your data. Before uploading, ensure your dataset is clean and properly formatted. The first step in the Vertex AI workflow is to make this data accessible to the platform.

  1. Navigate to Cloud Storage: In the Google Cloud Console, use the navigation menu to go to Cloud Storage > Buckets.

  2. Create a Storage Bucket: Click CREATE BUCKET. Provide a globally unique name for your bucket, select a region for data storage (it is best practice to use the same region where you will run your Vertex AI jobs), and keep the remaining settings at their default values. Click CREATE.

  3. Upload Your Dataset: Once the bucket is created, navigate into it and click UPLOAD FILES. Select your prepared CSV dataset from your local machine to upload it.

Step 2: Create a Vertex AI Dataset

Next, you will register your data with Vertex AI, which allows the platform to understand its structure and content.

  1. Navigate to Vertex AI: In the Google Cloud Console, navigate to Vertex AI in the navigation menu.

  2. Create a New Dataset: From the Vertex AI Dashboard, select Datasets from the left-hand navigation pane. Click CREATE.

  3. Configure the Dataset:

    • Provide a descriptive name for your dataset (e.g., Customer_Churn_Data).

    • Select Tabular as the data type.

    • For the objective, select Classification.

    • Click CREATE.

  4. Connect Your Data Source:

    • Select Select CSV files from Cloud Storage as your data source.

    • Click BROWSE and navigate to the bucket and CSV file you uploaded in Step 1.

    • Click CONTINUE. Vertex AI will begin importing and analyzing your data, a process that may take several minutes depending on the file's size.

Step 3: Train the AutoML Model

With your dataset registered, you can now instruct AutoML to train a predictive model.

  1. Initiate Training: Once the data import is complete, you will be on the dataset's detail page. Click TRAIN NEW MODEL.

  2. Set Training Method: In the model training pane, ensure the Dataset is correct and select AutoML as the training method. Click CONTINUE.

  3. Define the Model Details:

    • Target Column: From the dropdown menu, select the column that you want the model to predict. In our example, this would be the 'Churn' column.

    • Training Budget: Specify the number of node hours for training. AutoML will automatically search for the best model architecture within this budget. For an initial experiment, a budget of 1 node hour is often sufficient to gauge feasibility and obtain a baseline model; Vertex AI provides an estimated cost before training begins.

  4. Start Training: Click START TRAINING. The model training process is fully automated and can take several hours to complete, depending on your dataset size and training budget.

Step 4: Evaluate Model Performance

After training is complete, Vertex AI provides a comprehensive suite of metrics to evaluate the model's predictive accuracy.

  1. Access the Model: Navigate to the Models section in the Vertex AI console. Click on the name of your newly trained model.

  2. Review the 'Evaluate' Tab: This tab presents various performance metrics. For a classification model, pay attention to:

    • Confusion Matrix: This matrix displays the counts of correct and incorrect predictions for each class (e.g., how many 'Yes' churners were correctly identified versus missed).

    • Precision and Recall: These metrics provide insight into the model's accuracy. In simplified terms, precision measures the accuracy of positive predictions, while recall gauges how effectively the model identifies all actual positive cases.

Step 5: Deploy the Model to an Endpoint

To use your model for predictions, it must be deployed to an endpoint, which makes it accessible as a service.

  1. Initiate Deployment: From your model's detail page, navigate to the DEPLOY & TEST tab and click DEPLOY TO ENDPOINT.

  2. Configure the Endpoint:

    • Select Create new endpoint and provide a name (e.g., churn-prediction-endpoint).

    • Leave the remaining settings, such as machine type and traffic split, at their default values for this initial deployment.

  3. Deploy: Click DEPLOY. Endpoint creation and model deployment will take several minutes.

Step 6: Test the Deployed Model

Finally, you can send new data to your deployed model to receive a prediction.

  1. Navigate to the Endpoint: In the Vertex AI console, select Endpoints from the navigation pane and click on your new endpoint.

  2. Make a Prediction: Within the Test your model section, you can input new data for each feature column. For example, enter the data for a new, hypothetical customer, omitting the value for the 'Churn' target column.

  3. Analyze the Result: Click PREDICT. The model will return its prediction, including a confidence score, indicating the likelihood of the new customer churning.

Conclusion

You have successfully completed the end-to-end process of building, training, evaluating, and deploying a machine learning model using Vertex AI AutoML. This exercise demonstrates that direct engagement with powerful AI technologies is no longer exclusively restricted to individuals with extensive coding or data science expertise. By leveraging the automated and intuitive capabilities of Vertex AI, you can now translate business data into actionable predictions, driving innovation and efficiency within your organization. We encourage you to apply this process to your own datasets and explore the further possibilities offered by the Google Cloud AI ecosystem.