A leading CPG company’s product engineering group wanted to use historical product formulation data from their research teams and public data to aide in their product planning process. We were able to help the client achieve their goals by using ML to predict the potential success of a new product. This helped the client greatly improve their ability to select new products to invest in while saving millions of dollars in wasted R&D efforts. 

The Problem

The client wanted to be smarter about what new product development initiatives they invested in. This would reduce their financial risk and increase development speed for a new product by predicting the commercial viability of a new product ahead of time. They wanted to look at their current product offering, compare it with the competitive landscape, and identify key gaps to target. While this could’ve been done manually, it would’ve taken hundreds of hours to extract key insights. They wanted to utilize machine learning to perform the analysis and generate predictions of which new products would be most successful.

Data Preprocessing

Our data science team worked with the client’s subject matter experts to create a de-normalized structure of historical product data, then map it against commercially available data of similar products from competitors. The data consisted of geographical availability, ingredients, and marketed features. There were challenges in creating the training data set due to incomplete historical data and structural differences between the client’s data and commercially available data.  We mitigated this by applying a multivariate imputation by chained equations (MICE) algorithm to address the gaps. This resulted in a usable training data set.

Model Selection

We need to go through multiple iterations to identify the best fit. The best suited models were based on distance-based kernel, where the relationship between two products can be represented using a distance metric (such as Euclidean). We trained the model with multiple algorithms (KDTree, Ball-tree, Wide-Deep Neural Network) in multiple iterations, including mini-batch training and tuning hyper-parameters, number of layers, and number of nodes in the layers to further tune the model.

User Experience and Deployment

The client had an existing Google Cloud infrastructure, so we used Google Compute Engine with TPUs (Google’s proprietary ML processors) to train the model. We used various libraries and frameworks, such as Google Cloud ML Engine, SciPy, Flasks, Keras and Tensorflow.

To productionalize the model, we built a web application for the product planning team, which consisted of data scientists and product engineers. The model was integrated into their existing web app using an API, which we built using Keras and TensorFlow. The architecture allows batch as well as on-line update to the underlying data used for training and setting up various machine-learning/clustering models.

Year Established
Products
Team Size
Clients

Learn how to use machine learning in your decision making

Get In Touch

Related Content

Improving Quality of Care through Population Health Analytics

A leader in healthcare partnered with us to improve care quality using  analytics

Investing to Help Their Customers Make Data-Driven Decisions

A workforce management company partnered with us to strategically enhance their analytics offering

MVP Helps Startup Validate Market

A healthcare startup wanted to test product-market fit fast. See how we helped achieve that