This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Recommind's Advice for Making Machine Learning Work For Business

This article is more than 10 years old.

It’s easy to get caught up in the mystique of machine learning. After all, what’s not to like about the idea of algorithms sucking up and sorting through the data detritus of our companies’ back alleys like a Roomba Sweeper Vac?

The reality is a bit more complex. Like any “breakthrough” technology, machine learning involves some forethought and discipline before being let loose in the enterprise.

I recently spoke with Bob Tennant, CEO of Recommind, a San Francisco-based unstructured information management and analysis company, seeking a few points of advice about where to use machine learning most effectively. Recommind has developed a platform that automates discovery, search, and categorization of data. With user input, the software adjusts its behavior to predict which data will be most relevant for the question at hand. By using Probabilistic Latent Semantic Analysis (PLSA), a machine-learning technique developed by Recommind cofounder and CTO Jan Puzicha at University of California, Berkeley, Recommind can help sort out whether thousands of documents that contain the word “java” are about programming or coffee, for instance.

Before unleashing the beast, however, the key question to ask is, “What kinds of business problems would you like to solve?” Tennant says. “To put it another way, ‘what questions would you want to be able to answer, and how much would they be worth if you could answer them?’ It’s a whole new world now. Don’t let your thinking be constrained by what you think may or may not be possible.”

Two nearly universal business problems are prime candidates for machine learning, Tennant says.

Stop the Hoarding!

For the greatest number of companies, machine learning can be used to attack data hoarding, a problem related to information governance, or more specifically, governance-related paralysis.

“Any major company is subject to both government investigation and to litigation,” Tennant says. “Because of those requirements, what’s tended to happen is that data has piled up for a long period of time and everybody’s afraid to get rid of it. And they’re afraid to get rid of it because they don’t know if it is pertinent to a lawsuit or a government investigation that’s ongoing.”

The closest machine-learning analogue to that Roomba is “predictive coding,” machine-learning technology combined with an iterative workflow that leverages a small amount of human input to identify relevant information. Using predictive coding for governance, businesses can quickly find information that’s safe for deletion and feel confident they’ve made the right decision. That enables them to stop hoarding and get rid of the clutter, Tennant says.

“That’s an application of machine learning within the context of a repeatable and defensible workflow that can be applied to any corporation,” he says.

Improving Integration of Data Sources for Better Marketing

It’s hard enough to cut through all the noise and reach potential customers and customers who may be at risk of churning. It’s so much worse when you connect with them and send the wrong message.

“I still get solicitations from credit card companies on an ongoing basis,” Tennant says. “Sometimes they’re from my own bank, and sometimes that’s right after I’ve just finished yelling at them about having screwed something up.”

The problem is that the entire lifecycle of the customer is not evident to the marketers, who typically focus on acquisition and lack access to current information that would help them market to existing customers. The marketing management system that sent out the flyer is not integrated with the call center software that received that angry phone call, nor is it connected to the CRM system that a banker used when the customer was last in the branch. This is disconnected from click-stream information, such as a web log that captured the fact that the customer was looking around for mortgages.

The outcome might be greatly improved if, instead of sending a credit-card solicitation immediately after an angry phone call, the marketer waited two weeks for the customer to cool off, then sent a mortgage-related offer. When connected to all these data sources, machine learning can help make correlations that prevent marketing fails.

Data Discipline for Maximizing Machine Learning Value

Once you’re convinced machine learning is useful, there are several ways to clear the ground to make it effective, Tennant says. Machine learning can be used diagnostically (figuring out what data you have and what you should keep) as well as opportunistically (finding new data sources and comparing them to the data you have organized and now retain).

Learn to manage what you have. “The first thing is to understand what you have,” Tennant says. “There’s a lot more cost involved in data hoarding than people think about. It’s not the cost of disks [anymore]; it’s the cost of all the overhead of managing the different systems and the clogging of the networks [as data moves back and forth].”

It’s also important to acknowledge that not all of your data will be valuable, and it’s quite likely that right now, you’re not looking at all the data that will be valuable. The days of storing everything indefinitely are over - there’s simply too much data. Some kind of data-management scheme has to be applied as data arrives so that wasteful volumes don’t start accumulating and good stuff doesn’t fall through the cracks.

There is not much to be gained from trying to boil the ocean, but there is almost-instant ROI from coming to grips with your current data and filtering out the noise, Tennant says.

Think about the signals you want from your data. The golden goose of analysis is the extraction of signals, specific, actionable information that defines a course of action. You arrive at signals by first looking at the business problems you want to solve, Tennant says. Initially, you won’t know which machine learning techniques will provide which signals, but working with vendors should allow you to conduct some experiments to figure that out.

This may also mean taking some of the data you’ve collected and sharing it with outside parties. This is a psychological barrier many companies have yet to overcome, but it can be a shrewd move with the right governance. For instance, data could be placed on Kaggle, a leading site for competitions between data scientists, which offers prize money to those who devise the best solutions.

Think about getting signals from data you don’t own. As I’ve alluded to in “Do you Suffer from the Data Not Invented Here Syndrome,” the time has come for companies to ditch the provincial attitude that only data created and housed within the walls of the enterprise is worth exploring. Think about where you can get signals from outside your company - the context around the interactions people have with your company and others in your space. Packaged applications can help to deliver some of this data, but analytics vendors have recently moved their offerings into the realm of the business user. Consider developing your own capabilities, using commercial building blocks to create an application that your business users can deploy effectively.

Recommind helps companies with most of the stages of the data discovery and analytical process: What is the business question I am trying to answer? What data do I have? What data should I use? What data should I not use? Where should I go to get data to complete the picture? It becomes particularly useful when dealing with unstructured data such as emails, tweets, documents and social media, Tennant says.

But as with all technologies, no matter how sophisticated, companies need to develop a clear sense of purpose first and do some pruning of their data landscapes before unleashing the robots.

Follow Dan Woods on Twitter

Dan Woods is CTO and editor of CITO Research, a publication that seeks to advance the craft of technology leadership. For more stories like this one visit