Using Machine Learning to Improve Sales - A Simple Example 

You've probably heard by now about all of the advances machine learning is enabling, in areas like voice recognition, conversations, image processing, and self-driving cars. But how can you harness this amazing new power to do something as basic as improve your business's sales? What level of work is required, and what kind of results can you expect to achieve?

This blog article sets out to answer these questions by giving a concrete, real-world example: improving the close rate of an outbound sales team.

You don't need to be a machine learning expert to understand the example I will give. I'll take you through the whole process at a high level, and summarize the results. And for the programmers out there, I'll include the data and sample code.

The first thing you'll need in order to work on any machine learning problem is historical data - the more the better. Even basic machine learning algorithms require hundreds or thousands of data points to achieve reasonable accuracy; some algorithms like neural networks can require millions.

For most (supervised) learning algorithms, the historical data has to be tagged with the "correct" answer. For example, if you are trying to train your algorithm to recognize faces in a picture, you need to start with data where the people are already tagged. Similarly, in our case, where we are trying to predict whether or not a sales lead will purchase our product, we need historical data on prior leads, their attributes, and whether or not they purchased the product. The goal of the machine learning code is then to predict which ones will purchase the product in the future.

I googled for some sample data and found a data set of 3000 sales records that were generously provided by a Portuguese bank and used as the basis of a kaggle competition. This data includes 3000 of their leads, and for each, it has about a dozen attributes (age, education, profession, etc) as well as whether or not the lead purchased the product they were telemarketing (a term deposit). 

I randomly pulled out about 10% of these leads and set them aside as part of a "verification set". Once the algorithm is trained, we will test it by applying the predictions to this verification set, and, by comparing our predictions to what actually happened, we will be able to see how accurate our predictions were.

The remaining data was used to train the algorithm.

In practice, the first choice you would make regarding the algorithm is what platform to use to develop it. Theoretically, you can write your algorithm from scratch, use a pre-built library like tensorflow, or use a development environment geared to developing and deploying machine learning models like Amazon Sagemaker. I chose the last option as this is the easiest and also in general the Amazon algorithms are scalable and efficient. Sagemaker uses convenient Jupyter notebooks (a popular development environment for python/ML) and also works directly on the AWS cloud with ample computing resources available - important for the training part of the process, which can consume a lot of computer resources.

The next choice you might make is what machine learning algorithm to use. There are about a dozen popular machine learning algorithms (cheat sheet here), each one tuned to a particular type of problem and data set. The one I chose is XGBoost (a form of gradient boosted tree), which works very well for classification problems (where the answer is one of  limited set of values, like "yes" or "no", as opposed to a number) and does not require a huge data set. In my case, I had ~2700 record to train with, and just needed to predict "yes" or "no", whether they would buy the product or not.

With Sagemaker, once you have the data set up the way you want it, actually doing the machine learning training is just a few lines of code. You simply pass off the data to an XGBoost training implementation and it trains a model for you. In my case, this took about 15 minutes of execution time. This "model" is essentially a predictor function that will allow you to predict future sales. AWS lets you easily set this up as an endpoint that is easily callable from your code.

The whole process took me 3-4 hours, most of which was cleaning up the data beforehand. 

What were the results?

  • Without using machine learning, and just calling every lead on the list, the close rate would have been 7.5%.
  • With using machine learning, and just calling the leads it predicts would close, the close rate would have been 85%.

In other words, even with this simple example, relatively small data set, and no model tuning, sales close rates with machine learning were over 11 times higher than without it.

With more work, It's possible to improve it even further.

Hopefully this example gives you a sense of the power of machine learning, and how it can be used in real world problems all business face.

Here is the code for those that are curious. You should be able to run this directly in a Sagemaker Jupyter notebook.

The same data used is here.

bucket = 'marketing-example-1'
prefix = 'sagemaker/xgboost'
 
# Define IAM role
import boto3
import re
from sagemaker import get_execution_role

role = get_execution_role()

#import libraries
import numpy as np                                # For matrix operations and numerical processing
import pandas as pd                               # For munging tabular data
import matplotlib.pyplot as plt                   # For charts and visualizations
from IPython.display import Image                 # For displaying images in the notebook
from IPython.display import display               # For displaying outputs in the notebook
from time import gmtime, strftime                 # For labeling SageMaker models, endpoints, etc.
import sys                                        # For writing outputs to notebook
import math                                       # For ceiling function
import json                                       # For parsing hosting outputs
import os                                         # For manipulating filepath names
import sagemaker                                  # Amazon SageMaker's Python SDK provides many helper functions
from sagemaker.predictor import csv_serializer    # Converts strings for HTTP POST requests on inference

#download data set
!wget https://fasttrackteam.com/Data/sites/1/media/data.csv

#read into data frame
data = pd.read_csv('./data.csv', sep=',')
pd.set_option('display.max_columns', 500)     # Make sure we can see all of the columns
pd.set_option('display.max_rows', 20)         # Keep the output on one page
data

#clean up data
data['no_previous_contact'] = np.where(data['pdays'] == 999, 1, 0)                                 # Indicator variable to capture when pdays takes a value of 999
data['not_working'] = np.where(np.in1d(data['job'], ['student', 'retired', 'unemployed']), 1, 0)   # Indicator for individuals not actively employed
model_data = pd.get_dummies(data)                                                                  # Convert categorical variables to sets of indicators
model_data = model_data.drop(['duration', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed'], axis=1)

#split into train, test, validation sets
train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))])   # Randomly sort the data then split out first 70%, second 20%, and last 10%

#prep for XGBoost
pd.concat([train_data['convert_yes'], train_data.drop(['convert_no', 'convert_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
pd.concat([validation_data['convert_yes'], validation_data.drop(['convert_no', 'convert_yes'], axis=1)], axis=1).to_csv('validation.csv', index=False, header=False)

#copy to S3
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.csv')).upload_file('validation.csv')

#set up training instances
containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',
              'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',
              'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest',
              'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:latest',
              'ap-northeast-1': '501404015308.dkr.ecr.ap-northeast-1.amazonaws.com/xgboost:latest'}

s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

#create training job
sess = sagemaker.Session()

xgb = sagemaker.estimator.Estimator(containers[boto3.Session().region_name],
                                    role, 
                                    train_instance_count=1, 
                                    train_instance_type='ml.m4.xlarge',
                                    output_path='s3://{}/{}/output'.format(bucket, prefix),
                                    sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,
                        eta=0.2,
                        gamma=4,
                        min_child_weight=6,
                        subsample=0.8,
                        silent=0,
                        objective='binary:logistic',
                        num_round=100)

xgb.fit({'train': s3_input_train, 'validation': s3_input_validation}) 

#create an endpoint based on trained model
xgb_predictor = xgb.deploy(initial_instance_count=1,
                           instance_type='ml.m4.xlarge')

#evaluate results
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
def predict(data, rows=500):
    split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
    predictions = ''
    for array in split_array:
        predictions = ','.join([predictions, xgb_predictor.predict(array).decode('utf-8')])

    return np.fromstring(predictions[1:], sep=',')

predictions = predict(test_data.drop(['convert_no', 'convert_yes'], axis=1).as_matrix())

pd.crosstab(index=test_data['convert_yes'], columns=np.round(predictions), rownames=['actuals'], colnames=['predictions'])

#clean up
sagemaker.Session().delete_endpoint(xgb_predictor.endpoint)
Posted by Brian Conte Monday, July 2, 2018 2:34:00 AM Categories: B2B big data technology

Big Data - A Sample Application 

Exploring Google Books

Google Books, and its associated ngram indices, represent one of the largest publicly available databases in the world. At last count, Google had scanned and indexed over 25 million books containing over 1,000,000,000,000 terms (ngrams) - roughly comparable to all of the text on all of the pages of the internet. Not only is the database impressive in its scale and availability, but in the wealth of knowledge in it about our culture over the last 200 years. 

Here are some examples of some of the insights you can gain with Google Books. These graphs show the relative occurrences in printed material of the specified words and phrases, by year, and to a good approximation reflect what people were thinking (and writing about) during this time.

Political ideologies:

 

Modes of transportation:

Family roles:

Many more examples are here.

Working with data sets this large required Google to pioneer new concepts in highly scalable parallel data processing, such as MapReduce, also known by the name of its popular implementation, Hadoop. These techniques allowed Google to break down the massive problem of indexing this vast database into manageable chunks that could be performed by many machines working in parallel. These systems and techniques are now used by many companies for big data problems, such as customer analytics and machine learning. 

 

View User Profile for Brian Conte Brian founded Fast Track with over 15 years of entrepreneurial experience and technology expertise. Brian managed the development of Microsoft's first browser in 1985 and later founded hDC, the first Windows software company. Brian ran hDC, later named Express Systems, for 10 years before selling it to WRQ in 1996, where he remained as CTO. Brian spearheaded the development of one of WRQ's most successful products, Express 2000, which generated more than $10 million in its first year. Brian holds a BSE in Electrical Engineering and Computer Science from Princeton University.
Posted by Brian Conte Tuesday, October 18, 2016 1:31:00 AM Categories: B2B big data custom development enterprise technology web development

Web Development Life Cycle in a Nutshell 

In this article, you will learn the different stages of web development in a language that even a non-techy person can understand.

Your website represents you to the world. An understanding of the web development cycle will enable you to work efficiently with web developers to achieve a prominent online presence. An image depicting the life cycle of web development.
As a businessman, you have to put in a lot of effort and planning in setting up your business, as well as your website and online presence on various social networks. That is why hiring website developers to help you build your website is getting more popular. From identifying the feature you need for your website to going live, the whole process is known as Web Development Life Cycle. Normally, this process goes through 6 stages, namely:

  • Analyzing
  • Planning
  • Design
  • Development
  • Testing and Delivery
  • Maintenance

 

Analyzing

This a crucial stage of the development cycle. This is the stage where you have to analyze the core values and functions of your company. You should have a clear understanding of your business goals and how you will need the web to attain these goals.

In order to analyze your goals in detail, you can break them down in parts and define those. First of all, consider what the purpose of the site will be; whether it will be used to promote a service, provide information, or to sell a product.  Once you have a clear purpose, you need to define the target audience next. Think of the ‘ideal’ person that you would want as the visitor; knowing their age, sex, interests etc. will help in determining the best design for the site. Now that you know your target audience, you can analyze what kind of content they’ll be looking for on your website.

Planning
After a thorough analysis, you can move on to the next stage of the web development cycle, and start planning. It is at this stage that the site map is developed.
A sitemap is the basic outline model of your website. It lists all the main areas of the site, as well as their subdivisions. This will help you to decide what type of content will be on your site. The type of technical tools to be implemented are also decided at this stage. The target audience is to be kept in mind, as you need the user interface to be such that it is not only easy but also fun and engaging for them to navigate through the web site.

Design
Now is the time to design the layout of the website. The site will look different for different target groups as per their interests. It is also important to strengthen the identity of your company on the website. You can do that by incorporating the company logo or its colors into the design. Here are a few of the current website layout trends which you can go through for the layout of your website.

The web designer will send you several prototypes. You can either view mockups, or the designer can give you access to view the work in progress. This is done so that you are able to view the design and development stages, and are able to give feedback. This is necessary as the website needs to match your needs and tastes. You should also decide what type of Content Management System (CMS)  to use at this stage of the development cycle. Constant communication is ultimately necessary in this stage.

Development
In the cycle, the development stage is where the actual, functional website is created by using the graphic elements of the prototype. No matter what CMS you will use, it is best to start with a generic HTML and CSS . This involves writing valid HTML / CSS code that complies with current web standards, maximizing functionality, as well as accessibility for as large an audience as possible.

The home page is the first page to be developed. After that, a template is created for the content pages, which contains the main navigational structure of the website. In this, the developer distributes the content in their appropriate areas. All the other technical features are made functional in this phase.

Testing and Delivery
Websites function as a multi-user and multi-tier system with bandwidth limitations. Consequently, tests for complete functionality and compatibility are done at this stage. Both automated testing and manual testing should be done without fail. Implement analytics tools so that you will be able to track your website’s statistics before, during, and after the website launch.

Once the final output is approved, website owners then perform a final run-through to confirm that everything was uploaded correctly and is functional.  The site can then go live.

Maintenance
The web development cycle doesn’t stop at the site’s launch.  With a lot of online competitors vying for your target customers, the real battle has just started. As a website owner, you will have to make sure that your website is updated with the current trends as well as stuffed with content that matters to your target audience.   Aside from regularly updating the content of the site, you should also be looking at regular site backups, additional plugin installations, tools and plugin upgrades.

Was this rundown helpful to you? Give us your thoughts.

View User Profile for Shubhada Paranjape Shubhada worked as a team lead for the Objectstar testing group (a product of Fujitsu) for two years. Later, she was the product lead for the e-filing development and support team for two years. Shubhada then joined Brian and her husband Ajey to start and run the Fast Track India operations. She holds a Masters in Mathematics from Pune University and an advanced diploma in Computer Science. She's on Twitter as @ShubhadaPar.
Posted by Shubhada Paranjape Thursday, March 17, 2016 6:25:00 PM Categories: business partnership technology website

What is Agile Methodology? 

Agile methodology

Agile methodology is a set of tools, skills, and knowledge that is considered (collectively) as an alternative method to conventional product management and development. It is often used in software development wherein teams act and decide in response to unpredictability through iterative work sprints.


The Origin of Agile

The 1970 publication by Dr. Winston Royce entitled "Managing the Development of Large Software Systems" criticized the sequential process involved in product development.

Dr. Royce emphasized that software should not be developed like a product on an assembly line where each component is added in sequential phases, and where each phase must be completed before starting the next phase – the so-called “waterfall” approach. He opposed this phase-based approach wherein developers first gather all of the requirements, complete all of the architecture and design elements, write all the code, do all testing, and so on. Dr. Royce specifically opposed this style of process because of the lack of communication between the specific groups which complete every phase.

In waterfall methodology, teams only have a single chance to get things right. It is also not an optimized method compared to the concept of agile. Waterfall method assumes that each requirement can be identified prior to the design and coding processes. Could you tell your developers all they need to know (requirements and all elements) to include in the software before it is up and running? Or would it be easier to illustrate your idea to the development team if you could give feedback on functional software?

Why Go the Agile Way?

Using agile technology provides opportunities for your team to assess the direction of your project during the development process. This is attained through regular iterations, at the end of which, teams should present the resulting product increment. This method is described as 'incremental' and 'iterative' due to the process of repetitive shortened work cycles and the functional product they produce.
 
There are different types of agile methods that use the original principle stated in the agile manifesto. The most popular types are the following:

Scrum

Scrum specifically focuses on how to manage tasks within a team-based development setting. It is the most widely implemented agile method possibly because it is easier for IT development teams to understand and follow. Scrum is not repressive and doesn't demand loads of technical discipline unlike well-defined Agile methods. It lets the development team decide what to do and how to do it; as well as get up to speed and begin doing Agile swiftly and cost-effectively.

Scrum certification helps fulfill the objective of the Agile manifesto by encouraging collaboration, productivity, and accomplishment among team members.      

Dynamic Systems Delivery Method (DSDM)
Possibly the original agile method, DSDM was already in existence even before the term 'agile' was used and adapted in software development.  DSDM fixes cost, time and quality at the outset and uses prioritization scope into “musts”, “shoulds”, “coulds”, and “won’t haves”.

Extreme Programming (XP)

Extreme Programming or XP is a more thorough type of agile method which focuses on process analysis, development, and test phases through frequent releases in short development cycles intended to improve productivity and introduce checkpoints to accommodate new customer requirements.

Among the three popular types, DSDM is possibly the most comprehensive agile method, while Scrum and XP are a lot easier to implement and complementary since they deal with various aspects of software development projects and both are established on very similar concepts.

In the last decade, many industries have seen the benefits of using agile technology. Media, marketing, technology, large corporations, as well as government sectors have seen a dramatic improvement in their IT development projects and team efforts, which also provides that much-needed competitive edge.

In agile product development, project management is a little different as it relies more on the team leader's skills in coordination, communication, and facilitation with less emphasis on planning and control. However, not all projects go well with this method and it is not always the key to instant success. The key is to understand many techniques from different agile and waterfall methodologies, and pick out the best approaches that will suit a specific situation.

Agile technology, with a combination of your team's skill and experience, can help you have a more flexible approach and less documentation, more collaboration and visibility that allows for a more rewarding team experience and better products as a result.

 

View User Profile for Brian Conte Brian founded Fast Track with over 15 years of entrepreneurial experience and technology expertise. Brian managed the development of Microsoft's first browser in 1985 and later founded hDC, the first Windows software company. Brian ran hDC, later named Express Systems, for 10 years before selling it to WRQ in 1996, where he remained as CTO. Brian spearheaded the development of one of WRQ's most successful products, Express 2000, which generated more than $10 million in its first year. Brian holds a BSE in Electrical Engineering and Computer Science from Princeton University.
Posted by Brian Conte Thursday, January 21, 2016 5:08:00 PM Categories: business partnership custom development enterprise project management small business tips technology web design web development