Cloud, Cost Containment

7 tips to help get your AWS costs under control

Do you feel like your AWS monthly costs are getting out of control? Costs will continue to trend up as your needs grow. With that in mind, here are a few tips to help IT and Finance reign in spending. AmazonWebservices_Logo.svg

  1. Develop a simple metric to track your monthly AWS costs—Cost per X.  Your X can be customers, accounts, transactions—whatever makes sense for your business.  If your Cost per X ever rises substantially, it’s probably worth a deeper discussion.
  2. Tag all instances. Finance and IT should work together to define tags that help with tracking and financial reporting. If applicable, include tags for Customer, Application, and Environment (e.g. development, test, or production.) Tags will help you understand if utilization on development and test instances is too high.
  3. Implement a “Tag or Kill” policy to help enforce tagging. Pick one day a week to kill any instances that aren’t tagged.
  4. “Turn off the lights.”  Shut down development and test instances when they aren’t being used. In theory, only production instances should be left on 24/7/365.
  5. Buy EC2 Reserved Instances.  The beauty of AWS is that EC2 instances can be turned on or off as needed. This flexibility comes at a premium.  If you know you’ll need a certain EC2 instance size 100% of the time, it’s definitely worth buying a reserved instance. Reserved instances require an upfront financial commitment, but save significant money in the long run.
  6. Auto Scale your EC2 Instances! AWS Auto Scaling allows you to ramp your EC2 instances up or down based on predefined criteria. Additional instances can be turned on or off as needed. You may be wasting money if your EC2 instances are running at sizes based on peak demand.EC2 Trend - Peak Demand
  7. Talk about it.  Finance and IT should meet regularly to discuss ongoing costs, big swings in the Cost per X,  and to modify the strategy as needed.
Big Data, Data Science

Using Prescriptive Analytics to Make Better Decisions

Business Analytics is broken down into three distinct phases.

  1. Descriptive – What happened? This phase involves traditional BI tools to help organizations process and report on historical data. Trends are analyzed and decisions are made. The majority of management reporting uses this approach.
  2. Predictive – What will happen? This phase uses machine learning algorithms to build models from historical data and then uses those same models to predict a future outcome or its likelihood.
  3. Prescriptive – What action should be taken? This phase prescribes actions to achieve the best possible outcome based on the predictions made. Actions that lead to the highest chance of success are prescribed.

Prescriptive Analytics predicts and compares the likely outcomes of any number of actions, and then chooses the very best action to help advance an organization’s objectives.

Consider implications for the healthcare industry. Healthcare predictions are most useful when that knowledge prescribes clinical action for each predicted outcome.

Similar insights can help organizations improve decision making and have more control of business outcomes. Prescriptive analytics is an important next step on the path to insight-based actions and recommendations.

Data Science

3 free data tools you never knew you were missing

The right tools can make a world of difference. If you work with data, here are three tools to add to your toolbox.

1. Data Preprocessing

KNIME is an open source data analytics and integration platform. The interface allows you to assemble workflow nodes for data preprocessing (ETL) and data analysis. Modeling and data visualization nodes are also available, but I use other tools for those. Screen Shot 2015-12-18 at 5.01.46 PM

Need to create a monster Pivot Table? The Pivoting node can handle very large files with ease. I used a dataset in comma separated format (csv) and a simple KNIME workflow to create a pivot table with over 100,000 columns.

Screen Shot 2015-12-18 at 4.49.32 PM

Download KNIME at

2. Data Mining

Screen Shot 2015-12-18 at 5.06.25 PMWeka is a collection of machine learning algorithms that help you complete data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka also contains tools for data preprocessing, but can also manage classification, regression, clustering, association rules, and visualization.

Weka has a large online community and lots of support. The interface is easy to use.

Screen Shot 2015-12-18 at 5.10.46 PM

Weka also provides some great visualizations of your dataset.

Screen Shot 2015-12-18 at 5.11.12 PM

Download Weka here.

Here are a few sample datasets to get you started.

3. Data Visualization

Screen Shot 2015-12-18 at 5.21.26 PMTableau Public is a free tool to create interactive data stories on the web. It’s available as a service so you can be up and running as soon as you download it.  Connect, create, and publish interactive data visualizations  directly to your website. No coding required!

Tableau even provides How-to Videos and sample datasets.

Download Tableau Public here.


Data Science

6 steps to data mining awesomeness

Have a data mining project on the horizon?  These 6 steps make up the Cross Industry Standard Process for Data Mining (CRISP-DM) and will help make it awesome!  datamining

  1. Gain an understanding of the business problem you are trying to solve. Are the business requirements well defined?
  2. Get to know the data. What data is available? Is it complete? What data is needed?  Now is also a good time to identify any data quality problems. 
  3. Prepare the data. Data is rarely clean or in the right format for your modeling tools. This step can be time consuming.   
  4. Create your model(s).  – Pick your modeling tool and build your model – Linear Regression, Classification, Clustering. Several techniques can be used to solve the same data mining problem. Now might also be a good time to revisit Step 3 if the data isn’t quite right. 
  5. Evaluate your results.  Are the results meaningful? Do they solve the problem you identified in Step 1?  Ultimately, a decision on the use of the results should be made.
  6. Deploy your model!  How should the model be deployed? What steps should be taken to maximize the benefit of the model and results?

That’s it!

Do you use a different process? I’d love to hear about it. Please leave a comment.