Data Science

A kid’s guide to data science – clustering

Bedtimes are the Worst

Having a bedtime is tough when you’re a kid. I get it. When the YouTube video you’re watching gets cut short because you have to get ready for bed, it’s the worst.

But like most kids, occasionally you get to stay up past your bedtime. Have you ever wondered why adults let kids stay up past their bedtimes? Think about it. Every so often, you get to stay up late.

Why? What’s different about those days? Maybe you should keep track of your bedtimes to see if you can discover the magic formula to staying up late!

Tracking Your Bedtimes

Tracking your bedtimes is easy. Let’s make a list of things that might help you solve this puzzle.

  • The date. It’s good to know which days you go to bed on time versus days you stay up late.
  • Your age. As you get older, you’re allowed to stay up later. That’s a good thing!
  • Is it a school night? You almost never stay up late on a school night.
  • Are you home? 
  • Is it a sleepover? Sleepovers are the best!
  • Are you sick?
  • Your bedtime. (Boo, Hiss!)
  • The actual time you went to bed that day. 

Look at this in action.

  • January 14, 2016 (The date)
  • 12 (Your age)
  • Yes (Is it a school night?)
  • Yes (Are you home?)
  • No (Is it a sleepover?)
  • No (Are you sick?)
  • 8:30pm (Your bedtime.)
  • 8:30pm (Actual time you went to bed.)

Great! You’ve got your first bedtime tracked! You’re well on your way.

Super Bedtime Tracking

Now let’s change the way we’re tracking your bedtimes to make it easier to track lots of days.

Let’s take a look at what 3 days looks like.

Screen Shot 2016-01-14 at 10.08.09 PM

Great!

Now imagine tracking bedtimes for a whole year—all 365 days. That’s a lot of data!

Clustering the Data

If you had a year’s worth of bedtimes tracked, you could begin to look for patterns. Are certain days better than others for staying up late? Clustering, aka grouping, the data allows you to observe meaningful patterns. Do you see any interesting patterns in the table below? It looks different from the table above since we consolidated the data into 4 groups.

Screen Shot 2016-01-25 at 5.16.46 PM

Group 4 is a winner! Nothing exciting here though—you already know that you get to stay up late during weekend sleepovers.

Screen Shot 2016-01-25 at 5.16.14 PM

Groups 1 and 3 are pretty boring. No staying up late on school nights. Boo!

Group 2 is interesting. Friday is not a school night, but being sick means going to bed early.

What did we learn?

If you want to stay up late more often…

  1. Have more sleepovers!
  2. Don’t get sick! The easiest way to avoid colds is to wash your hands!
Leadership

3 tips to improve your leadership skills

What makes a great leader? Here are three things you can do to help improve your leadership skills.

1. Become a better communicator.

Great leaders are effective communicators. If you struggle in this department, it may be time to hone your communication skills. When it comes to effective communication, practice makes perfect. I used to hate public speaking and avoided it at all cost. Everything changed when I joined Toastmasters International. Whether you’re giving impromptu comments or a prepared speech, every meeting provides an opportunity to speak in a supportive environment. Find a club near you to get started.

2. Gain confidence.

Great leaders are confident. Did you know that changing your body language can make you feel more confident?  Regardless of how you really feel, “high-power poses” can affect testosterone and cortisol levels in the brain—tricking your brain into feeling more confident. Amy Cuddy’s TED talk will explain how. Try a power pose before your next big meeting or interview.

3. Develop self-awareness.

Great leaders have self-awareness. They understand and reflect on their strengths and weaknesses. They embrace their failures, learn from them, and know that they must do better. You can start to develop self-awareness by taking a free online assessment to better understand your personality, style, and strengths and weaknesses.

Big Data

How Big is Your Data?

Big Data architecture can add layers of complexity to an IT environment.  Not sure if you’re dealing with Big Data?  Use Gartner’s 3Vs as a litmus test: Volume, Velocity, and Variety.

The 3Vs become a reasonable test on whether you should add Big Data to your architecture.

  1. Volume. The amount of data.  How much data are you processing? Some smartwatch manufacturers store every user interaction. For these companies, this might be hundreds of terabytes or petabytes of data.
  2. Velocity. The speed of data. How fast is your data moving in and out of the system? Some data is created and captured in realtime at the point of the transaction.
  3. Variety. The assortment of data. Do you have different types of data? Data can consist of text, images, audio, video and the supporting metadata.

Having only 2 of the 3 Vs may mean that you can avoid adding Big Data to your architecture and skip the added complexity.

Leadership

Failure is an option

Traf-O-Data 8008

Have you ever heard of Traf-O-Data 8008?  If you answered “no,” you’re not alone. Bill Gates’s first attempt at entrepreneurship was a bit of a flop. Traf-O-Data 8008 was supposed to turn traffic tapes into useful data. Things were so bad that the product didn’t work during a demo for a local County government.  Some time later, the State of Washington began offering free traffic processing services, ending the need for Traf-O-Data 8008. Instead of giving up, Bill Gates used his first failure to help lay the foundation for what would become Microsoft.

A Key to Success is Failure.

Failure is an option that can bring growth and resilience. Many of the world’s greatest business executives, sports athletes, and leaders attribute failure as a key to their success.

‘Shark Tank’ investor Kevin O’Leary likes to invest in entrepreneurs who have felt the sting of failure because he believes “they have a better chance in the future.”

Make the failure work for you

  1. Separate yourself from the failure. You are not a failure. Something you tried didn’t work.
  2. Accept the failure and move on. You can’t change the past so stop dwelling on it.  The time and energy you spend talking and thinking about the failure could be used on moving forward.
  3. Think about what the experience has taught you. Use this as an opportunity to learn and grow. What aspects of your approach would you adjust or change?
  4. Shift your focus to something new. It’s time to adjust your focus to something new and exciting. Consider the past failure as an asset.

 

Strategic Planning

Your IT Infrastructure is Dying

Eventually, all of your technology will die.  According to manufacturer specifications, every product has a useful service life.  It’s the IT Circle of  Life, aka IT Asset Management, and should be part of your strategic plan and budget.

I once led technology operations for a company with a small data center and an APC Symmetra power backup unit consisting of 30+ batteries.  A simple audit revealed that 24 of the batteries had surpassed their expected life by at least a year.  At $500 per replacement battery, the potential risk to the budget was $12,000. Thankfully, only six batteries failed that year.

Fortunately, small and medium sized businesses can take a few simple steps to track IT assets and plan accurately for the future.

  • Develop a system. Consider implementing a system to track all IT assets.  Include details like purchase dates, locations, serial numbers, and warranty information. Free software packages like SpiceWorks make it easy.
  • Set lifecycle policies for your IT assets. Use the manufacturer’s End of Service Life or consider the following averages as a guide to get you started.
  • Ensure that resource and budget allocations are accurate. How many IT assets are going to die next year? Include reasonable replacement cost estimates in your budget.

Keep it simple and begin tracking today.

Tag Clouds

Twitter Tag Clouds – Visualizing Popular Hashtags

tag cloud is a visual representation of text data and is typically made up of single word tags. The frequency of each tag is usually represented by size or color.

I created the following tag clouds using Twitter’s API and two KNIME workflows. Twitter’s API returned 1379 tweets by searching the #browns hashtag. The Browns are currently in the news for firing both their head coach and GM, so I thought the hashtag would make a good candidate for tag clouds.


 

First Tag Cloud

Tag cloud number 1 is based on common keyword tags found in all 1379 tweets. I stripped usernames and URLs from the tweets before processing them. I then used KNIME’s POS Tagger node to assign parts of speech to each term. The resulting tag cloud highlights nouns in brown, verbs in orange, and adjectives in black. Larger words appear more often in the tweets that were analyzed.

POS_TagCloud


Second Tag Cloud

Tag cloud number 2 is based on the same tweets and keyword tags. For this tag cloud, I used KNIME’s Named Entity Tagger node to tag terms as either organizations, locations, or people. The resulting tag cloud highlights people in brown, organizations in orange, and locations in black. Terms in green could not be identified by the tagger. As with the cloud above, the larger the font, the higher the tag frequency.

NLP_NE_TagCloud


Interested in creating your own tag clouds? I’ll have instructions posted soon. Until then, feel free to leave a comment with your Twitter username and the hashtag you’d like analyzed. I’ll tag you with the results.

Cloud, Cost Containment

7 tips to help get your AWS costs under control

Do you feel like your AWS monthly costs are getting out of control? Costs will continue to trend up as your needs grow. With that in mind, here are a few tips to help IT and Finance reign in spending. AmazonWebservices_Logo.svg

  1. Develop a simple metric to track your monthly AWS costs—Cost per X.  Your X can be customers, accounts, transactions—whatever makes sense for your business.  If your Cost per X ever rises substantially, it’s probably worth a deeper discussion.
  2. Tag all instances. Finance and IT should work together to define tags that help with tracking and financial reporting. If applicable, include tags for Customer, Application, and Environment (e.g. development, test, or production.) Tags will help you understand if utilization on development and test instances is too high.
  3. Implement a “Tag or Kill” policy to help enforce tagging. Pick one day a week to kill any instances that aren’t tagged.
  4. “Turn off the lights.”  Shut down development and test instances when they aren’t being used. In theory, only production instances should be left on 24/7/365.
  5. Buy EC2 Reserved Instances.  The beauty of AWS is that EC2 instances can be turned on or off as needed. This flexibility comes at a premium.  If you know you’ll need a certain EC2 instance size 100% of the time, it’s definitely worth buying a reserved instance. Reserved instances require an upfront financial commitment, but save significant money in the long run.
  6. Auto Scale your EC2 Instances! AWS Auto Scaling allows you to ramp your EC2 instances up or down based on predefined criteria. Additional instances can be turned on or off as needed. You may be wasting money if your EC2 instances are running at sizes based on peak demand.EC2 Trend - Peak Demand
  7. Talk about it.  Finance and IT should meet regularly to discuss ongoing costs, big swings in the Cost per X,  and to modify the strategy as needed.
Big Data, Data Science

Using Prescriptive Analytics to Make Better Decisions

Business Analytics is broken down into three distinct phases.

  1. Descriptive – What happened? This phase involves traditional BI tools to help organizations process and report on historical data. Trends are analyzed and decisions are made. The majority of management reporting uses this approach.
  2. Predictive – What will happen? This phase uses machine learning algorithms to build models from historical data and then uses those same models to predict a future outcome or its likelihood.
  3. Prescriptive – What action should be taken? This phase prescribes actions to achieve the best possible outcome based on the predictions made. Actions that lead to the highest chance of success are prescribed.

Prescriptive Analytics predicts and compares the likely outcomes of any number of actions, and then chooses the very best action to help advance an organization’s objectives.

Consider implications for the healthcare industry. Healthcare predictions are most useful when that knowledge prescribes clinical action for each predicted outcome.

Similar insights can help organizations improve decision making and have more control of business outcomes. Prescriptive analytics is an important next step on the path to insight-based actions and recommendations.

Data Science

3 free data tools you never knew you were missing

The right tools can make a world of difference. If you work with data, here are three tools to add to your toolbox.

1. Data Preprocessing

KNIME is an open source data analytics and integration platform. The interface allows you to assemble workflow nodes for data preprocessing (ETL) and data analysis. Modeling and data visualization nodes are also available, but I use other tools for those. Screen Shot 2015-12-18 at 5.01.46 PM

Need to create a monster Pivot Table? The Pivoting node can handle very large files with ease. I used a dataset in comma separated format (csv) and a simple KNIME workflow to create a pivot table with over 100,000 columns.

Screen Shot 2015-12-18 at 4.49.32 PM

Download KNIME at knime.org.


2. Data Mining

Screen Shot 2015-12-18 at 5.06.25 PMWeka is a collection of machine learning algorithms that help you complete data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka also contains tools for data preprocessing, but can also manage classification, regression, clustering, association rules, and visualization.

Weka has a large online community and lots of support. The interface is easy to use.

Screen Shot 2015-12-18 at 5.10.46 PM

Weka also provides some great visualizations of your dataset.

Screen Shot 2015-12-18 at 5.11.12 PM

Download Weka here.

Here are a few sample datasets to get you started.


3. Data Visualization

Screen Shot 2015-12-18 at 5.21.26 PMTableau Public is a free tool to create interactive data stories on the web. It’s available as a service so you can be up and running as soon as you download it.  Connect, create, and publish interactive data visualizations  directly to your website. No coding required!

Tableau even provides How-to Videos and sample datasets.

Download Tableau Public here.