Tag Clouds

Twitter Tag Clouds – Visualizing Popular Hashtags

tag cloud is a visual representation of text data and is typically made up of single word tags. The frequency of each tag is usually represented by size or color.

I created the following tag clouds using Twitter’s API and two KNIME workflows. Twitter’s API returned 1379 tweets by searching the #browns hashtag. The Browns are currently in the news for firing both their head coach and GM, so I thought the hashtag would make a good candidate for tag clouds.


First Tag Cloud

Tag cloud number 1 is based on common keyword tags found in all 1379 tweets. I stripped usernames and URLs from the tweets before processing them. I then used KNIME’s POS Tagger node to assign parts of speech to each term. The resulting tag cloud highlights nouns in brown, verbs in orange, and adjectives in black. Larger words appear more often in the tweets that were analyzed.


Second Tag Cloud

Tag cloud number 2 is based on the same tweets and keyword tags. For this tag cloud, I used KNIME’s Named Entity Tagger node to tag terms as either organizations, locations, or people. The resulting tag cloud highlights people in brown, organizations in orange, and locations in black. Terms in green could not be identified by the tagger. As with the cloud above, the larger the font, the higher the tag frequency.


Interested in creating your own tag clouds? I’ll have instructions posted soon. Until then, feel free to leave a comment with your Twitter username and the hashtag you’d like analyzed. I’ll tag you with the results.

Data Science

3 free data tools you never knew you were missing

The right tools can make a world of difference. If you work with data, here are three tools to add to your toolbox.

1. Data Preprocessing

KNIME is an open source data analytics and integration platform. The interface allows you to assemble workflow nodes for data preprocessing (ETL) and data analysis. Modeling and data visualization nodes are also available, but I use other tools for those. Screen Shot 2015-12-18 at 5.01.46 PM

Need to create a monster Pivot Table? The Pivoting node can handle very large files with ease. I used a dataset in comma separated format (csv) and a simple KNIME workflow to create a pivot table with over 100,000 columns.

Screen Shot 2015-12-18 at 4.49.32 PM

Download KNIME at knime.org.

2. Data Mining

Screen Shot 2015-12-18 at 5.06.25 PMWeka is a collection of machine learning algorithms that help you complete data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka also contains tools for data preprocessing, but can also manage classification, regression, clustering, association rules, and visualization.

Weka has a large online community and lots of support. The interface is easy to use.

Screen Shot 2015-12-18 at 5.10.46 PM

Weka also provides some great visualizations of your dataset.

Screen Shot 2015-12-18 at 5.11.12 PM

Download Weka here.

Here are a few sample datasets to get you started.

3. Data Visualization

Screen Shot 2015-12-18 at 5.21.26 PMTableau Public is a free tool to create interactive data stories on the web. It’s available as a service so you can be up and running as soon as you download it.  Connect, create, and publish interactive data visualizations  directly to your website. No coding required!

Tableau even provides How-to Videos and sample datasets.

Download Tableau Public here.