Create a Custom Wordcloud with Twitter Data and Python

How to analyze more than 100k tweets with a couple of lines of code

Gif from GIPHY

It’s no a surprise to anyone that Brazilians love soccer and I include myself in this bundle of lovers, my favorite club is Athletico, one of the biggest clubs in the world, at least for me haha.

As a good fan, I like to be informed about anything that is happening with the club so to do that I use Twitter, it’s a good source of information because things happen fast there, in a couple of minutes a small discussion change to a big deal (for good and bad things). Here I started to have a problem, I’ve noticed I spent so much time scrolling my cellphone to get some news, based on that I started to think about how to improve this process, reduce the time with the phone, and still get a summary about all week news and discussions, It's here that Wordcloud and Twitter API can be useful! If you get interested in that, please stay with me and I’ll teach you how to do that!

Note: English is not my mother tongue, sou probably you’ll find some grammatical errors in the article, sorry about that!

Gif from GIPHY

What you’ll need?

Well, you don’t need NASA technology to create a word cloud in python, actually, it’s pretty simple, the first thing you need is a computer with an internet connection (obviously), you also need a Google and Twitter account, the first one is to access Goggle Colab, a python laboratory with no installation needs and the other is to get information from Twitter, but to get those tweets and use that to generate information you need to upgrade your account to a developer level, you can find a tutorial to do it here.

With all these tools ready! let’s get started!!!

Importing packages and bringing some data…

First of all, let's import the packages we’ll use, so please open a new notebook with Google Colab and copy the code below:

With those packages ready, let’s configure our Twitter dev credentials (To get those keys please follow the steps from the article I mentioned before.):

Now we need to use the credentials and test if it’s working. First, let’s set our variables to use:

Now that we’re done with the variables is time to configure our search subject, please copy and paste the code below:

Let me explain something, inside the variable named “query” the first word (in my code is Athletico) you’ll put the keyword you want to search. After the keyword I’ve set to filter retweets because I don’t want to duplicate tweets in my query, it’s optional :)

Run it… wait… wait.. wait… MOMENT OF TRUTH!

Gif from GIPHY

Hope it’s working on your side, if everything is ok you’ll be able to see 10 tweets with the subject that you’re looking for. Awesome, we’re almost there!

Now it’s time to store this data somewhere, to do that we need to create an empty dictionary and fill this dictionary with a JSON structure provided by Twitter, don’t worry, It’s pretty simple, follow the code:

If you run the variable tweets_dict alone you’ll probably find something like that:

image from the Author

That’s awesome, now we have our structure to accommodate the data done! So let's bring some data.

Basically, we need to create a function to interact with the Twitter cursor, get data and store it in our dictionary. It’s good to say that the first four lines of this code are the same as our step 4, you can change the keyword and the number of tweets you want to read. I’d set it as 100k because of no reason haha.

So, if you followed all the steps exactly as I mentioned before it will take some time to get all data that we need, don’t worry, your computer will not explode.

Gif from GIPHY

Wow, long time… now we have all our 100k tweets stored in a dictionary, let’s transform it into a data frame, talking about data analysis it’s easy to handle.

After running this piece of code you’ll probably see something like that.

image from the Author

Who like data analysis have noticed now that what we did is better than gold, you can do a lot of things with this data frame that you just learned.

Let’s back to our focus, back to the word cloud, from this data frame we just need all texts, the code below will extract the text, create a big string and count how many words we have to conclude our project.

We are now able to create our beautiful customized word cloud.

Before you run it let me explain some parts of the code:

  1. The stop words are necessary to take out words like ‘the’, ‘for’, ‘with’ etc. You also can customize it with stopwords.update clause;
  2. The variable fura_color is where you will set your image address, you must need to change it. I’d choose a hurricane image colored with red and black because a hurricane is Athletico nickname and red and black is our colors :)
  3. The variable max_words is where you’ll select the max number of words that will appear in your visualization.

After all these steps you’ll get something like this beautiful word cloud:

image from the Author

Well, that’s it for today, my fella friends. Hope you like it!

More Articles

If you like it follow me on LinkedIn!

--

--

--

Data Analysis, Data Viz & More…

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Retail Analytics: A Novel and Intuitive way of finding Substitutes and Complements

The Best Deep Learning Models for Time Series Forecasting

Full implementation of Gradient Descent — no holding back

Requirements Don’t Come From Business: They Come From Business Analysis

Finding the Right Data Catalog Solution

The Battle of Interactive Geographic Visualization Part 1 — Interactive Geoplot Using One Line of…

How to Prepare for Business Case Interview as an Analyst?

Does the h-index tell something the number of citations doesn’t?

Leon Bueno by Data Pack

Leon Bueno by Data Pack

Data Analysis, Data Viz & More…

More from Medium

Top 3 analytical questions on Boston Airbnb listings

Bringing Spellcheck to Tableau with Python

Life-cycle of Data Science project: Hands-on guide

Story-Telling with Data