Create a Custom Wordcloud with Twitter Data and Python
How to analyze more than 100k tweets with a couple of lines of code
It’s no a surprise to anyone that Brazilians love soccer and I include myself in this bundle of lovers, my favorite club is Athletico, one of the biggest clubs in the world, at least for me haha.
As a good fan, I like to be informed about anything that is happening with the club so to do that I use Twitter, it’s a good source of information because things happen fast there, in a couple of minutes a small discussion change to a big deal (for good and bad things). Here I started to have a problem, I’ve noticed I spent so much time scrolling my cellphone to get some news, based on that I started to think about how to improve this process, reduce the time with the phone, and still get a summary about all week news and discussions, It's here that Wordcloud and Twitter API can be useful! If you get interested in that, please stay with me and I’ll teach you how to do that!
Note: English is not my mother tongue, sou probably you’ll find some grammatical errors in the article, sorry about that!
What you’ll need?
Well, you don’t need NASA technology to create a word cloud in python, actually, it’s pretty simple, the first thing you need is a computer with an internet connection (obviously), you also need a Google and Twitter account, the first one is to access Goggle Colab, a python laboratory with no installation needs and the other is to get information from Twitter, but to get those tweets and use that to generate information you need to upgrade your account to a developer level, you can find a tutorial to do it here.
With all these tools ready! let’s get started!!!
Importing packages and bringing some data…
First of all, let's import the packages we’ll use, so please open a new notebook with Google Colab and copy the code below:
With those packages ready, let’s configure our Twitter dev credentials (To get those keys please follow the steps from the article I mentioned before.):
Now we need to use the credentials and test if it’s working. First, let’s set our variables to use:
Now that we’re done with the variables is time to configure our search subject, please copy and paste the code below:
Let me explain something, inside the variable named “query” the first word (in my code is Athletico) you’ll put the keyword you want to search. After the keyword I’ve set to filter retweets because I don’t want to duplicate tweets in my query, it’s optional :)
Run it… wait… wait.. wait… MOMENT OF TRUTH!
Hope it’s working on your side, if everything is ok you’ll be able to see 10 tweets with the subject that you’re looking for. Awesome, we’re almost there!
Now it’s time to store this data somewhere, to do that we need to create an empty dictionary and fill this dictionary with a JSON structure provided by Twitter, don’t worry, It’s pretty simple, follow the code:
If you run the variable tweets_dict alone you’ll probably find something like that:
That’s awesome, now we have our structure to accommodate the data done! So let's bring some data.
Basically, we need to create a function to interact with the Twitter cursor, get data and store it in our dictionary. It’s good to say that the first four lines of this code are the same as our step 4, you can change the keyword and the number of tweets you want to read. I’d set it as 100k because of no reason haha.
So, if you followed all the steps exactly as I mentioned before it will take some time to get all data that we need, don’t worry, your computer will not explode.
Wow, long time… now we have all our 100k tweets stored in a dictionary, let’s transform it into a data frame, talking about data analysis it’s easy to handle.
After running this piece of code you’ll probably see something like that.
Who like data analysis have noticed now that what we did is better than gold, you can do a lot of things with this data frame that you just learned.
Let’s back to our focus, back to the word cloud, from this data frame we just need all texts, the code below will extract the text, create a big string and count how many words we have to conclude our project.
We are now able to create our beautiful customized word cloud.
Before you run it let me explain some parts of the code:
- The stop words are necessary to take out words like ‘the’, ‘for’, ‘with’ etc. You also can customize it with stopwords.update clause;
- The variable fura_color is where you will set your image address, you must need to change it. I’d choose a hurricane image colored with red and black because a hurricane is Athletico nickname and red and black is our colors :)
- The variable max_words is where you’ll select the max number of words that will appear in your visualization.
After all these steps you’ll get something like this beautiful word cloud:
Well, that’s it for today, my fella friends. Hope you like it!
If you like it follow me on LinkedIn!