When you’re creating a data visualization, one of the first things you do is identify what data you’ll be using. Once you’ve landed on a topic, you need to find a data set that fits your needs, clean it up if necessary, and put it in the proper format for your project. In this post, I’ll be explaining the process that I use while coding with Python.
Say you want to make a visualization about child mortality rates in various countries. There are all sorts of data sources on the Internet where you could get this information, but I’m using World Bank in this example.
When you follow this link, you’ll be taken to the page pictured below, where you’ll be able to choose your desired indicators (country, time period, and actual data set, i.e. child mortality rates). For example, I could select data from the United States from the past 20 years on the number of air passengers transported by U.S. airlines.
After you’ve selected all your desired indicators, you’ll see a button to the right of the indicator menu that will invite you to apply your changes. After you’ve done this, your selected data will be displayed.
Above the selected data, there will be a ‘Download options’ button, pictured below. Click this button and download your data as a CSV file.
You’ll then be able to access your CSV file in your Downloads folder. Your CSV file is now ready to go into Google Sheets.
Create a Google Sheet. Select File>Import>Upload>Replace current sheet and upload your CSV. It will then be dropped into the sheet and look something like this.
You can then clean up your data. This looks like deleting any blank fields and unnecessary titles and formatting the data in whatever way you need it. (The article below will help you out with quickly deleting blank fields if you have a large number of them.)