We have gone through the recent Medium posts on Python visualization and put together the best ones — with the hope that it will make it easier for you to explore them for your next project!
There are amazing articles on data visualization on Medium every day. Although this comes at the cost of information overload, it shouldn't prevent you from exploring interesting articles because you can learn many new techniques for creating effective visualizations for your projects.
To help, I have gone through the recent Medium posts on Python visualization and put together the best ones — with the hope that it will make it easier for you to explore them yourself. I've submitted these to the Datapane gallery, which is hosting them for us.
If you don't know Datapane already, it is an open-source framework for people who analyze data in Python and need a way to share their results. Datapane hosts a free public platform with a gallery and community of people who share and collaborate on Python data visualization techniques.
In this article, I will use plots in the gallery as examples to show what factors make up an effective plot, introduce different kinds of plots, and how to create them yourself!
In this article, Saptashwa Bhattacharyya combines SVM, PCA, and Grid-search Cross-Validation to create a pipeline to find best parameters for binary classification. He then plots a decision boundary to present how well our algorithm has performed.
Joint-plot is really helpful in showing both the distribution and the relationship between 2 variables in one plot. The darker the hexagon, the more number of points (observations) fall in that region
Contour plot and Kernel density estimation: KDE (Kernel density estimation) is a useful statistical tool that lets you create a smooth curve given a set of data. This can be useful if you want to visualize just the “shape” of some data (instead of the discrete histogram).
Pair plots: By looking at the pair plots, it is much easier to compare the correlation among different pair of variables. In the plot below, you could see that the mean area has a strong correlation with mean radius. The difference in color is also helpful to know the behavior of each label in each pair — it is really clear!
Contour plot of SVM: This contour plot is really helpful to find the percentage that the point lies in that area actually belongs to that area. I especially like the contour plot below because I can see which regions the malignant cells, benign cells, and support vectors lie and it helps me understand how SVM works.
3D SVM plot: Even though we often see 2D SVM plots, most of the time, data is multi-dimensional. Seeing this 3D plot is really helpful to understand how SVM works in multi-dimensional space.
A good plot is not only a beautiful, but also provides the right message to viewers. Without the right proportions in the graph, viewers will interpret the message in a different way.
Plots show the change over time
This plot effectively represents the percentage of different continents in the world in a snapshot of time:
But how do you create the plots to show the change of the proportions of population of different continents over time?
The author shows you can do exactly so with the plots below
As you can see from the plots above, the time dimension is added to the plot. Now the plots do not only show the distribution but also show how the overall number and distribution change over time! Neat!
As the data points grow across both dimensions, it becomes harder to visualize either the bar chart or stacked bar chart because size of the bar charts are too small to deliver any meaningful information.
That is why it is clever of the author to use a bubble chart, an effective way to see many more data points in one chart but still clearly show the change in proportion over time.
You might know how heatmap is used to show the correlation between different variables in the data, but what if you want to get more insight out of the number .71 in the heatmap? In this article, Paul Hiemstra shows how you could do that by combining heatmap and a 2d histogram to explore the structure of a weather dataset.
Linked plots with Altair
Heatmaps and 2d histograms are both effective in showing the correlation — but it would be much more effective if you could combine them both. The linked plots below do just that.
As you click each square in the heatmap, you will see 2D histogram representation of that heatmap on the right-hand side!
How does the correlation of .98 look like in 2d histogram? We expect a linear correlation and we prove it by looking at the plot on the right. In contrast, there seems not to be any pattern on the 2D histogram for the correlation of .12. Very intuitive and easy to understand.
How do you visualize a network with different sources of inflow and outflow? For example, what services are the government's revenues such as taxes, utilities are spent on? And what is the percentage of expenditure from one service compared to expenditures from other services?
If you click on each network of the diagram, you can see clearly which services the revenues are spent on. If you look solely at the nodes on the left-hand side, you can compare the proportions between different revenues. And you can do the same thing with nodes on the right-hand side. This technique is particularly useful for mapping processes - such as a sales pipeline, or the paths visitors take on your website. It is amazing how one diagram can convey so much information.
Animated Bar Chart
The most common way to see the change of bar chart over time is to use a slide bar where you slide the button to see the change yourself. But your rate of sliding changes so you will not see the change over time at the same rate. That is why the animated bar chart is so effective.
Click the play button on the left hand side to see how the bar chart changes over time! Now you can clearly see how COVID-19 affects different social groups differently in different time periods.
I hope this article provides you a good start to explore interesting medium articles on visualization. The best way to learn anything is to try them yourself. Pick one visualization, run the code, and observe the magic!