1_vgrOydKr3YdJcZjWXHK5lw.jpeg

Guide

Matplotlib vs. Altair

Eugene Teoh

Matplotlib and Altair are both popular Python libraries for visualizing data - but which one is a better choice for your project?

If you have just started out in Python, you might have heard of Matplotlib. Matplotlib has been the go-to visualisation library for anyone starting out in Python, but is it the best?

In this article, I will introduce and discuss about the differences between Altair and Matplotlib, what they are good at, and who should use them?

First, lets analyse what Matplotlib is good at.

I will be using Datapane to embed visualisations from each library so that the plots retain it’s characteristics.

Matplotlib

Matplotlib is an exhaustive visualisation library which comprises of many functionalities. It’s concept is based on MATLAB’s plotting API. Those that have used MATLAB will feel more at home. It is most probably the first Python visualisation library Data Scientists will learn.

Matplotlib makes easy things easy and hard things possible.
Matplotlib Documentation

The image above describes the general concepts of a Matplotlib figure.

0_VgK31kEunVCqYghK.png

Pros

Customizable

Because of it’s low-level interface nature, Matplotlib can plot anything. If you just want a quick experiment, a few lines of code will plot you any mathematical functions you want. If you want to plot complicated visualisations, with a little tinkering, you will be able to do it! There is even support for 3D visualisation.

Let’s start with a simple plot.

  1. import numpy as np
  2. import matplotlib.pyplot as pltx = np.linspace(1, 100)
  3. y = 3 * x ** 2fig = plt.figure()
  4. plt.plot(y)
  5. plt.title(r"$y = 3x^2$")

It can even plot the text below:

  1. fig = plt.figure()
  2. plt.text(0.6, 0.7, "learning", size=40, rotation=20.,
  3. ha="center", va="center",
  4. bbox=dict(boxstyle="round",
  5. ec=(1., 0.5, 0.5),
  6. fc=(1., 0.8, 0.8),
  7. )
  8. )
  9. plt.text(0.55, 0.6, "machine", size=40, rotation=-25.,
  10. ha="right", va="top",
  11. bbox=dict(boxstyle="square",
  12. ec=(1., 0.5, 0.5),
  13. fc=(1., 0.8, 0.8),
  14. )
  15. )

Animation

Matplotlib also offers a package for live animations. It allows you to plot live data such as a sinusoidal wave, or even the NASDAQ stock market index!

  1. """
  2. ==================
  3. Animated line plot
  4. ==================
  5. """
  6. # https://matplotlib.org/3.1.1/gallery/animation/simple_anim.html
  7. import numpy as np
  8. import matplotlib.pyplot as plt
  9. import matplotlib.animation as animation
  10. fig, ax = plt.subplots()
  11. x = np.arange(0, 2*np.pi, 0.01)
  12. line, = ax.plot(x, np.sin(x))
  13. def animate(i):
  14. line.set_ydata(np.sin(x + i / 50)) # update the data.
  15. return line,
  16. ani = animation.FuncAnimation(
  17. fig, animate, interval=20, blit=True, save_count=50)
  18. # To save the animation, use e.g.
  19. #
  20. # ani.save("movie.mp4")
  21. #
  22. # or
  23. #
  24. # writer = animation.FFMpegWriter(
  25. # fps=15, metadata=dict(artist='Me'), bitrate=1800)
  26. # ani.save("movie.mp4", writer=writer)
  27. plt.show()

Cons

Not flexible

Because of it’s low-level interface nature, plotting simple data will be easy. However, when the data gets very complex, more lines of code will be required from trivial issues such as formatting.

The plot below shows how disorganised it looks when the data gets large and complex.

Next, let’s learn about Altair.

Altair

Altair takes a completely different approach from Matplotlib. It is a declarative statistical visualisation library, initially released in 2016, and is built on Vega and Vega-Lite. It also uses Pandas Dataframe for the data expression. They have three design approaches in mind:

  • Constrained, simple and declarative to allow focus on the data rather than trivial issues such as formatting.
  • To emit JSON output that follows the Vega and Vega-Lite specifications
  • Render the specifications using existing visualisation libraries

Pros

  • Intuitive and structured

Altair provides a very intuitive and structured approach to plotting. I will use the simple example from the Matplotlib section:

  1. import numpy as np
  2. import altair as alt
  3. x = np.linspace(1, 100)
  4. y = 3 * x ** 2
  5. df_alt = pd.DataFrame({'x': x, 'y': y})
  6. alt.Chart(df_alt).mark_line().encode(
  7. x='x',
  8. y='y'
  9. )

You can see how we can do the same thing as with Matplotlib but with less code!

Basic characteristics of Altair:

Marks

Marks specify how the data is represented in the plot. For example, mark_line() expresses the data as a line plot, mark_point() makes it into a scatter plot, mark_circle() creates a scatter plot with filled circles.

Encodings

Encodings are called by encode(). It allows the mapping of data to the different channels such as x, y, colour, shape etc. For example, if I were to have multiple columns in my DataFrame, I could map the x and y axes to different columns of data. Or if I would like to colour my plot with a different colour, I could change my encoding channels.

Interactive

One of the most unique features of Altair is the interactive plots. With interactive() you can make any plot interactive, allowing you to zoom in and out, highlight certain regions of the plot and much more. This funtionality is particularly useful when you have large and complex data.

Flexible

With it’s declarative nature, Altair can plot and complex datasets with only several lines of code! This allows Data Scientists to have better user experience for data visualisation without worrying much about trivial plotting issues.

Cons

Not as customizable

With Altair’s declarative and high-level approach to plotting, it makes plotting complex machine learning models and much more difficult. In the Altair Documentation, they also do not recommend creating plots with more than 5000 rows, which will cause errors.

No 3D Visualisation

Data Scientists often require visualisation in the 3D plane to allow better intepretation of the data. Examples of it include dimensionality reduction techniques such as Principle Component Analysis (PCA), or word2vec or much more!. In this case, I would default to Matplotlib or other visualisation libraries with better 3D visualisation support.

Conclusion

Thats it! I hope you learned something new about both Matplotlib and Altair. Now, you should practice what you have learned with your own projects. If you are keen on learning more about data visualisation, you should explore other libraries such as Seaborn, Plotly, Bokeh and Folium.

Need to share Python analyses?

Datapane is an API and framework which makes it easy for people analysing data in Python to publish interactive reports and deploy their analyses.