PySide v0.3 – Benchmarks

Measurements

This report compares 3 different versions of Qt bindings for Python:

  • PyQt4;
  • PySide v0.2 (Boost based);
  • PySide v0.3 (CPython based).

All the tests were performed using the software/configuration below:

  • Maemo 5 – 3.2010.02-8 – running on a N900 device;
  • Ubuntu 10.04 64bits – Lucid Lynx – running on a Lenovo T61p Core2 DUO T7500 2.2GHz / 4GiB memory;
  • Qt 4.6.2 (4.6.2~git20100310-0maemo1+0m5 – see Note bellow / 4.6.2-0ubuntu5);
  • PySide 0.3.1 (0.3.1-1maemo1 / 0.3.1-1ppa1);
  • PyQt4 (4.7.3-maemo2 – see Note bellow / 4.7.2-0ubuntu1);
  • Scripts to get results and plot graphs: measurements.tar.bz2 and ps_mem.py.

Note: in order to have the latest version of Qt4.6 and PyQt (and to have the same Qt version into Desktop and N900) we used the packages available into Scratchbox (using PR1.2 packages). To install these packages into N900 it was just a matter of copying files from /var/cache/apt/archives (from Scratchbox) to the device.

Nokia N900 was used as the alternative hardware platform due to its modest processor and disk performance and memory bandwidth compared to modern desktop hardware. The lower hardware performance provides for more pronounced differences between the implementations in the measurements.

We tried to cover as much aspects as possible when comparing the three different implementations. The tests include loading time, object creation/destruction, signal connection/disconnection, method call/reimplementation and parents. We also have information on disk space consumption.

Each test includes a brief description of its purpose and methodology, followed by a graphic comparing the results from different implementations. Some notes are added when necessary to ease understanding.

Basically the tests are divided in two main sections: memory and performance. Performance covers time to complete the tasks on each implementation and memory focus just on memory usage.

Startup measurements

In these tests we only measure the memory and time consumed during the loading of QtCore and QtGui bindings (import QtCore, QtGui) into a N900 device. Two instances of Python were running at same time (and both imported QtCore/QtGui) to get shared memory information. The graph at left shows the memory usage when importing just QtCore. On the right side the same information but now importing QtGui + QtCore.

Note: the tests were done on a N900 device, the desktop results are faster than 0.05 secs, so we decided to skip these desktop tests.

import_memory_core.png import_memory_gui.png
import_time.png

Disk usage measurements

Here we have the results from an ls -la command. The size drop from v0.2 to v0.3 is really remarkable and shows that choosing CPython was the right move to be done. The graph shows the size of QtCore and QtGui.so (library) files alone. When including dependencies needed by each implementation (like sip library, libpyside, etc) the numbers grow just a little bit (less than 200 Kbytes).

size-comparision-final.png

Memory usage methodology

The rationale behind these tests is to show memory consumption in a number of situations. Before each graph there is a brief description of its contents. Graphs at left refer to results taken from N900 and to the right are the results from Desktop.

To gather information about memory consumption, the script reads data from /proc/self/status at the beginning and at the end of each memory consumption test. After some calculations involving memory resident+stack it is possible to produce a picture of overall memory usage.

Memory usage graphs

The graph below describes the amount of memory used to connect signals. As the Desktop uses 64 bits there are some differences in the amount of memory used. PySide v0.3 shows some improvement over v0.2 and uses quite the same amount of memory as PyQt4.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

Device
Desktop
memory_connect_signals.png desk_memory_connect_signals.png

The next graph shows the amount of memory used to create objects. All the implementations have almost the same consumption.

memory_connect_signals.png desk_memory_create_objects.png

Now the graph shows the amount of memory to execute a method call. Even with the gap shown in the graph the difference between the three implementations is not noticeable.

memory_method_call.png desk_memory_method_call.png

This graph has information about memory usage for reimplemented virtual methods. The new implementation (v0.3) shows a better result.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

memory_reimplemented_virtual_method.png desk_memory_reimplemented_virtual_method.png

The last graph contains information about set parent method. All the implementations have similar results.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

memory_set_parent.png desk_memory_set_parent.png

Performace graphs methodology

This section compares performance results from the three different implementations (PyQt4, PySide v0.2 and v0.3). Before each graph there is a brief description of its contents. Graphs at left refer to results taken from N900 and in the right there are the results from Desktop.

Basically each graph shows the initial time stamp subtracted from the final time stamp.

Performace graphs

First graph shows the time necessary to connect a signal. All implementations have similar performances.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

Device
Desktop
time_connect_signals.png desk_time_connect_signals.png

Object creation graph. This one shows a better response in this v0.3 implementation. This is a key point because each new object created will impact loading time.

time_create_objects.png desk_time_create_objects.png

This graph shows the time necessary to destroy an object. All implementations have similar performances.

time_destroy_qobjects.png desk_time_destroy_qobjects.png

The graph below has information on the time necessary to disconnect a signal. Here we see that there is room for improvements in PySide v0.3.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

time_disconnect_signals.png desk_time_disconnect_signals.png

This graph shows the time necessary to perform method calls. Again, all implementations have similar performances.

time_method_call.png desk_time_method_call.png

Next graph is about the time used by reimplemented virtual methods. PySide v0.3 shows a slight advantage on this aspect too.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

time_reimplemented_virtual_method.png desk_time_reimplemented_virtual_method.png

And the last graph, showing the time to execute a set_parent instruction. All implementations have similar performances on this.

Note: This test was generated for internal benchmarking purposes. Real-life applications rarely create these amounts of connections.

time_set_parent.png desk_time_set_parent.png

Conclusion

One of the main points in this PySide v0.3 release is the impressive size reduction provided by the CPython implementation, compared to the previous one using Boost. The new PySide implementation also have got better results in memory and performance fields, as was expected.

Based on this report it is possible to devise a roadmap for PySide fine tuning tasks. Until now just common techniques were applied to improve performance. Some source code restructuring could help us achieve even better results, like the “disconnect signals” graph shows.