Skip to content

The document property lookup is a bottleneck #455

@rweickelt

Description

@rweickelt

I am working on a 250 pages document containing many and large tables. The performance of rinohtype is not great (2 minutes compared to < 10s with TeX) and I started to dig into possible bottlenecks. I took my document as a benchmark and ran:

python3 -m cProfile -o test-1.prof -m sphinx.cmd.build -b rinoh src build
pyprof2calltree -o test-1.prof -i test-1.callgrind -i test-1.prof

then analysed the .callgrind file with K/QCacheGrind:

Image

We can see 18 Mio calls to the document_part property in layout.py.

500.000 calls seem to originate in get_style() in style.py:

Image

I supposed the rest comes from recursive calls in the document/document_part property getters. Looking at layout.py we find:

    @cached
    def get_style(self, attribute, container):
        return self.get_config_value(attribute, container.document)

Although the get_style function is marked for explicit caching, the cache has only little effect. The container parameter is different on each call leading to a cache miss and therefore the container.document property lookup machinery is triggered more than necessary. Document access may be better solved with a singleton or - if there are multiple document objects - with an id + getter. That getter could then be cached easily.

Solving that bottleneck may cause a significant performance boost of about 12%. I feel not competent enough to solve this dilemma right now. I don't even know if my measurements are correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions