With activated query statistics that collect information about most frequent queries with number of calls, number of rows involved as well as mean time and max time. There are handy extensions, that can be helpful to increase transparency and find suspicious code’s:
Using all of it – we can export it into prometheus and expose it in grafana’s dashboard to correlate with CPU spikes – i.e. what are the hottest tables? What are access pattern? are there full scans = extensive disk IO/data buffer operations in db? How many rows fetched? When was the last time a vacuum or analyze executed? Can we map most demanding and frequent queries to the ORM and code? How code itself is organized – i.e. does ORM boilerplate is hidden behind wrappers that re-used across the project (hence all optimisations can be done in a single place and it is easier to find all references).
There are some basic precautions, that always worth to keep an eye on – probably even with some static checkers:
- using function in conditions that need to modify every row, multi table joins, using `in` in conditions and looong list of values to check with,
Additionally, can be beneficial to integrate into code:
- Upserts, checks to avoid updates in DB if data not changed
- Pre-computed aggregations and caching (old good LRU with)
Sometimes, it can be much more efficient to zoom out a bit and think from the level of business logic – are there any redundant calls in the function’s flow? What about data schemas – i.e. we should always design storage based on usage patterns – how we are going to query this data? are we read or write heavy => Are there any advantages of some de-normalisation or maybe there is a need for an extra layer of aggregated data?