MongoDB performance tuning tips
This tutorial is a collection of performance tuning tips for Java developers that would like to improve performance of their Java apps on top of MongoDB.
Set batch size
If you don’t specify batch size then MongoDB will return documents one-by-one, which is very ineffective – network call to get each document. To get more documents you can set batchSize on MongoIterable like so:
yourCollection.find().batchSize(100) // or during aggregation: scores.aggregate(pipeline).batchSize(100)
This instructs MongoDB driver to try to retrieve more than one document in one network call – documents are retrieved during execution of hasNext() method, whereas next() takes retrieved document from local memory.
Unlike @BatchSize in Hibernate, in MongoDB it is not constant – it is only initial batch size! It sounds at least strange, but the results may cost you a headache, so be warned. For example batch size set to 100 initially returns 99 documents, but subsequent calls can return less and less documents – going even to 30 documents.
This sounds obvious, but is very often neglected. When you don’t specify fields you want to get, then MongoDB will send whole matching documents. In case of find results or aggregation it may create quite large network traffic that will slow your application down (disk reads + send through network). So don’t forget to add project to aggregation pipeline or specify fields in find queries.
Amount of send data also has impact on batch size.
Structure documents to support use cases
In MongoDB there are no joins (ok, since 3.2 there is $lookup operator, but is very limited), so you should structure your collections and documents in the way they are used. If you put into different collections data that application will use together, then you will have to join data on Java side and this will always cost you performance.
Indexes for common use cases
Data in MongoDB should be structured in a way it is used in a application. But large amount of data without indexes is still slow – similar to scanning files. Good, selective indexes allow application to find data very quickly. To find out which indexes are good you have to know how your application is using it – in large organizations it can be challenging. :-)
One problem with indexes is that when you have to much of them it slows your writes, because they need to be updated. When you have too few – it slows your reads, because they don’t support your use cases. You have to do this trade-off yourself, based on your needs.
Find slow queries in logs and profiler
Enable profiling and find slow queries (for example those that don’t use indexes, but should) in /var/log/mongodb/mongodb.log. The same information can be found in system.profile collection, from the mongo shell:
Check out Mongo Profiling page for more details.
One of the best ways of learning about performance of MongoDB queries is to apply explain(“executionStats”) on them. Execution stats gives you information about:
- processing stages,
- selected index,
- execution time,
- how effective the index is (keys vs documents examined, documents examined vs documents returned).
It’s pure knowledge fountain.
Make sure indexes fit in memory
This is important to gain full performance. If indexes are too big to fit in RAM this will result in page faults and disk reads.
You can find index sizes from collection stats:
- totalIndexSize: size of all indexes (also: db.yourCollection.totalIndexSize())
- indexSizes: sizes of each index.
Since MongoDB 3 storage engines are exchangeable and there is the new WireTiger engine that has prefix compression, which results in smaller index size.
This is just the beginning of wast topic of MongoDB performance tuning, but I hope that this post will help you a bit in your endeavours. :-)