MongoDB aggregation performance capability -


i trying work through performance considerations using mongodb considerable amount of documents used in variety of aggregations.

i have read collection has 32tb capcity depending on sizes of chunk , shard key values.

if have 65,000 customers each supply (on average) 350 sales transactions per day, ends being 22,750,000 documents getting created daily. when sales transaction, mean object invoice header , line items. each document have average of 2.60kb.

i have other data being received these same customers account balances , products catalogue. estimate 1,000 product records active @ 1 time.

based upon above, approximate 8,392,475,0,00 (8.4 billion) documents in single year total of 20,145,450,000 kb (18.76tb) of data being stored in collection.

based upon capacity of mongodb collection of 32tb (34,359,738,368 kb) believe @ 58.63% of capacity.

i want understand how perform different aggregation queries running on it. want create set of staged pipeline aggregations write different collection used source data business insights analysis.

across 8.4 billion transactional documents, aim create aggregated data in different collection set of individual services output using $out avoid issues 16mb document size single results set.

am being overly ambitious here expection mongodb able to:

  1. store data in collection
  2. aggregate , output results of refreshed data drive business insights in separate collection consumption services provide discrete aspects of customer's business

any feedback welcome, want understand limit of using mongodb opposed other technologies quantity data storage , use.

thanks in advance

there no limit on how big collection in mongodb can (in replica set or sharded cluster). think confusing maximum collection size after reaching cannot sharded.

mongodb docs: sharding operational restrictions

for amount of data planning have make sense go sharded cluster beginning.


Comments