python - suggestion for processing huge amount of data on a single node system -


i have load huge amount of data(stackoverflow dump), process , generate files machine learning application.

operations are..

  1. parse xml file , load db(nosql or sql). use sax parser.
  2. do operations, group by(or aggregate) on database.
  3. then generate csv files. csv file generation on-demand. must not take more 2 minutes 10,000 records.

my constraints python language can use , should run on dual core cpu(no distributed system).

i tried using mysql , processing 21 million records took days(i created hell lot of indexes , aggregate operations). now, use mongodb, still takes lot of time.

can suggest me technology can use make faster(like time taken can 4/5 hours).


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -