Thursday, April 14, 2011

Rails: Script to import data

I have a couple of scripts that generate & gather large amounts of data that I will need both to seed my database with and in the future to add large amounts more to it. What is the best way to import lots of relational data into a rails database as both seed data and intermittently during production?

I haven't settled on an output format for my script yet but the data's structure largely mirrors my rails model's and contains has_many associations that I would like the import to preserve.

I've googled a fair bit and seen ar-extensions and seed_fu as well as the idea of using fixtures.

With ar-extensions all the examples seem to be staightforward csv imports (likely from table dumps which seems to be its primary use case) with no mention of handling associations or avoiding duplicate updates. In my case I have no id's, foreign keys, or join tables in my script so this seems like it wouldn't work for me unless I was prepared to handle that complexity myself.

With seed_fu, it looks like it could handle the relational aspects of the data creation but would still require me to specify ids (how can you know which ones are available in production?) and mix code with data.

Fixtures also have the same id problem though now it requires objects to be named (I'd probably just end up using numbers for names) and I am not sure how I would avoid accidental duplications of records.

Or would i be better off just putting my data into a local sqlite db first and then using the straight table dumping techniques?

From stackoverflow
  • I was handed a rails app recently that required some imported data and it was all handled in a rake task. All the data came in form of csv formatted files and was handled per class/model etc as necessary. This worked out relatively well for me being new to the system as it was very easy to see where data was going and how it was being applied. As you import the data you can check for id conflicts/collisions and handle them accordingly.

  • I've done this before with CSV files. I have a cron job that collects the data and puts it in accessible CSV files. Then I have a rake task (also in cron, but on another box) that looks for CSVs, and if there are any, calls model methods to create objects out of the CSV rows.

    The models have a csv_create_or_update method that takes a CSV row. This technique sidesteps the id issue, and also allows validations to run (or not) on the data coming in.

0 comments:

Post a Comment