December 13, 2006

Preloading Large Data Sets Using 'rake migrate'
Post by Sandro, on behalf of RailsOnWave, www.railsonwave.com

Let’s suppose you have to migrate a website that comprises the name of thousands of cities stored in an SQL table. In accordance with the Rails philosophy, it’s convenient for us to use YAML files, being more readable and better formed than their CSV equivalents.

Here there are the steps to perform to make it easy:

  • export your SQL data in CSV format. you can use a tool such as PHPmyadmin or something similar, use a data separator (like |, for example) and put the column names in the first row of your CSV data.
  • use this script (csvtoyml.rb) to generate several yml files from your CSV data (be sure to change the file name and the separator to your configuration)
  • put the generated yml files in a folder in the 'db' directory of your Rails app (e.g.: /myapp/db/migrate/bigtable )
  • use the following piece of code inside one of your migrations to load in the data:
nr = Dir.glob(File.join(File.dirname(__FILE__), "path/to/your/yaml/files/*.yml")).length

(0..(nr-1)).each do |n|
  puts("File " + n.to_s + " di " + (nr-1).to_s + " \n")
  f = Fixtures.new(Modelname.connection,                      # a database connection
                   "tablename",                               # table name
                   Modelname,                                 # model class
                   File.join(File.dirname(__FILE__), "path/to/your/yaml/files/yamlfile" + n.to_s))
  f.insert_fixtures
end
Note: You need to split the file into fragments to avoid an error (stack level too deep) that would be generated from importing a single file with all the data.

By Sandro, on behalf of RailsOnWave