Loading CSV to MySQL table – fast…

My pet project is on its way – a subtitles website done in Laravel. So, I needed some data about the movies and found out that IMDB offers a daily dump of their database. Parts of it, surely… Anyway, it is enough for me, faster than an API, better than scraping.

So, I have made a simple script to download the files using file_get_contents, and ungz-ed (I think that is how you say it 🙂 ) the file with a script I found here: https://stackoverflow.com/questions/3293121/how-can-i-unzip-a-gz-file-with-php

I knew that importing trough Laravel’s method create would be slow… Way too slow for the amount of the data IMDB has to offer (). The first alternative was to open the file in PHP, and a “foreach” loop, but why not skip PHP as much as I can?

So, some digging on the net and some help from Neven (friend and a DB expert), and we found LOAD DATA INFILE MySQL function that loads a CSV directly to the table skipping PHP loading altogether. The function is nice enough to return the number of lines made. My final functions looks like this:

private function importFileToTable(string $dataFile, string $tableName)
{
    $sql = "LOAD DATA INFILE '$dataFile' INTO TABLE $tableName FIELDS TERMINATED BY '\\t' LINES TERMINATED BY '\\n' IGNORE 1 ROWS;";
    DB::connection()->disableQueryLog();
    $rowCount = DB::connection()->getpdo()->exec($sql);
    return $rowCount;
}

LOAD DATA INFILE '$dataFile' – $dataFile is the path to the file. In my case its a TSV file (Tab Separated Value)
INTO TABLE $tableName – $tableName is the name of the table you want to import to. I have an array of filenames, and iterate trough them
FIELDS TERMINATED BY '\\t' – As I mentioned, I have a TSV, so \\t is the separator for my rows. If you have a CSV you can put “,” or what ever your delimiter is.
LINES TERMINATED BY '\\n' – the end of the line is “\\n”.
IGNORE 1 ROWS – This ignores the first row as it has the column names. You can also use ENCLOSED BY '"', this can automatically strip quotas if the data is enclosed in them.

A nice explanation for this function can be found here https://www.mysqltutorial.org/import-csv-file-mysql-table/?fbclid=IwAR3V8HOodWOUu7ZKqT8SlEKWA4zqx5ovLp6pKqGq9FtLvDafGUyxExb8FKE
For now, I have not done any speed tests to compare it to line-by-line import via PHP, but if I do I’ll be sure to update this post.

One Comment

Leave a Reply