You are here:Home»KB»Programming»MySQL»How to export HTML from MySQL as a CSV without data corruption
Thursday, 03 May 2012 00:00

How to export HTML from MySQL as a CSV without data corruption

Written by

** this csv export procedure might not maintain cariage returns and will need adjustment if so. CKeditor QuickFaq article is a good example **

This is a worked example of exporting data from a QuickFaq via csv, data manipulation in libre office and then importing the articles into K2. The articles contain full html and must not be modified or escaped. This custom routine will work for migrating all sorts of data to and from a MySQL database with html, or not, without loosing any data

Exporting Data out of QuickFaq

  • export with phpmyadmin with the following settings:

export type CSV
fields terminated by    ,
fileds enclosed by    ¬¬¬replaceme¬¬¬
fields escaped by
lines terminated by
replace NULL by

Remove CRLF characters within fields - not ticked
Put fields names in the first row - ticked

NEW Terms in PHPMyadmin - still checking (29-05-14)

export type CSV
Columns separated with:    ,
Columns enclosed with:    ¬¬¬replaceme¬¬¬
Columns escaped with:
Lines terminated with:
Replace NULL with:

Remove CRLF characters within fields - not ticked
Put fields names in the first row - ticked

  • use notepad++ and replace all " with ""
  • use notepad++ and replace all ¬¬¬replaceme¬¬¬ with "
  • should now be importable as a cvs into libre office

Import CSV in to libre office

The csv export should now be imported into libre office with the following settings. The purpose of this step is so that the exported data can be matched up to the data structure of the K2 articles and categories

  • character set    Unicode (UTF-8), this assumes thats what your charater set is using. (the other optin is possibly Western Europe (ISO-8859-15/EURO) if you are using Latin). There should be no Bomb Characters, this shows there is a character that cannot be handled. I think unicode is probably a complete implementation of Unicode.
  • Language - Default (UK) - should not make any difference
  • from row 1
  • separated by    ,  (comma)
  • Text delimiter   "
  • leave everything else

Notes

  • K2 import/export module makes all " go to ""   , this seems to be the key for a successful import.
  • when you then import this using open office after doubling up the ", the import process removes a " out of each pair making it the syntax appear correct once the spreadsheet opens.

Importing Data into K2

i used navicat as phpmyadmin had some issues with not setting an escaped character.

  • open navicat and then select the K2 items table
  • run the import wizard in navicat
  • Select the appropriate csv file (ie QuickFaq Data - Ready for import into K2.csv)
  • leave encoding as 65001 (UTF-8)
  • next
  • select comma (,) for the Field delimeter, leave Record separator as CRLF (doesnt seem to make any difference), Text Qualifier as "
  • next
  • change first data row to 2
  • change the date settings (might be different for different csv but we will use QuickFaq to K2)

Date order    YMD
Date delimiter    -
Time delimeter    :
DateTime Order    Date Time (should be unchanged)
Decimal symbol    . (should be unchanged)

  • next
  • all fields should match up here
  • next
  • select Copy delete all records in destination, repopulate from the source
  • next
  • click start
  • there should be no errors for a successful import

Additional notes, these might not be required but when these instructions are used again can be pruned or merged.

on the working csv there only seems to be 1 LF at the end whereas the one that does not work has CRLF

\n = linefeed
\r = carriage return

solution is to use the pipe symbol the vertical line, seperating thing does not like multiple symbols

use CSV for export

Fields terminated by   |
Fields enclosed by     "
Fields escaped by    nothing here
Lines terminated by  nothing here
Replace NULL by NULL (possibly leave this blank)

Remove CRLF characters within fields - unticked
Put fields names in the first row   -   ticked

so unexplained is most likely due to a field seperator present in the html code such as ,";  so use one that is not present and is most likely the pipe symbol. a quick text scan will show it up.

to find the fault, find a break in the file when loaded in office/spreadsheet and then scan for that text in the csv file using notepad++ and this will reveal all.

¦ does not seem to work but give it a go

¬¬¬replaceme¬¬¬

Read 1492 times Last modified on Thursday, 29 May 2014 11:14