In this article, I’ll try to explain how you can get the data from the source, do basic analysis, load into your database and then prepare some basic charts. Once you are familiar with the basic stuff, you could further explore and create different complex charts.
Find the right dataset
I used NHS profile data downloaded from Statistics Canada website for this example. This profile presents information from the 2011 National Household Survey (NHS) for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. You may download NHS data for the geographic level of your choice in either CSV (comma-separated values) or TAB (tab-separated values) format. For this example we are using data for ‘Census tracts’ in CSV format (99-004-XWE2011001-401_CSV.zip).
Analyse your data
Before you start importing data into your database, you may want to analyse the dataset. Since the CSV file is large (60 MB), it might be wise to use some specialized tool to load and analyse the data. Spreadsheet applications such as Microsoft excel may not be the right to tool to handle large csv files. I used R studio to load and view the data. You could use the below command to browse and open csv in R Studio.
data <- read.csv(file.choose())
Once you have the data loaded in R Studio, you can view and analyse the data and do any necessary processing before loading it into the database
Load your data into database
There are different ways in which you can import data into MySQL database. I used the below steps to import my data files into the MySQL database on the cloud server.
Step1: Connect to server via SSH
This article may guide you on this step
Step2: Upload data files to the server using ftp
Upload your data files to a temporary folder in the server using ftp. If the file is zipped then use the below command to unzip from command line once you are connected via SSH.
Step3: Connect to MySQL
Now you need to connect to MySQL from command line.
In order to import data from command line you need to enable local-infile option.The local data offloading is off by default and we need to enable it. You can specify that as an additional option when setting up your client connection:
mysql -u myuser -p --local-infile somedatabase
This is because that feature opens a security hole. So you have to enable it manually in case you really want to use it.
Step4: Create a table in database
To import the CSV file, you need to create the target table in your database.
DROP TABLE IF EXISTS canada_nhs_stats_ct; CREATE TABLE canada_nhs_stats_ct ( Geo_Code int, Prov_Name varchar(50), CMA_CA_Name varchar(50), CT_Name varchar(50), GNR float(3,2), Topic varchar(100), Characteristic varchar(100), Note int, Total int, Flag_Total varchar(30), Male int, Flag_Male varchar(30), Female int, Flag_Female varchar(30) );
Now you are ready to load data into the table.
Step5: Load csv files in to the table
Use the below command to import data from csv file to your database table
load data local infile 'REPLACE_WITH_FILE_PATH/99-004-XWE2011001-401.csv' into table canada_stats_ct FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (Geo_Code,Prov_name,CMA_CA_Name,CT_Name,GNR,Topic,Characteristic,Note,Total,Flag_total,Male,Flag_Male,Female,Flag_Female);
Once you have the data in the data base you may use any server side language available on your server to fetch the data and output as required by the Javascrip charts library.
I'll be publishing detailed steps on how to create charts here soon. Till then please refer to this article for creating your own charts using PlusCharts.