Hello Readers,
Big Data is all the rage these days. On this blog we have mainly used R as our analysis program of choice (sorry Python) to examine, model, and predict on the data. R is optimal for data with hundred thousands of rows or less, and dealing with larger data sets with millions of rows or more usually slows R down to a crawl. (Remember the neural network post?) Another deciding factor on computation speed is your computer setup.
Because there are millions of rows in big data sets, we need only sample a few hundred thousand of them into R. But where will we store the millions of rows prior to sampling? Easy, into a data warehouse made for million plus row data sets, a SQL Server database.
We have not dealt with hundreds of millions or billions of rows, so resist the urge to mention the Hadoop distributed file system. Or should I say not yet? Anyways, here we will load million plus row U.S. Census data sets in preparation for R sampling.
Getting Census Data
Specifically we concentrate on two data sets, U.S. Census American Community Survey 2011, where we can find nationwide population and housing data. You can download the data via ftp here, or choose 2011 ACS 1-year PUMS from the ACS site. The ACS encompasses social, economic, demographic, and housing data. A screenshot of the page is shown below:
Figure 1. American Community Survey (Census) Webpage |
Figure 2. Choose File Format - CSV |
Figure 3. Download Population and Housing Data Files |
Figure 4. Unzipped Files |
Importing into SQL Server
Now for the crucial part, where we import the data into a database system, SQL Server, instead of reading the CSV files into R directly. After connecting to your server, select which database where you would import the CSV files. Right-click and under Tasks, > choose Import Files.
Figure 5. Import Files Option |
Figure 6. Select Flat File Source |
Figure 7. Column Previews |
Figure 8. Table Destination |
Figure 9. Select Source Tables |
Figure 10. Last Check |
Figure 11. Import Progressing |
Figure 12. Execution Successful |
Figure 13. Locating Imported Table |
Thanks for reading,
Wayne
@beyondvalence
Here is Instant Approval Social Bookmarking or high DA Social Bookmarking sites list. you can use all good bookmarks sites and increase your ranking .
ReplyDelete