Getting Started with Data Science and Analysis
I’ve been an instructor for Software-Carpentry (SWC) over a year now. It’s been a facinating experience and I’m proud to be a part of an open source movement promoting best practices. Typically when looking to start learing data science/analysis the first things people look up is something along the lines of: “learn python”, “free online r course”, “data science python”, “r jobs”, etc. Or scan through the coursera offerings. I’m a bit biased, but I think the SWC material is one of the best ways to just get familiar with the basics. This isn’t a blog post about SWC per se, but how one might go about learing and navigating some of the material on your own without attending a workshop.
Software Carptentry Material
SWC has a page of lessons that link to the various lessons taught during workshops. The core material covers:
- Unix
- Some programming language like Python, R, MATLAB, etc
- Version control using Git or Mercurial
- Databases and SQL
If you take a look and listen (there is sound) to the introductory browsercase. You will see that we say we teach the above material, but in essense we try to convey:
- Automation of repetitive tasks
- Tracking and sharing work
- Building modular code
- Manage data
It’s a little bait-and-switch :)
Installing Things
For each SWC workshop is accompanied by a website. The website Contains information about location, instructors, helpers, syllabus, etc. For people reading this post, the most important part may be the installation instructions towards the middle/bottom of the page. There are separate instructions depending on your operating system.
Essentially:
Data Carpentry Material
Software Carpentry’s sister organization, Data Carpentry also has a set of lesson plans. From their about us page:
Data Carpentry is a sister organization of Software Carpentry designed to teach basic concepts, skills and tools for working more effectively with data. We develop curricula and run workshops that are 1) domain specific; 2) target fundamental data analysis and data management challenges; and 3) require little or no prior programming experience. In many domains of research the rapid generation of large amounts of data is fundamentally changing how research is done. The deluge of data presents great opportunities, but also many challenges in managing, analyzing and sharing data. Data Carpentry aims to teach the data skills that will enable researchers to be more effective and productive.
Pick a language you want to do the lesson with. clone or Download ZIP, and go through the lessons using the .html, .Rmd, or .md as mentioned above.
Happy Learning!
Hopefully this has been clear enough. If not, post a comment, and I’ll respond and update this post accordinly.