In the past month I have started learning skills for a future career in data analysis. I completed my Bachelor degree in Psychological Science at the University of Queensland. In contrast to many of my peers, I enjoyed the large statistics component. My favourite part of completing my thesis was extracting meaning from the data I collected, using SPSS.
Since I would like to work in analysing data in the future, I decided to supplement my statistical knowledge with practical computer science skills. My focus will be on SQL and Python, potentially adding in R. I would also like to learn some machine learning and participate in Kaggle competitions. This blog with keep track of my learning.
My progress so far has been learning the basics. I completed the “Intro to SQL: Querying and managing data” on Khan academy. I like the smaller challenges leading to a free choice project for each chapter. Having the independence to personalise the data for the projects makes the learning feel less like my hand was held throughout.
I learnt the basics of querying and modifying data. From here, I think my next steps will be to learn how to create/import databases. I’m familiar with excel so I’d like to try and import data from excel and query that. For further SQL learning resources I will use SQL Sever Stairways. It’s a free resource and the variety of stairways is impressive so I can choose which areas I would like to focus on.
For Python, I am doing the Intro to Python for Data Science course on DataCamp. I was already familiar with the basics from an introductory software development course I took in my first year of university as an elective.The course covers the basics: print, strings, lists, tuples, calculations. It then goes more in to the packages and functions in those packages. The fourth chapter of the first course is dedicated to Numpy. The usefulness of Numpy arrays over lists is undeniable.
I’ve downloaded the Anaconda distribution package, so that when I work off my own computer I’ll have all the useful data analysis packages. After completing this course I would like to focus on SQL for a little bit again.
Currently, I envision the following as my future learning projects :
- Learn how to create SQL databases.
- Work with real data .
- Deal with messy data. I’ve used Excel and SPSS for this and would like to learn other methods.
- Familiarise myself with some of the Python packages I’ve read about: numpy, pandas, matplotlib, scikit-learn, scipy.
- Data visualisation using Python.