After finishing Datacamp’s Introduction to Python course I decided to jump back to SQL and practice with some real data. I was quickly pulled short with the realisation that I needed to take a couple step backs in my SQL learning and improve my understanding of databases. Khan academy had taught me about the language, but not about how I could actually use it.

For Python, I went and downloaded and ran the Python package (and then a beefed up version in the form of Anaconda), that’s not how it works for SQL. SQL is different from Python in that it is a language used for many different Database Management Systems. These are the applications that store and analyse data. Each of these systems has their own flavour of the language, but all make use of the basics I have already learned. My next moves were to pick an application to use, and learn more database basics in general. I didn’t want to have a gaping hole in my fundamental knowledge.

Choosing a Relational Database Management System

A Relational Database Management System (RDBMS) is ….

In order to be able to make my own database and start excitedly running all the queries I can think of, I needed an application to work with. So many choices, so many opinions to read online.

I read through a lot of articles about which system to use. The criteria I had for deciding my relational database management system were:

  1. Free for me to use. I didn’t want to pay to learn a system I might never use professionally. There are quality choices available freely. I can always swap in the future if need be.
  2. Popular. While I can swap, it would be useful if a future position needed me to make use of a program I was already very familiar with. I wanted one popular in industry and that is used for the analysis I can see myself doing in the future.
  3. Easily available learning resources. I’m a beginner. Help will be needed.

From these criteria I decided on Microsoft’s SQL Server. There were several that met the criteria, so the decision was a bit arbitrary in the end. My top 3 choices were Oracle, MySQL and SQL Server.  It mostly came down to the learning resources that I’d already found were making use of it.

Learning About Databases

I watched the 5 part Database Fundamentals video series from Microsoft Virtual Academy. It features Brian Alderman (CEO, Microtechpoint) and Pete Harris (Sr. Content Publisher, Microsoft). The five part series covered:

  1. Introducing core database concepts
  2. Relational concepts
  3. Creating databases and database objects
  4. Using DML statements
  5. SQL Server administration fundamentals

I don’t think the two hosts have worked together much before. They were working at different levels. The main speaker was constantly being reminded it was a beginner’s course by the other. This improved after the first couple hours. Some sections could have used some editing out of unnecessary parts. There are probably videos out there with better production value, however the information presented in the series was useful.

Side-note: Brian Alderman kept saying genre as “jen-rah”, with the ‘jen’ like Jenny. He said it so many times I second guessed my own pronunciation and had to look it up. I was comforted to know it was him and not me pronouncing it wrong.

The second and fourth were my preferred parts of the series, and I’ll go in to what I learned from them more in-depth below. The first part was very general and introductory, though it was good to get my definitions down-pat. Plus some history never hurts. The third covered the creation of the databases. Showing how it can be done by using a GUI or by writing a script. I think I was hoping for more from this section than what was given. I already knew what the CREATE statement was, and they didn’t go in to more in-depth options available when creating a database. The fifth section was the only one specific to SQL Server. It covered security, in the sense of access rights and logins, as well as back-ups and recovery. Most of this was directed towards those more interested in becoming a systems administrator. I watched it through because I thought it would be good to understand the basics on these topics. Especially the backing up and recovery.

Relational concepts and normalisation.

During the second part I gained a greater theoretical understanding of databases during the discussion on normalising data. The normal forms were given as follows:

  1. Eliminate Repeating Groups
  2. Eliminate Redundant Data
  3. Eliminate Columns not Dependent on Key
  4. Isolate Independent Multiple Relationships
  5. Isolate Semantically Related Multiple Relationships

Only the first three were properly covered, as that is the extent most databases go to, apparently. I found the three steps intuitive, but having a conscious understanding of them is helpful. I could see myself not doing the third one on particular. I might have gone for a larger table where columns were related to other parts of the table, but not necessarily all related to the primary key.

Data integrity was another topic touched on that I would not have seen as an obvious advantage to a relational database. The use of foreign keys requires them to be correct and available in another table. If it is not there/ entered correctly then the command will not work.

Using DML statements.

The data modification language covered in the fourth part reiterated many of the SQL query commands I’d already learnt including: SELECT…FROM, WHERE, AND, OR, AS, BETWEEN, JOIN (INNER, OUTER, CROSS), ORDER BY. The aggregate commands: AVG, MIN, MAX were also demoed. There were several I had yet to come across which were: DISTINCT, UNION, EXCEPT and INTERSECT. I’m not sure how often I’d practically use UNION, but I was certainly glad to learn about the others and can see myself using them in queries in the future.

The aggregate commands: AVG, MIN, MAX were also demoed. I’d like to learn more about how much statistical power SQL has. Can I run statistical analyses through SQL Server?

Inserting into, and updating the data tables was covered. I was very happy to learn about the BULK INSERT command. I knew that it would exist and that I’ll use it in the future, but knowing its name saves me one google search.

The fourth part also briefly discussed indexes and triggers. I definitely would like more information on these two subjects. To my understanding, indexes here are different to the indexes in python. In python a list index is just its position in the list. For databases, one can index the data for storage/speed purposes. Indexing is sorting your data. If it is already sorted then it is quicker to sort through. This becomes more important the larger the database gets. There are different types of indexing. I’d like to research in to this subject more and write about it in the future. Triggers are a stored program that are set off when a certain criteria is met. For example one could set it so that if one table is updated it automatically updates a different table. A real life example would be if an employee’s last name was listed in multiple tables. If they changed their name it would only be necessary to update the name on one table.

Overall I found the database video series informative and I’m looking forward to booting up SQL server and creating my first database.


One thought on “Databases

  1. Pingback: French Word Frequency – Part 1 – Sarah Rowley

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s