So, you've just graduated, and you want to be a Data Scientist?
Data Science - an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured.
What programming skills do you really need?
The not so great news here is that most of the skills university taught you aren’t really useful for real-life data science projects. The good news is that by my understanding, you only need a couple of data coding skills for those real-life projects, and all of them are free to learn, and if you’ve already got one under your belt it will be much easier to learn another.
The coding skills:
- bash/command line
- Python or R (Python is currently more popular and easier to learn)
- (and sometimes Java)
Ultimately, it will depend on the company you work for which two or three you end up using.
Tip - https://www.kaggle.com/learn/overview- You can use Kaggle’s Faster Data Science Education (free) courses to learn coding languages, and improve on current skills.
How do you show experience, when you haven’t had a Data Science job?
Everyone recognises that experience isn’t always easy to get and is a bit of a chicken-and-egg situation. Everyone wants to hire someone with experience, and even down at internship level, a person with experience is likely to be chosen over someone who has none. But in most professions, you need a job to gain experience. In data science at least, there are a couple of answers…
- Personal Projects
Personal project means that you pick a dataset that intrigues you and create a project out of it. Define your project goal, analyse the data, and record and explain any assumptions you make from your analysis.
The key with these projects is to focus on the quality and maintainability of your code. You’ve probably already proven that you have the maths and analytics skills, good quality code will help put you that step ahead of all the other applicants.
- GitHub Portfolio
GitHub - a web-based version-control and collaboration platform for software developers.
Once you’ve done some personal projects and you’re feeling confident in your code, start to build a GitHub portfolio. You can also use GitHub to host a blog. A blog would be a fantastic way to a. explain your work and b. practice writing about your methodology.
When you’re feeling confident enough, you can also use GitHub to give back to the Data Science community through open source projects (e.g. sklearn and pandas).
An introduction to GitHub - https://guides.github.com/activities/hello-world/
- Kaggle Competitions
Kaggle - a platform for data science competitions. We help you solve difficult problems, recruit strong teams, and amplify the power of your data science talent.
Kaggle is a great way to practice your coding and to check out the way that other Data Scientists are approaching problems with their code, and how they are using their code – and use the other contributors as a way to learn and develop.
Do keep in mind that Kaggle competitions are just that – competitions, and the beat-the-clock kind. As a data scientist, you will spend more time understanding and cleaning the data, and building maintainable code.
A good place to start - https://www.kaggle.com/c/titanic
As a final note, one of our consultants was given this piece of advice for Data Science Candidates (especially doing Technical Tests at Interview stage), from the Head of Data Science at one of our Client Companies;
“The most elegant and effective solution isn’t always an algorithm that’s complicated and over-engineered. Simplicity and maintainability are the key”.
By Emily Hill