Why Kaggle is the best way to sharpen your ML skills

kaggle
Author

Gianluca Rossi

Published

June 19, 2021

A recent post on Twitter made me realize many don’t notice the actual value Kaggle brings to the larger ML community and Kagglers.

Most media attention focuses on Kaggle GrandMasters and the record-breaking prizes sometimes offered by companies sponsoring those Challenges. The publicity is good but does not give justice to the crucial value-adding contributions of Kaggle.

Before diving into why Kaggle is such a good platform, I want to highlight the value of open competitions ― like the ones organized by Kaggle.

Why are competitions so important?

The recent acceleration of innovation in many ML tasks is primarily motivated by open competition and shared standards. If it weren’t for benchmarks like ImageNet, COCO, and WikiText, the progress in Computer Vision and NLP would have been much slower. Companies and research labs would have continued to invest in research, but we would not be in a state where new SOTA models are released almost weakly.

These open competitions are a positive-sum game for society. Companies and labs don’t need to reinvent the wheel each time. Instead, they can build on top of knowledge shared by others to make progress rapidly. This practice also has other significant impacts on society, thanks to reduced pollution and better affordable AI-driven products.

I can see lots of similarities between the competitions organized by Kaggle and, for instance, the yearly ImageNet challenge.

It’s a free apprenticeship, not unpaid labor

It is hard to deny the positive impact Kaggle had in building a global community of people passionate about ML. It is also hard to deny the number of people who had transitioned to a career in Data Science thanks to what they learned while competing at Kaggle.

While competing with others, Kagglers are learning transferable skills. For instance, the first thing you learn in Kaggle is the value of a robust cross-validation strategy. This skill is critical in the real world. Similarly, after a few competitions, people organically learn how to be effective and good practices, motivated by being competitive with other contestants who have already learned such skills. These skills are hard to acquire in school, bootcamp, or by reading a book.

In this sense, Kaggle is a free apprenticeship led by the community.

A fantastic resource also for experienced scientists

There are also enormous benefits for people that have already established a career in ML. By participating in Kaggle competitions, you can get exposure to ML tasks you don’t face at work and keep up with the SOTA. In addition, constant exposure to best practices and tools allows Kagglers to continuously acquire new skills and get ready for even bigger challenges in the future.

For every Kagglers, the beauty of the platform is not about the Private Leaderboard or the Globa Ranking. Those are just a tool to encourage people to go the extra mile. Kaggle is about learning.

The platform is also rewarding people for sharing good datasets, notebooks, and discussions. I would also argue that many users don’t care about the Leaderboard and Rankings in general. Those are just side effects, and people have real lives. In a sense, it’s like running or any other hobby. Not everyone trains to win the Boston Marathon or the Olympics. Most people run just because it’s a healthy habit.

It’s a level playing field

Not everyone is lucky to work for a leading tech company, surrounded by dozens, if not hundreds, of the best minds in the field. Kaggle lets you learn from some of the very best talents in the industry, even if you live far from where the ML buzz happens. The only limit to your ability to grow is your motivation to try new things, read what others are sharing, and interacting with them.

At the end of each competition, every top-ranked team shares a description of their solution. Sometimes, top-ranked contestants even share the code used to generate their submission. This knowledge-sharing tradition contributes to creating a repository of knowledge everyone can use in future competitions and real-life challenges. Moreover, this knowledge is open to everyone and not hidden behind a paywall.

New SOTA factory and a true benchmark for new models and tools

Nowadays, to be competitive in a Kaggle challenge, it is not sufficient to use out-of-the-box solutions. Instead, competitors often merge ideas from recent research papers or even create entirely new methods later published.

The competition in every Kaggle competition is so fierce that recently released models and tools are immediately tested. This is a great way to gain an understanding of how generalizable are the results of research papers.

Conclusion

In summary, Kaggle is a positive-sum game for society and an excellent investment of your time if you’re interested in becoming a better data scientist and stay up to date. I hope more people join this vibrant and welcoming community.