Captain Obvious And Obviousness of Tracking Your Machine Learning Code
Let's start with numbers so that you will take me seriously -
780 Million miles of driving data has been collected by Tesla
1 Millionmiles of driving data is collected by Tesla every hour
This is Big Data
Big Data is just that - Data. Unsorted, incomprehensible, and apparently, useless, unless collected with clear objectives and in a desired format. Once you have that, you can bring in your machine learning tools to make sense of that seemingly incomprehensible data.
This is moot.
As with code, you need to take 'snapshots' of your code - not for Instagram, but to make sure that is you mess up. you can go to an earlier state and avoid hitting reset. This is made possible via Git - a versioning system, optimized for tracking your code across your program. Withouta clear tracking process, you will have a hard time trying to "analyze" code changes and eventually, Joey will come out to say -
Project Maintenance 101: Machine Learning Edition
From time to time you should retrain your model to prevent model. Importance of understanding that any changes affecting the system and can change it in unforeseen ways. ML projects can be up-to-date for years, so it's important to have a process of transferring ownership and also creating code that will be independent of the coding style of your colleagues.
Install, Apply and Track standards.
While coders get the leeway of being considered as artists and get to do their "own thing" - construction engineers, architects, structural engineers all have rigid and strict standards that prefer finesse over art.
(Art masquerading as finesse)
Where Do We Go From Here?
Classification is cheat code that you want to use here - determine a suitable classification model. You're working with data and business challenges. Apply, adapt and address these challenges. To expand your ML knowledge or to seek help in rethinking your ML workflow, check outComet.ml (this newsletter’s sponsor) and start tracking your ML experiments. You can start experimenting and use tools to allow people to build ML models together.
But the quality of the ML software should be high. Sometimes you can’t rely on some open-sourced library - especially if you need to use your results at production. The clearness, qualitative code, collaboration options, performance, and general documentation is written in plain language provide to project success and good analyze.
If you have a few (or many) experiences to share, you can become a contributor on Hacker Noon too. Simply create an account, and join 10,000+ contributors sharing their knowledge and expertise with the rest of us. Maybe you'll get featured on our next newsletter too.