

Data Science at Home
Francesco Gadaleta
Cutting through AI bullsh*t.Come join the discussion on Discord! https://discord.gg/4UNKGf3
Episodes
Mentioned books

Aug 29, 2019 • 31min
[RB] Complex video analysis made easy with Videoflow (Ep. 75)
In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.
I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github
References
Videflow Github official repository
https://github.com/videoflow/videoflow

Aug 27, 2019 • 45min
[RB] Validate neural networks without data with Dr. Charles Martin (Ep. 74)
In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work.
The questions that Charles answers in the show are essentially two:
Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?
How can we dominate DNN in a theoretically principled way?
References
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks https://github.com/CalculatedContent/WeightWatcher
Slack channel https://weightwatcherai.slack.com/
Dr. Charles Martin Blog http://calculatedcontent.com and channel https://www.youtube.com/c/calculationconsulting
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning - Charles H. Martin, Michael W. Mahoney

Aug 20, 2019 • 21min
How to cluster tabular data with Markov Clustering (Ep. 73)
In this episode I explain how a community detection algorithm known as Markov clustering can be constructed by combining simple concepts like random walks, graphs, similarity matrix. Moreover, I highlight how one can build a similarity graph and then run a community detection algorithm on such graph to find clusters in tabular data.
You can find a simple hands-on code snippet to play with on the Amethix Blog
Enjoy the show!
References
[1] S. Fortunato, “Community detection in graphs”, Physics Reports, volume 486, issues 3-5, pages 75-174, February 2010.
[2] Z. Yang, et al., “A Comparative Analysis of Community Detection Algorithms on Artificial Networks”, Scientific Reports volume 6, Article number: 30750 (2016)
[3] S. Dongen, “A cluster algorithm for graphs”, Technical Report, CWI (Centre for Mathematics and Computer Science) Amsterdam, The Netherlands, 2000.
[4] A. J. Enright, et al., “An efficient algorithm for large-scale detection of protein families”, Nucleic Acids Research, volume 30, issue 7, pages 1575-1584, 2002.

Aug 14, 2019 • 14min
Waterfall or Agile? The best methodology for AI and machine learning (Ep. 72)
The two most widely considered software development models in modern project management are, without any doubt, the Waterfall Methodology and the Agile Methodology. In this episode I make a comparison between the two and explain what I believe is the best choice for your machine learning project.
An interesting post to read (mentioned in the episode) is How businesses can scale Artificial Intelligence & Machine Learning https://amethix.com/how-businesses-can-scale-artificial-intelligence-machine-learning/

Aug 6, 2019 • 22min
Training neural networks faster without GPU (Ep. 71)
Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense.
Enjoy the show!
References
Faster Neural Network Training with Data Echoing
https://arxiv.org/abs/1907.05550

Jul 23, 2019 • 45min
Validate neural networks without data with Dr. Charles Martin (Ep. 70)
In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work.
The questions that Charles answers in the show are essentially two:
Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML?
How can we dominate DNN in a theoretically principled way?
References
The WeightWatcher tool for predicting the accuracy of Deep Neural Networks https://github.com/CalculatedContent/WeightWatcher
Slack channel https://weightwatcherai.slack.com/
Dr. Charles Martin Blog http://calculatedcontent.com and channel https://www.youtube.com/c/calculationconsulting
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning - Charles H. Martin, Michael W. Mahoney

Jul 16, 2019 • 31min
Complex video analysis made easy with Videoflow (Ep. 69)
In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment.
I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github
References
Videflow Github official repository
https://github.com/videoflow/videoflow

Jul 9, 2019 • 42min
Episode 68: AI and the future of banking with Chris Skinner [RB]
In this episode I have a wonderful conversation with Chris Skinner.
Chris and I recently got in touch at The banking scene 2019, fintech conference recently held in Brussels. During that conference he talked as a real trouble maker - that’s how he defines himself - saying that “People are not educated with loans, credit, money” and that “Banks are failing at digital”.
After I got my hands on his last book Digital Human, I invited him to the show to ask him a few questions about innovation, regulation and technology in finance.

Jul 2, 2019 • 29min
Episode 67: Classic Computer Science Problems in Python
Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications.
His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exercises, and of course algorithms.
There are examples in the major topics any data scientist should be familiar with, for example search, clustering, graphs, and much more.
Get the book from https://www.manning.com/books/classic-computer-science-problems-in-python and use coupon code poddatascienceathome19 to get 40% discount.
References
Twitter https://twitter.com/davekopec
GitHub https://github.com/davecom
classicproblems.com

Jun 25, 2019 • 19min
Episode 66: More intelligent machines with self-supervised learning
In this episode I talk about a new paradigm of learning, which can be found a bit blurry and not really different from the other methods we know of, such as supervised and unsupervised learning. The method I introduce here is called self-supervised learning.
Enjoy the show!
Don't forget to subscribe to our Newsletter at amethix.com and get the latest updates in AI and machine learning. We do not spam. Promise!
References
Deep Clustering for Unsupervised Learning of Visual Features
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey


