

DataTalks.Club
DataTalks.Club
DataTalks.Club - the place to talk about data!
Episodes
Mentioned books

Nov 4, 2022 • 50min
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 28, 2022 • 53min
Large-Scale Entity Resolution - Sonal Goyal
We talked about:
Sonal’s background
How the idea for Zingg came about
What Zingg is
The difference between entity resolution and identity resolution
How duplicate detection relates to entity resolution
How Sonal decided to start working on Zingg
How Zingg works
What Zingg runs on
Switching from consultancy to working on a new open source solution
Why Zingg is open source
Open source licensing
Working on Zingg initially vs now
Zingg’s current and future team
Sonal’s biggest current challenge
Avoiding problems with entity/identity resolution through database design
Identity resolution vs basic joins, data fusions, and fuzzy joins
Deterministic matching vs probabilistic machine learning
Identity and entity resolution applications for fraud detection
Graph algorithms vs classic ML in entity resolution
Identity resolution success stories
What Sonal would do differently given the chance to start over with Zingg
Advice for those seeking to realize their own solution to a data problem
Reading suggestion from Sonal
Conclusion
Links:
Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 21, 2022 • 51min
From Data Science to DataOps - Tomasz Hinc
We talked about:
Tomasz’s background
What Tomasz did before DataOps (Data Science)
Why Tomasz made the transition from Data science to DataOps
What is DataOps?
How is DataOps related to infrastructure?
How Tomasz learned the skills necessary to become DataOps
Becoming comfortable with terminal
The overlap between DataOps and Data Engineering
Suitable/useful skills for DataOps
Minimal operational skills for DataOps
Similarities between DataOps and Data Science Managers
Tomasz’s interesting projects
Confidence in results and avoiding going too deep with edge cases
Conclusion
Links:
Terminal setup video, 19 minutes long: https://www.youtube.com/watch?v=D2PSsnqgBiw
Command line videos, one and a half hour to become somewhat comfy with the terminal: https://www.youtube.com/playlist?list=PLIhvC56v63IKioClkSNDjW7iz-6TFvLwS
Course from MIT talking about just that (command line, git, storing secrets): https://missing.csail.mit.edu/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 14, 2022 • 54min
Data Science Career Development - Katie Bauer
We talked about:
Katie’s background
What is a data scientist?
What is a data science manager?
Quality of the craft
How data leaders promote career growth
Supporting senior data professionals
Choosing the IC route vs the management route
Managing junior data professionals
Talking to senior stakeholders and PMs as a junior
The importance of hiring juniors
What skills do data scientist managers need to get hired?
How juniors that are just starting out can set themselves apart from the competition
Asking senior colleagues for help and the rubber duck channel
The challenges of the head of data
Conclusion
Links:
Jobs at Gloss Genius: https://boards.greenhouse.io/glossgenius
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Oct 7, 2022 • 49min
From Testing Phones to Managing NLP Projects - Alvaro Navas Peire
We talked about:
Alvaro’s background
Working as a QA (Quality Assurance) engineer
Transitioning from QA to Machine Learning
Gathering knowledge about ML field
Searching for an ML job (improving soft skills and CV)
Data science interview skills
Zoomcamp projects
Zoomcamp project deployment
How to not undersell yourself during interviews
Alvaro’s experience with interviews during his transition
Alvaro’s Zoomcamp notes
Alvaro’s coach
The importance of mathematical knowledge to a transition into ML
Preparing for technical interviews
Alvaro’s typical workday
Alvaro’s team’s tech stack
The importance of a technical background to transitioning into ML
Links:
Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0
Github profile: https://github.com/ziritrion
LinkedIn profile: https://www.linkedin.com/in/alvaronavas/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin
DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 30, 2022 • 53min
Responsible and Explainable AI - Supreet Kaur
We talked about:
Supreet’s background
Responsible AI
Example of explainable AI
Responsible AI vs explainable AI
Explainable AI tools and frameworks (glass box approach)
Checking for bias in data and handling personal data
Understanding whether your company needs certain type of data
Data quality checks and automation
Responsibility vs profitability
The human touch in AI
The trade-off between model complexity and explainability
Is completely automated AI out of the question?
Detecting model drift and overfitting
How Supreet became interested in explainable AI
Trustworthy AI
Reliability vs fairness
Bias indicators
The future of explainable AI
About DataBuzz
The diversity of data science roles
Ethics in data science
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/supreet-kaur1995/
Databuzz page: https://www.linkedin.com/company/databuzz-club/
Medium Blog Page: https://medium.com/@supreetkaur_66831
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 30, 2022 • 50min
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 23, 2022 • 17sec
No episode this week
Have a great weekend!

Sep 16, 2022 • 59min
Leading Data Research - David Bader
We talked about:
David’s background
A day in the life of a professor
David’s current projects
Starting a school
The different types of professors
David’s recent papers
Similarities and differences between research labs and startups
Finding (or creating) good datasets
David’s lab
Balancing research and teaching as a professor
David’s most rewarding research project
David’s most underrated research project
David’s virtual data science seminars on YouTube
Teaching at universities without doing research
Staying up-to-date in research
David’s favorite conferences
Selecting topics for research
Convincing students to stay in academia and competing with industry
Finding David online
Links:
David A. Bader: https://davidbader.net/
NJIT Institute for Data Science: https://datascience.njit.edu/
Arkouda: https://github.com/Bears-R-Us/arkouda
NJIT Data Science YouTube Channel: https://www.youtube.com/c/NJITInstituteforDataScience
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Sep 9, 2022 • 56min
Dataset Creation and Curation - Christiaan Swart
We talked about:
Christiaan’s background
Usual ways of collecting and curating data
Getting the buy-in from experts and executives
Starting an annotation booklet
Pre-labeling
Dataset collection
Human level baseline and feedback
Using the annotation booklet to boost annotation productivity
Putting yourself in the shoes of annotators (and measuring performance)
Active learning
Distance supervision
Weak labeling
Dataset collection in career positioning and project portfolios
IPython widgets
GDPR compliance and non-English NLP
Finding Christiaan online
Links:
My personal blog: https://useml.net/
Comtura, my company: https://comtura.ai/
LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html


