Thank you for visiting!
This site is intended to help explain who I am and who I want to become.
A little bit about me
My name is David(Dave) P. Van Anda. I’m currently based in New Jersey and working full-time at a divirod as a software and data engineer. Divirod is deploying the most comprehensive, scalable water data network in the world, in an effort to map and manage the world’s water resources accurately.
I’m also a graduate student at Indiana University Bloomington studying Data Science with a particular interest in Machine Learning, Network Science, and Complex Systems. I’m a Research Fellow at IU and working on a project at the Kelley School of Business. My role is modeling and analyzing social contagion in corporate/professional environments.
Applied Machine Learning I526 with Dr. James Shanahan – Logistic Regression and regularization. Decision trees and pruning, implementation of decision trees. Support vector machines and making them work in practice. Boosting – implementing different boosting methods with decision trees. Using the algorithms for several tasks – how to set up the problem, debug, select features and develop the learning algorithm. Unsupervised learning – k-means, PCA, hierarchical clustering. Implementing the clustering algorithms. Parallelizing the learning algorithms.
Network Science I606 with Dr. Santo Fortunato – Models and algorithms used in network science. Programming for the analysis of networks of various types and for simulating the dynamics of processes running on them, like epidemic spreading and opinion dynamic
Data Visualization DS590 with Dr. YY Ahn – Understand, explain, and manipulate different types of data, analyze them by applying exploratory visualization techniques, and create explanatory web-based visualizations. Evaluate the effectiveness of data visualizations based on the principles of human perception, design, types of data, and visualization techniques.
Natural Language Processing DS590 with Dr. Olga Scrivner – Domain-specific NLP techniques for data analysis featuring Healthcare, Banking, Marketing, Customer Service, and Technology domains.
Social Media Mining I639 with Dr. Ali Ghazinejad – Hands-on experience in mining social data for social meaning extraction (with a focus on sentiment analysis, due to the special importance of this task in various real world applications such as those related to market intelligence) using automated methods (e.g., natural language processing [NLP] and machine learning technologies). Read, discuss, and critique claims and findings from contemporary research related to SMM. Address practical issues related to building tools to mine social media.
Statistics S520 with Dr. Jianyu Wang – Discrete and continuous random variables, estimation, hypothesis testing, 1- and 2-sample location problems, ANOVA, and linear regression
Statistics S580 with Dr. Brad Luen – Regression models and non-parametric statistics
Time Series Analysis DS590 with Dr. Olga Scrivner – Regression, forecasting, ARIMA
What I’m Reading Right Now
Podcasts I Like Right Now
- The Jim Rutt Show
- Macro Voices
- The TWIML AI Podcast
Here’s a quick little function to interpolate missing values in a pandas dataframe with linear regression. Enjoy.
GitHub Repository: https://github.com/d141/Contagious-Inspiration Notebook: https://github.com/d141/Contagious-Inspiration/blob/main/Train_Inspiration.ipynb David Van Anda Indiana University Introduction The Bandwagon Effect was first described by Sundar et al. in 2008.[1,2] In their studies, they show that feedback about products from other people will influence an individual’s decision to purchase. This usually comes in the form of ratingsContinue reading “Inspirational Priming and the Bandwagon Effect on Reddit”