layout | permalink | title | tags | imagefeature | chart | mathjax | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
page |
/about/index.html |
Aashita Kesarwani |
|
true |
true |
I am a Math PhD graduate with a passion for data and machine learning. I am especially interested in two particular areas - data visualizations and natural language processing. I have recently joined Harvey Mudd College as a Scientific Computing Specialist.
Open source contributions related to data science:
- Bubbly (Author and Maintainer)
An easy-to-use python package for plotting interactive and animated bubble charts using plotly. The animated bubble charts can accommodate upto seven variables in total viz. X-axis, Y-axis, Z-axis, time, dots, their size and their color in a compact and captivating way with plenty of customization and can be used with plotly as demonstrated in the notebook here.
-
nytcomments (Author and Maintainer)
A Python package that includes three main functions to perform three distinct tasks involving the retrieval of comments' and articles' from New York Times as ready-to-use dataset for data science/machine learning projects:- The main function get_dataset returns two dataframes - one each for the articles and the respective comments. The retrieval can be customized based on a number of parameters such as a specific timeline for the articles, search keywords, filter queries, etc.
- The function get_articles is an API wrapper for NYT article search API, that returns the cleaned up and preprocessed data for articles as a ready-to-use pandas dataframe (or csv files) and the retrieval can be customized with the same options as above.
- The function get_comments retrieves the comments on NYT article(s) given their URLs. It can be used as a substitute for the comments by URL option in the NYT Community API that is now deprecated and has an unresolved issue. This function does not use NYT API for the retrieval unlike the above two. The package is accompanied with an illustrative tutorial for its use containing detailed information regarding the functions and their parameters.
-
A dataset contributed to Kaggle, that was among the 20 featured datasets, comprised of over 1.2 million comments with 34 variables and over 9,000 articles with 16 variables along with the ideas for data science projects.
Project concerning comments posted on New York Times articles:
- Exploratory data analysis of the features contained in the comments' and articles' dataset with statistical graphs.
- Bag of words models to predict the probability that a certain comment will be selected as a NYT's pick.
- Logistic Regression model coupled with Latent Semantic Analysis (LSA) on Tf-Idf vectors of words and character n-grams of comments' text.
- NB-Logistic Regression model inspired from the paper Baselines and Bigrams: Simple, Good Sentiment and Topic Classification by Sida Wang and Chris Manning.
Cool projects related to NYT dataset:
- An automated twitter bot @OnAffairs trained to comment on current affairs using the Markov chain model on comments from NYT articles. [Code]
- Word clouds of various shapes for the visualization of different textual features from the NYT articles' and comments' dataset.
Some of my short projects in data science can be found at the following links:
- Titanic survival prediction (Top 3% score in Kaggle competition)
- Feature Engineering for 2018 NCAA Division I Men’s Basketball Championships predictions
- The effect of recession on housing prices in university towns
- Plotting record temperatures for New Orleans
- Extraction of dates from medical records
My PhD thesis titled Theory of the generalized modified Bessel function \(K_{z,w}(x)\) and \(2\)-adic valuations of integer sequences is linked here.
[//]: #( is a link to my resume Aashita Kesarwani{:width="1080px"})
Please feel free to contact me at contact@aashitak.com or connect with me in other platforms (Github/LinkedIn/Kaggle) using the icons at the bottom bar of the website.