Manoj Kumar

About Manoj Kumar:

In my previous role as a data scientist at Infosys, I had the opportunity to collaborate with a talented team of data scientists to develop a disease prediction model using a large healthcare dataset. Our work involved various stages of the data science lifecycle, and I played a crucial role in several key areas.

One of my primary responsibilities was conducting extensive data pre-processing to ensure the quality and reliability of the dataset. I successfully handled categorical variables through label encoding techniques and implemented appropriate methods to address missing data effectively.

To enhance the model's performance and robustness, I explored and implemented ensemble methods such as Random Forest, Gradient Boosting, and XG Boosting. By carefully selecting and combining these algorithms, we achieved significant improvements in prediction accuracy, outperforming traditional Semi-Supervised learning approaches.

Moreover, I leveraged clustering techniques, including K-Means and hierarchical clustering, to uncover valuable insights within the data and further enhance the accuracy of our predictions. Through meticulous evaluation and comparison of various classification algorithms, such as SVM, Decision Tree, and KNN, I gained expertise in selecting the most suitable models and hyperparameters, using appropriate evaluation metrics.

Throughout my projects, I consistently employed measures of central tendency techniques to handle missing values, implemented the IQR technique for outlier detection and treatment, and employed the VIF method to identify overfitting, underfitting, or best-fitting datasets. I also utilized preprocessing methods like Standard Scaling, Min-Max Scaling, and Robust Scaling to normalize the data and improve model performance.

In addition to my technical skills, I have experience working with automation tools and frameworks. I have created API project folders using automation tools like cookie cutter and bash scripts, incorporating Flask and version control systems like git. Furthermore, I have successfully deployed CI/CD pipeline projects on cloud platforms such as AWS, Azure, and Heroku, utilizing services like GitHub Actions/CircleCI and Docker for seamless deployment.

Documentation is an essential part of my work, and I have prepared detailed High-Level Design (HLD) and Low-Level Design (LLD) documents to provide comprehensive project understanding. I am proficient in applying state-of-the-art EDA techniques, including decomposition and manifold techniques such as PCA and T-SNE. Furthermore, I have applied deep learning techniques, including Artificial Neural Networks (ANN), to solve regression and classification problems, implementing activation functions, forward propagation, and optimization algorithms like Gradient Descent.

Throughout my career, I have developed multiple pipelines encompassing data ingestion, EDA, ensemble methods, and various machine learning algorithms, along with hyperparameter tuning. I have also extracted data from databases such as MongoDB Atlas and MySQL, ensuring the integrity and security of the data. I am adept at conducting experiments on datasets without disturbing the original data and maintaining a project dashboard to track and communicate progress effectively.

As a proactive team player, I have utilized tools like Slack to facilitate seamless communication and collaboration within cross-functional teams. I am familiar with RADAR for issue tracking and have a strong commitment to delivering high-quality work.

Experience