Skip to content
View simantalahkar's full-sized avatar

Highlights

  • Pro

Block or report simantalahkar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
simantalahkar/README.md

๐Ÿ‘‹ Hey there! I'm Simanta Lahkar

๐Ÿ”ฌ Computational Physicist turned Scientific Software Developer and Data Engineer
๐Ÿง‘โ€๐Ÿ’ป Currently designing databases for large-scale atomistic simulation data at TU Eindhoven ร— IBM Research
๐Ÿš€ Passionate about bridging scientific computing with modern data engineering and AI technologies
๐Ÿ’ก Love building tools that make complex scientific and engineering workflows accessible and scalable
๐ŸŒ Based in Den Bosch, Netherlands
โšก Fun fact: I enjoy cooking, drone cinematography, and swimming when not debugging simulation data pipelines!

๐ŸŒ Connect with Me:

Portfolio LinkedIn Email

๐Ÿ’ป Tech Stack:

Core Programming & Development

Python SQL C++ Git

Data Engineering & Big Data

Apache Spark Apache Airflow PostgreSQL Docker

Data Analysis & Visualization

Matplotlib Plotly Excel

Scientific Computing & Machine Learning

NumPy Pandas scikit-learn TensorFlow MATLAB

Cloud & Infrastructure

Linux Databricks Jupyter Notebook

๐Ÿš€ What I'm Working On:

๐Ÿ”ฌ Scientific Cloud-Native Data Infrastructure & AI Integration

Building an open-source, cloud-native pipeline for large-scale molecular dynamics data. Using MinIO for scalable storage, Apache Spark and Delta Lake for transforming raw trajectories into structured formats, and Trino for fast SQL querying. Integrating MLflow for reproducible AI workflows and orchestrating everything with Apache Airflow. Focused on scalable, metadata-rich infrastructure for scientific computing.

๐Ÿงฌ LAMMPSKit - Production-Ready Scientific Package

GitHub PyPI Developed a modular Python toolkit for LAMMPS simulation analysis, backed by 270+ tests (94% coverage), Dockerized for portability, and powered by robust CI/CD. Achieved 60% memory savings and 40% faster performance compared to typical scientific scripting workflows.

โš›๏ธ LAMMPS Extension for Electrochemical Simulations

GitHub Extended LAMMPS with C++ to integrate two open-source packages for novel electrochemical device simulations, navigating complex licensing and attribution challenges.

๐Ÿ’ผ Professional Focus:

๐ŸŽฏ Seeking opportunities in:

  • Scientific Software Development & Computational Materials Science
  • Data Engineering, Analytics & Data Stewardship
  • Modeling & Simulation Engineering
  • AI/ML Applications in Scientific Computing

๐Ÿ”ง Core Expertise:

  • Data Analysis & Insights: Statistical analysis of large scientific datasets with advanced visualization
  • Materials Science Modeling: Molecular dynamics simulations, DFT calculations, and multi-scale modeling
  • Performance Optimization: Algorithmic improvements achieving significant memory and speed gains
  • Data Pipeline Architecture: Real-time streaming and batch processing for scientific workflows
  • Data Governance: Metadata management, data quality assurance, and reproducible research practices
  • Full-Stack Scientific Computing: From Python APIs to C++ algorithms to cloud deployment
  • Production Software Development: CI/CD, automated testing, containerization, and package distribution

๐Ÿ“Š Current Learning Journey:

๐ŸŒฑ Databricks Certified Data Engineer (in progress)
๐ŸŒฑ Cloud-native data lake architectures & data governance
๐ŸŒฑ Advanced statistical analysis and predictive modeling
๐ŸŒฑ Graph-based ML for scientific applications
๐ŸŒฑ Natural language interfaces for scientific databases

๐ŸŽ“ Background:

PhD in Materials Science & Engineering from Shanghai Jiao Tong University with expertise in computational modeling, machine learning, and numerical methods. Transitioned from pure research to building production-ready scientific software that solves real-world problems.

Key Achievement: Led IBM collaboration resulting in 10x device stability improvement through innovative simulation algorithms and data processing pipelines.

๐Ÿ—ฃ๏ธ Languages:

  • ๐Ÿ‡ฌ๐Ÿ‡ง English (Professional - C2)
  • ๐Ÿ‡ณ๐Ÿ‡ฑ Dutch (Learning - Beginner)
  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese (Basic - A1)
  • ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi, Assamese, Bengali (Native)

๐Ÿ’ฌ Let's connect! I'm always excited to discuss scientific computing, materials science research, data engineering challenges, or opportunities to make complex data more accessible through better analysis and visualization. Whether you're looking to optimize simulation workflows, design scalable data architectures, implement data governance, or bridge the gap between research and production - I'd love to hear from you!

๐Ÿ“ซ Reach out: [email protected] | LinkedIn | Portfolio

Pinned Loading

  1. LAMMPS-CTIP-EChemDID LAMMPS-CTIP-EChemDID Public

    Main repo for modified CTIP and EChemDID packages of LAMMPS, and associated helper tools

    C++ 2

  2. lammpskit lammpskit Public

    lammpskit is a Python toolkit for post-processing and analyzing molecular dynamics (MD) simulations with LAMMPS. Its modular data processing and analysis functions are broadly applicable to scientiโ€ฆ

    Python 2 3