Yohan M Markose

Data Engineer | Software Developer | Data Scientist

About Me

👨‍💻

Data Engineer with 2+ Years of Experience

I'm a passionate Data Scientist currently pursuing my Master's in Information Systems at Northeastern University (GPA: 4.0). With over 2.5 years of professional experience at IQVIA, I specialize in building scalable data pipelines using Python and SQL, implementing ETL processes, and developing automation solutions that saved 250+ hours per month.

I'm currently exploring technologies including agentic RAG pipelines with LangGraph, end-to-end data pipeline systems using Snowflake and vector databases, and automated workflows with Apache Airflow. My projects showcase expertise in modern data engineering: containerized deployments with Docker/Docker-compose, and full-stack Data Product platforms with FastAPI and Streamlit. I'm passionate about creating data solutions that not only solve technical challenges but also make a meaningful impact on business decisions and outcomes.

Technical Skills

Programming Languages

Python SQL Java C++ HTML5 CSS3

Python Libraries & Frameworks

NumPy Pandas Scikit-learn Matplotlib TensorFlow FastAPI Beautiful Soup Selenium Seaborn LangChain LangGraph Streamlit LiteLLM PyMuPDF

Tools & Platforms

Microsoft Excel Apache Airflow Docker Git/GitHub MCP Power BI Pinecone ChromaDB Mistral OCR

Database & Cloud

Snowflake MySQL DBT AWS S3 GitHub Actions Google Cloud Run GCP Redis Streams Google Compute Engine

Professional Experience

Oct 2020 - Apr 2023

Software Developer

IQVIA | Kochi, Kerala, India
  • Led automation team, reducing manual QA tasks by 250+ hours/month
  • Implemented Python ETL pipelines, cutting data prep time from 10 hours to 10 minutes
  • Automated deliverables using Selenium, saving 75+ manual hours monthly
  • Developed Python scripts for multi-source data integration, reducing processing time by 80%
  • Maintained SQL databases and provisioned 15+ reports monthly
  • Mentored 7 team members in Python automation and Excel data analysis
Jan 2020 - Apr 2020

Software Developer - Intern

IQVIA | Kochi, Kerala, India
  • Developed on-demand QA automation solutions using Python
  • Scraped data from client dashboards using Beautiful Soup and Selenium
  • Processed and visualized data for actionable insights

Projects

Venture Scope (Multi Agent - Agentic RAG Application)

MCP Servers LangGraph LangChain Snowflake Pinecone FastAPI Streamlit AWS S3 Docker Google Cloud Run

NVIDIA FINRAG (Agentic RAG)

Agentic RAG pipeline providing curated financial information for NVIDIA using LangGraph. Features three distinct tools: Snowflake for stock analysis, RAG for quarterly reports, and Web Search for real-time data.

LangGraph LangChain Snowflake Pinecone FastAPI Streamlit AWS S3 Docker Google Cloud Run

Financial RAG Pipeline

RAG Pinecone Chroma DB Airflow Selenium LangGraph LangChain FastAPI Streamlit AWS S3 Docker Google Cloud Run

Federal Reserve Economic Data Pipeline (Snowflake Pipeline)

Built Snowflake pipeline using Snowpark for Python to extract and analyze U.S. Treasury yield data. Created dashboard displaying inverse treasury yield curve.

Snowflake Snowpark GitHub Actions AWS S3 Docker Google Cloud Run

SEC Bridge - Financial Statement Analytics Platform

Architected data pipeline leveraging Airflow and Snowflake to automate SEC financial dataset processing. Implemented automated quality checks and built full-stack solution.

Snowflake Airflow DBT FastAPI Streamlit AWS S3 Docker Google Cloud Run

Electric Car Emission impact - Bayesian Statistics

Python Bayesian Statistics NumPy Pandas Matplotlib Seaborn

Education

Sep 2024 - Expected May 2026

Master of Science in Information Systems

Northeastern University | Boston, MA

GPA: 4.0

Relevant Courses: Data Science Engineering Methods, Big-Data Systems and Intelligence Analytics, Application Engineering & Development, Programming Structures and Algorithm

Aug 2016 - Jun 2020

Bachelor of Technology in Mechanical Engineering

Mar Athanasius College of Engineering | Kerala, India

Certifications

🏆

IBM Data Science

Jun 2024
📊

Microsoft Certified: Power BI Data Analyst Associate

Dec 2022
🐍

PCAP - Certified Associate In Python Programming

Nov 2021

Let's Connect

I'm actively seeking internship and full-time opportunities in Data Engineering and Analytics.