Sohaib Shamsi — Full-Stack & AI Engineer

About Me

Turning Complex Problems Into Elegant Solutions

I'm a Computer Science graduate from FAST NUCES, Karachi, with a deep focus on Artificial Intelligence, Machine Learning, and Full-Stack Engineering.

I specialize in designing and deploying end-to-end AI pipelines: from data ingestion and model training to production APIs and scalable infrastructure. My work spans fraud detection systems, NLP engines, recommendation algorithms, reinforcement learning environments, and distributed data architectures.

Beyond code, I mentor aspiring AI engineers, compete in programming contests (IEEE Xtreme — Top 10 nationally), and regularly publish technical content on system design and ML engineering.

AI/ML Engineering

End-to-end ML pipelines, model deployment, MLOps

Full-Stack Dev

React, Node.js, Python, REST APIs, databases

Data Engineering

Apache NiFi, Solr, Ozone, distributed systems

Skills

My Tech Arsenal

Technologies and tools I use to bring ideas to life

Core
Stack

AI / Machine Learning

PyTorchTensorFlowScikit-learnHugging FaceLangChainOpenAI APIAnthropic APIXGBoostNLTK / spaCyOpenCVMLflowW&BRAG PipelinesRLHF

Full-Stack Development

ReactNext.jsNode.jsExpressFastAPIFlaskTypeScriptTailwind CSSPostgreSQLMongoDBRedisGraphQL

Data & Cloud Engineering

Apache NiFiApache SolrApache OzoneApache TikaApache KafkaDockerKubernetesAWSCI/CDTerraformAirflowSpark

Tools & Practices

Git / GitHubLinuxVS CodeJupyterAgile / ScrumSystem DesignREST APIsMicroservicesTestingDocumentation

Projects

What I've Built

A selection of projects that showcase my expertise in AI, ML, and full-stack development

Fraud Alert Triage & Evaluation Pipeline

AI-driven alert triage module for fraud risk management at Paysys, enabling automated prioritization and classification of high-volume transaction alerts using a dynamically retraining ML pipeline that adapts to evolving fraud patterns with elastic data windows.

PythonXGBoostScikit-learnFastAPIPostgreSQLDocker

Dynamic retraining with elastic data windows

Real-time alert classification at scale

Continuous feedback loop integration

Enterprise Data Archiving Pipeline

Engineered a structured and unstructured data archiving pipeline using Apache NiFi, Tika, and Solr for content extraction, indexing, and searchability. Integrated long-term storage with Apache Ozone (S3-compatible) for scalability and durability.

Apache NiFiApache TikaApache SolrApache OzoneJavaS3

Multi-format content extraction

Full-text search with Solr indexing

S3-compatible lifecycle management

Agentic RL Framework for LLM Teaching

Designed and implemented an agentic AI framework for reinforcement learning tasks using Anthropic models at Preference Model. Built RL agents with frozen language models for inference, developing a teaching pipeline where agents curate high-quality offline datasets.

PythonAnthropic APIPyTorchRLTransformersRLHF

Agentic AI framework for RL tasks

Frozen LLM inference pipeline

Automated judge evaluation system

Insulin Resistance & ML Analysis

Applied advanced ML models to public physiological and clinical time-series datasets to study insulin resistance under noisy, non-stationary data conditions. Built robust preprocessing pipelines for biomedical signal analysis.

PythonTensorFlowPandasSciPyMatplotlibJupyter

Time-series clinical data modeling

Noise-robust feature engineering

Biomedical signal processing

Collaborative Filtering Recommendation System

Implemented sophisticated recommendation models on user–item interaction datasets (MovieLens-style), analyzing sparsity, cold-start, and scalability trade-offs with matrix factorization and neural collaborative filtering.

PythonPyTorchSurpriseNumPyPandasFlask

Matrix factorization & neural CF

Cold-start mitigation strategies

Scalability benchmarking

Reinforcement Learning Task Design

Designed custom RL environments with synthetic and semi-realistic datasets, focusing on reward shaping, evaluation stability, and policy gradient methods for complex sequential decision-making tasks.

PythonOpenAI GymStable Baselines3PyTorchRay RLlib

Custom Gym environment design

Advanced reward shaping

Policy gradient evaluation

RAG-Powered Knowledge Engine

Built a Retrieval-Augmented Generation system combining vector databases with LLMs for context-aware question answering over large document corpora. Features semantic search, chunk optimization, and hallucination reduction.

LangChainOpenAIPineconeFastAPIReactDocker

Semantic vector search

Context-aware response generation

Hallucination guard rails

Real-Time Sentiment Analysis Dashboard

End-to-end NLP pipeline streaming social media data through Kafka, performing real-time sentiment classification with transformer models, and visualizing trends on a live React dashboard with WebSocket updates.

TransformersKafkaReactWebSocketD3.jsFastAPI

Sub-second streaming inference

Transformer-based classification

Interactive D3.js visualizations

AI-Powered Code Review Agent

Autonomous code review bot using LLMs to analyze pull requests, detect bugs, suggest refactoring, and enforce coding standards. Integrates with GitHub Actions for seamless CI/CD pipeline integration.

GPT-4LangChainGitHub APINode.jsTypeScriptDocker

Automated PR analysis

Bug detection & code smell alerts

GitHub Actions integration

Distributed ML Training Platform

Scalable distributed training infrastructure supporting data and model parallelism across GPU clusters. Features automatic hyperparameter tuning, experiment tracking, and one-click model deployment with MLflow.

PyTorch DDPRayMLflowKubernetesTerraformAWS

Multi-GPU distributed training

Automated hyperparameter search

One-click model serving

Computer Vision Quality Inspector

Deep learning-based visual inspection system for manufacturing defect detection. Uses custom-trained YOLO and EfficientNet models with real-time inference on edge devices, achieving 98.5% defect detection accuracy.

YOLOv8EfficientNetOpenCVONNXTensorRTFastAPI

98.5% detection accuracy

Edge-optimized inference

Real-time video processing

ILF Cross-Currency Payment System

Led the Interledger Framework (ILF) project at Paysys enabling cross-currency payments using the Interledger Protocol. Built a robust settlement system handling multi-currency transactions with real-time exchange rates.

Node.jsTypeScriptInterledgerPostgreSQLRedisDocker

Cross-currency settlement engine

Interledger Protocol integration

Real-time FX rate handling

See More on GitHub

Experience

Career & Achievements

Nov 2024 — 2025

AI/ML Engineer — Preference Model

Remote, USA

Designed and implemented an agentic AI framework for reinforcement learning tasks using Anthropic models.

Built RL agents with frozen language models for inference, focusing on teaching LLMs effective strategies
Developed an RL-based teaching pipeline where agents curate high-quality offline datasets
Implemented automated judge system for model performance evaluation

Jun 2024 — Present

Software Engineer & AI — Paysys

Karachi, Pakistan

Working on AI-driven fraud risk management and cross-currency payment systems.

Built rule engines and AI models for fraud risk management using Tazama's platform
Led ILF (Interledger Framework) project for cross-currency payments with Interledger Protocol
Engineered data archiving pipeline with Apache NiFi, Tika, Solr, and Ozone

2021 — 2025

BS Computer Science — FAST NUCES, Karachi

Relevant Coursework: Machine Learning, Deep Learning, NLP, Computer Vision, Distributed Systems, Data Structures & Algorithms, Database Systems, Operating Systems

Competition

IEEE Xtreme Programming Contest

Achieved Top 10 nationally in the IEEE Xtreme 24-hour competitive programming marathon, demonstrating exceptional problem-solving skills under pressure.

Ongoing

AI/ML Mentor & Technical Writer

Mentored 50+ students in AI and ML fundamentals through online platforms. Regularly publish technical content, system design explanations, and ML engineering guides.

Hi, I'mSohaib Shamsi