My GitHub Projects 🚀

Clustering Algorithm Analysis

Description: Efficient pipelines are built for approximate nearest-neighbour search and k-means clustering on MNIST. Parts 1–2 invest in heavy index construction to accelerate queries directly in the raw \(28 \times 28\) pixel space via classic hashing and graph-based indices, whereas Part 3 first compresses images to a latent representation \(\lt50D\); timing, approximation, and Silhouette metrics are then compared across all parts.

  • Tech Stack: C/C++, Python, Jupyter Notebook, TensorFlow/Keras
  • Key Features:
    • Vector Search and Clustering (LSH, Hypercube)
    • Graph Nearest Neighbor Search (GNNs, MRNG, NSG)
    • Dimensionality reduction via NN autoencoders

Implementing a Shell

Description: Implementation of mysh, a lightweight, Unix-like bash shell.

  • Tech Stack: C/C++
  • Key Features:
    • I/O redirection
    • Pipelines
    • Background execution
    • Wildcard expansion
    • Alias management
    • Signal handling
    • Command history

Client-Server Model through TCP

Description: Implementation of a thread-pooled TCP poller server with a stress-testing client, ensuring safe concurrency through POSIX mutexes and condition variables.

  • Tech Stack: C/C++, Bash
  • Key Features:
    • poller - Multithreaded C/C++ server that queues incoming sockets,
    • pollSwayer - Multithreaded client that reads an input file and spawns one thread per voter

Data Mining Techniques: Customer Profiling & Goodreads Book Analysis

Description: Customers are segmented with Agglomerative and K-Means clustering for profile analysis, while cosine similarity on vectorized book descriptions supports a recommendation system.

  • Tech Stack: Python, Jupyter Notebook