Suraj Barman How KV Caching Cuts Autoregressive Transformer Inference Time by Up to 5× Key‑Value (KV) caching stores the immutable key and value matrices generated by the attention layer so that they can be reused during each decoding step, removing the need to recompute them for previo...
Suraj Barman Can LLM Embeddings Improve Time Series Forecasting? A Practical Feature‑Engineering Guide Context History Large language models (LLMs) have reshaped many AI tasks, from text generation to code synthesis. Researchers began probing whether the semantic knowledge captured in LLM embeddings co...
Suraj Barman Building a Simple Semantic Search Engine with Sentence Embeddings Semantic search replaces exact keyword matching with meaning‑based retrieval. By converting text into sentence embeddings , each document is represented as a high‑dimensional vector that captures its ...
Suraj Barman How to Deploy AI Agents to Production: A Practical Architecture and Infrastructure Guide Deploying AI agents to production means turning a working prototype into a dependable service that can handle real‑world traffic and failures. It requires clear architectural choices, a well‑designed ...
Suraj Barman 5 Security Patterns for Agentic AI Systems Context History of Agentic AI Security Agentic AI, where autonomous software agents act on behalf of users or services, has rapidly moved from research labs to production environments. Early deploymen...
Suraj Barman Vector Databases vs. Graph RAG: Choosing the Right Memory Architecture for AI Agents Vector Databases vs. Graph RAG for AI Agent Memory AI agents require persistent memory to handle complex, multi‑step tasks. Two leading architectures- vector databases that store dense embeddings and ...
Suraj Barman How QUIC‑Based Proxy Mode Boosts Cloudflare One Client Performance Cloudflare Ones proxy mode now uses QUIC to keep traffic at the transport layer, removing legacy TCP conversion and delivering faster, more reliable connections for zero‑trust environments. Architectu...
Suraj Barman Automatic Return Routing (ARR) – Technical Guide for Cloudflare One Background and Evolution of Return Routing in Cloudflare One The public Internet relies on a one‑to‑one mapping between an IP address and its destination. Anycast extends this idea by announcing the s...
Suraj Barman Dynamic Path MTU Discovery Enhances Cloudflare One Client Resilience Dynamic Path MTU Discovery Enhances Cloudflare One Client Resilience Modern enterprise traffic faces frequent packet‑size mismatches that can stall uploads, video calls, or SSH sessions. By embedding ...
Suraj Barman How to Choose Between PCA and t‑SNE for Effective Data Visualization Choosing Between PCA and t‑SNE for Data Visualization Data scientists often need to turn high‑dimensional datasets into 2‑D or 3‑D plots that reveal patterns. font-weight: 800>PCA and font-weight: 800...
Suraj Barman Top 7 Small Language Models You Can Run on a Laptop Context History of Small Language Models on Consumer Laptops In the past few years, advances in model compression, quantization, and GPU acceleration have turned large language models (LLMs) from clou...
Suraj Barman Bag-of-Words vs TF-IDF vs LLM Embeddings: Performance in scikit-learn Classification & Clustering Bag-of-Words vs TF-IDF vs LLM Embeddings: Performance in scikit-learn Classification Clustering In scikit-learn pipelines, raw text must be transformed into numeric vectors before modeling. This artic...