AboutBlogContact
DatabasesMay 20, 2025 2 min read 174Updated: May 18, 2026

Postgres BML: Binary Model Loading and Vector Speed (2025)

AunimedaAunimeda
📋 Table of Contents

Postgres BML: Binary Model Loading

By 2025, the "vector database" hype has settled, and PostgreSQL has emerged as the winner. With the introduction of BML (Binary Model Loading), we can now do more than just store vectors; we can run the models that generate them directly inside the database process.

Why BML?

Previously, to get a vector embedding, you had to:

  1. Fetch text from Postgres.
  2. Send it to an external microservice (Python/FastAPI).
  3. Load the model in that service.
  4. Generate the embedding.
  5. Send it back to Postgres.

With BML, the model is a first-class database object.

Loading a Model

Using the new pg_bml extension, loading a quantized GGUF or ONNX model is a single command:

SELECT bml.load_model('text-embed-v3', '/models/embed-v3-q4.bml');

In-Database Inference

Once loaded, you can generate embeddings as part of your INSERT or UPDATE pipeline using a trigger or a simple function call.

INSERT INTO documents (content, embedding)
VALUES (
    'This is a deeply technical blog post about Postgres.',
    bml.embed('text-embed-v3', 'This is a deeply technical blog post about Postgres.')
);

The Performance Advantage

By eliminating the network round-trip and the serialization overhead between JS/Python and SQL, BML-powered inference is up to 5x faster than external calls. This allows for real-time semantic search even on high-throughput write workloads.

Hybrid Search in 2025

BML also enables "Smart Reranking" within the same query. You can use standard BM25 for initial retrieval and then use a small BML model to rerank the top 100 results based on semantic relevance, all within the Postgres execution plan.

SELECT id, content
FROM documents
WHERE embedding <=> bml.embed('text-embed-v3', 'query') < 0.5
ORDER BY bml.rerank('cross-encoder-mini', content, 'query') DESC
LIMIT 10;

Postgres has evolved from a storage engine to a complete intelligence platform. In 2025, if your data is in Postgres, your models should be too.


Aunimeda builds backend systems with optimized database architectures - PostgreSQL, Redis, ClickHouse, and more.

Contact us for backend and database engineering. See also: Custom Software Development

Read Also

MySQL 4.1: Finally, Subqueries are Here! (2003)aunimeda
Databases

MySQL 4.1: Finally, Subqueries are Here! (2003)

The wait is over. MySQL 4.1 finally supports subqueries. Learn how to ditch those hacky temporary tables and write cleaner, faster SQL.

Need for Speed: Optimizing MySQL 3.23 MyISAM Tables (2000)aunimeda
Databases

Need for Speed: Optimizing MySQL 3.23 MyISAM Tables (2000)

MySQL 3.23 is the engine of the web. If your queries are dragging, you probably haven't tuned your buffers or optimized your MyISAM keys. Here's how to wring every drop of RPS out of your database.

PostgreSQL EXPLAIN ANALYZE: Reading Query Plans Like a Senior DBAaunimeda
Databases

PostgreSQL EXPLAIN ANALYZE: Reading Query Plans Like a Senior DBA

Stop guessing why your queries are slow. Learn to read PostgreSQL query plans at a level where you can actually fix problems - seq scans, join strategies, row estimate disasters, and the N+1 you didn't know was hiding in your ORM output.

Need IT development for your business?

We build websites, mobile apps and AI solutions. Free consultation.

Get Consultation All articles