EDB Engineering Newsletter #2(English)

Jan 07, 2025

Welcome to the 2nd edition of the EDB Engineering Newsletter! Where we share with you interesting links in the data world that the EDB Engineering team has enjoyed, as well as other news about what the EDB Engineering team is up to!

What we’re following

New Amazon S3 Tables: Storage optimized for analytics workloads

This is effectively AWS exposing Iceberg table metadata as a first class platform-level citizen with management via s3tables CLI and AWS web platform console. This management comprises CRUD and table maintenance (e.g. compaction). But all actual ingestion, consumption and processing of the data in these table buckets happens through other AWS services like EMR, Redshift, Glue, Athena, and Data Firehose.

https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads

Use Ollama with any GGUF Model on Hugging Face Hub

Ollama now integrates with Hugging Face Hub to use any GGUF (a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes) Model (both public and private). This integration simplifies the process, enabling users to execute models with a single command without additional setup. With this integration, Ollama will likely become more popular.

https://huggingface.co/docs/hub/en/ollama

Use of Time in Distributed Databases, parts 1-3

Murat Demirbas wrote an excellent series of posts on why and how distributed databases handle time in the face of unreliable networks and unreliable clock hardware. He covers important historical background: from logical and vector clocks to NTP, to important modern designs like Clock SI, and production systems like Dynamo and Spanner.

Building effective agents

This article helps to describe not just what agents and workflows are but how you might use major frameworks like LangChain to build them and some various architectures you might choose to model your system after. We also enjoyed Simon Willison’s review of this post.

https://www.anthropic.com/research/building-effective-agents

Make tuple deformation faster in PostgreSQL

David Rowley has been working on a series of patches to make it faster to “deform” a Postgre tuple. Deforming is the process of extracting tuple data into separate arrays containing cell values and cell isnull booleans. “The performance increase seems nice at around 5-20% with my tests.”

https://www.postgresql.org/message-id/flat/CAApHDvrBztXP3yx%3DNKNmo3xwFAFhEdyPnvrDg3%3DM0RhDs%2B4vYw%40mail.gmail.com

Improvements to multi-column GROUP BY in DataFusion 44

In DataFusion v43 (the previous release) there was a significant improvement to avoid converting from column format to row format when grouping by multiple columns. Queries like SELECT ... FROM ... GROUP BY col1, ... colN. The change relied on specialized code for the types of col1, …, colN.

In v44 we saw improved support when grouping by multiple columns that involved time-related types. And improved vectorized operations for append and equal to operators within a multi-column GROUP BY.

You can find the full v44 release notes here.

PostgreSQL and Fil-C

In a series of Twitter posts, Filip Jerzy Pizło has been sharing his progress getting PostgreSQL to build with his memory-safe fork of Clang called Fil-C. Fil-C has already been able to build other major projects like OpenSSH and Lua without code changes. He has uncovered a number of interesting, but safe, tricks PostgreSQL does and has been updating Fil-C to become aware of these tricks. He may even have found a bug in PostgreSQL.

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning

https://www.youtube-nocookie.com/embed/KrRD7r7y7NY?rel=0&autoplay=0&showinfo=0&enablejsapi=0

Andrew Ng highlighted the transformative potential of “agentic AI.” He described this paradigm as one in which AI agents iteratively plan, act, and refine solutions, mirroring human problem-solving.

He emphasizes the role of generative AI in accelerating prototyping and the shift towards leveraging unstructured data like text, images, and videos for business impact.

Ng mentioned key design patterns such as reflection, tool use, and multi-agent collaboration, alongside advancements in large multimodal models (LMMs) capable of handling complex visual tasks.

Finally, he stresses the importance of responsible development, data engineering (handling unstructured data), and emerging AI applications, positioning this era as a pivotal time for AI builders.

From the EDB team

Pure parsers and reentrant scanners in PostgreSQL

Peter Eisentraut has been working on a series of patches to make parsers in PostgreSQL thread-safe. Beyond being cleaner, this removes one roadblock stopping PostgreSQL from switching to a thread model, should the community decide to do so.

https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b%40eisentraut.org

Cloud Neutral Postgres Databases with Kubernetes and CloudNativePG

With CloudNativePG, organizations can embrace a robust, high-performance, and standardized approach to running PostgreSQL clusters across bare metal, hybrid, or multi-cloud environments, all driven by declarative configuration. This empowers DBAs to implement cloud-neutral, shared-nothing architectures that avoid vendor lock-in while retaining full control over data and performance. As businesses increasingly move away from cloud-dependent models, CloudNativePG offers a scalable, future-proof solution for managing PostgreSQL in diverse and complex environments, including a return to on-premises deployments.

https://www.cncf.io/blog/2024/11/20/cloud-neutral-postgres-databases-with-kubernetes-and-cloudnativepg

Postgres Hacking Workshop – January 2025

Robert Haas hosts the PostgreSQL Hacking Workshop, a monthly virtual meetup, where attendees watch an existing PostgreSQL talk and discuss the talk together. It attracts a mix of experienced PostgreSQL contributors and folks who are newer to PostgreSQL. “We’ve been really lucky to have some experienced people who attend month after month,” Robert said, “but the goal is really to help more people get involved, so I’m always excited to see new people on the call.” So whether you’re new to contributing to PostgreSQL or not, check out the PostgreSQL Hacking Workshop if you’re free!

https://rhaas.blogspot.com/2024/12/postgresql-hacking-workshop-january-2025.html

Explaining ABI Breakage in PostgreSQL 17.1

In the previous edition of this newsletter we shared a link to the mailing list where Pavan Deolasee first discovered the ABI breakage in PostgreSQL 17.1. Pavan wrote a followup blog post going into more detail about what went wrong.

https://www.enterprisedb.com/blog/explaining-abi-breakage-postgresql-171