Matheus das Mercês for AWS Community Builders

Posted on Jun 17

Building a Serverless, Multi-Backend Web Search Service for AI Agents on AWS

#aws #serverless #architecture #ai

Introduction

From small ones to more complex agentic architectures, agents are everywhere. As more teams build AI-powered solutions, web search is becoming a fundamental capability: access current information, verify facts, and gather external context.

In this article, I'll walk through how we built a serverless, multi-backend web search service at PostNL, creating a foundation that can evolve from a single search backend to a centralized, provider-agnostic service on AWS.

Why web search as a shared capability

While integrating a search provider is relatively straightforward for a single app, the challenge becomes more interesting when multiple teams start building agents that need to access information beyond what the LLM already knows.

If teams have the freedom to implement their web search services independently, that can lead to fragmentation. Different providers, different APIs, some of them relying on public web search APIs, different cost models, etc.

At that point, web search is no longer just an application feature, but a shared capability. Providing a centralized web search service enables teams to consume a consistent interface while allowing platform owners to manage costs, operations, and future provider choices in a single place.

Designing a centralized web search service

At PostNL, the AI Center of Excellence (AI CoE) supports teams across the organization in adopting and scaling AI solutions. As more teams began experimenting with AI agents, the need for a web search service became clear.

Rather than recommending a specific provider, we decided to focus on a platform-oriented solution. Our goal was to provide a single web search capability that teams could easily consume, while keeping the underlying search implementation flexible and replaceable.

From the beginning, we defined a few key design principles:

A single interface for all consumers;
Support for multiple search backends;
Low operational overhead;
Cost-efficient operation;
The ability to evolve without impacting consumers;

These principles led us to an architecture centered around a lightweight routing layer that sits between AI applications and the underlying web search providers.

Why the backend doesn't matter to consumers

Perhaps the most important aspect of the design is that consumers never interact directly with the underlying web search providers.

Because all requests flow through the routing layer, the backend can evolve independently of the applications using the service. Today, the router forwards requests to the primary search backend; tomorrow it could route traffic to additional providers, apply failover policies, or make routing decisions based on cost, geography, or freshness requirements.

Initial architecture

The service is exposed through a private API Gateway, ensuring that only authorized consumers within the network can access it through a VPC endpoint.
Requests are routed to a lightweight Lambda-based router, which acts as the abstraction layer between clients and the underlying search backend.
The initial backend is a self-hosted SearXNG deployment running on ECS Fargate behind an internal Application Load Balancer, providing a scalable and centrally managed web search capability while keeping the architecture flexible for future providers.

Building the Router

The Router is the central component of the system. It enables future backend changes without impacting consumers.

The overall flow is fixed:

Receive requests from API Gateway
Forward requests to the primary backend
Return responses from the backend
Perform basic request validation if required

The Router follows a hexagonal architecture, also known as ports and adapters. The application core defines the web search behavior and interfaces, while infrastructure concerns such as API Gateway events, HTTP clients, and provider-specific integrations are implemented as adapters.

This keeps the core logic independent from any specific backend. Today, the Router forwards requests to SearXNG. In the future, additional adapters can be introduced for other providers without changing the public API or affecting consumers.

Runtime choice

For the runtime, we selected Go because the Router is mostly a lightweight HTTP orchestration component. It does not perform heavy computation; it validates requests, applies routing logic, calls a backend, and returns a response.

Go is a good fit for this type of workload because it provides:

Fast startup times
Low memory usage
Strong HTTP support in the standard library
Simple concurrency primitives
Easy deployment as a small Lambda binary

The reasoning behind Lambda

AWS Lambda was selected to keep the operational footprint small. The Router does not need long-running infrastructure, local state, or complex runtime management. Keeping it stateless allows Lambda to scale horizontally as request volume changes.

At first glance, introducing a Lambda-based router in the request path may raise scalability concerns, especially for a service that could be consumed by multiple applications. However, the router intentionally remains lightweight, performing only request validation, routing, and protocol translation. By keeping the component stateless and focused on orchestration rather than search execution, Lambda provides automatic scaling with minimal operational overhead while maintaining low latency.

This separation keeps the Router simple today while allowing it to evolve later with routing policies, fallback logic, circuit breakers, and additional backend adapters.

Deploying an open-source web search backend

After establishing the Router layer, the next decision was selecting the initial search backend.

Rather than immediately integrating commercial web search providers, we wanted a solution that was self-hosted, easy to experiment with, and replaceable in the future.

For the MVP, we selected SearXNG, an open-source metasearch engine.

Why SearXNG?

Requirement	SearXNG
Self-hosted	✅
Open source	✅
Aggregates multiple engines	✅
Easy container deployment	✅
Vendor independence	✅

SearXNG runs as a containerized service on ECS Fargate and is exposed internally through an Application Load Balancer. For this workload, ECS Fargate offered the simplest path to a production-ready deployment.

Consuming the service

As the service is exposed through a private API Gateway, teams at PostNL can consume the service through a standard HTTP interface, where authentication, quotas, and usage tracking are handled centrally. This provides clear visibility into adoption and usage patterns while ensuring that the platform can be governed consistently as more teams onboard.

The platform is intentionally focused on search retrieval. Responses consist of search results returned by the backend and are not interpreted or summarized by an LLM. Any reasoning or answer generation remains the responsibility of the consuming application.

Preparing for a multi-backend future

Although the initial implementation relies on a single backend, the service was designed from the beginning to support multiple search providers. The Router follows a hexagonal architecture, where the application core remains independent from backend-specific implementations through well-defined ports and adapters.

Potential future enhancements include:

Additional search providers
Provider failover and fallback mechanisms
Health-based routing
Query-specific routing policies
Configurable provider selection

By investing in the abstraction layer early, the service remains flexible and can evolve incrementally as requirements change, while continuing to provide a stable interface for consumers.

Conclusion

As AI agents become more common, capabilities such as web search are increasingly moving from individual applications into shared capabilities. By centralizing web search behind a consistent interface, teams can focus on building solutions rather than integrating and operating search providers.

The specific technologies and providers will likely evolve over time, but the underlying principle remains the same: common capabilities are often most valuable when they are provided once and consumed by many.

DEV Community