40° 48′ 32″ N  ·  73° 57′ 41″ W / bbox [-73.985,40.795,-73.945,40.825] / commit 837f98e / built May 2026
Columbia EECS E6895 · Final Project · 2026

An illuminated atlas
of the city,
narrated by an agent.

Palimpsest plans a short walking tour for a bounded slice of Manhattan and narrates it from free public-domain sources — every claim cited back to a retrieved document, verified at generation time.

v1 — Wikipedia + OSM 120 unit tests, all green pgvector + PostGIS Anthropic Claude
§01 · the problem

Why
this matters.

A large language model will tell you Riverside Church is a thirteenth-century gothic cathedral with absolute confidence. It isn't. Hallucinations about places — names, dates, who built what — are a citation problem, not a model-scale problem.

“LLMs are confident liars about places.”

The fix is not a bigger model. The fix is a retrieval contract that refuses to render a sentence the model can't ground in a real document from a real source — and a UI that shows the user exactly which document.

Palimpsest treats narration the way an academic treats a footnote: every assertion is anchored to a doc_id, a paragraph offset, a source feed, and a retrieval score. The verifier rejects any sentence that fails the contract. The frontend renders the citation as a chip you can click.

unconstrained · gpt-style

Riverside Church, completed in 1192, was funded by Andrew Carnegie and houses the oldest bell carillon in North America. Martin Luther King Jr. delivered his “Beyond Vietnam” speech here in 1965.

palimpsest · v1

Riverside Church, completed in 1930wiki·p3, was funded by John D. Rockefeller Jr.wiki·p4 and houses the Laura Spelman Rockefeller Memorial Carillonwiki·p7. King delivered “Beyond Vietnam” here in 1967wiki·p11.

§02 · the product

An agent
that walks
the page.

Tell it where you want to go. It plans a route inside the bbox, queries each stop against the corpus, narrates the walk, and lets you click any sentence back to the source paragraph it came from.
a walk near Columbia streaming
Morningside Pk. N
Morningside Heights · Upper West Side 1 : 14 400
Riverside Church
Low Library
Cathedral of St. John
Morningside Park
§03 · how it works

Five pieces,
each minimal.

A retrieval-augmented agent on top of a pgvector + PostGIS corpus. The web shell streams server-sent events; the verifier holds the line on every sentence. No tricks, no magic — just a small, honest pipeline.
01

Browser

3D map on the left, chat pane on the right. React + Vite + TypeScript + MapLibre GL.

apps/web
02

Agent

FastAPI service runs Claude in a tool-use loop and streams events back as SSE.

apps/agent
03

Tools

Typed Pydantic schemas: search_places, plan_walk, narrate.

apps/agent/tools
04

Corpus

928 places — Wikipedia + OSM in v1 — embedded with pgvector, indexed with PostGIS.

data/corpus
05

Verifier

Rejects any sentence missing a complete 5-field citation. Runs at generation time.

apps/agent/verify
§04 · status

By the
numbers.

Milestone 1 ships bounded, demoable, online-only. The corpus is fully embedded, the contract is enforced, the verifier is green against the test suite.
928
places indexed
Wikipedia + OpenStreetMap, bounded to a tight Manhattan bbox.
120
unit tests, all green
Pytest + a hand-rolled SSE replay harness for tool-call recordings.
100%
embedding coverage
Every corpus row has a vector. No nulls, no fallbacks, no excuses.
§05 · the contract

Five fields,
or it doesn't
render.

The verifier is the smallest interesting piece of the system. It checks every sentence the model generates against a strict five-field schema — if any field is missing, the sentence is dropped before the user sees it.

The schema.

A citation must resolve to a real document, a real paragraph inside that document, a real retrieval score above threshold, and a real source feed. No string-matching, no “trust me”.

  • doc_id— stable identifier inside the source feed
  • source— the feed it came from (wikipedia / osm / chronicling)
  • paragraph— offset into the document body
  • score— cosine similarity, must clear 0.42
  • retrieved_at— ISO 8601 timestamp of the retrieval call
{
  "sentence": "Riverside Church was completed in 1930.",
  "citations": [
    {
      "doc_id":      "wp:Riverside_Church",
      "source":      "wikipedia",
      "paragraph":   3,
      "score":       0.871,
      "retrieved_at":"2026-05-10T14:22:07Z"
    }
  ],
  "verified": true   // passes contract
}
§06 · sources

Public
domain,
cited back.

Every claim in the narration resolves to one of these feeds. v1 ships with Wikipedia and OpenStreetMap; the others are tracked for v2 — schemas drafted, ingesters stubbed.

Wikipedia

Long-form narrative, dates, names. Pulled via Wikidata SPARQL for the bbox.

v1 · 612 docs

OpenStreetMap

Geometry, addresses, building footprints. POI tags drive the “type of place” field.

v1 · 316 features

Chronicling America

Library of Congress newspaper archive. Periodical mentions, contemporary reporting.

v2 · drafted

NYPL Digital

New York Public Library Digital Collections — historical photographs, maps, plates.

v2 · drafted

NYC Open Data

Landmark designations, building age, zoning. Authoritative civic record for the city.

v2 · drafted

MTA GTFS

Subway and bus geometry. Optional walk leg via transit when the route gets long.

v2 · tracked

NOAA Weather

Local forecast at tour-start time. Narrator adjusts diction when it's raining.

v2 · tracked

More to come

Curated additions only — Palimpsest does not crawl. Every feed is licensed, public-domain, and explicit.

+open to proposals
§07 · the stack

What's
under the
hood.

No frameworks-on-frameworks. Each layer is one well-understood tool, picked because it does its job and gets out of the way.
Frontend
·React 18 ·Vite ·TypeScript ·MapLibre GL ·TailwindCSS ·EventSource
Agent & API
·FastAPI ·Pydantic v2 ·Anthropic Claude ·Server-Sent Events ·uvicorn ·structlog
Retrieval
·Postgres 16 ·pgvector ·PostGIS ·HNSW index ·bge-small-en-v1.5 ·cosine @ 0.42
Infra
·Vercel (web) ·Fly.io (agent) ·Neon (pg) ·GitHub Actions ·Sentry ·pytest · ruff
§08 · the team

Built
by three.

Graduate students in the Department of Computer Science at Columbia, shipped over a semester. Course: EECS E6895, Advanced Big Data & AI.
CY

Chenhao Yang

retrieval & corpus

Built the ingest pipeline, the embedding store, and the spatial index. Wrote the verifier.

Columbia · M.S. CS
KJ

Kaining Jia

agent & API

Designed the tool-use loop, the SSE stream, and the typed Pydantic contract for every tool call.

Columbia · M.S. CS
TD

Thomas Duan

frontend & cartography

Made the map breathe — engine wrapper, marker style, flyTo choreography, and the chat shell.

Columbia · M.S. CS

Walk the page
with us.

The codebase is public and the documents are public. Clone it, run it against your own bbox, or open an issue with a feed you'd like us to index.