PyData Berlin 2025 Notes

I just came back from the PyData conference in Berlin and apart from meeting a lot of great people, I took some tidbits away:

Docker’s Cache mounts can be used to solve the problem of a single pip dependency change invalidating the entire Docker cache. So instead of doing:
```
RUN pip install -r requirements.txt
```
You can do:
```
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt
```
Or, equivalently, for uv
```
 RUN --mount=type=cache,target=/root/.cache/uv \
     uv sync
```
This will cache the pip downloads and speed up subsequent builds.
You can, through the magic of WebAssembly, run DuckDB in the browser: https://shell.duckdb.org
For documentation, people (e.g. like Github, Cloudflare, basically everyone) are more or less following the Diátaxis framework. I have a Dejavu feeling that I have seen this somewhere before under a different name. The idea is to split documentation into four categories: tutorials, how-to guides, explanations, and references. Tutorials and how-to guides are task-oriented, while explanations and references are information-oriented (on a need-to-know basis).
How did I miss WebLLM before?
For PDF/HTML parsing and text extraction docling is the new hotness and already very promising.