Matthias Lüdtke's better-idea.org

PyData Berlin 2025 Notes

I just came back from the PyData conference in Berlin and apart from meeting a lot of great people, I took some tidbits away:

  1. Docker’s Cache mounts can be used to solve the problem of a single pip dependency change invalidating the entire Docker cache. So instead of doing:

    RUN pip install -r requirements.txt
    

    You can do:

    RUN --mount=type=cache,target=/root/.cache/pip \
        pip install -r requirements.txt
    

    Or, equivalently, for uv

     RUN --mount=type=cache,target=/root/.cache/uv \
         uv sync
    

    This will cache the pip downloads and speed up subsequent builds.

  2. You can, through the magic of WebAssembly, run DuckDB in the browser: https://shell.duckdb.org

  3. For documentation, people (e.g. like Github, Cloudflare, basically everyone) are more or less following the Diátaxis framework. I have a Dejavu feeling that I have seen this somewhere before under a different name. The idea is to split documentation into four categories: tutorials, how-to guides, explanations, and references. Tutorials and how-to guides are task-oriented, while explanations and references are information-oriented (on a need-to-know basis).

  4. How did I miss WebLLM before?

  5. For PDF/HTML parsing and text extraction docling is the new hotness and already very promising.