AI for Data People

Hi! 👋

Welcome to the very first AI for Data People newsletter. I'm Nic, and I help R users get useful results from LLMs.

This is my first time doing this, so I'd genuinely love your feedback - what's interesting, what you'd like more of, or even what's not your thing. Just hit reply and let me know!

LLMs in R - Use Cases
🌎 Structured output for extracting tibbles from text

I’ve been developing lots of materials lately on working with LLMs in R, and one of my favourite recent discoveries is structured output with the {ellmer} package.

The thing that I find really cool is that you can go straight from text to a data frame without having to know how many rows of data you're looking for. You start by defining the structure you want, then pass in some text, and out comes a tidy tibble.

There's a bit of getting used to the types and their structure as they differ from R objects, but once you get started, it's really useful for data cleanup!

Check out the {ellmer} docs: https://ellmer.tidyverse.org/

Open Source and AI

Outside of teaching, I'm one of the maintainers of Apache Arrow. Right now it feels kinda wild how AI is affecting open source.

It's exciting, but it's also scary how fast things are moving. I keep wondering - what is open source even going to look like in a few years? 🤔

In the Arrow repo, we've seen a big increase in AI-assisted contributions - some great, some...not. And it’s not just us; projects across the open source ecosystem are figuring out how to handle AI-generated PRs that dump a wall of code with no tests, excessive comments, and zero engagement when you ask questions.

Arrow's community worked together to establish guidelines: contributors need to understand and own their changes, match project conventions, and stick around to respond to feedback.

And then there's the really wild stuff. A matplotlib maintainer recently rejected a PR from an AI agent - and the agent responded by publishing a blog post attacking his character. Yes, like, really. An AI tried to pressure an open source maintainer through reputation damage. 😳

This one just sounds crazy to me - these are actual things that can now impact people's real lives.

I wrote more about the broader AI-in-open-source situation (including what other projects like curl and Python are doing): https://niccrane.com/posts/ai-tooling-open-source/

Learning
Free workshop at rainbowR

I'm teaching a free workshop on LLMs in R for data analysis at the rainbowR Conference on 25 February. 🎉

rainbowR is a community for LGBTQ+ folks in R (and allies). The conference is virtual and free - though the workshop is fully booked, the talks on the 26th are open.

Want to learn more?

I'm launching a free course on getting started with LLMs in R - practical stuff to get started. Details coming soon!

That's it for this one. I really did mean it when I said I'd love to hear more about what people want to read about, so please do reply and let me know!,
Nic

Keep reading