Uncategorized

Large language models and generative AI: a recent hack day |



Engineering blog

Developers in the Product and Engineering department came together with colleagues from across the Guardian to explore the potential of LLMs and more

Fri 22 Dec 2023 06.02 EST

The discussion of large language models (LLMs) and generative artificial intelligence was everywhere in 2023 – not least in the Guardian’s Product and Engineering department. Hack days are a staple part of the software development culture, so it was no surprise that at this year’s final hackathon, several developers and data scientists focused their attention in this area – covering potential applications in podcasting, search and image generation. And who wouldn’t want a browser extension that assesses the mood of a news article and finds an appropriate music track to enhance your reading experience?

In total, the teams, including colleagues from across the Guardian, produced and presented 24 hacks in the course of the two-day event – some of which took an entirely different turn. These included: a product that spits out cultural recommendations; a dedicated platform for showcasing Guardian documentaries; experiments with ChatGPT and Google Bard involving tools like Trello to improve efficiency; and a Guardian-themed generative AI screensaver.

Here are three acts that won an award:

crosswordsplus hack screenshot Photograph: Dana Dramowiczs/The Guardian

Best technical hack: CrosswordsPlus

Play crosswords with your friends in real time with the Guardian’s Puzzles and Crosswords app on iOS. Built using SharePlay, a native iOS framework, this hack enabled multiplayer functionality even if the user is not signed-in. Apple takes care of the data transfer and offers end-to-end encryption – you just simply start a session using FaceTime, iMessage or AirDrop.

Best entertaining hack: 5/15s in 515s

A hack which takes a weekly “5/15” report – through which teams update on their progress – and converts it into an audio podcast that is 515 seconds long. Perfect for any manager who might not otherwise have time to read everything! The hack works by using an OpenAI GPT-3.5 model to summarise each team’s report before converting this into audio using a ext to speech model.

5/15s hack presentation screenshot Illustration: Mahesh Makani/The Guardian

Most Artificially Intelligent hack: Linguini Labelling Method

Building labelled training datasets is crucial for successful machine learning projects, especially in Natural Language Processing (NLP) – but this process is often laborious and time-consuming. Quite often you need multiple annotators labelling thousands of examples to build a performant model, and ideally you want them to annotate the same examples and to reach a good level of agreement.

The idea for this hack was to leverage the ability of Large Language Models (LLMs) to recognise concepts in text and use them as additional annotators to speed up the process and make it more robust. The proposed pipeline integrates various LLMs alongside human annotators: when both entities agree on a data point, it seamlessly enters the training dataset; if discrepancies arise, a human review is triggered, focusing efforts solely on annotating “difficult” examples, maximising efficiency.

LLM hack presentation screenshot Photograph: AnnaV Vissens/The Guardian



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *