Taking Generative AI From Prototype to Production

So… your teams have been experimenting with Generative AI, and even launched a handful of promising prototypes. What do you need to bring these applications to production so they move the needle for your business? 

Artium hosted an event during NY Tech Week to discuss the unique challenges teams face on their journey to turn their Generative AI-powered prototypes to full-fledged production applications.

Watch the video above or listen to the Crafted podcast edition on Apple Podcasts, Spotify, or your favorite podcast app

On stage are:

      • Catherine Miller (CTO, Flatiron Health)

      • Raghvender Arni (Director, Customer Acceleration Team, AWS Industries)

      • Justin Zhao (Founding Engineer at Predibase)

      • Jacopo Tagliabue (Founder of Bauplan, Adj. Prof. of ML at NYU)

      • Dan Blumberg (host of Crafted, Artium’s podcast about great products and the people who make them)

    If we can help you take your idea from prototype to production, please reach out! Email us at hello@thisisartium.com and tell us what you’re building!

    Full Transcript

    Arni – 00:00:00: No matter what we want to talk about, our customers want to talk about one thing, which is GenAI. Doesn’t matter what the problem is, the answer is GenAI.

    Dan – 00:00:08: Hey, Crafted listeners, we have a special one for you today, and it’s all about the topic of the year, Generative AI, and more specifically, how and whether to take all those experiments, prototypes, and proofs of concept you’ve built and put them into production. I asked these questions in more of an incredible panel at an NYC Tech Week event that Artium hosted in October. I was joined by Raghvender Arni from Amazon Web Services.

    Arni – 00:00:34: Money appears out of nowhere now. People are just like, oh, yeah, no problem. Money is not a problem. Just make it work.

    Dan – 00:00:39: Cat Miller, the CTO of Flatiron Health and a previous guest on this podcast.

    Cat – 00:00:44: When I get stuck, I just copy paste my error message or like a really whiny, why did it do this? And it gives me an answer, which is like correct a lot of the time. And when it’s wrong, I’m like, no stupid head, you’re wrong. Tell me another answer. And it gives me a correct answer. That is frigging magic.

    Dan – 00:01:00: Justin Zhao, the founding engineer of Predibase.

    Justin – 00:01:04: Sort of the fun moments are sort of starting to die down, and we see this realism hammer coming into the picture.

    Dan – 00:01:11: And Jacopo Tagliabue, founder of Bauplan.

    Jacopo – 00:01:15: The real answer is that because everybody wants fancy model, what you really need is better data. The power is in the data.

    Dan – 00:01:21: Welcome to Crafted, a show about great products and the people who make them. I’m your host, Dan Blumberg. I’m a product and engagement leader at Artium, where my colleagues and I help companies build incredible products, recruit high-performing teams, and help you achieve the culture of craft you need to build great software long after we’re gone. As you may suspect, we’ve gone deep on Generative AI. Not only are we advising our clients on how they can use GenAI in their industry, but we’re also building our own software and guiding our clients with a new approach to building software, one with LLMs in the loop. You can learn more on our website at thisisartium.com. Okay, here’s our conversation, GenAI from prototype to production, recorded during NYC Tech Week in October. I started by asking our panel to share more on what they do and where they are in their GenAI journey.

    Arni – 00:02:19: Super happy to be here. So I lead a team called the Customer Acceleration Team within AWS, which is a team of software engineers, designers, data scientists. And we work closely with some of AWS’s largest customers across a range of industries in sitting down and like, hey, what problem are we trying to solve and use software to address it. And work with consulting partners to take it to prod. We work across data, IoT, robotics. So the team is a global team and we work closely across all the spaces. The last nine months, no matter what we want to talk about, our customers want to talk about one thing, which is GenAI. Doesn’t matter what the problem is, the answer is GenAI right now. So that’s usually been what we’ve spending our time on. I’d say in the last Under an year, we’ve done 90 different engagements with customers. You know, financial services, telco, auto manufacturing, healthcare, energy, media, sports. And it’s been amazing. I’ve been in tech for a long time. And it’s amazing that the amount of focus that CEOs and board of directors are like, hey, what’s our AI strategy? Obviously, there’s much more that goes beneath. They’ve never really sat down and looked at their tech strategy. But now everybody wants to understand the GenAI strategy. So never seen that level of focus. Normally, when we go, we have to find funding, find budgeting. Money appears out of nowhere now. People are just like, oh, yeah, no problem. Money is not a problem. Just make it work. So that’s been the zeitgeist over the last nine months, I’d say. What’s been working really well, at least now? The first three to four months was somewhat frustrating. We’d walk in, customers want to do something, mainly because, as you said, their boss wanted to see something. But I think the last couple of months, more Weiser councils have prevailed to say, wait, this is, first of all, expensive. If you can find a GPU, more power to you. So compute is hard to find. Compute is expensive to use, which means that more and more customers have started to think about use cases. The second is, it takes a little bit of time for customers to think through, hey, here’s my business problem, can GenAI really solve it? And let me apply the right model, the right approach to get there. So my guess is over the next, the rest of the year, and definitely in the next year, while the first six to nine months was heavy experimentation and a lot of like searching in the dark, The rest of the year and certainly next year is going to be more fine-tuned use cases, much better business cases, and now going into scaling. So that’s what I’m more optimistic about. So that’s been my experience and my introduction, but I’ll let the rest of them speak.

    Justin – 00:05:09: Thanks. That was a great introduction. My name is Justin. I’m a tech lead manager at Predibase. I lead the machine learning team. And before Predibase, I was at Google for six years. I worked on natural language, or I should say, language models inside the Google Assistant. So I’ve been in the machine learning space for a while. And yeah, Predibase is all about infrastructure for machine learning. And over the last year, we see a huge spike in interest in fine-tuning. And so now we’re basically a fine-tuning infrastructure company. Some things that are going well, like I think there’s a lot of excitement. And as someone who kind of joined the machine learning boat like almost 10 years ago at this point, what drew me initially to language models and generative models overall is, for me, it was always a compelling story about how a machine can express creativity. And it’s been very gratifying to see kind of more people join the boat and be like, oh, wow, ChatGPT is cool. And oh, my gosh, this technology is really interesting. And I’m feeling inspired. So that’s really, really exciting. We see that in our customers as well. And also in customer conversations. Something that maybe is not going as well is I think people are at the point where they’re actually trying to think about how this becomes a real app. And for this to become an app, it needs to be financially viable. You need to have the right hardware to train these things. You need to be able to pay for that hardware. And I think there’s questions about efficiency and, you know, concerns about privacy and things like this. So I think sort of the fun moments are sort of starting to die down. And we see like this realism hammer coming into the picture. And I think that’s what we’re going to talk about more today. For sure.

    Cat – 00:06:53: I’m Cath I’m CTO of Flatiron Health. I’ve been there for nine and a half years, been CTO for one and a half. We’ve been doing ML modeling for seven years at this point as part of our data extraction strategy. So one of the things we do is we’re an organization that has a ton of data about patients with cancer and we structure that data. So we take unstructured records, we take the doctor’s note about what happened in a patient visit, and we give structure to it so it can then be used for research purposes to better inform care and decision making and so on. So something that’s going well is I find that we’ve had a couple of successes around the kinds of tasks where you have an internal task that has a fair bit of toil associated with it, where you can have like a ChatGPT or, you know, a certain sort of LLM do a first pass and get like 80% there. So examples are like writing SQL. I find that ChatGPT , actually GPT-4, in particular, writes quite good SQL if you give it a database schema. Another example is we have a lot of mapping tasks. So let’s say you’re a doctor’s office and you have a whole bunch of different medicines and you call them different things and we need to map them to standard SNOMED codes. It’s another example of where you can kind of easily understand why an LLM would be pretty good at that. And it’s definitely possible to build those workflows in a way where they save 80% of your time because they give you a lot of suggestions and they give you, you know, maybe you can use one to cross-check another. And then you have a human come in and kind of do the hard stuff. So, like, those are going pretty well. And I just talked about them as being internal use cases. So the productionisation as such is really just, can you run it repeatedly? Is it safe to run on whatever kind of confidential information you have in-house, et cetera, et cetera? So we’re not talking about standing up something that the, you know, entire wide-wide world can then hit as an endpoint, costing us a lot of, you know, dollars in GPUs. So that’s been going well. What I would say is going less well. I mean, one is that I do not find that I have infinite dollars to spend on these things. So I wish I was the companies you were talking to. What I think is actually interesting is I said that we’ve been in this game for a while. We’ve built a lot of models over time. And so right now we’re like building model by model, right? You know, you make a BERT model for stage and then you make another one for a biomarker. You kind of make a bunch of different models. And I was like, it would be great if LLMs just let me like, you know, I didn’t have to build models anymore. I just like fine-tuned on some information. And then one LLM could kind of answer all my questions. And while it can do that, okay, in tasks where we really need high accuracy, which is like a lot of our tasks, it’s really not performing as well as the more traditional ML models that we’d already built. And so, you know, I don’t know if that’s the end state. That’s just where we are right now with our explorations. But that’s like a little bit of a bummer because it sure would have been nice to be able to get all that for free.

    Jacopo – 00:09:40: I’m Jacopo. Thanks so much for having me. I’m the CTO and co-founder of Bauplan, and I’m by far the least competent person in GenAI on the panel. My company builds data infrastructure, but I have a lifelong interest in language. My first company was actually an NLP company, so I’ve been in NLP for quite a long time, and then I got out of NLP to build data tools just when everybody got into NLP, and everybody asked me the same questions, like, why did you do that? And I always get the same answer, which is when Crypto people come into your field, that’s the moment for you to get out and do something actually better with your life. No, the real answer is that because everybody wants fancy model, what you really need is better data. The power is in the data. Model gets commoditized much quickly. Unless you’re a company building model, models don’t really matter. As much as people like me getting PhDs and writing papers all the time, it’s painful to admit, but it’s not that hard to do it at the end of the day. It’s much easier to commoditize model than to commoditize data. It’s a transformation that goes from your data to be fed into those models. So at least that’s my, you know, very grumpy old man boomer position on the old thing. Things have been going well. Well, you know, aside from, you know, from being skeptical of many of the hype, it’s fantastic that people get interested in NLP. It is fantastic that we see NLP being solved in some of the tasks that were impossible to be considered, even remotely solved, like three, four years ago, right? Sometimes models now do things that, if you ask me or other NLP experts, five years ago, we would have said, this will take 25 years. And now it’s kind of solved, right? So that, I think, is good. You know, a lot of company will die. Most of the ideas that we see every week will not survive the next two weeks. They’re just how, you know, hype cycles are going. Again, remember blockchain 2017? But, you know, but we got Coinbase out of that, which is, you know, a respectable company. So I’m sure something will come out of this as well. So that’s on the good side. On the bad side is LLM seems to be sucking the hair out of everything related to ML and software. For no good reason. The problem is that most of the, I’m a B2B person, I’m a B2B founder. Most of the problems in the real world, tends to be tabular in nature. Even when they’re not tabular, even when they’re unstructuring the axis, tend to be structured in the Ys. There are prediction, there are tags, there are a classification problem. And a specialized, small, cheap model that you could have made three years ago with good data will still be much better than sending all your data to OpenAI API, costing four times as much, and spend your afternoon trying to prompt this thing like it’s a black magic box. And while all of this is obviously trivial once you heard it from a person that knows what it’s doing, most companies seem to throw all of this out of the windows for some hype or whatever. And it doesn’t matter what the problem is, the solution is GenAI. Well, the solution is never the tech. Tech is never a solution. Again, as painful as for somebody that builds tech, publish papers, do open source all the time. The point is, what is the problem? And are you back prop, if you want to use a machine learning term, from the problem to the solution? And there’s not just many. There’s so many problems, outside from CodePilot’s, which I love, by the way, that have the structure, the logical structure, of an LLM problem. So that’s my kind of contrarian and skeptical position on the whole thing.

    Dan – 00:12:58: Does anyone want to respond to Jacopo’s take that, you know, in many cases, a classifier would work better?

    Kate – 00:13:03: I mean, I think I said something very similar, which is like we have this repo of models, similar deal. We’ve been doing manual abstraction forever, so we have a ton of labeled data. We have a lot of classification problems. And yeah, definitely seeing that those classically trained models are outperforming at the moment.

    Justin – 00:13:19: I would agree as well. Like some of our conversations with customers, you know, are quite short because they’re just like, oh, well, you should just, you know, write an if statement for this. And, you know, you don’t even need any machine learning whatsoever. And so there’s definitely like an educational gap, right? However, you know, maybe just to balance that conversation a bit, like I think it is, you know, very powerful in terms of it opens your mind up to what is possible. And I think one thing that’s very exciting about LLMs is like, you know, the zero shot inference, right? Where kind of out of the box SQL statements are kind of just getting written at like a pretty good bar, right? But ultimately at the end of the day, it’s going to depend on your task. It depends on the difficulty, depends on a lot of these things.

    Arni – 00:13:58: Yeah, we’ve learned quite a bit. In the early days, the answer was, no matter what the problem was, the GenAI and the LLM is the answer. Then we do RAG on it, which is a known technique for matching up an LLM with data, which may be vectorized, need not be, but usually is. And customers would be like, I’m going to take all my data, I’m going to vectorize it. And then I’m going to mash it against this big LLM to produce the embeddings of the vectors needed for the vector you spend a pile of money. To mash that data with the LLM, you spend a pile of more money. But the real answer you wanted is in your data to begin with anyway. So why couldn’t you take a smaller model Transformers work. They don’t always need 300 billion tokens. You could build smaller models for that specific problem. If your question or your problem is, write me a poem about something, Yes, you do need a much larger language model, but A lot of people, unless you’re doing homework, right? If you’re in the enterprise, your problems are not writing poems. Unless you’re in the creative space, obviously in the marketing and all those. But for a lot of those problems, you need to be factual. And you need to get answers that are going to save lives or are going to impact serious business decisions. Your data already has the answers. There’s different ways of looking at your data. It’s amazing how much money is being spent in using these models to prompt them to stop hallucinations, changing the temperature. Like you do all these kinds of shenanigans, like you don’t need that. So I think there’s a broader consensus emerging among within our customer base to say, look, one model is not gonna cut it, right? Just thinking that, oh yeah, I’m just gonna throw everything into this one model, I’m gonna solve it, it’s not gonna cut it. Because first of all, we can’t afford it, right? I mean, in the short run, yes, there’s a surplus of money being spent for all the wrong reasons, I believe. But in the long run, I think money will be spread across. I can get away with smaller models for 80 to 90% of my tasks. And maybe 10% of the tasks, I do need a larger model, yeah, for those, make the API calls and spend the fraction. But just like how API gateways made a lot of things, well, again, some people may or may not like API gateways, but for those that do like it, right, it provided a nice abstraction from whoever’s calling it to a series of systems that sit behind it. Will have the need for a new AI gateway of some sort so that you can take a user’s request and help the users figure out, do I go here, do I go here? Now, if you’re a startup or a smaller company, you don’t need an AI. You can make those calls with a simple if loop maybe. But if you’re a big, large enterprise and you’re selling to a big, large enterprise, I think there’s a huge opportunity for these companies to be able to connect their problem with the right model. Some models coming off the shelf and some models being fine-tuned or built from scratch. And it’s not expensive. No one should be scared to build a new model. Because if your model is small and not billions of parameters, it’s actually not, you can build it in a week. My team’s done it time and time again. Take customers’ data, use Bloom, use GPT-J. There’s plenty of known algorithms you can go in and build a model or fine-tune an existing model. I think that’s where we’re gonna head going into next year. And it’ll become even more important because I’ve been in tech and I’ve been interacting, done quite a few gigs. I’ve never spent as much time with lawyers as I have in the last seven months, ever, right? Like one fourth of the phone calls, there’s a lot on the phone. That’s primarily because people are concerned, what data is going in, what data is coming out? But with a smaller model where you control your data and you control the output, the amount of liability you have is much smaller as well. So which means that if you have a blended mode of like 80%, 90% is the smaller models and 10% to 20% is the bigger models, you reduce your liability also dramatically as well. Because now you’re in much more control. So I think that’s going to be the focus next year is how do you balance out all these models to solve the problem. Start with the problem, not start with the answer. Start with the problem, don’t start with the model. And then your problem will navigate you to the choice of the model you have. And there’s a few things that came out of GenAI. If there’s one thing that is customers are rediscovering the value of data. Like people knew it implicitly. Most people forget it. And I’m glad you brought it up. Data all of a sudden is like, oh, man, I need to really understand where my data is, right? And people are rethinking their data strategies, which is really helping. I’ve never met as many CDOs. Those as well have been excited about, you know, like, hey, this is actually people are funding my business finally. And I know the CDOs have been like, nobody cares about data, really. So they’re actually happy right now. So that’s my added perspective, hopefully.

    Dan – 00:18:54: Kate, when you appeared on the Crafted podcast, you were telling the story of Flatiron Health’s cautious embrace of AI. And in the early days, it was all humans. You trusted humans and did not trust AI. And then more recently, you said you’re starting to trust it for simple queries from patient records. Like, is the. A patient a smoker or a nonsmoker? And then you said you could envision a future where, you know, 80 to 90 percent of queries are able to be answered through AI. And I’m curious if you could walk through a little bit of that journey and what you think needs to happen for that future where potentially 90 percent of those questions actually are able to be answered by AI.

    Kate – 00:19:29: You know, I studied AI, but I’ve been out of that field for a long time. So to some extent, this is me watching what’s happening. So when we started the company in like 2011, roughly, a lot of other companies were starting with the same idea of like, oh, there’s lots of data out there. We’ll use AI to like extract information from it. And I remember talking to the founders and Zach was like, no, we’re using people. Like those companies that think they can do it with AI, they’re wrong. And that was extremely true in 2011. We definitely watched other companies crash and burn on this idea of being able to extract really deep, nuanced things from a patient record that it’s frankly even hard to get humans to do. And so we really went all in on a human-first, tech-enabled human strategy. And then maybe three or four years down the line, we really looked at it and said, OK, there’s definitely a combination of human and machine that will work here where you have a machine do some filtering for you. There are certain kinds of elements that are much easier to filter for than others. And so we’ll do a hybrid strategy. And I do think that some of the techniques that have come along in the last, I’ll say four to five years, like deep learning, like it’s not just LLMs, like the state of the art, and you could speak to this far better than I could, has really come a substantial way in the intervening years. And so the difficulty of tasks that you can give it has really been pretty non-monotonically like sliding along and getting better. And it’s almost hard to keep up with it and know exactly where your threshold is. We’re still in the humans plus AI strategy. I think there’ll be a crux when we come to the mostly AI, fewer humans part of it. But I think that day is coming in the next couple of years. And frankly, I don’t know whether it’s a Generative AI thing or it’s going to go back to more classical models. But just either newer techniques being developed there or even just using the existing techniques better as they go from being sort of research grade to something that people are using more in industry. I’d be curious your take on that evolution.

    Jacopo – 00:21:21: I mean, I think the story of NLP was like, it never works, it never works, it never works. Oh my God, it works. And this was GPT, by the way. So GPT was really, like, I know that nobody knew that because it was not a chat, you can talk to him. And, you know, it was like, but GPT was really when everybody was like, this is a different thing than what we had before. And that was a fantastic moment for the field. And that’s the moment, I think, for one underappreciated reason, that neither the people that didn’t really like neural network, myself included, I come from cognition, so I’m a neural network skeptic by training, and the people that really love neural network, like, so let’s say Gary Marcus and the people of OpenAI, if you want to, nobody could have predicted that. Nobody really knew that taking an old architecture and just making it much bigger, like a completely new behavior would emerge. And it was a fantastic new thing for the field. Like, we learned something that no theory actually predicted. And then we immediately forgot about that, and we went back to ask human to rate stuff so that we can actually fine tune it because we need to be safe and blah, blah, blah. But it was a fantastic scientific achievement. And most of the things that we saw later to me are like kind of boring refinement of that very good insight. One thing that I would only add, and maybe Justin can help me in that is, Evaluating stuff is always very, very hard in machine learning, but with recent models becoming much, much harder. Why is that? Because we’re mostly testing on the training set. The training set is so big that it’s impossible to find unless you really, really try out of distribution example, which is why when you go online and people post like, GDP solved this homework stuff, no shit, it was in the training test. Everything is in the training tab. That’s the point, right? So it’s actually very, very, very hard. And when it fails, because you find that tiny part of the distribution that is not covered, it actually still fails spectacularly as older models. So question for people that work in this every day, what do you do to make sure that evaluation works?

    Justin – 00:23:24: Yeah, great question. I mean, evaluation is definitely a premier challenge for deep learning models. I mean, I would say the human metaphor kind of applies here, too. Like, some people say that I’m really smart because I can solve a data structures and algorithms question. But like, how do I know that it’s like me inherently versus like, oh, something I read, like, you know, in my CLRS textbook when I was doing my computer science degree. So in the same way, I think it’s very, very hard. If you guys are familiar with Code Llama and HumanEval , right, they tried to create this clean test set that no LLM ever had access to of like how to write code. And then there’s a new model called Mistral. Mistral, that there’s strong evidence to show that they just included that in their training set. And, you know, maybe it was an accident. Maybe it was intentional. I don’t know. I’m not going to comment on that. But I think the concerns about evaluation are real. And what you were saying, Kate, is like, gets me thinking about at Google. So for the longest time, we were trying to push for generative language models in the Google Assistant. Anyone has used a Google Assistant here. It’s still that, you know, same kind of robotic. Voice. And, you know, the main concern was like, even though you’re telling us these evaluation numbers, like we can’t ever be 100% confident that it’s going to be good and perfect. And so I think it remains a challenge today. But I think ChatGBT shows us that hallucinations and there’s a certain amount of forgiveness that humans are willing to tolerate. And so people can be a little bit bolder.

    Dan – 00:24:57: I want to get back to the theme of going, the title of panel is from prototype to productions. And as a lot of memes of visually illustrated Generative AI sits on top of what should be a very stable foundation of best practices on data and ways of writing software. And if you implement it before you have a stable foundation, that’s potentially a recipe for disaster. And I’m curious if you could share a little bit for anyone who’s here in the room tonight, who’s building, who’s listening, what are the types of things that you need to have in place before you go to this next level?

    Arni – 00:25:27: Performance is really hard to gauge. If you have expectations for your user to say, look, the response needs to come back in X number of seconds, whatever the duration may be for an interactive use case or even maybe a batch use case. These things are so complicated that being able to determine how long will it take as more and more users start to use your system, it’s getting really hard to size these systems properly for production use cases, both for latency and for cost. Because a lot of these models are priced by token. How many of you, we don’t speak tokens? Like at the end of the day, do you sit back and say, I wonder how many words I spoke today? That’s not how you operate as humans, but guess what? You’ll be thinking a lot about that if you start to put this in prod because your life will revolve around token counting. And you’ll have all kinds of algorithms to figure out how do I compress my tokens going in, compress my tokens coming out. Look, so sizing and latency have been a huge issue that we’ve seen as an early prod. Some customers have been surprised by the early builds, have been surprised by the high variation in latency. So that’s number one. I’ve seen very few examples where customers are like, I’m just going to go build this and push this out to the whole open world. Because they don’t know what this thing is going to say. It’s like taking a little kid, like who knows what they’ll say? It’s one of those things. You have to put a lot of guardrails literally around these things. So most of the use cases really have started with internal employees or internal use cases, maybe partners. But certainly not like immediately opening up to the outside. Even when they did it, they had heavy guardrails around it. I have a lot of customers who do LLM red teaming, which is important. Work with fairly big customers. So brand is really important for them. One thing that goes sideways and it has a serious impact on their brand. So they’re really careful about that. So extensive red teaming and extensive guardrails have been put in place to make this thing really viable and usable. Model drift. For some reason, what worked in your lab, what worked in the early stages, doesn’t seem to be the same. These models, they may seem magical, but at the same time, they’re scary. Like, wait a minute, I don’t understand that. So the deterministic nature of known algorithms are out the window. So you need to be really planned for some very non-deterministic answers, which means that you need model monitoring, model drift, right, you need to capture all these things. So some of our customers are like, man, this is a lot of work, right? And from like, this worked on my laptop, to this worked on a prototype somewhere, to like really putting this in prod. But these are all like real things that we’ve seen. So that’s what I’ve experienced. And there’s many more, but I’ll let the rest of the panel in a way in, but that’s what I’ve seen with some of our big enterprise customers.

    Dan – 00:28:19: Justin, what else needs to be in place before we go from prototype to production?

    Justin – 00:28:23: Oh, I was going to say, like, I think like the wisdom from Arnie is like really valuable. The most valuable thing is experience, you know, and like when it comes to like, you know, knowing what model is going to work or, you know, how much you need to care about model drift, even at all. Maybe you should, you know, go talk to Artium, you know, to get some.

    Dan – 00:28:42: Find the model whisperers. 

    Justin – 00:28:44: Yeah, yeah. But yeah, I would say like, I would be really big on experimentation culture. You know, there’s a lot of open source stuff. You just have to try it and you have to like kind of really try it to kind of know. And yeah, you might get burned with a big bill here and there. But yeah, definitely like the number one thing I think that I tell, you know, all of our customers in like our machine learning consultation conversations is like start small. You like, you got to start small. Like you can iterate faster. You can build better. Start with like something super rudimentary and try to beat it. And then, you know, add things as needed as you find that you need them.

    Kate – 00:29:20: Yeah, I think it comes down a lot to what is your use case that you’re trying to do. So I said that my successes are in these like really small, like figuring out something that’s an internal efficiency. People can hack that together in a weekend and there’s like very little cost and very little infrastructure that needs to be true there other than I think this point about like how do you measure the goodness or how do you keep a human in loop or how do you do something to make sure that it isn’t a bad actor? But if you can basically design the way that your algorithm is working or the way that your insertion of AI is working such that you have those checkpoints, then you actually don’t need a lot of infrastructure and a lot of prep. You can just like go build. I think there’s a difference, again, if you’re thinking about things that improve like internal efficiency but add like a production scale. So like let’s say you’re an insurance company. If I was an insurance company, I would be investing in AI super heavily because you spend a lot of time processing claims and looking for fraud and all sorts of things. So like it’s a great use case for like the cost is no object if you can get it right and you should be dumping a lot of money. Into those fundamentals. And it’s yet different. Again, if you are thinking about I’m a B2C company and I have a website and I want to like add a cool AI feature. That’s the hardest case, I think, because now you need like real-time production infrastructure that’s going to be costly and your users are going to interact with it. And I don’t know if you are going to be able to get your money back or charge for it. So that feels like really hard. So to me, what you need in place really depends on like what you’re trying to do with it. I’ll also call out the build versus buy question. So like. I need a chat bot for like three different websites and I’m sure as heck not going to build that myself. Because why? I’m going to go buy one. And I think that’s both because like why not leverage that other expertise and also from a UI perspective. I don’t want to be building UIs for 20 different AI models in my company. Like that seems like a lot of work and I’m not going to invest in there and they’re not going to be as good. I’d rather pay someone else when part of what I’m looking for is that UI expertise. So that’s just like another element to throw in.

    Jacopo – 00:31:18: I agree with what everybody said. The only thing I would add is, at least the way I see it in any stack of, you know, if you want to build something, you have data ops first, without data in order, reproducible pipelines, robust pipelines, nothing ever matters. Even if you get a prediction right, then you will never be able to know why the prediction was right in the first place, which means it’s not really right. Then you have MLOps . MLOps tends to come in two, you know, in two segments. One is the training part for model the support set. You know, now it’s more fancy to call API, but let’s say you want your own model, you fine-tune it, whatever. There’s a part of that that happens offline, meaning when people don’t look at it, which needs to be, you know, scalable, reliable, possibly low cost because GPUs are very expensive, like our friends at Amazon, but, you know, they’re already doing fairly well at AWS. We don’t want them to do even better. And then there’s the inference part, right? So when you actually need to serve the model. If the inference is in batch, that tends to resemble a lot like training, so it’s kind of like less time-constrained and less costly, but if inference is real, go to the example of CAT or your, you know, web app or whatever, there may be a significant portion of complexity, cost, and so on. And it’s very hard to get the inference right if you didn’t get the training right first, and it’s very hard to get the training right if you didn’t get data ops first. So my suggestion is start with the vertical slice. There is vertical, so end-to-end, because you need to validate the entire system, but with the simplest possible case, the simplest possible data transformation, the simplest possible model, and the simplest possible inference. So that I would say that, you know, a very pragmatic approach to this. On the build versus buy, I actually couldn’t agree more. Like a long time ago, when I was still doing ML, like two years ago, I wrote a paper called, You Don’t Need a Bigger Boat. And I just want to quote it because I think it’s a very funny title, and I want to say that. And the argument made in that paper was that you should build the only thing that are really core to your company, and you should buy everything else. If you’re not like humongous conglomerates. Like if you know one of the five companies that can pay engineers how much they want and they’re basically printing money and we all know who they are, Everybody else should buy technology. Do not build technology. You’re going to spend more time. You’re going to waste people’s salaries. You’re going to be more frustrated. And you’re going to end up with an inferior product. You just go to people like Justin from Predibase. His entire startup is literally about that. Do not build your own unless your business depends on it. The other color related to it is, if you start with buying, you can always change your mind. Buy, get some ROI, and then if that thing become the core of the company because you actually won the lottery, then you can still replace it afterwards and invest in building it. But if you start with building, it will take you a year before you can even verify if that was a good idea in the first place. So buy technology, don’t build it, unless you really know what you’re doing.

    Dan – 00:34:22: Only a few months ago, I saw you speak at a different panel on the same topic. And you said, we’re in this year of experimentation. You sort of said that in your opening comments tonight. And I’m interested, do you still think that we’re in the year of experimentation? And it’s not, as you said, I heard you say before, it’s not time to start thinking about business models or are we rounding the corner? We’re almost a year from when, you know, it was around Thanksgiving last year that GPT 3.5 came out. We’re measuring it on, you know, the calendar year of the year of experimentation. We’re sort of getting close. I’m curious, are your clients still, you say they’re still sort of throwing money at us. Are they asking questions about business models? Should we be asking these questions? Should we still just sort of be playing right now, do you think?

    Arni – 00:34:57: I think the experimentation will keep going on because there’s so many new things coming out. Agents, we’ll see how they go, but, you know, they have some promise in some areas, some don’t. Multimodal models have come out, you know, depending on which one you pick, you know, from 4V, you know, to other ones. The blend of cogeneration and language model, that is really picking up. So I don’t think the experimentation. Itself is going to slow down for the next couple of years because this innovation is just going to keep going and going and going. But at the same time, I’ll still stick to my other statement, which is clients are starting to see some value from the initial experimentation they did to say, hmm, maybe I can take this to prod. We just had one major financial customer go live on their website this morning, big German insurance company. So we’ve seen those customers at early waves start to go to prod. But that is. It doesn’t mean they’ve stopped experimenting. The experimentation could be new use cases. Could be newer tech applied to the existing use cases or new use cases. But this tech is still in its infancy, right? And it’ll be misused in so many ways, which is, you know, that’s what tech is, right? You’ll try it, you’ll misuse it. VCs will spend a lot of money. We, like a big cloud player, will throw a lot of money at it. We’ll get a lot of things wrong. We’ll get a few things right, but the experimentation, I think, is being in the very early stages. So my guess is this wave will pick up for the next several years. But the good news is, I mean, I have at least 20 customers who’ve gone to prod at scale, you know, with this both for internal use cases and now even starting to see external use cases. But last time we met, that was not the case, right? It was still at zero customers fully in prod yet. They’re all in like pre-prod, in early experimentation, but that has changed. And it’s all because it’s moving so fast.

    Dan – 00:36:49: Yeah, I was just gonna say, it was only a few months ago that we met, and I heard you say that, so. Last question for the panel. Is there something that today sounds like science fiction that you think in the near future is gonna be totally boring and commonplace and this technology is going to enable?

    Kate – 00:37:04: I actually feel like we’ve already had the magic. I have a GPT instance that knows that my side coding project knows all the context of it. And when I get stuck, I just copy paste my error message or a really whiny, why did it do this? And it gives me an answer which is correct a lot of the time. And when it’s wrong, I’m like, no, stupid head, you’re wrong. Tell me another answer. And it gives me a correct answer. That is freaking magic. And I say that as a developer who got like, my biggest problem was I’d get stuck and I really wanted someone to talk me out of it. Like I hated the Google hole. And that is amazing. So for all of our skepticism about it’s not like a silver bullet, it doesn’t fix everything, that is incredible. I also like chat bots on websites now. I now want a chat bot because it’s going to get all my information and get all my stuff together. And then it’ll send me to an agent with all that filled in. I don’t want to discount how amazing that leap forward is. I mean, I too remember learning about neural nets and when I was learning about them, they were like, this is stuff from the 70s that we learned didn’t work years ago. Right? And so like, I don’t know, that’s actually the huge leap forward. I think we’re still just processing that. I think it is science fiction.

    Arni – 00:38:14: The big aha for me, my son’s 14, and he’s learning coding, like, you know, We’ll see how far that goes. But over the weekend, he was like, dad, I’m gonna build my first, he’s learning C++, I don’t know why, but for some robotics program, I guess. And he’s like, I’m gonna write Tic-Tac-Toe in C++. I just learned it. I’m like, okay, go do it. Couple of hours later, he comes back, fully functional, did it great. Then I said, this looks good. Just put it through ChatGPT and say, can you optimize this? He’s like, this is like, look at this. It’s only so many lines. I did so well. Ask it. And sure enough, drops it in without any context. Optimize this. Comes back. Looks like you’re trying to build a game of tic-tac-toe. His eyes, and it’s typing it out, he can see his. And as it starts to spit out the code, the number lines go from that to like this. And you should see the look on his face, he’s like, For me, that is science fiction, right? When you can make kids go like, what did you just do? And there’s nothing magical. I mean, you know how it works. So once you get behind the curtain, even though you can’t exactly see how it’s working, but you know, you have a general idea of how it is working. But for me to see his expression is like, holy crap, how did he do that? Now the joy for him is like, I wanna go back and learn how I can write like that. And now he wants to go back and figure out the optimization and all that. So the whole notion, some of us have a shared past here. We were at Pivotal Labs, where pair programming was a big deal. And pairing was always a human to human pairing. But I think either Copilot or CodeWhisperer , like there’s a whole bunch of like ChatGPT itself. I think to have that companion, no matter what job you’re doing, to be able to ask it, come back, help you, not that you don’t always know, but someone can come and teach you that. I think that is gonna remain the aha for a while now for us. There’s not flying cars, there’s no fanciness here. I don’t want a flying car. I just want to, I like to drive my car. I don’t want to fly it. But that for me still remains to be the big aha. Simple things. 

    Dan – 00:40:19: Anyone else want to take the science fiction question?

    Jacopo – 00:40:22: There’s like the literature and like NLP literature in the last year or so is like a paper every two days and so on and so forth. And they’re all like excruciatingly boring. The field was already boring in the last five years, but last year, this year, like it’s just impossible to find a good idea. But there’s one exception. There was a paper by Percy Young and a bunch of other people that basically takes tiny agents in a video game and make them kind of like evolve in this kind of a life of a tiny village by themselves. Like the paper has no line of code. There’s not like a single formula in the entire paper. It looks like a paper from like psychologists in the late 80s. Are you familiar with the topic? And it’s fantastic. There’s like these tiny agents. And then after a week, they talk together or something like a week of digital time. They actually organize spontaneously like a birthday party for one of the other agents that actually has a birthday in that way. And all of that is completely unscripted. Like all of that is just, you know, one possible generation of the evolution of these agents. And I think that’s actually one. Novel, an incredible, new, exciting era of research, which is the idea that if you do like this is more science fiction. As well, if this thing become reliable enough and we solve a lot of the problem that we need to solve. But there’s going to be a war when, you know, human agents are one part of the type of agency you experience. But this is the digital world. There’s going to be autonomous decisions with all the problems, but also all the opportunities that that entails made by these agents. And I think that paper, which is fantastic because everybody can read it. Even if you don’t know anything about human science is a tiny, tiny glimpse of that future. And I will be very excited in five years from now. That actually is the reality that everybody lives.

    Justin – 00:41:58: Very similar, but a little bit of a more dystopian twist, if anyone has seen Black Mirror. But also, I think it’s very exciting where you can imagine we have these characters, but the character is yourself. And one of the great things about computers is that once you write something, once you create something, you can Command, Command, and you can just have hundreds of versions of yourself just doing things for you, on behalf of you, maybe in this village. And I have no idea how that’s going to go, but I can definitely see the technical feasibility of that very soon. Yeah. I don’t know if I want to live in that future, but we’ll figure it out.

    Arni – 00:42:38: I don’t want more copies of me, no.

    Dan – 00:42:41: Well, thank you all for bringing the copies of yourselves that you brought tonight. Thank you so much, Arni, Justin, Kate, Jacopo. Thank you again to Geeks Who Lead and Peter Bell. Thank you to Betterment for hosting us. Thank you all for coming. I know it’s a really busy week during NYC Tech Week. Thank you to NYC Tech Week again. If you’re building something ambitious, we’d love to chat. Come find, there are a bunch of us from Artium in the room. Come find us. Let’s get a drink and let’s continue the conversation. Thank you so much. Thanks so much for listening to our live event. If you’ve made it this far, we know you are really into this topic and we’d love to hear more about what you are building. At Artium, we love partnering with visionaries to help them build incredible products, recruit high performing teams and achieve the culture of craft you need to build great software long after we’re gone. You can learn more about us at thisisartyom.com and start a conversation by emailing hello at thisisartium.com. And please share this episode with a friend, rate it, review it and help Crafted grow.

    Jacopo – 00:43:44: Never works, it never works, it never works, oh my god it works.


    Get our newsletter on how we help partners unlock their potential




    Get our newsletter on how we help partners unlock their potential