Operations and Maintenance
Array

Hosted By Matt And Matt

Full Scale

See All Episodes With Matt And Matt

Ep. #957 - SDLC: Operations and Maintenance

It’s a wrap! Today’s episode of Startup Hustle marks the finale of our Software Development Lifecycle (SDLC) series. Matt DeCoursey and Matt Watson share best practices for avoiding maintenance nightmares.

Miss the first episodes? Journey with the Matts throughout the entire SDLC series.

Covered In This Episode

Does software development end when the product is deployed? Nope.

Get Started with Full Scale

According to Matt and Matt, software development never ends. After deployment, consistently performing maintenance checks are vital. Without the right operations and maintenance in place, your tech product will likely not perform optimally. But there’s no need to fret! Matt and Matt share their practical tips on how to make the last SDLC phase more efficient and effective.

What are you waiting for? Tune in to this Startup Hustle episode.

Hear What Entrepreneurs Have to Say in Startup Hustle Podcast

Highlights

  • Fact: software just doesn’t work all the time (02:50)
  • A discussion on regular synthetic monitoring checks (04:17)
  • On keeping a service level agreement (SLA) and a service level objective (SLO) to maintain the functionality of the software (05:25)
  • Is there a chance your software won’t work as expected after launch? (07:43)
  • Software needs to be maintained (09:50)
  • On the renewal of SSL certificates (11:09)
  • Dallas Cowboys losing their domain name (11:58)
  • The importance of security (15:25)
  • Updating code is not simple (17:07)
  • GDPR Compliance (17:46)
  • Things involved in software maintenance (21:19)
  • On having basic software monitoring (22:38)
  • Questions to ask for operations and maintenance during the development process (25:05)
  • Matt and Matt’s recommendations for excellent software operations and maintenance (26:16)
  • Benefits of having site reliability engineers focused on operations and maintenance (27:54)
  • On software developers being on-call (30:42)

Key Quotes

So you have all these different little apps to connect to all these different little things. Anytime, any of those could break. That’s why it’s important to have some basic monitoring even if it’s the simplest monitoring. At least have a starting point to know how to do the most important things about your software.

– Matt Watson

It’s absolutely crucial to approach your users and ask them: what are we doing right? What are we not doing right? What can we do better? And, you know, from a maintenance standpoint, and operations, remember why you built it. They are the ones that are the most important. When it comes to setting priorities and everything like that, you want to listen to the people that use it.

– Matt DeCoursey

Software is never done. If you’re going to own a software company or product, you need to be able to maintain it and support it. People will say, if it’s built well, it shouldn’t break. There are things you can’t control.

– Matt Watson

Sponsor Highlight

Are you looking for qualified individuals to help with software maintenance? Full Scale has the right developers, testers, and leaders for your project. Not only that—this Inc. 5000 company also has a client-friendly platform to help you manage the team. Build your software development team quickly and affordably today!

Go check out all of our Startup Hustle partners for additional business solutions.

Rough Transcript

Following is an auto-generated text transcript of this episode. Apologies for any errors!

Matt DeCoursey 00:00
And we’re back! Back for another episode of Startup Hustle. Matt DeCoursey here with Matt Watson. Hi, Matt.

Matt Watson 00:08
Are you ready for your operation today?

Matt DeCoursey 00:11
It’s just some general maintenance. You know, I’m getting tuned up a little bit as I’ve gotten a little older. You need to kind of oil the joints up a little bit. Tuck the tummy. Sand out the wrinkles. Where are we? Are we talking about operations and maintenance of software as in episode eight of our software development lifecycle series? Are we talking about the operations and maintenance part of becoming an old man?

Matt Watson 00:40
We’re just trying to keep you actually working.

Matt DeCoursey 00:43
It’s a whole other series, brother. That is a whole other series. Well, Matt, here we are. It’s eight of eight. We’ve gone through another series. You know, this one was nowhere near the 52-part series that we did, which was now almost, man, we started that one a year and a half ago. I realize how time flies, but with time flying, you’re guaranteed to have a net now. You’ve made it a little bit further down the road. You’re no longer in the minimally lovable product category. And you want to move forward with the software platform. You have some users; you figured out how to do it all. And it’s time to deal with the operations and maintenance. Before we get too far into that, today’s episode of Startup Hustle is powered by FullScale.io. Hiring software developers is difficult. Full Scale can help you build a software team quickly and affordably and has the platform to help you manage that team. Visit FullScale.io to learn more. Now, here we are with post-deployment. So, we talked about last week, and we aren’t talking about sending me to Abu Dhabi. We’re talking about moving the server stuff on the left to the server stuff on the right. That’s what I gathered from the last episode. That’s an easy way to look at it. Yeah, operations and maintenance though, this is, you know, look, this doesn’t sound sexy. But if you mess this up, it can be game over.

Matt Watson 02:10
Well, the problem with software is it just doesn’t always work all the time.

Matt DeCoursey 02:16
If it’s built well, it should never break, right? True or not true? Don’t tell yourself that. Software breaks for a lot of different reasons that you cannot control.

Matt Watson 02:28
So I had a problem last week, at my company where the customer’s website would work fine. And then randomly, when he would go to it, it wouldn’t work. It would like part of it would load, and part of it wouldn’t. And, of course, they would report the issue. And then, when I go to check it, it works just fine. And then you check it again the next day, and then it will work. And then the next day, it would work as nobody could pinpoint. Like, why doesn’t this is it works does not work? Like no, like, nobody can reproduce this you. And the thing is, for software, whenever you have your software deployed, you need some form of monitoring for it. So that instead of going off of this, like a gut feeling, and like, Well, Matt tested it, and the other map tested it and whoever tested it, and it worked or didn’t work, like you need something that like test it for you and test it you know, every minute and maybe across you know, even from different parts of the country or world and all that kind of stuff, so that you just don’t run off your gut feelings, right. And so there are all kinds of application monitoring tools that existed at the last company. I had stack phi. That’s what we did. We created applications.

Matt DeCoursey 03:30
So having to build a couple of those.

Matt Watson 03:33
So the way I solved this problem last week is I set up a synthetic monitoring check. So it just tried. Basically, all it did was I put in the URL, the website address for this webpage. And once a minute, it would try to load this page. Well, then after you know, an hour goes by, I can go and look at the chart and see, okay, is it working? Or is it slow? What is going on? And what you’ll find funny is a pattern emerging every 15 minutes. It was slow for like two to three minutes. And then the rest time.

Matt DeCoursey 04:06
I know why. I know why. You had a bunch of piled-up repetitive tasks. I’ll schedule it for the 15th. Yes. So about one before.

Matt Watson 04:19
So I immediately have to go to the developer and the rest of the development team. I’m like, What the hell is running every 15 minutes? Exactly 15 minutes. What the hell is going on? Right? So then, you know, you start going to AWS and looking at logs and the database logs.

Matt DeCoursey 04:34
You called me, dude. You should. Yeah.

Matt Watson 04:37
But the point is, without this tool, it was really difficult to know. And so, when you start at the very high level, the best thing you can do, everybody should have some kind of SLA, you know, service level agreement of uptime, you know, 99.9% of the time, you know, whatever our software is going to work and all that kind The stuff and the new things these days is people also have what they call service level objectives. SLOs, which it’s not about, it’s not always about uptime, right? Like, you take something like Gigabook as an example. It’s like, oh, our website’s up, the app is up. So it’s available. But the issue is like, the calendar sync on the back end doesn’t work, like you need an SLA or like an SLO for different parts of the system that do different things. And anyway, you’re gonna get really complicated with it. You know, there’s all sorts of stuff you could do there. But the key is using something like Gigabook as an example. You’ve got a monitor with all those different pieces, right? Like, how often do we do the calendar sync isn’t working is not working. Like we’ve got to be able to monitor all the different moving pieces. So we have some s, you know, SLA type stuff that we can track because inevitably, things just don’t work. Like the example, I gave earlier. It didn’t work because something every 15 minutes was trying to delete data out of the database that we didn’t even use anyways. It was just some logging data with some bullshit we didn’t even need.

Matt DeCoursey 05:59
Which occurred 11 minutes after the hour and 26 minutes after the hour or something like that wouldn’t have caused the problem. It was actually building a Gigabook that made me guess exactly what your problem was. Because scheduling things work on the zeros, the 50s, the 30s. Like, there’s just a natural time for creating deadlock things. So we had all these reminders, notifications, everything. And then what had happened is it was actually wasn’t even us, our server, and database actually was performing a backup, did a little backup every 15 minutes or something like that, which what it did was locked the database where it did the backup, which only took like 30 seconds or a minute, but you know, kids were sitting there going, why can’t we get this damn reminder to send it 15 or 30 minutes, because it would come in at 1216 or 1231. And you’re like, what, and these things will drive you crazy. Yeah, when you’re working with things, that and timing matter.

Matt Watson 07:06
You don’t want to be like that well, and that these are and these are the problems, right? Like, everything’s gonna work perfectly. We deployed our app, you know, a month ago, and we’ve made no changes to it. Everything works perfectly. And all of a sudden, it doesn’t. And there are a lot of different reasons that these things go wrong. And the example I gave was like a database was full of too much data. And when it was trying to lead it, it was timing out, and you gave some other examples. But it can also be like, Oh, AWS did this thing. And we have to upgrade versions, or AWS is having a problem, or some third party that we rely on, it doesn’t work. You know, we like Gigabook, you know, use different APIs for calendars, and running credit cards, like all these things, any of those could break at any time. And the great thing about the cloud is it makes it easy to deploy apps and do all the things that we do. But they also add a lot of points of failure. So you have all these different little apps to connect all these different little things. And anytime, any of those could break. And that’s why it’s important to have some basic monitoring, even if it’s the simplest monitoring, to at least have a starting point to know, like do the most important things about your software, do they work or not work? And then you can dig deeper if there’s a problem?

Matt DeCoursey 08:17
Well, those two examples that, you know, the examples that we just mentioned, our operations and maintenance at that point, because part of operations when you’re running a software platform is making sure that the software makes sure the software platform is operating. Yep. And now that you mentioned the thing, it’s like, you know, all right, so I’m chuckling and I made the buzzer sound for Okay, look, even the best constructed and most meticulously engineered software is still gonna break. Because the thing is, and I mentioned things that you can’t control. Because I’ve had so many people say that to me, they’re like, because a big red flag when people will meet with people about providing services and building them software teams at Full Scale is how much will it cost to build? And how long will it take? And to me, that’s a rookie question that is a rookie question all day, because software is never done. If you’re going to be a software company or product, you need to be able to maintain it and support it. And people will say, Well, if it’s built well, it shouldn’t break. There’s the things you can’t control. This is what you’ve been mentioning as your a lot when they make major updates to those servers. I mean, that can happen and that can break little things. Let’s look at All right, so Gigabook is in PHP, which has different versions that move up the line. And either if you want to have something that is very exploitable, that’s four versions old, go for purity, you might have broken but you’re never going to know because then when you go to like PHP, version seven to eight was a big thing and broke a lot of stuff. Let’s also talk about when Apple made major security updates. And they launched iOS 14, it legit broke half of the App Store. Okay, so if you didn’t have a developer on hand, you weren’t able to maintain it, these aren’t things that were a product of poor software build, you know, say no, with Chrome, Chrome changes something in their browser Jabril your user interface.

Matt Watson 10:20
You bring it up Apple rhinorrhea something, the thing that I swear breaks more software applications than anything else combined, is SSL certificates expiring. Yes. And that is maintenance. And that is just a maintenance thing. It’s like every year, you have to renew the certificates. And you can think Apple partly for this, because they made the change like a year or two ago, where Safari would only work with certificates that were less than 12 months old, or whatever. So now you have to change certificates even more often. So instead of changing them, like every three years now you gotta do it every year. Now, the good news is a lot has become more automated, there are ways to automate the certificate stuff on Azure and AWS, and whatever. And there’s this thing called Let’s Encrypt, and, but that alone, like talking about operations and maintenance, it’s just renewing a damn SSL certificate has taken down so many websites and taken down so many of my own applications over the past because people just overlook it. And next thing, you know, just everything’s down, you’re like, oh, shit, it’s the most simplest thing in the world that took it down.

Matt DeCoursey 11:20
Well, okay, let’s you want to talk about funny fuck ups. The Dallas Cowboys several years ago had to buy their domain back from someone. Because not any not only had it expired, it had truly expired. So when renew their domain, GoDaddy, does your registration and they do it for 12 months, they actually do it for 13 months. And, and 12 is there because they know you’re going to forget about it. And they’re giving you a month to figure it out. And then they actually like to release it. So someone had just paid like, I don’t know, I can only imagine what that person’s day was like when they just got that automated email, like, congratulations, you’re now alone, dallascowboys.com or whatever? And, you know, like, and I mean, is that is that a great business to be in buying expired domains like that. But you talk about just little goofy things. These are operating? Yes. And maybe like, and with that everything, everything went away? Yeah, I couldn’t remember, I think I think they paid that guy like, he was actually a Cowboys fan. So I think that he only charged him about 50 grand.

Matt Watson 12:30
Well, and a lot of application problems can come from just unforeseen things where you sign up a customer is all of a sudden has different requirements or use, you know, uses than your other customer. So for example, at stack phi, we did a lot of stuff with logging data. And a lot of customers that send like, five or 10 gigabytes of logs a month was a lot of data. And all of a sudden, we sign up somebody who sends like that many an hour, and sends a terabyte of logs an hour, well, then things just didn’t perform as well. And things would get slow and timeout or not work. And it was all because they were just sending us profoundly more data than the other clients. And it’d be no different. Like, you take an application like Gigabook that has scheduling and stuff. Seems pretty simple. But then all of a sudden, somebody’s like, well, we have 1000 users, and we need to show all of their calendars on the same page at one time, like the software would probably puke. Like it just could not handle it at some point because it wasn’t designed for that level of data through that.

Matt DeCoursey 13:26
Yeah, through that. And it was and you talked about maintenance. And maintenance is also a kind of optimization. Yeah, in some regards. And so the the what Matt’s mentioning is, this is scalability issues that we’re hoping you have at some point, because it means that you’re driving users and people are, are using what you built, in most cases, I don’t, I don’t look at these kinds of things as like, bad, like terrible problems to be faced with because the opposite is just no one’s using it and you don’t have load capacity to deal with. But in these particular situations, you talk about knowing and understanding what you’ve built and how it operates. And, you know, so you talked about the calendars and calendars are going to pull the information that you’ll see on any calendar, whether it’s Gigabook or Google, that information is in a database somewhere and, and a well maintained and optimized site will just go get the pieces that it knows it needs for the calendar and instead and and and a very early version of that it was looking through the entire database. So all of a sudden you get more users get more people turning pages, more people looking at calendars, and you talk about the software puking well that’s what happens when you run like if I had to sprint even just a quarter mile now I’d throw up afterward because I’m not really good state for that, you know, and that that’s the thing is like, are you able to handle whatever comes up and also things with maintenance as well. It’s like, I think security is a big area, you know, security. Security needs and what was it called GDP? What is it the

Matt Watson 15:06
GDPR? That’s a different kind of data privacy kind of thing.

Matt DeCoursey 15:10
That wasn’t still, that’s maintenance and operations.

Matt Watson 15:13
Yeah, it’s like we had to do a whole bunch of work to be in compliance, you know, these things change.

Matt DeCoursey 15:18
And depending on what business you’re in, like, if you’re in anything that has to deal with sensitive data, then these are things you need to get pretty good at.

Matt Watson 15:31
And security is a really big one. And the problem is, as of today, there are probably hundreds or 1000s of exploitable security things in Windows and Linux, Linux and browsers, whatever the thing is, either we know about them, or we don’t know about them. And in a year or two, we’ll know about several of them, they’ll be found or like the CIA has known them forever, and use them to exploit things. But then finally, somebody in the wild found out they exist or whatever, right? The problem is like the code we shipped two years ago had that exploit in it. And so now we’ve got to run around and keep updating things for these bugs that have probably been around for a long time, but nobody knew they existed, right. And it’s a lot of work. And you get these high profile companies, they get exploited by these things. And that’s because they’re using some old version of Java, and they find out there’s this exploit and struts, which is some API thing that the old Java apps used. So it’s like, as soon as you find that you have to run around and update the version of Java for all this. But as you mentioned earlier, upgrade, updating the version may not be as simple as just like, you know, a Windows Update on your computer and you’ve rebooted, it could be like, Hey, we had to do like this major code change, and spent weeks or months updating the version of PHP, because the new version also had all this other crap in it that I didn’t care about. But to get the security fixes, I had to update all of it to get all this stuff right. It can be huge projects, and update just updating all your JavaScript frameworks and your Server Libraries and your operating system and all this stuff, just to keep up with security stuff, you know, on a quarterly basis, annual basis.

Matt DeCoursey 17:09
Well, that and that’s operational and maintenance planning. Now I remember at the time when GDPR came out, and that was when we shared offices, so I was at the sacrifi office with you. And man did that thing, that whole, all that stuff dragged your other shit into the gutter meaning like from a, from a planning standpoint, because it overwhelmed all of the other stuff that you really needed and wanted to do. And where operations and maintenance could really be important there is if you’re looking at something like GDPR, where you’ve got to do this insane amount of updating and other stuff. And if you’re promising that you’re delivering something else on the other side, you might, you might find that you fail at both sides of things and end up really non compliant, and have disappointed customers and everything. So that’s why I said like, in the very beginning of the episode, if you fail at this part of the lifecycle, it can really mean it can mean it can be a lot of trouble. And I think one of the things that is also a key component of that is learning how to understand what’s a priority and what is not.

Matt Watson 18:21
Yeah.

Matt DeCoursey 18:22
So how many non priority things have people pitched you as like, this is what we’ve got to do. Actually, you mentioned that earlier, like, hey, look, internal tools that only you and your team see, don’t always have to be pretty to be a fan.

Matt Watson 18:37
So you know, when I just started working at the place I’m working at now, Camp digital, the first project that they’re working on, or they were concerned about when he started was upgrading a version of Angular, because they’re worried about, okay, it’s old, it’s deprecated as the security issues, but then come to find out Yeah, it was in an internal app that literally nobody use, like two people a day use this thing. I’m like, why do we care like that? It’s not like we have consumers that can access this and somehow exploit this system or whatever, like they can even log into it. And that is always the challenge with software development. We did a whole episode about this round planning and stuff is trying to weigh out what are the most, you know, highest priorities. And inevitably, there are always maintenance things, and operational things that are important that take time that come back into the planning phase of like, hey, yeah, we have to do this work for sock to compliance, or AWS is no longer going to support this thing. So we have to rewrite this other thing or whatever. We have to upgrade versions of PHP or Ruby or dotnet, or whatever it is. There’s always work that comes out of maintenance. That is not necessarily planned work. It’s not like hey, we’re going to ship this new feature for a client. It’s just like homework and shit we have to do just to stay in business just to keep the machine running. Right. How do we keep the corsi alive and running is like Darth Vader in No we were keep me. Yeah.

Matt DeCoursey 20:02
Why do people always compare me to Darth Vader? You know? Screw me Darth Vader toys and like my respirator.

Matt Watson 20:09
And all your employees gave me the old Darth Vader.

Matt DeCoursey 20:12
Okay, in Darth Vader’s defense now he didn’t always do the right thing he had some anger problems he’d sometimes choke people with that air choke thing but he was running a pretty successful franchise the Empire seem to be run fairly well other than some minor flaws with minor skirmishes do the Emperor the sand just empty was misunderstood, misunderstood.

Matt Watson 20:43
There’s always a lot of maintenance to keep things working.

Matt DeCoursey 20:47
I still don’t understand what the Empire actually did. What was the revenue driver there? Was it just like general suppression and just like just taking over everything like something out to pay for the desk?

Matt Watson 20:59
It’s like our world today. It’s harvesting oil and selling oil except back then it was harvested. Is it harvesting something else in the movies?

Matt DeCoursey 21:07
Maybe? What are kyber crystals? There you go. There you go. That’s what powers a lightsaber. Right? And a lot of other things, apparently. Yep, seems like a good time to mention that today’s episode of Startup Hustle, it’s powered by FullScale.io and not kyber crystals. Hiring software developers is difficult and Full Scale can help you build a software team quickly and affordably and has the platform to help you manage that team. Visit FullScale.io to learn more. You know what, just to really cause controversy, Star Wars is so much better than Harry Potter. Yeah, I’m gonna get tweeted or something on that one. So you’re still listening to this thrilling episode about software operations and maintenance. Know that I don’t always like to throw controversial shouts out there. But is Star Wars greater than Harry Potter? Well, so I’m going to punch me in the head if she heard me say that.

Matt Watson 22:04
I’m gonna recommend anybody who’s listening to this, if you have some kind of software that’s in production, you need some form of basic application monitoring. And it can be as simple as like a paint, you know, you’ve heard of like Pingdom, or some kind of check where you just put in like your website address and like no does like your homepage load does the login page, your software load, like, yeah, at least have that like at a bare minimum. But if you have an application that’s revenue-generating, that’s important to your business, you need more than that you should use something like sacrifi, like my old company, or data dog, or New Relic, AppDynamics, there’s all these companies that do this kind of stuff, their APM, or application performance monitoring tools that can help you know, we talked about KPIs and stuff in this episode in the previous episode of knowing, you know, how many people are logging into the software? How is it performing, what kind of errors are happening in the software, so I can find those errors and fix them. All of those sorts of things, database performance, server performance, all this kind of stuff. So you can keep an eye and know if things are working. And the problem is that the bigger your system gets, the more applications that you have, the more different kinds of users you have. All this stuff gets more and more complicated. And a lot of software problems are not as simple as whether the system is up or down. It’s like, oh, when this specific user logs in and does this specific thing, it doesn’t work. And it’s because, well, they tried to put in a field that was too long, or this database field is null or like whatever dumb thing it is, it’s causing problems. But it only happens to them, it doesn’t happen to everybody. So it makes it really hard to find those needles in a haystack. And at stack fie we had like 20 or 30 different applications. And we monitored all of them very closely. But we would get hundreds of errors a day. And a lot of them were just noise like random database timeout or random this random that and some of it is just noise, it’s just part of just the operations of apps, they randomly get errors. But if you can’t see those errors, and you can’t see those problems, you can’t go through them and figure out which ones are important and which ones aren’t and figure out which ones to fix. And put that back into your planning of like, okay, in our next sprint, and our next work that we’re going to do, we got to spend a few minutes to figure out how to fix these problems, because it requires constant care and feeding to keep these systems going.

Matt DeCoursey 24:24
You know, there’s something that’s not on my the production team didn’t add in here, for me, but from my own experience, and I think of operations and maintenance like this is the part of the of the of the development process, where I think it’s absolutely crucial to approach your users and ask them what what are we doing right? What are we not doing right? What can we do better? And you know, from a maintenance standpoint, and operations like remember, that’s why you built it. That’s why you built it. They’re the ones that are the most important. And, you know, when it comes to setting priorities and everything like that, you want to listen to the people that use it. Now, the caveat with that is, as Matt, Matt, Matt just described, like a really weird one off kind of situation. Those aren’t always urgent. In fact, most of the time they’re not, you’ve got one person that does some weird combo. Yeah, of shit. And, and, and, and that’s the things I think you also want to look out for, if you can chase that stuff down the rabbit hole for like, 1000 years. Yep, you can. And just trying to determine like, you know, the things that should be a priority are the things that affect the broader base of users, the core functionality of what you’ve built, as well as what anything that will help bring or keep users and the system.

Matt Watson 25:54
So when you do have problems, you have to figure out how to tell your customers about the problems. So from an operations and maintenance standpoint, we talked about SLAs earlier, having some kind of service level availability that you track. But something else is really common these days that I recommend, again, if you’ve got a revenue-generating product, and it’s important, and you have customers, and all that kind of stuff, is setting up a status page. So status pages are really common these days. There’s a status page.io, which is owned by Atlassian. The same people have JIRA, it’s like it starts at $30 a month. And basically you create a little page that you can post on there, like when you’re going to do deployments or when you have outages and all that kind of stuff. And your customers can sign up to receive notifications if your system goes down, or any of that kind of stuff. So if you are having problems, which we all do, from time to time, you can use your status page as a way to alert your customers that you’re having issues. And having a history of knowing, okay, on this date, in June, whatever, we had this problem. And you can also put on there things like your SLA numbers, and all that kind of stuff. And it also helps your customers to see like, oh, they don’t have any issues like they are, they’re up times 99.9%, whatever. So status pages are another great thing to have. I know this is not exciting, and it’s not sexy, but it’s part of the operation stuff that is very, very functional, useful.

Matt DeCoursey 27:16
Yeah. And that’s I mean, you know, there’s, there’s this, this topic could go on and on and on. I mean, we could go down 10 million different rabbit holes. I mean, now one of the things when it comes to you talking about service level agreements and stuff, and I want to hit on one, one thing that I saw, you know, I got to be involved with sacrifice growth, because of its relationship with Full Scale. And I swear, man, when you added site reliability engineers, SRE, people then focused on that kind of operations and maintenance. In my conversations with the people that worked locally in that office, their tone changed so dramatically, and so quickly, because there was someone there to kind of support and get their back on the maintenance side of things. Another thing that I thought so at the time, you guys were all like an eight to five, nine to five kind of operation. But you had users in 60 different countries, which meant someone was really in there in the weeds on the platform, 24 hours a day. So what was happening, especially when it was in its earlier phases, and things were growing, and you’re figuring a lot of stuff out as your people were getting to work in the morning, and there was a stack of tickets. So like you’re having to like or inquire or whatever. And you talked about having people working in some different time zones, or like different chefs, rather than the engineer, local engineering team coming to work every day and being like, dammit, here’s these tickets I got to deal with. Yeah, potentially to solutions rather than problems, which from an operational standpoint, and a moral standpoint. And also, like your users are getting responses quicker, at least it was like someone is home. So when it comes to operations and maintenance of stuff, like I’m telling you, people understand that you can’t always fix their problems instantly. But one thing you can do instantly or close to is acknowledge that you have received the problem. And that you know, like so that whole thing is from India. This is operations and maintenance in some regards. Now, if you’re an early stage startup, you just built that minimally lovable product, you don’t have the budget or need for a site reliability engineer. But if you are a more sophisticated organization, or you have the resources for it, these things really kind of matter. And that’s one of the that’s part of the advice that I give a lot of people at Full Scale, that how companies that already have a big engineering team is putting some people on an opposing schedule, so you kind of begin to operate and maintain things fine, probably problems and solve them in in more of a 24 hour cycle.

Matt Watson 30:04
Absolutely. And you bring up a great subject we haven’t talked about from an operations perspective, which is the on call part of it, right? So if you’re a software developer, pretty much your whole career, you’ve, you’ve probably been on call or your co workers have been on call, or you take turns being on call, right? It’s like, so Saturday evening, if a server goes down, like who’s gonna deal with it, right? Like that whole part of it that we could probably have a whole episode about on call and best practices. But nobody wants to be on call, and nobody wants to deal with it. And you’re absolutely right, having developed having employees that work all different time zones, and all that helps a lot, because it’s like, hey, I can sleep at night, because at least I know, you know that.

Matt DeCoursey 30:44
That was a cue to sleep at night. Because before that, I changed when I got there. And we’d be like, So and so isn’t going to be on today, because they had to get up at one in the morning and fix something.

Matt Watson 30:53
Yeah. And if that’s the reality, right, the bigger you get, you know, the bigger your system gets, it happens, right? So things happen, the system goes down. And somebody’s got to figure out what it is. And again, we talked in our last episode about deployment pipelines. And we talked about monitoring all these things, having this stuff in a place where like, a lot of people can go in and click buttons and redeploy code and do this and do that restart things, makes it easier for on call, so they can remedies, you know, and resolve things, at least, you know, patch it up. So you have to wait for somebody else to come in in the morning, but at least keep the machine running until somebody else comes in.

Matt DeCoursey 31:28
I still think one of the key pieces of that is just acknowledging to the person reporting the issue, the habit that you acknowledge it, not just, we know something. And then like, two days later, it’s like, oh, hi, thanks for letting us know about this, like even just a simple reply. Hey, Matt, you know, we mentioned hiring people and having people that work 24 hours a day, the teams that Full Scale are on 24 hours a day for different clients, we always have someone working somewhere, if you need to hire software engineers, testers, or leaders let Full Scale help. We have the people on the platform to help you build and manage a team of experts. When you visit FullScale.io. All you need to do is answer a few questions. Let the platform match you up with our fully vetted, highly experienced team of software engineers, testers and leaders. At Full Scale, we specialize in building long-term teams and only work for you to learn more when you visit FullScale.io. And I actually want to roll from that last line into this thing. Hey, look, I haven’t had a long-term outlook on what you’re building with your software. And your business is really important, whether you’re using Full Scale or someone else, having people that understand your platform, I think, makes the maintenance like keeping your team intact and doesn’t have the short-term outlook on who’s working at it. Because the people that built it are usually the people that can fix it the fastest.

Matt Watson 32:50
And most of the time, with anything to do with software development, there’s always a lot of tribal knowledge, right. So people have worked on the project for a long time, and they understand the intricacies of how different things work. And yeah, if people keep coming and GO AND and OR you let everybody go and like then you hire somebody new like they just don’t understand that tribal knowledge is gone. And that tribal knowledge is really, really important when it comes to software development.

Matt DeCoursey 33:13
Well, that’s why you don’t have that short-term outlook when it comes to your team. You know, and some of that I get a lot I talked to you, I just talk to a lot of people, and they’re like, do I really need help for a couple of months? I’m like, Well, what then they’re like, well, if something breaks, then we’ll get someone to fix it. Yeah, that’s not how software platforms work, people.

Matt Watson 33:30
And it was probably random, like only calling a plumber to come in and fix your drain issue. Well, they don’t just walk in and understand what to fix.

Matt DeCoursey 33:39
I mean, they might depend on some levels. That’s pretty straightforward. But the thing is, it is like if you know there is, there’s a lot to be said about knowing where to hit it with a hammer.

Matt Watson 33:56
Yeah, for a plumber, they could come to figure it out. But for software developers, I don’t even know where to start. I don’t have any ship work.

Matt DeCoursey 34:01
Yep. Well, Matt, I think we have maintained the series quite well during this episode. And congratulations on a successful deployment and operation and maintenance. This is the conclusion of our series. You know, I like doing series. I’m enjoying the series.

Matt Watson 34:23
I think we’ll have to figure out our next series now.

Matt DeCoursey 34:25
I think we’re gonna need to give away some money to people that want to build software.

Matt Watson 34:30
Yes, I like the idea.

Matt DeCoursey 34:31
Let’s follow the model of the 52-part series and this series, and you know, if you want to go back and capture this whole series that came out once a week, there’s a link in the show notes, so it’ll help you get to all of them faster. We’re doing our best to maintain and operate our massive list of episodes. Got episode 1000 coming up this year, man. What are we gonna do?

Matt Watson 34:57
We’re still not a stadium right now.

Matt DeCoursey 34:59
Yeah, totally sounds like a lot of work. Sounds like a lot of work, so I gotta go, man. I’m gonna get working on that. I’ll see you down the road.

Matt Watson 35:08
Yeah.