• 2 Posts
  • 41 Comments
Joined 1Y ago
cake
Cake day: Jun 19, 2023

help-circle
rss

Certs are a waste of time tbh. If you have 8 years of experience, you should have more than enough to fill out a resume already.

An AWS cert is almost certainly even more useless for you specifically unless you wanted to get into devops/sre and do systems design. I have been in sre for a very long time and have never even heard of anyone writing tooling in Java. That section of the industry is entirely dominated by go, python, and (more often than anything else) bash for really quick automation.


As a counter balance to that though, interviewers need to understand what they are hiring for and tailor the questions asked to those requirements.

For example, there is genuinely very little coding required of an SRE these days but EVERY job interview wants you to do some leetcode style algorithm design… Since containers took over, the times I have used anything beyond relatively unremarkable bash scripts is exceptionally small. It’s extremely unlikely that I will be responsible for a task that is so dependent on performance that I need to design a perfect O(1) algorithm. On terraform though, I’m a fucking surgeon.

SRE specifically should HEAVILY focus on system design and almost all other things should have much much less priority… I’ve failed plenty of skill assessments just because of the code though.


Containerization (even for small things) makes modern infrastructure a LOT easier.


I honestly love it. Of course it’s not perfect but I don’t ever want to go back to the old way if I can avoid it.


Most resistance I have seen mostly comes down to a misunderstanding in the benefits that kubernetes offers. The assumption is that kube is used for autoscaling and that, if the inbound traffic is predictable then the added complexity is unnecessary. When that happens the “kube isn’t right for all situations” turns into “kube isn’t right for any situation” whether the person in question would ever admit that or not…

All of this ignores the MASSIVE reliability enhancement kube delivers and the huge amount of effort currently going into modern tool development surrounding the kube ecosystem.


Real talk, you don’t have the luxury of being an idealist right out of university. Your goal is to get a job. When you’re in that job you will likely not have the luxury of being an idealist either.

When you have enough experience making practical, reasoned decisions, then you can stand on principals.

For context, I have been in this business for nearly 20 years. The people I have personally worked with who have resisted things on philosophical grounds ALWAYS get left behind. I’ve seen it with systemd, the cloud, and now I’m seeing it again with kubernetes. You cannot escape the collective inertia of an entire industry.

Obviously there are still thresholds… I would never work for someone like Raytheon. You have to draw lines somewhere but saying you aren’t going to work for a company that does user behavior tracking is short sighted and impractical.


Fuck if this isn’t the truth… Saying this as a Sr. SRE with no degree or certs.


I have a function called up. I do up X where X is the number of directories I want to go up.

up() {
  if [[ $# -eq 0 ]]; then
    cd ..
    return 0
  fi
  local path i
  for (( i=0; i < $1; i++ )); do
    path+=../
  done
  cd "$path"
}

EDIT: Don’t know if it’s just me but if you see < it should be the less than character.



We had a service that compiles a dataset once per quarter. The total size is ~30gb. We were starting a container, storing it on an EFS volume, and mounting like any other disk.

Every time a pod started it would need to read this data into memory so we would get quick initial start-up time but the time to be ready for traffic still took a while.

Since we didn’t need to update it very often, we decided to just package the compiled dataset into the container and skip the EFS volume. We updated the image pull policy to ifNotPresent so it cut egress traffic pricing from EFS to zero. Now there is a cost to pull the image from ECR but that’s only if the pod is being scheduled onto a node it hasn’t been run on before. There was no noticable change in behavior or performance and we saved a bunch on cost.

Sometimes the big, dumb option is the right choice.


Thing is, I had a reachable goal which made it easier for me to learn and feel good as I had a tangible result.

IMO, this exact thing is what separates the people who succeed and those who give up. If you are only approaching the code as some abstract concept then it will never work. Anyone learning this stuff needs to understand that the code is more like a hammer to a carpenter than anything else… It’s a very physical tool used for doing a real job. If you don’t have any nails to hit, you’re not going to get anything done.


We focus a lot more on production than the average developer. It’s our job to make sure whatever devs build is run quickly, efficiently, safely, and scalably.

You will work with a lot of kubernetes, Argo, terraform, Prometheus, grafana. You’ll design build pipelines and software rollout strategies. You plan for zero downtime migrations and upgrades, database maintenance… You’ll have your hands in everything from capacity planning to security to cost optimization to developer support… User permissions, infrastructure, networking, observability… You will write RFCs and setup POCs for new tools. You define and track error budgets and figure out how to keep your org under those projections. When there is an outage you will be involved in writing post mortems.

The days are so varied and unpredictable that it keeps things interesting. The landscape changes so often you’re never really stuck doing the same thing over and over.

I genuinely love it.

EDIT: The SRE Podcast from Google is actually really great for learning about this world. The first season talks about what you’ll be doing and why (based around the SRE O’Riley book). The second season talks about what to expect in different stages of your career progression.


Don’t program (as much). Point yourself towards DevOps, SRE, and/or Platform Engineering. You’ll be designing complex systems and will have your hands in dozens of different tech stacks.

Sometimes I think a straight dev job would be interesting but I legitimately love the SRE space.


Not scrolling through all the comments to see if someone mentioned this yet or not but every December I check what is on the best albums of the year lists… Generally I check per-genre that I’m into. Like best black metal of 2023, best jazz of 2023, etc etc…

Other than that, bandcamp and YouTube are the biggest. I honestly buy more on bandcamp these days than I torrent though. It’s such a great site.


An RFC that essentially boiled down to saying, in excruciating detail, that I am qualified for the job I was hired for and that I can be trusted not to break the website.


It’s completely overkill for pretty much everyone but I have been thinking about building a kubernetes native client for months now.

Like the torrent should be treated as a normal resource with a Torrent CRD. It should be scheduled onto whichever node has available capacity and rescheduled onto a different node if it goes down. If allowed by the tracker, multiple instances could be run. You could set resource limits programmatically, easily configure block storage, build dashboards, export logs/metrics… It would be open ended enough that you could have interfaces built as browser extensions, web ui, mobile app, tui, cli and be unopinionated so much that the method for torrent ingestions could be left up to the used. HTTP request, watch directory, rss client, download manager… You could even do stuff like throw magnet links into a queue… etc, etc…

I keep thinking it would be a great project but I just do not have the spare time to dedicate to it… I imagine it could be used for large scale deployments for something like the Internet archive or whatever.


In the case of small little indie bands, they often aren’t on torrent sites at all. Given the choice between Spotify and Bandcamp, I’m going to buy the album on Bandcamp 100% of the time. I can contribute to the artist more and usually end up with a vinyl copy on the process.

Pirating has always been a solution to poor ease of access to content. If I could pay a legitimate subscription for a site with the catalog of PTP or RED, I would do it in a heartbeat. It will never happen though.


I remember reading about COBOL devs being able to earn pretty solid incomes. I thought, “Let’s check this out.” I found a site that did common things in different languages and compared them. Reverse string in ruby. 1 line. Reverse string in COBOL. 40-50 lines. “Ehh… Maybe I don’t want to learn this after all.”


I have really enjoyed the small projects I have written in rust but, being in the SRE space, it would be irresponsible and selfish to use anything other than bash, python, or go. It feels like the overwhelming majority of tools I use these days have been written in go.


Mostly as kodi/plex front ends. I’ve set them up as a kubernetes cluster in the past but they didn’t have enough ram to run my torrent client. Now I just use an old Thinkpad running talos.


I still can’t believe voters didn’t give him the boot.

That was the Trump-era midterms IIRC… He’s there now because he has a D next to his name. That’s about it.


I had this just the other day actually. I am in SRE and the overwhelming majority of the code I write is terraform, instructions for a Dockerfile or CI pipeline, or just some random ass bash to compile information… I don’t actually do that much coding at all and what I do end up doing is pretty rudimentary.

EVERY job interview I go on though wants to do some leetcode style code puzzle. The one I got the other day I just said to the guy, “I honestly don’t know how to do this. The code I write isn’t fancy or clever. It’s mostly just to get things done.” We worked through it together though and I understood the logic by the end but they were mostly holding my hand. What I was doing was throwing out ideas and trying to work out pros/cons with the interviewer. That was enough apparently because he green-lit me for the next round…

These type of interview questions really annoy me because they are not representative of the job in any way. In addition to work, I also have a life that does not involve computers. After putting in a 40 hour week on engineering stuff, grinding leetcode over a weekend is a hard sell.


Certbot in cron if you’re still managing servers.

I’m using cert-manager in kube.

I haven’t manually managed a certificate in years… Would never want to do it again either.


It auto discovers machines/instances/VMs/containers in the mesh and figures out the secure routing on the fly. If you couldn’t ensure a consistent IP from the home address it wouldn’t matter… The service mesh would work it out.

It is probably overkill for this project though… Something to think about…


Would love to see some base salaries posted along with the responses. If you’re getting paid shit base maybe this is how they make up for it?

I’m in SRE. No on call benefits at all. Base salary is 175k USD plus 20% annual bonus.


With Prometheus I would add a section to the scrap config to rewrite the labels attached to each metric. Does such a thing exist for telegraf? I’ve never used it.

Or could you change the grafana query to just aggregate the values for all pods in that deployment?


Istio is a service mesh. You basically run proxies on the vps and the rpi. The apps make calls to localhost and the proxy layer figures out the communication between each proxy.

Duck dns is just a dynamic dns service. It gives you a stable address even if you don’t have a static ip.


This would be nice because I don’t need a static ip and I don’t have to leak my ip address.

How does the VPS know how to find your rpi?

Could you not just use something like duck dns on a cronjob and give out that url?

I would also need to figure out how to supply ejabberd with the correct certificates for the domain. Since it’s running on a different computer than the reverse proxy, would I have to somehow copy the certificate over every time it has to be renewed?

Since the VPS is doing your TLS termination, you would need an encrypted tunnel of some sort. Have you considered something like Istio? That provides mTLS out of the box really… I’ve never seen it for this kind of use case but I don’t see why it wouldn’t work.


Figured this would be one of the responses. Thanks. I don’t interact with node very often. I assumed there was a better option but wasn’t sure which… This is just the first result.


You can do it bro. Dockerfiles are basically just shell scripts with a few extras.

It uses npm to build so start with a node base container. You can find them on docker hub. Alpine-based images are a good starting point.

FROM appdynamics/nodejs-agent:23.5.0-19-alpine 

RUN git clone https://github.com/stophecom/sharrr-svelte.git && \ 
    cd sharrr-svelt/ && \
    npm run build

If you need to access files from outside of the container, include a VOLUME line. If it needs to be accessible from a specific network port, add an EXPOSE line. Add a CMD line at the end to start whatever command needs to be run to start the process.

Save your Dockerfile and build.

docker build . -t my-sharrr-image

There are build instructions in the readme. What’s stopping you?


Yep. IO.

OP, this might be overkill for you but it might be worth standing up a grafana/prometheus stack… You’d be able to see this stuff a lot faster and potentially narrow in on a root cause.




I still get nagged constantly to add a birthday to my Google account to make sure I am older than 13 or something… To satisfy some dumb law I think. The email account is like 15 years old though. How could I be under 13 if it’s that old?


I think the idea with soft serve us that you use hooks and use a dedicated ci/cd tool. I use adnanh/webhook for lightweight ci/cd on personal projects.


I think the typical approach is in the opposite direction… Write a comment about what the code should do then have the AI write the code, adjusting for any mistakes it makes.


node-0 node-1 node-2 …

Everything runs kubernetes so the names are mostly irrelevant.

Years ago I worked at a company who named everything after WoW characters. I wished murder was legal in those days.


Branching per environment gets to be a nightmare really quickly. I am trying to avoid that.

I am not worried about vendor lock in.


Single instance of github labels?
I'm trying to move my org into a more gitops workflow. I was thinking a good way to do promotions between environments would be to auto sync based on PR label. Thinking about it though, because you can apply the same label multiple times to different PRs, I can see situations where there would be conflicts. Like a PR is labeled "qa" so that its promoted to the qa env, automated testing is started, a different change is ready, the PR is labeled "qa", and it would sync overwriting the currently deployed version in qa. I obviously don't want this. Is there a way to enforce only single instances of a label on a PR across a repository? Or maybe there is some kind a queue system out there that I'm not aware of? I'm using github, argocd, and circleci.
fedilink

Self-taught, senior SRE leveling up my skills?
Wondering if anyone else has been in a similar situation.. For some background, I installed my first Linux server as a teenager around 2000-2001. I started working in ops around 2007, transitioned into SRE around 2011, have been working in that space ever since, and I'm now comfortably sitting in Sr SRE rolls. For that entire time, I never did any formal training of any kind. I'm entirely self taught. Because of this more unconventional approach to this industry, I am positive that I have knowledge gaps. The thing is, I don't really feel affected by those knowledge gaps very often. I think I have written code in at least a half dozen languages.. I can pick a new language up pretty quickly too. What I'm writing isn't generally very large projects but I'm not typically writing large projects at work.. Since containers took over I feel like +90% is simple automation or glue code.. I've never really had a problem I couldn't solve in code though. The situations where I feel these gaps the most is in the interview process. Algorithm design might be important for some people but I really don't come across situations very often where I need to be concerned about perfect O(1) performance. System design questions during interviews aren't great either.. "How would you make this system better?" I can explain some things but the closer I get to the front end, the weaker I get.. I personally just have zero interest in front end development so I've never cared enough to learn it. Lately I feel like I've missed out on working in more interesting roles entirely because of these types of interviews. Sometimes not even because of failing a challenge.. Late last year I was interviewing with Etsy and the feedback I got was, "You didn't do anything wrong. Everyone on the team said yes but there was another candidate that everyone said yes to as well. They just had a little more experience than you did in a few areas. We only budgeted for one new engineer though so we took the other guy." Maybe I don't know what I don't know though.. I guess I'm wondering what a solution for this might be? Part time comp sci degree? Bootcamp? Library card and some willpower?
fedilink

No one wanted Biden in 2008 either. In 2020 the media kept saying he was the front runner with no evidence to support that… He was behind in polls and if he didn’t win the South Carolina primary he would have dropped out. Everyone would have moved on to Buttigieg or Bernie.

Now we’re essentially in the same place we were in in 2016…

“Who are you excited for?”

“Neither of them.”