So recently been spending time configuring my selfhosted services with notifications usint ntfy. I’ve added ntfy to report status on containers and my system using Beszel. However, only 12 out of my 44 containers seem to have healthcheck “enabled” or built in as a feature. So im now wondering what is considered best practice for monitoring the uptime/health of my containers. I am already using uptimekuma, with the “docker container” option for each of my containers i deem necessary to monitor, i do not monitor all 44 of them 😅

So I’m left with these questions;

  1. How do you notify yourself about the status of a container?
  2. Is there a “quick” way to know if a container has healthcheck as a feature.
  3. Does healthcheck feature simply depend on the developer of each app, or the person building the container?
  4. Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

Thanks for any input!

  • irmadlad@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    Dozzle will tell you just about everything you want to know about the health of a container. Sadly, to my knowledge, it does not integrate with any notification platforms like nfty, even though there is a long standing request for that feature.

  • nesc@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago
    1. I don’t, in general all of them are restarted automatically. I have monitoring configured for services, not containers themselves.
    2. Yes, looking at the original Dockerfile/Containerfile if they have HEALTHCHECK keyword you can assume that they do something.
    3. Person building the container, often it doesn’t make sense to create a healthcheck at all. Some times healthcheck feature is provided by application as well, it still needs to be part of the containerfile.
    4. Better to monitor your service (application+db+proxy+queue+whatever), not containers in isolation.
  • realitaetsverlust@piefed.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 days ago

    How do you notify yourself about the status of a container?

    I usually notice if a container or application is down because that usually results in something in my house not working. Sounds stupid, but I’m not hosting a hyper available cluster at home.

    Is there a “quick” way to know if a container has healthcheck as a feature.

    Check the documentation

    Does healthcheck feature simply depend on the developer of each app, or the person building the container?

    If the developer adds a healthcheck feature, you should use that. If there is none, you can always build one yourself. If it’s a web app, a simple HTTP request does the trick, just validate the returned HTML - if the status code is 200 and the output contains a certain string, it seems to be up. If it’s not a web app, like a database, a simple SELECT 1 on the database could tell you if it’s reachable or not.

    Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

    If you only run a bunch of web services that you use on demand, monitoring the HTTP requests to each service is more than enough. Caddy being a single point of failure is not a problem because your caddy being dead still results in the service being unusable. And you will immediately know if caddy died or the service behind it because the error message looks different. If the upstream is dead, caddy returns a 502, if caddy is dead, you’ll get a “Connection timed out”

    • lps2@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      For databases, many like postgres have a ping / ready command you can use to ensure it’s up and not have the overhead of an actual query! Redis is the same way (I feel like pg and redis health checks covers a lot of the common stack patterns)

  • manwichmakesameal@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 days ago

    I use uptimekuma with notifications through home assistant. I get notifications on my phone and watch. I had notifications set up to go to a room on my matrix homeserver but recently migrated it and don’t feel like messing with the room.

  • Zelaf@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 days ago

    So I’m also using Beszel and Ntfy to track my systems because it’s lightweight and very very easy. Coming from having tried Grafana and Prometheus and different TSDBs I felt like I was way better off.

    I’ve been following Beszels development closely because it was previously missing features like container monitoring and systemd monitoring which I’m very thankful for them having added recently and I use containers as my primary way of hosting all my applications. The “Healthy” or “Unhealthy” status is directly reported by Docker itself and not something Beszel monitors directly so it has to be configured, either by the configuration in the Dockerfile of the container image or afterwards using the healthcheck options when running a container.

    As some other comments mentioned, some containers do come with a healthcheck built in which makes docker auto-configure and enable that healthcheck endpoint. Some containers don’t have a healthcheck built into the container build file and some have documentation for adding a healthcheck to the docker run command or compose file. Some examples are Beszel and Ntfy themselves.

    For containers that do not have a healthcheck built into the build file it is either documented how to add it to the compose or you have to figure out a way to do it yourself. For docker images that are built using a more standard image like Alpine, Debian or others you usually have something like curl installed. If the service you are running has a webpage going you can use that. Some programs have a healthcheck command built into it that you can also use.

    As an example, the postgresql program has a built in healthcheck command you can use of that’ll check if the database is ready. The easiest way to add it would be to do

        healthcheck:
          test: ["CMD", "pg_isready", "-U", "root",  "-d", "db_name"]
          interval: 30s
          retries: 5
          start_period: 60s
    

    That’ll run the command inside the container pg_isready -U root -d db_name every 30 seconds but not before 60 seconds to get the container up and running. Options can be changed depending on the speed of the system.

    Another example, for a container that has the curl program available inside it you can add something like

        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:3000/"]
          interval: 1m
          retries: 3
    

    This will run curl -f http://localhost:3000/ every 1 minute. If either of the above examples would exit with an exit code higher than 0 Docker would report the container has unhealthy. Beszel will then read that data and report back that the container is not healthy. Some web apps have something along the line of a /health endpoint you can use the curl command with as well.

    Unless the developer has spent some extra time on the healthchecks it is often just a basic way to see that the program inside the container is running. However, usually the container itself exits if the program it is running crashes or quits. So a healthcheck isn’t always necessary as the healthcheck will be that the container has abruptly stopped. This is why things like Uptime Kuma is something to consider running alongside Beszel because it can monitor when a web address or similar is down as well even if a container exits which as of now Beszel is still sadly lacking.

    I would recommend you read up on the Docker Compose spec for healthchecks since with the other options you can also do things like timeouts and what not, combining that with whatever program you’re running with the healthcheck you can get very creative with it if you must.

    My personal recommendation would be to sticking with Uptime Kuma regarding proper service availability healthchecks since it’ll be easier to configure and get an overview of things like slow load times of web pages and containers that have stopped while using Beszel to monitor performance and resource usage.

    • Sips'@slrpnk.netOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 days ago

      Maybe a transition to a cluster homelab should be the goal of 2026, would be fun.

      • Noxy@pawb.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 days ago

        maybe! three raspis and k3s have served me mostly well for years, tho with usb/sata adapters cuz the microsd was getting rather unreliable after awhile

        • Sips'@slrpnk.netOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 days ago

          Nice one that, fortunately i just rebuilt my server with an i5-12400 new fancy case amd slowly transitioning to an all in ssd build! I would probably lean towards a singlenode cluster using Talos.

          • Noxy@pawb.social
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 days ago

            I haven’t heard of Talos before, sounds like it’s not fully open source?

            • Sips'@slrpnk.netOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 day ago

              Talos is really awesome, its a minimal OS strictly built to run kubernetes. We use it at work and its running in production for a lot of people. Its extremely minimal and can only be used via its own api, talosctl command. Its minimalism makes it great for security and less resource heavy than alternatives.

              Check this out for a quick’ funny taste of why one should consider using Talos >>

              [60sec video from Sidero Labs, creators of Talos] https://www.youtube.com/watch?v=UiJYaU16rYU

              Talos is under MPL 2.0, afaik that is open-source.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 days ago

    If I go to its web interface (because everything is a web interface) and it’s down, then I know it has a problem.

    I could set up monitoring, but I wouldn’t care enough to fix it until I had free time to use it either.

    • tuckerm@feddit.online
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Same here. I’m the only user of my services, so if I try visiting the website and it’s down, that’s how I know it’s down.

      I prefer phrasing it differently, though. “With my current uptime monitoring strategy, all endpoints serve as an on-demand healthcheck endpoint.”

      One legitimate thing I do, though, is have a systemd service that starts each docker compose file. If a container crashes, systemd will notice (I think it keeps an eye on the PIDs automatically) and restart them.