Ad Widget

Collapse

Discussion thread for official Zabbix Template Ceph

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • jeeva
    Junior Member
    • May 2021
    • 4

    #16
    Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
    Where does one get this zabbix-agent2 compiled with ceph?

    Comment

    • kuziev
      Junior Member
      • Jan 2022
      • 3

      #17
      Originally posted by setsimmo
      I tried to troubleshoot the issues with ceph.osd.stats and found the following:

      The default RESTful permissions do not allow the pg dump* commands to be run by the user as created. I couldn't track down where to set permissions in Ceph for RESTful module users, as the users created by "ceph restful create-key" do not show up under other modules.

      More details on the RESTful API module can be found here: https://docs.ceph.com/en/latest/mgr/restful/
      FIX auth for mgr service and restart mgr service
      Code:
      ceph auth caps mgr.$id mon 'allow *' osd 'allow *' mds 'allow *'

      Comment

      • bilbolodz
        Junior Member
        • Jan 2019
        • 14

        #18
        I'm trying to figure out how to deal with detection active manager on CEPH and choosing host to ask for API interface. According documentation restful API starts only on manager which is active at that moment. My active manager can be at one of 6 hosts!

        Comment

        • kuziev
          Junior Member
          • Jan 2022
          • 3

          #19
          Originally posted by bilbolodz
          I'm trying to figure out how to deal with detection active manager on CEPH and choosing host to ask for API interface. According documentation restful API starts only on manager which is active at that moment. My active manager can be at one of 6 hosts!

          nginx config
          Code:
          upstream cephmgrrestfull {
              server 192.168.99.51:8003  weight=1 max_fails=1 fail_timeout=120 backup;
              server 192.168.99.52:8003  weight=1 max_fails=1 fail_timeout=120 ;
              server 192.168.99.53:8003  weight=2 max_fails=1 fail_timeout=120 backup;
          }
          server {
              listen 8003 ssl http2 default_server;
          
              server_name server_domain_or_IP;
              include snippets/self-signed.conf;
              include snippets/ssl-params.conf;
          
                  location / {
                      proxy_pass         https://cephmgrrestfull;
                  }
          }

          Comment

          • bilbolodz
            Junior Member
            • Jan 2019
            • 14

            #20
            Thanks but it require "external to CEPH cluster service" (nginx) running somewhere which I'd like to avoid. It's very strange that CEPH doesn't offer built in service for API redundancy. They already offer HA solution for S3 (HA proxy) and dashboard (HTTP redirect to active manager) so why not for API?

            Comment

            • kuziev
              Junior Member
              • Jan 2022
              • 3

              #21
              Nginx is used as a web server for zabbix, if you have apache you can probably do it on it ( https://httpd.apache.org/docs/2.4/mo..._balancer.html ).


              Comment

              • bilbolodz
                Junior Member
                • Jan 2019
                • 14

                #22
                That's indeed could be a smart idea to fire up HA for CEPH on zabbix server itself!
                Last edited by bilbolodz; 10-08-2022, 13:01.

                Comment

                • bilbolodz
                  Junior Member
                  • Jan 2019
                  • 14

                  #23
                  Actually my work mate found a better solution which NOT require any additional software:
                  • install zabbix agent 2 on every node which can be running mgr. It's generally a good idea to add it to "usual monitoring" to zabbix
                  • register ALL possible mgr nodes IP under common dns name (e.x. ceph-mgr.intra.blabla.com)
                  • in zabbix create host (e.x ceph-cluster) which represents your ceph cluster and set Interface to "Agent" BUT using DNS name: ceph-mgr.intra.blabla.com
                  • assing Ceph by Zabbix agent 2 template to host ceph-cluster
                  • set {$CEPH.CONNSTRING} to value: https://ceph-mgr.intra.blabla.com:8003 for host ceph-cluster
                  • add ceph-cluster to Hostname directive in zabbix_agent2.conf file on EVERY node can run mgr (Hint: It could be multiple names separated by comas in Hostname directive), restart agent
                  • enjoy working ceph cluster monitoring

                  Comment

                  • tinomms
                    Junior Member
                    • Mar 2022
                    • 4

                    #24
                    Hi there folks.

                    I need some help with this please. I can't find any clear, easy to follow documentation for how to set up this plugin with our CEPH cluster hosted on Proxmox.

                    My question therefore might have an obvious answer to some, but the documentation doesn't say, so go easy on me. Is the APIkey something that is generated at the command line (if so how?) or is it something that is input (kinda like a password) within the ceph.conf file of the plugin? If none of those then what is it and how is it generated please??

                    Thanks

                    Tino

                    Comment

                    • rmday
                      Junior Member
                      • Mar 2023
                      • 1

                      #25
                      Hello!

                      I am a new poster because I am new to zabbix along with ceph. My group upgraded our ceph cluster (vended from Croit), to a new version and since that point, we are noticing that none of the data is getting to our zabbix instance. It seemed to stop the day of the upgrade. As a result, we have OSDs showing as down and others up, but would not see any change in that except direct from the ceph management node.

                      We have the zabbix agent on all the nodes and the [ceph integration](https://www.zabbix.com/integrations/ceph) had been working for over a year. I am just not sure how to get started troubleshooting this and cannot make a new topic.

                      Any "get started troubleshooting" help would be appreciated!

                      Comment

                      • ttyzzx
                        Junior Member
                        • Sep 2023
                        • 1

                        #26
                        Hi,

                        It took me somewhat longer than it should have to get this working. I have a couple of suggestions for the README:

                        The first line under Setup states "Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin." I was unsure what this meant, I'm using official packages hand have no need to "compile" anything. Perhaps stating the official packages already support this would be less confusing.

                        It would be useful to reference that this uses the CEPH RESTAPI, and either give a few commands to activate it, or link to the appropriate CEPH page: https://docs.ceph.com/en/latest/mgr/restful/ - Finding this was my light-bulb moment, I was struggling because it was not clear how zabbix and ceph were glued together.

                        It would be useful to highlight the existence of the configuration file found at /etc/zabbix_agent2.d/plugins.d/ceph.conf.

                        These things are probably really obvious to those who know about them, but can be real stumbling blocks to those of us coming fresh to this.

                        Cheers,

                        Chris

                        Comment

                        • jartoun
                          Junior Member
                          • Apr 2024
                          • 1

                          #27
                          Hi everyone,

                          I am trying to get this template to work with a rook-ceph cluster... I have enabled the dashboard, the restful api and created a api user, however the zabbix template does not work.
                          I always get {"status": "401 Unauthorized", "detail": "You are not authorized to access that resource", "request_id": "252990b4-55ce-4f6b-8990-06943f624129"}


                          2 questions...

                          1. Has anyone actually got this template to work with rook-ceph, in that case, how??
                          2. How do I change default value strings? for example the template seems to add port number 8003 to the #CONNSTRING macro, is there any way to change that?

                          BR
                          jartoun

                          Comment

                          • bbrendon
                            Senior Member
                            • Sep 2005
                            • 870

                            #28
                            Since this is using from what I understand the old API and causes memory leaks in ceph-mgr because of ceph bugs, is there a new template for the new API coming? Has anyone started working on it?

                            HTML Code:
                            https://tracker.ceph.com/issues/59580
                            https://www.reddit.com/r/ceph/comments/1ecp6rf/problem_with_restful_module/
                            https://www.spinics.net/lists/ceph-users/msg77420.html
                            Unofficial Zabbix Expert
                            Blog, Corporate Site

                            Comment

                            • lrizzo_inap
                              Junior Member
                              • Aug 2023
                              • 1

                              #29
                              Hello,

                              As stated above in multiple messages, a Ceph cluster will have usually multiple managers running at the same time, but zabbix, by relying on the agent to access the ceph rest api, seems to be limited to connecting to only one host at a time; in case of a failure of the node that hosts the manager checked by zabbix, one could/would lose any visibility on the ceph cluster for a prolonged period of time and even relying on DNS like suggested above might require quite some time before any visibility/monitoring is restored.
                              is there any possibility to load balance the access to the zabbix agent with something like ha proxy?

                              I am a little inexperienced with haproxy (and with the inner workings of zabbix agent <--> zabbixproxy/zabbix server connectivity) and my first attempt at adding a haproxy service for port 10050 is not working properly even after adding the haproxy IPs in zabbix configs as servers; has anyone attempted something like this successfully or have any suggestion on which settings/configurations might help?

                              Thanks

                              Comment


                              • lrizzo_inap
                                lrizzo_inap commented
                                Editing a comment
                                while I have not yet fully completed the testing, this configuration seems to fit the bill as zabbix-get gets a successful agent.ping
                                you might need to make each manager active one at a time and run "ceph restful create-self-signed-cert" on each one (or install the proper certificate) so to have the rest api properly restart with the cert before it can be accessible for the health check upon manager failover.

                                HA proxy frontend/backend configs (adjust server name/fqdn/ip as needed):

                                frontend zabbix_agent_frontend
                                bind *:10050
                                mode tcp
                                default_backend ceph_mgr_agents

                                backend ceph_mgr_agents
                                mode tcp
                                option ssl-hello-chk
                                balance first

                                server mgr1 mgr1.ceph.monitoring:10050 check port 8003 inter 5s fall 3 rise 2
                                server mgr2 mgr2.ceph.monitoring:10050 check port 8003 inter 5s fall 3 rise 2
                                server mgr3 mgr3.ceph.monitoring:10050 check port 8003 inter 5s fall 3 rise 2
                            • Cond3nz
                              Junior Member
                              • Jul 2025
                              • 1

                              #30
                              Originally posted by setsimmo
                              I tried to troubleshoot the issues with ceph.osd.stats and found the following:

                              The default RESTful permissions do not allow the pg dump* commands to be run by the user as created. I couldn't track down where to set permissions in Ceph for RESTful module users, as the users created by "ceph restful create-key" do not show up under other modules.

                              More details on the RESTful API module can be found here: https://docs.ceph.com/en/latest/mgr/restful/

                              Thanks for your information, it just helped me:
                              Code:
                              ceph auth caps mgr.{node name} mon "allow profile mgr, allow command 'pg dump' " mds "allow *" osd "allow *"
                              Last edited by Cond3nz; 09-07-2025, 19:50.

                              Comment

                              Working...