Greetings,
As a end-user/sys-admin of both Zabbix and Ceph, I'm working on a template that I hope to share back with the community. There exists several good ones already, however, the template I'm creating is for a Ceph storage system from a company that specializes in Ceph hardware.
I have no affiliation with the company other than as a user/admin at my place of employment.
I know that Ceph has a built in module for exporting data to Ceph and I do have a little experience with that when we ran our own Ceph-from-source. In fact, I'm planning on leveraging some of that template for ideas on graphs/triggers/ect when I get to that stage. However, this storage device outputs its metrics from Prometheus. Thanks to some help earlier in the Help forum, I can now pull in the data reliably from Prometheus and create a few graphs. I've exported my template and would really appreciate some feedback.
https://gitlab.com/i.am.stack/zabbix_ceph_softiron
Specifically:
1) I feel like I've got a ton of duplication. I wonder if there is a better way of doing things. For example, I have the primary metric that requests the information from Prometheus. Then in discovery I have one rule for each item prototype. That seems like a waste to me as I then have a ton of discovery rules with just one item prototype in each (I'm still building out triggers/graphs for them, but they are essentially still a one per rule). Is there a better way of doing this? Also, it really stinks that I can't clone the Discovery Rule and have it clone the sub-item as well which means I'm having to copy-paste a lot of information. This also makes me wonder if I'm doing it "wrong" if I'm having to manually duplicate with slight tweaks for each rule. I'd _really_ appreciate feedback on this as I still have a lot more items to add.
2) I wanted to keep samples of each metric in the Discovery Rule Description. I've not done it for all of them yet, but you can look at the "Rocksdb Compact" Discovery Rule for an example. It is a good place to put it as it doesn't show up anywhere I've found but could be useful information for someone else later on to see what data I'm working with to build the template. Is this wise/useful for people to know/have if they are going to use the template in the future?
3) Same question but for primary "Items"? For example "Pages Degraded" in the "Items" for the template. The difference here is that this will show up in the "Help bubble" when the user looks at Monitoring->Latest Data and I feel like it is less useful and more confusing to have the raw metric appear there, but I'm not sure where else to store it for others who may be interested in what the raw metric looks like. Or is it just best to leave a sample output in the Git repo and only that?
4) Right now, I'm still capturing a ton of raw data and trying to figure out what is best to keep/modify/adjust/ect. Prometheus has concepts of metric types: counters can _only_ go up or reset to zero, gauges can go up or down, histograms and summary are not in use by this company right now, untype is a "catch-all" bucket which for the moment I'm just capturing as raw text until I've got a better idea of what values this storage system will send. Are there good habits for capturing Prometheus data in Zabbix? Meaning, is there a "configure gauges like this and counters like this" best practice? I haven't found any yet, but I thought it was worth asking.
5) Is there anything I'm doing right now in the template that I could do better?
Thanks!
As a end-user/sys-admin of both Zabbix and Ceph, I'm working on a template that I hope to share back with the community. There exists several good ones already, however, the template I'm creating is for a Ceph storage system from a company that specializes in Ceph hardware.
I have no affiliation with the company other than as a user/admin at my place of employment.
I know that Ceph has a built in module for exporting data to Ceph and I do have a little experience with that when we ran our own Ceph-from-source. In fact, I'm planning on leveraging some of that template for ideas on graphs/triggers/ect when I get to that stage. However, this storage device outputs its metrics from Prometheus. Thanks to some help earlier in the Help forum, I can now pull in the data reliably from Prometheus and create a few graphs. I've exported my template and would really appreciate some feedback.
https://gitlab.com/i.am.stack/zabbix_ceph_softiron
Specifically:
1) I feel like I've got a ton of duplication. I wonder if there is a better way of doing things. For example, I have the primary metric that requests the information from Prometheus. Then in discovery I have one rule for each item prototype. That seems like a waste to me as I then have a ton of discovery rules with just one item prototype in each (I'm still building out triggers/graphs for them, but they are essentially still a one per rule). Is there a better way of doing this? Also, it really stinks that I can't clone the Discovery Rule and have it clone the sub-item as well which means I'm having to copy-paste a lot of information. This also makes me wonder if I'm doing it "wrong" if I'm having to manually duplicate with slight tweaks for each rule. I'd _really_ appreciate feedback on this as I still have a lot more items to add.
2) I wanted to keep samples of each metric in the Discovery Rule Description. I've not done it for all of them yet, but you can look at the "Rocksdb Compact" Discovery Rule for an example. It is a good place to put it as it doesn't show up anywhere I've found but could be useful information for someone else later on to see what data I'm working with to build the template. Is this wise/useful for people to know/have if they are going to use the template in the future?
3) Same question but for primary "Items"? For example "Pages Degraded" in the "Items" for the template. The difference here is that this will show up in the "Help bubble" when the user looks at Monitoring->Latest Data and I feel like it is less useful and more confusing to have the raw metric appear there, but I'm not sure where else to store it for others who may be interested in what the raw metric looks like. Or is it just best to leave a sample output in the Git repo and only that?
4) Right now, I'm still capturing a ton of raw data and trying to figure out what is best to keep/modify/adjust/ect. Prometheus has concepts of metric types: counters can _only_ go up or reset to zero, gauges can go up or down, histograms and summary are not in use by this company right now, untype is a "catch-all" bucket which for the moment I'm just capturing as raw text until I've got a better idea of what values this storage system will send. Are there good habits for capturing Prometheus data in Zabbix? Meaning, is there a "configure gauges like this and counters like this" best practice? I haven't found any yet, but I thought it was worth asking.
5) Is there anything I'm doing right now in the template that I could do better?
Thanks!