Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 25 additions & 23 deletions conf/defaults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ java.library.path: "/usr/local/lib:/opt/local/lib:/usr/lib:/usr/lib64"
storm.local.dir: "storm-local"
storm.log4j2.conf.dir: "log4j2"
storm.zookeeper.servers:
- "localhost"
- "localhost"
storm.zookeeper.port: 2181
storm.zookeeper.root: "/storm"
storm.zookeeper.session.timeout: 20000
Expand All @@ -52,7 +52,7 @@ storm.nimbus.retry.intervalceiling.millis: 60000
storm.nimbus.zookeeper.acls.check: true
storm.nimbus.zookeeper.acls.fixup: true

storm.auth.simple-white-list.users: []
storm.auth.simple-white-list.users: [ ]
storm.cluster.state.store: "org.apache.storm.cluster.ZKStateStorageFactory"
storm.meta.serialization.delegate: "org.apache.storm.serialization.GzipThriftSerializationDelegate"
storm.codedistributor.class: "org.apache.storm.codedistributor.LocalFileSystemCodeDistributor"
Expand All @@ -62,7 +62,7 @@ storm.health.check.timeout.ms: 5000
storm.disable.symlinks: false

### nimbus.* configs are for the master
nimbus.seeds : ["localhost"]
nimbus.seeds: [ "localhost" ]
nimbus.thrift.port: 6627
nimbus.thrift.threads: 64
nimbus.thrift.max_buffer_size: 1048576
Expand Down Expand Up @@ -163,10 +163,10 @@ storm.blobstore.acl.validation.enabled: false
### supervisor.* configs are for node supervisors
# Define the amount of workers that can be run on this machine. Each worker is assigned a port to use for communication
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6700
- 6701
- 6702
- 6703
supervisor.childopts: "-Xmx256m"
supervisor.run.worker.as.user: false
#how long supervisor will wait to ensure that a worker process is started
Expand All @@ -184,8 +184,8 @@ supervisor.worker.heartbeats.max.timeout.secs: 600
#For topology configurable heartbeat timeout, maximum allowed heartbeat timeout.
worker.max.timeout.secs: 600
supervisor.enable: true
supervisor.supervisors: []
supervisor.supervisors.commands: []
supervisor.supervisors: [ ]
supervisor.supervisors.commands: [ ]
supervisor.memory.capacity.mb: 4096.0
#By convention 1 cpu core should be about 100, but this can be adjusted if needed
# using 100 makes it simple to set the desired value to the capacity measurement
Expand Down Expand Up @@ -278,6 +278,8 @@ topology.max.task.parallelism: null
topology.max.spout.pending: null # ideally should be larger than topology.producer.batch.size. (esp. if topology.batch.flush.interval.millis=0)
topology.state.synchronization.timeout.secs: 60
topology.stats.sample.rate: 0.05
topology.stats.ewma.enable: false
Comment thread
rzo1 marked this conversation as resolved.
topology.stats.ewma.smoothing.factor: 0.0625
topology.builtin.metrics.bucket.size.secs: 60
topology.fall.back.on.java.serialization: false
topology.worker.childopts: null
Expand All @@ -287,16 +289,16 @@ topology.worker.shared.thread.pool.size: 4

# Spout Wait Strategy - employed when there is no data to produce
topology.spout.wait.strategy: "org.apache.storm.policy.WaitStrategyProgressive"
topology.spout.wait.park.microsec : 100 # park time for org.apache.storm.policy.WaitStrategyPark. Busy spins if set to 0.
topology.spout.wait.park.microsec: 100 # park time for org.apache.storm.policy.WaitStrategyPark. Busy spins if set to 0.

topology.spout.wait.progressive.level1.count: 0 # number of iterations to spend in level 1 [no sleep] of WaitStrategyProgressive, before progressing to level 2
topology.spout.wait.progressive.level2.count: 0 # number of iterations to spend in level 2 [parkNanos(1)] of WaitStrategyProgressive, before progressing to level 3
topology.spout.wait.progressive.level3.sleep.millis: 1 # sleep duration for idling iterations in level 3 of WaitStrategyProgressive

# Bolt Wait Strategy - employed when there is no data in its receive buffer to process
topology.bolt.wait.strategy : "org.apache.storm.policy.WaitStrategyProgressive"
topology.bolt.wait.strategy: "org.apache.storm.policy.WaitStrategyProgressive"

topology.bolt.wait.park.microsec : 100 # park time for org.apache.storm.policy.WaitStrategyPark. Busy spins if set to 0.
topology.bolt.wait.park.microsec: 100 # park time for org.apache.storm.policy.WaitStrategyPark. Busy spins if set to 0.

topology.bolt.wait.progressive.level1.count: 1 # number of iterations to spend in level 1 [no sleep] of WaitStrategyProgressive, before progressing to level 2
topology.bolt.wait.progressive.level2.count: 1000 # number of iterations to spend in level 2 [parkNanos(1)] of WaitStrategyProgressive, before progressing to level 3
Expand Down Expand Up @@ -363,20 +365,20 @@ blacklist.scheduler.assume.supervisor.bad.based.on.bad.slot: true

dev.zookeeper.path: "/tmp/dev-storm-zookeeper"

pacemaker.servers: []
pacemaker.servers: [ ]
pacemaker.port: 6699
pacemaker.base.threads: 10
pacemaker.max.threads: 50
pacemaker.client.max.threads: 2
pacemaker.thread.timeout: 10
pacemaker.childopts: "-Xmx1024m"
pacemaker.auth.method: "NONE"
pacemaker.kerberos.users: []
pacemaker.kerberos.users: [ ]
pacemaker.thrift.message.size.max: 10485760

#default storm daemon metrics reporter plugins
storm.daemon.metrics.reporter.plugins:
- "org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter"
- "org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter"
storm.daemon.metrics.reporter.interval.secs: 10

storm.metricstore.class: "org.apache.storm.metricstore.rocksdb.RocksDbStore"
Expand All @@ -399,8 +401,8 @@ storm.cgroup.inherit.cpuset.configs: false
# Configs for CGroup support
storm.cgroup.hierarchy.dir: "/cgroup/storm_resources"
storm.cgroup.resources:
- "cpu"
- "memory"
- "cpu"
- "memory"
storm.cgroup.hierarchy.name: "storm"
storm.supervisor.cgroup.rootdir: "storm"
storm.cgroup.cgexec.cmd: "/bin/cgexec"
Expand All @@ -419,12 +421,12 @@ storm.worker.min.cpu.pcore.percent: 0.0

storm.topology.classpath.beginning.enabled: false
worker.metrics:
"CGroupMemory": "org.apache.storm.metrics2.cgroup.CGroupMemoryUsage"
"CGroupMemoryLimit": "org.apache.storm.metrics2.cgroup.CGroupMemoryLimit"
"CGroupCpu": "org.apache.storm.metrics2.cgroup.CGroupCpu"
"CGroupCpuGuarantee": "org.apache.storm.metrics2.cgroup.CGroupCpuGuarantee"
"CGroupCpuGuaranteeByCfsQuota": "org.apache.storm.metrics2.cgroup.CGroupCpuGuaranteeByCfsQuota"
"CGroupCpuStat": "org.apache.storm.metrics2.cgroup.CGroupCpuStat"
"CGroupMemory": "org.apache.storm.metrics2.cgroup.CGroupMemoryUsage"
"CGroupMemoryLimit": "org.apache.storm.metrics2.cgroup.CGroupMemoryLimit"
"CGroupCpu": "org.apache.storm.metrics2.cgroup.CGroupCpu"
"CGroupCpuGuarantee": "org.apache.storm.metrics2.cgroup.CGroupCpuGuarantee"
"CGroupCpuGuaranteeByCfsQuota": "org.apache.storm.metrics2.cgroup.CGroupCpuGuaranteeByCfsQuota"
"CGroupCpuStat": "org.apache.storm.metrics2.cgroup.CGroupCpuStat"

# The number of buckets for running statistics
num.stat.buckets: 20
42 changes: 39 additions & 3 deletions docs/Metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,15 +180,22 @@ Similar to the tuple counting metrics storm also collects average latency metric

##### `__complete-latency`

The complete latency is just for spouts. It is the average amount of time it took for `ack` or `fail` to be called for a tuple after it was emitted. If acking is disabled this metric is likely to be blank or 0 for all values, and should be ignored.
The complete latency is just for spouts. It is the average amount of time it took for `ack` or `fail` to be called for a
tuple after it was emitted. If acking is disabled this metric is likely to be blank or 0 for all values, and should be
ignored.

##### `__execute-latency`

This is just for bolts. It is the average amount of time that the bolt spent in the call to the `execute` method. The higher this gets, the lower the throughput of tuples per bolt instance.
This is just for bolts. It is the average amount of time that the bolt spent in the call to the `execute` method. The
higher this gets, the lower the throughput of tuples per bolt instance.

##### `__process-latency`

This is also just for bolts. It is the average amount of time between when `execute` was called to start processing a tuple, to when it was acked or failed by the bolt. If your bolt is a very simple bolt and the processing is synchronous then `__process-latency` and `__execute-latency` should be very close to one another, with process latency being slightly smaller. If you are doing a join or have asynchronous processing then it may take a while for a tuple to be acked so the process latency would be higher than the execute latency.
This is also just for bolts. It is the average amount of time between when `execute` was called to start processing a
tuple, to when it was acked or failed by the bolt. If your bolt is a very simple bolt and the processing is synchronous
then `__process-latency` and `__execute-latency` should be very close to one another, with process latency being
slightly smaller. If you are doing a join or have asynchronous processing then it may take a while for a tuple to be
acked so the process latency would be higher than the execute latency.

##### `__skipped-max-spout-ms`

Expand All @@ -207,6 +214,35 @@ This metric indicates the overflow count last time BP status was sent, with a mi

This metric records how much time a spout was idle because the topology was deactivated. This is the total time in milliseconds, not the average amount of time and is not sub-sampled.

#### Tuple Jitter Metrics
To activate jitter-based metrics, `topology.stats.ewma.enable` must be set to `true`, which switches the system to a jitter estimation based on an Exponential Moving Average (EWMA).
In this model, jitter is dynamically updated by weighting new latency samples against historical data using a smoothing factor (`topology.stats.ewma.smoothing.factor`).
This parameter, which defaults to 0.0625 (equivalent to $1/16$ or a 4-bit right shift), determines the metric's reactivity: higher values make the jitter more sensitive to recent spikes, while lower values prioritize long-term stability.
Operators should be aware that enabling this feature triples the gauge count for every component-stream pair per task; this significant increase in metric cardinality can impact TSDB storage and costs, so backend capacity should be verified before deployment.

##### `__complete-jitter`

This metric is specific to spouts. It measures the variation (jitter) in the total completion time (end-to-end latency)
of tuples, calculated using the exponentially weighted moving average (EWMA) algorithm as defined in RFC 1889 §A.8 / RFC 3550 §A.8.
While `__complete-latency` indicates the average amount of time it took for a tuple to be fully processed by the
topology (from emission to the final ack), the jitter metric quantifies the consistency of that process. If acking is
disabled, this metric is likely to be blank or 0 and should be ignored.

##### `__execute-jitter`

This metric is specific to bolts. It measures the variation (jitter) in the time spent within the execute method,
calculated using the exponentially weighted moving average (EWMA) algorithm as defined in RFC 1889 §A.8 / RFC 3550 §A.8.
While `__execute-latency` provides the average time spent in the execute call, the jitter metric quantifies the
predictability of that execution time. It is a critical indicator of computational "smoothness".

##### `__process-jitter`

This metric is specific to bolts. It measures the variation (jitter) in the process latency, calculated using the
exponentially weighted moving average (EWMA) algorithm as defined in RFC 1889 §A.8 / RFC 3550 §A.8.
While `__process-latency` provides the average time a tuple spends being processed, the jitter metric quantifies the
stability of that processing time. It helps identify "noisy" execution environments where processing times fluctuate
significantly, even if the average remains within acceptable limits.

#### Error Reporting Metrics

Storm also collects error reporting metrics for bolts and spouts.
Expand Down
Loading
Loading