Mustang solves the problem of efficiently indexing Boolean expressions (both Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF)) in a high-dimensional multi-valued attribute space. The goal is to rapidly find the set of Boolean expressions that evaluate to true for a given assignment of values to attributes.
Applications include:
- Online advertising — Boolean expressions represent advertiser targeting criteria; attribute assignments represent visiting users.
- General targeting systems — Any system where rules/criteria must be matched against incoming events at scale.
Mustang uses an inverted list data structure adapted from ranked information retrieval, enabling it to efficiently return the top-N matching expressions — a critical capability in systems with ranked advertising, notifications, or personalization.
| Module | Description |
|---|---|
mustang-models |
Domain model: Criteria, Predicate, Detail, PreOperation, scoring, ratification types |
mustang-core |
Core indexing & search engine (MustangEngine) |
mustang-dw-bundle |
Optional Dropwizard bundle for exposing a REST API over the engine |
<dependency>
<groupId>com.phonepe</groupId>
<artifactId>mustang-core</artifactId>
<version>${mustang.version}</version>
</dependency><dependency>
<groupId>com.phonepe</groupId>
<artifactId>mustang-dw-bundle</artifactId>
<version>${mustang.version}</version>
</dependency>Note: Find the latest version on Maven Central.
Criteria represents a Boolean expression in one of two normalized forms:
-
DNF (Disjunctive Normal Form) — disjunction of conjunctions
(A ∈ {a1,a2} ∧ B ∉ {b1,b2}) ∨ (A ∈ {a1,a3} ∧ D ∉ {d1}) -
CNF (Conjunctive Normal Form) — conjunction of disjunctions
(A ∈ {a1,a2} ∨ B ∉ {b1,b2}) ∧ (A ∈ {a1,a3} ∨ D ∉ {d1})
Composition is a set of Predicates:
Conjunction (∧)— satisfied only when all constituent predicates evaluate totrueDisjunction (∨)— satisfied when any constituent predicate evaluates totrue
Predicate is a conditional:
IncludedPredicate— inclusion (∈)ExcludedPredicate— exclusion (∉)
Detail carries the caveat that a predicate enforces:
Detail |
Caveat |
|---|---|
EqualityDetail |
EQUALITY |
RegexDetail |
REGEX |
RangeDetail |
RANGE (gt, gte, lt, lte, between — open & closed) |
VersioningDetail |
VERSIONING |
SubSetDetail |
SUBSET |
SuperSetDetail |
SUPERSET |
EqualSetDetail |
EQUALSET |
ExistenceDetail |
EXISTENCE (attribute must be present) |
NonExistenceDetail |
NON_EXISTENCE (attribute must be absent) |
Caveat |
Data Types |
|---|---|
EQUALITY |
String, Number, Boolean |
SUBSET |
Collections (List, Set) |
SUPERSET |
Collections (List, Set) |
EQUALSET |
Collections (List, Set) |
REGEX |
String |
RANGE |
Number |
VERSIONING |
String |
EXISTENCE |
Any |
NON_EXISTENCE |
Any |
PreOperations allow lightweight transformations on attribute values before evaluation:
| PreOperation | Description |
|---|---|
SubStringPreOperation |
Extract substring by index range |
ModuloPreOperation |
Compute value % rhs |
DivisionPreOperation |
Compute value / rhs |
MultiplicationPreOperation |
Compute value * rhs |
AdditionPreOperation |
Compute value + rhs |
SubtractionPreOperation |
Compute value - rhs |
LengthPreOperation |
Returns string/collection length |
SizePreOperation |
Returns collection size |
BinaryConversionPreOperation |
Convert number to binary string |
DateTimePreOperation |
Extract date/time component |
ObjectMapper mapper = new ObjectMapper();
MustangEngine engine = MustangEngine.builder().mapper(mapper).build();Criteria dnf = DNFCriteria.builder()
.id("C1")
.conjunction(Conjunction.builder()
.predicate(IncludedPredicate.builder()
.lhs("$.a")
// Optional: transform the attribute value before matching
.preOperation(SubStringPreOperation.builder().beginIndex(1).endIndex(2).build())
.detail(EqualityDetail.builder()
.values(Sets.newHashSet("A1", "A2", "A3"))
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.n")
.detail(RangeDetail.builder()
.lowerBound(3)
.includeLowerBound(true) // n >= 3
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.x")
.preOperation(ModuloPreOperation.builder().rhs(4).build())
.detail(RangeDetail.builder()
.upperBound(3) // (x % 4) < 3
.build())
.build())
.build())
.build();Criteria cnf = CNFCriteria.builder()
.id("C2")
.disjunction(Disjunction.builder()
.predicate(IncludedPredicate.builder()
.lhs("$.a")
.detail(EqualityDetail.builder()
.values(Sets.newHashSet("A1", "A2"))
.build())
.build())
.predicate(ExcludedPredicate.builder()
.lhs("$.b")
.detail(RegexDetail.builder()
.regex("B.?")
.build())
.build())
.predicate(IncludedPredicate.builder()
.lhs("$.v")
.detail(VersioningDetail.builder()
.check(CheckType.ABOVE)
.baseVersion("1.2.3.4-alpha")
.build())
.build())
.build())
.build();// Single
engine.add("my_index", criteria);
// Bulk
engine.add("my_index", Arrays.asList(criteria1, criteria2, criteria3));// Build the evaluation context from a JSON event
JsonNode event = mapper.readTree("{\"a\":\"A1\",\"b\":\"B3\",\"n\":5,\"p\":true}");
EvaluationContext context = EvaluationContext.builder().node(event).build();
// Search (with scoring/ranking)
Set<String> results = engine.search("my_index", context);
// Search (skip scoring for raw speed)
Set<String> results = engine.search("my_index", context, false);engine.update("my_index", updatedCriteria); // upsert semantics
engine.delete("my_index", criteria);List<Criteria> matching = engine.scan(criteriaList, context);boolean result = engine.evaluate(criteria, context);// Build a fresh index under a temp name, then atomically replace
engine.add("my_index_new", allCriteria);
engine.replace("my_index", "my_index_new");engine.ratify("my_index"); // run in background
// Poll for results
RatificationResult result = engine.getRatificationResult("my_index");SimilarityStats stats = engine.checkSimilarity("my_index", candidateCriteria);String exported = engine.exportIndexGroup("my_index");
engine.importIndexGroup(exported);
Set<String> results = engine.search("my_index", context);Supply weights on predicates and the engine scores criteria based on overlap with the assignment:
// Score_conj(E, S) = Σ w_E(A,v) * w_S(A,v) for (A,v) ∈ IN(E) ∩ S
// DNFCriteria score = max over conjunctions
// CNFCriteria score = sum over disjunctionsmustang-dw-bundle exposes read-only REST endpoints over a MustangEngine instance:
// In your Application class
bootstrap.addBundle(new MustangBundle<>(config -> config.getMustangConfig()));Endpoints include /mustang/search, /mustang/scan, /mustang/debug, and /mustang/ratify.
3.x is fully backward-compatible with 2.x. Necessary transformations are applied transparently. All users on 2.x are encouraged to upgrade.
Full documentation is available at the project docs site.
To build the docs locally:
cd docs
pip install -r requirements.txt
zensical build --clean
# Output is in docs/site/See CONTRIBUTING.md.
See CODE_OF_CONDUCT.md.
Distributed under the Apache 2.0 License.
The core indexing algorithm is based on:
"Indexing Boolean Expressions" — Proceedings of VLDB 2009. See reference/vldb09-indexing.pdf.