The core of the system is its distributed document store. Bagri is designed to work with self-describing document formats like JSON or XML. At document parsing phase the system collects all unique paths and store them in system vocabulary. Document data elements (XML attributes and text elements, JSON values) are identified by document/path identifiers and stored in distributed data caches. Indexed values and transaction logs are stored in additional caches.
Registration of document types and their unique paths can be accomplished in two ways: via registration of corresponding document schemas (XSD) or on the fly during the incoming documents processing. In this way Bagri solves the major disadvantage of all self-describing document formats: its verbosity.
The document structure is separated from data elements and stored efficiently in distinct vocabulary entries, as opposite to the original document structure where every document keeps its data structure inside the document itself.