Data extraction and transformation via XQuery


As the system was initially designed to process XML documents there was no doubts in regards to a query language for data management. XQuery is a mature and very flexible language specially designed for semi-structured data manipulation and supported by wide IT community governed by W3C organisation. Recent extensions of the language to support maps, arrays and JSON serialization has shown that we did a right choice.

Do you want to compare XQuery with SQL? Ok, we’ll just present a few advantages of XQuery over SQL to make it short:

XQuery: the powerful data extraction (XPath) and transformation (XSLT) language

  • Selection/navigation features of XPath outperforms SQL by far
  • Much wider set of supported data types
  • A well organized wide set of common-purpose functions agreed by W3C standardization process
  • With ability to extend it developing custom server modules


Both client-side and server-side processing language

  • No equivalent in SQL. Every DB uses its own language on the server side Oracle: PL/SQL, MSSQL/Sybase: Transact SQL, etc..


Progressing very well by IT community

  • XQuery 1.0..3.1, XQuery update, Full Search


New features added regularly

  • XQuery: last standard extensions were added in 2015
  • SQL: no evolution last time. Latest standard changes were in 2011


But, how XQuery is used against data stored in distributed cache?

That is how, in short:

  • XQuery statement consists of two logical parts: data selection part and transformation of the selected data;
  • First part of the query is analysed and transformed to a number of simple queries against distributed cache;
  • Queries are performed on distributed cache in parallel, indexes are used whenever it is possible;
  • Selected documents are streamed back to XQuery engine for the second transformation phase;


Sounds quite similar to distributed MapReduce jobs, isn’t it?