Skip to end of metadata
Go to start of metadata

This version is a prototype and does not adhere to the current version of the CSD profile.

Use Cases

  • Prototype the CSD interface profile to ensure that it can be deployed using commodity open-source technologies
  • Easy to deploy, XQuery based, CSD Directory facade for other registries.  For example:
    • For any FRED-API compliant Facility Registry (e.g. DHIS or ResourceMap) once the issue on XML Endpoints is resolved.
  • Provide a performance testing platform of various XQuery engines (baseX,existDb,zorba, others)
  • CSD Document management
  • Document change notification (RSS feed) for subscription by CSD conusmers

CareServicesConsumerQuery

Submit a POST request to index.php with your document.   Some sample documents are found in test_csdq
You can test this with, for example:

curl -X POST -d @/home/ubuntu/CSD/test_csdq/stats_named.xq.xml \
 http://csd.ohie.org/CSD/index.php

Searches on CSD

Named/Canned Xquery searches are made available under queries/search via the web dav server.  If there is a file queries/search/blah/blah.xq, you can access it as http://csd.ohie.org/CSD/index.php/search/blah/blah.xq.  The dynamic context of the search is the published CSD document at time of the zorba server initialization.  For the time being, and GET variables are passed on to the query.  This is wide open at the moment and we need to lock it down.

Executing Queries on Source Documents

Named/Canned Xquery used to query against any source documents are made available under queries/exec via the web dav server.  If there is a file queries/exec/blah/blah.xq, you can access it as http://csd.ohie.org/CSD/index.php/edit/blah/blah.xq.  Each document under "docs" is available to the xquery.  For example "docs/source_data.xml" would be available as "doc('source_data.xml')"

Currently there is one script queries/exec/fred_conversion.xq to test translation of the FRED-API XML Endpoint to CSD.

Publishing

Named/Canned Xquery use to publish CSD documents are made available under queries/publish via the web dav server.  If there is a file queries/publish/blah/blah.xq, you can access it as http://csd.ohie.org/CSD/index.php/publish/blah/blah.xq.  Each document under "docs" is available to the xquery.  For example "docs/source_data.xml" would be available as "doc('source_data.xml')"  If the resulting document passes validation via the .xsd it is published to the webdav server at docs/CSD.xml

Currently there is one publishing script queries/publish/simple_merge.xq

Currently, there is no way to tell the zorba server to reload/refresh the CSD source document after publication.  You need to login to the server and restart the "zorba_server/init.php" process manually.

Updating

Named/Canned Xquery use to publish CSD documents are made available under queries/publish via the web dav server.  If there is a file queries/publish/blah/blah.xq, you can access it as http://csd.ohie.org/CSD/index.php/update/blah/blah.xq. Each document under "docs" is available to the xquery.  For example "docs/source_data.xml" would be available as "doc('source_data.xml')".  The document "CSD.xml" is set to be the default context.   If the resulting document passes validation via the .xsd it is published to the web dav server.

There are no update scripts as of yet.  The intention is that scripts here are used to update the currently published CSD.xml, for example with an update of facilities.  How this update occurs is defined by the update script.  An example workflow would be:

  • XML data exported from FRED-API compliant FR with updates since a given time.
  • This data is posted to webdav server under the docs/ directory as, for example, "fr_updates.xml"
  • There is an xquery script, for example "fr_update.xq", under "queries/update/" that would:
    • scan fr_updates.xml for any new facilities and add them to the CSD.xml, which is the default context
    • scan fr_updates.xml for any existing facilities and either delete/replace the entire <facility> node, or overwrite any values contained within it as necessary.

Document Change Notification

The subversion repository where docs are stored has an xml log format.  This log format is transformed by XSLT to an RSS feed.  You can see the current changes as index.php/rss

The XSL document is pulled (and cached) from https://github.com/nexml/nexml/blob/master/xslt/svnlog.xslt (Thanks to Rutger Vos for sharing).  You can substitute your own XSLT if you wish.

  • CSD
  • XQuery Engines
    • In browser javascript engine: XQIB
    • BaseX XML Database
    • exist-db
    • Zorba
      • Zorba and large data sets
      • IBM Article (dated)
      • Zorba source code
      • Zorba and PHP introductory slides
      • WARNING: Zorba has indices (needed to optomize joins for example) but according to their documentation, the  "Zorba query optimizer does not yet detect index-related rewrites automatically. Although, we do plan to offer automatic index-related rewrites in the near future, we also expect the probing functions to remain useful for manual rewrites because both the XQuery language and the kind of indexes that are allowed in Zorba can be much more complex than their relational counterparts."  This means that "generic" named xqueries which are independent of the xquery engine cannot be effectively used by Zorba at this point – they will need to be rewritten to take advantage of any defined indices.
    • LUX
  • When choosing an XQuery Engine for deployment, we would suggest the following Evaluation Criteria:
    • Programming Language
    • Source Code Last Updated
    • Source Documentation
    • XQuery verison
    • XQuery Update Facility support
    • Data storage
    • Index Support
    • Client libraries (PHP, Python, Java,etc.)
    •  Mailing List/User Forum last activity

Potential Issues for Public Comment

Public comments can be submitted according to this process at the following comment form

  • how to map hcIdentifier used in HPD to II ?
  • no date of birth in persons  
  • do we need Org/Facility Contact to be required?
  • orgType, orgTypeDescription, orgPrimaryName are all required but perhaps should not be.  similarly for facility.
  • minoccurs on orglanguage is not set, but is set for facilityLanguage
  • no place to add other identifiers for facility
  • should creation date be required?
  • facility can be in multiple facilityorgid?  how to represent one facility in multiple hierarchies that are not necc. mapped to services   
  • is a lastmodified time/create time for CSD doc needed?
  • CareServicesDiscoveryQuery/XQueryFunction has no ability to set parameters (e.g. nid).  Options would include:
    • for un-ordered named variables something we define such as <pre><XQueryFunction name="loopup_by_nid.xq"><value name="nid">123</value></XqueryFUnciton></pre>
    • XML-RPC for ordered unnamed variables
    • some other structured XML that knows about xquery data types (note: XQuery shares the same data types as XML Schema 1.0 , XSD).
    • XForms data model?
    • existDB restful web api which has parameters for every query such as start, max, session-id, cache=yes/no
    • CareServicesDiscoveryQuery/XQueryResult/queryResult may not be valid XML.. Should it always be a CDATA?
  • do we need/want this: CareServicesDiscoveryQuery/XQueryResult/queryResult.  Possibly only if accept header text/xml was sent by the client.  What should the accept header be if we want the raw result.  Maybe XQueryFunction and XQueryExpression could have an optional attribute to specify the "return" style e.g. raw or CSDQ format.
  • misspelling of encryptioinCertificate in XSD

  • query result error should have a code

Implementation Notes

Source code:

On the test server the repo lives under /home/ubuntu/CSD .

WebDAV + Subversion

Subversion is used to hold the source code and the source CSD documents. In deployment the php source code and CSD documents should not be in the same repository.

index.php, the PHP Glue

CareServicesConsumerQuery Using Different XQuery Engines

These POSTs are, by default, performed against a zorba instance listening over TCP/IP under:

http://csd.ohie.org/CSD/index.php

To POST to zorba linked over a Unix Domain Socket

http://csd.ohie.org/CSD/index.php/zorba_uds

To POST to exist-db, use:

http://csd.ohie.org/CSD/index.php/exist

Searching Using Different XQuery Engines

All searches are, by default, performed against a zorba instance listening over TCP/IP under:

http://csd.ohie.org/CSD/index.php/search/path/to/search_query.xq

To perform a search on zorba over a Unix Domain Socket

http://csd.ohie.org/CSD/index.php/search_zorba_uds/path/to/search_query.xq

To perform a search on exist-db

http://csd.ohie.org/CSD/index.php/search_exist/path/to/search_query.xq

Zorba Implementation Notes

zorba does not install exactly according to the instructions found online.  In particular make sure you:

  • do "sudo apt-get install swig" before running cmake
  • do "-D CMAKE_INSTALL_PREFIX=/usr/local .. " in the build directory


There is a server script to keep an instance of zorba running with the CSD loaded in memory.  There are actually two, one that binds to a TCP/IP socket on localhost which can be started:

cd ~/CSD/zorba_server
php init.php > /tmp/zorba_server.out &

and one AF_UNIX that opens a socket on /tmp/zorba_server.sock

cd ~/CSD/zorba_server
sudo -u www-data rm /tmp/zorba_server.sock
sudo -u www-data php init_uds.php > /tmp/zorba_server_uds.out &

Performance Testing

Currently there are no indices are the zorba server's data store.  

  • It can process about 90 NID look-ups per second through the web interface on localhost.
    • webdav lookups are cached in php-apc
  • Actually executing the query on zroba takes, for example, 0.0078308582305908 seconds so there is not too much overhead in the php script
    • caching compiled queries (currently set at 200 max) reduces average response time (after cache) to  0.0058491230010986
  • which amounts to about 100 NID lookups per second
    • caching result in APC (of a NID lookup) didn't seem to make much difference in response time.
  • Loading the CSD.xml document in zorba's memory store required 2.5mb.

Results on an Amazon m1.large instance:

$ httperf --hog --server localhost --uri /CSD/index.php/search/search_by_nid.xq?nid=1197580084757020         --num-conn 500
httperf --hog --client=0/1 --server=localhost --port=80 --uri=/CSD/index.php/search/search_by_nid.xq?nid=1197580084757020 --send-buffer=4096 --recv-buffer=16384 --num-conns=500 --num-calls=1
httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE
Maximum connect burst length: 1

Total: connections 500 requests 500 replies 500 test-duration 5.260 s

Connection rate: 95.1 conn/s (10.5 ms/conn, <=1 concurrent connections)
Connection time [ms]: min 5.1 avg 10.5 max 16.9 median 10.5 stddev 1.0
Connection time [ms]: connect 0.2
Connection length [replies/conn]: 1.000

Request rate: 95.1 req/s (10.5 ms/req)
Request size [B]: 120.0

Reply rate [replies/s]: min 95.0 avg 95.0 max 95.0 stddev 0.0 (1 samples)
Reply time [ms]: response 10.3 transfer 0.0
Reply size [B]: header 193.0 content 281.0 footer 0.0 (total 474.0)
Reply status: 1xx=0 2xx=500 3xx=0 4xx=0 5xx=0

CPU time [s]: user 0.98 system 3.38 (user 18.6% system 64.3% total 82.9%)
Net I/O: 55.1 KB/s (0.5*10^6 bps)

Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

  • No labels

22 Comments