Comparing XML in RPS, eCTD

Email this to someonePrint this pageShare on LinkedIn0Tweet about this on TwitterShare on Google+0

Recently, I was providing some training to a GlobalSubmit client and one of the participants asked me about an xml document that was present in a folder along with the sample eCTD that we use for training. The document was called “porp.xml”. I explained that GlobalSubmit’s VALIDATE product can transform eCTD into RPS and when it does, it produces the XML backbone used for RPS, which is called porp.xml. This is a single xml document that replaces eCTD’s index.xml, regional xml, and study tagging files.

The next question was “Why in the world is it called porp?” I couldn’t answer that one. But recently, I attended several training classes held by the font of all RPS knowledge, GlobalSubmit’s CTO Jason Rock. I took advantage of that opportunity to gain insight into what is different about RPS xml when compared to eCTD xml. [For the record, it’s called porp.xml because it had to be different than eCTD – and porp is a combination of po, the business domain within HL7, and rp for regulated product.]

But on to bigger topics. I, along with most other remotely technical people who have dabbled in eCTD for years, am pretty comfortable with eCTD xml. I can create the xml for a sequence by hand, and I can look at a sponsor’s xml and figure out what it represents and what is wrong with it. But when I look at the xml for RPS, all bets are off. Jason walked us through the xml in our training class and here are some of the observations that I made:

  • With RPS, any pretense of representing the xml as “human readable” is over. New code systems and levels of indirection make this almost impossible.
  • The table of contents is not readily apparent. It’s created by combining a content code and keywords to determine placement within the TOC or tree. For example, the location of a drug substance specification would be determined by a code representing its document type and then several keywords representing substance name and possibly manufacturer – not by nesting the document within a TOC section.
  • Everything is referenced by ID. For example, a Context of Use (sort of the replacement for a leaf) contains references to the ID of a content file, as well as any keywords necessary, such as a code representing a specific route of administration or species.  Instead of just looking at something like a study ID, you see a code representing the study ID which you must then locate.
  • IDs themselves are more complex. With RPS, IDs must be either an OID or a GUID. An OID is formed by taking a unique numeric string (e.g. and adding additional digits in a unique fashion (e.g.,,, etc.).   A GUID is a 32 character hexadecimal character string, such as {21EC2020-3AEA-1069-A2DD-08002B30309D}. Either way, they are much less easy to identify visually that the simpler IDs used by most of today’s eCTD publishing tools.
  • Many new concepts are introduces such as “Sets” (the equivalent of version trees in a document management system), explicit ordering of elements (a concept not present in eCTD, although many people assume it is), and status (active vs. obsolete, applies to keywords as well as documents).

The bottom line is that RPS XML is only for the brave. Most of those intrepid souls creating XML by hand for eCTD will have to give up that practice, and everyone will need to ensure that they have a publishing vendor who is highly knowledgeable concerning the standard and who is able to produce very high quality software.

For those of you who would like to see what RPS xml looks like, check out Example RPS Code: BLA Multiple Sequence along with other Informational Documents (i.e. Plans, Rosters, RPS Technical Walkthrough, Implementation Guide, etc. on the RPS wiki.

Author: GS

Share This Post On

Submit a Comment

%d bloggers like this: