LargeInstanceProcessing

From XBRLWiki

Revision as of 14:47, 18 March 2014; Eric.jarry (Talk | contribs)
(diff) ←Older revision | Current revision | Newer revision→ (diff)
Jump to: navigation, search

In progress...

Contents

Introduction

Several families of taxonomies have led to potentially large instances (e.g. more than a few tens of kilobytes, up to several gigabytes).

The taxonomy currently known as having this characteristic are:

Note: the European taxonomies are intended to be used by all countries of the European Union, and more.

The size of these instances are typically due to lists of details for things like loans, financial products or assets.

Some tests have been made and led to difficulties.

The subject is tackled by the XBRL International, in the Standards Board and Best Practices Board and the topic has been discussed in the XBRL International conferences, during the 24th XBRL Conference in Yokohama (December 2012):

A Working Groupe Note has been published by XBRL International, proposing mainly to adopt a streaming solution and proposing adequate structure of XBRL instance.

This Wiki is a forum where this topic can be freely discussed.

Types of difficulties

Several difficulties may happen at different stages when processing instances, when:

  • loading the taxonomy
  • generating the instance
  • signing the instance
  • transmitting the instance
  • parsing the instance
  • validating the instance
  • checking business rules
  • reporting errors
  • rendering the instance

Loading the taxonomy

In some case big instances correspond to big taxonomies.

When a Data Point Model appears in instances (case of highly dimensional taxonomies), instances are bigger than for moderately dimensional taxonomies, where some dimensional aspects are hidden. This large set of dimensional elements leads to big taxonomy.

Sometimes, it is necessary to chop a taxonomy in several entry points to avoid too big DTS, this is the case of the CORE taxonomy which had to be chopped in four parts.

In the case of multi-lingual taxonomies, like the European ones, existence of labels in several languages also inflate the size of the taxonomy. Care must be taken to include only used labels in a given country (there are 24 languages in the European Union, plus Norwegian and Icelandic).

Generating the instance

The FRIS document put constraints on the ordering of units and contexts that should appear before facts, but this rule must be relaxed because it hinders the streaming of the instances.

This aspect is covered by the Working Group Note.

Signing the instance

Typically, supervisors request the signing of the transmitted instances to fulfil integrity and non-repudiation.

Sometimes, it is also necessary to crypt the instance to fulfil confidentiality.

Security tool may have limitation and adequate tools must be used.

It could also be possible to sign or encrypt a compressed file but this would mean to have a canonical compression algorithm.

More work needed

Transmitting the instance

Sending a multi-gigabyte document may cause difficulty but should be possible (technolgies exist to exchange video files of several gigabytes).

It is possible to transmit a compressed file that should be much smaller, due to the large compression factor of XML / XBRL files.

Parsing the instance

This aspect is covered by the Working Group Note.

Validating the instance

In this section, validation mean enforcement of the rules defined in XBRL 2.1 and XBRL Dimensions 1.0.

For the memory aspect, such a validation may be done fact by fact, with no need to keep the information in memory.

For dimensional validation, the context (or a representation of it) must be accessed, it is thus necessary to keep contexts-related information available.

Checking business rules

Business checks are typically exercised through assertions (defined by the XBRL Formula specifications).

This is a difficult point for XBRL processors that spend a lot of time for this task.

Software providers may propose optimisations in the expression of formula (for example, suppressing unneeded filters, factorizing filters used several times or putting expressions in variables).

Several optimisation may be considered (to be discussed)

Disposition of facts no longer needed

To process assertions all information of the instance must be accessible, except for facts for which all assertions' evaluations have been fired. For example, a fact being alone to bind to an assertion (e.g.: A > 0) does not need to be accessible for this assertion after it has fired.

If, for each fact considered, a reference count is initiated with the number of possible assertion's evaluation concerning this fact and decremented once such evaluation is fired, it would be possible to free the memory associated with this fact.

However, freeing memory for a single fact may have some disagreement:

  • given the memory fragmentation, it may be suboptimal for languages that use a garbage collector like Java or C#;
  • computing the possible number of evaluation may be difficult, considering implicit filtering and fall-back values;
  • the memory consumption may be lower but the time taken to handle the reference count would increase the processing time.

Slicing the instance into reporting units

(to be provided)

Reporting errors

Rendering the instance