Currently Polyglot book files do not contain any metadata. This is problematic for many reasons
Since it is anticipated that the adoption of this header format will take some time, we have opted for a header consisting entirely of printable ASCII characters (with some qualification, see below). So the header can be inspected by a text editor or using a simple dump program such as od on Linux. In particular the following command
od -w16 -a <file> | less
will display the header.
Applications that handle Polyglot books in a more global way (such as the Polyglot book merging utility) which are unaware of the header could possibly unintentionally mess it up. But the resulting file will still behave correctly from the point of view of key lookup.
A broken header may be easily deleted and recreated with the pgheader utility which is further described below.
Polyglot as of version 1.4.70b is aware of the new header and will treat it correctly.
The header data will be embedded in null records. Chess positions that correspond to null keys are actually known (they were constructed by Peter Österlund) but the probability that such a position would occur in an actual chess game is totally negligeable, and moreover the probability of a collision with another key in the book is much larger. Of course a header aware application may simply regard a null key as invalid.
If there are no null records then the book is assumed to contain no header.
If there is no null character in the header data then the book is assumed to contain no header.
The logical header is a UTF-8 encoded unicode character string (without byte order marker, see below). Note that a character string consisting of 7 bit ascii characters is a valid UTF-8 string.
As the logical header may be arbitrarily long this may present problems for applications that use fixed length buffers.
An application may refuse to parse a header which it considers too long. However it should always be able to process a logical header of at most 2048 characters (including the null character).
A certain number of fields (depending on the format version) are predefined. The predefined fields should not contain leading or trailing spaces. Numbers are written in decimal form without leading zeros.
The definition of the first three fields is independent of the format version.
<major>.<minor>
where <major> and <minor> are respectively the major and minor version number written in decimal form. They should be non-negative integers and contain no leading zeros.
Variant names should be printable ascii characters and contain no spaces or upper case letters. For known variants the standard variant names from the Chess Engine Communication Protocol should be used.
Having zero variants is legal but the meaning of this is undefined in v1.0 of the format.
The non-predefined fields are free format. They should be regarded as comments and would typically include license information, author data, source files etc...
Currently the logical header is structured like a shallow tree. It is recommended to keep this tree-like format for further versions of the format according to the following Backus-Naur form
<header> := <magic>\n<version>\n<root-field>[\n<field>]* <magic> := @PG@ <version> := <number>.<number> <root-field> := <count>[\n<multi-field>]* <multi-field> := [<root-field> | <field>] <field> := <string> <count> := <number>where <string> is assumed to contain no linefeed characters and <count> is the total number of fields contained in the corresponding subtree (but not including the <count> field itself).
For clarification it should also be pointed out that the header should not contain a byte order marker (BOM) since it breaks compatibility with 7bit ascii. But this issue is actually moot since a valid BOM would be at the beginning of the header where normally the magic string is. So a header containing a BOM would be invalid.
A BOM is not necessary since by design UTF-8 has no endianness ambiguity and moreover the official specification for UTF-8 specifies a BOM as optional.
"@PG@\n1.0\n3\n2\nnormal\nsuicide\n(normally comments here)"
In version 1.1 of the format it might perhaps be
"@PG@\n1.1\n4\n2\nnormal\nsuicide\n[somenewfield]\n(normally comments here)"
$ ./pgheader -h
pgheader <options> [<file>];
Update a header, adding a default one if necessary
<file> input file
Options:
-h print this help
-l print the known variant list
-s print the header
-S print the header data
-d delete the header
-v <variants> comma separated list of supported variants
-f force inclusion of unknown variants
-c <comment> free format string, may contain newlines encoded as
two character strings "\n"
The following command adds a comment to the very widely used polyglot
book "performance.bin" by Marc Lacrosse.
$ ./pgheader performance.bin -c "performance.bin by Marc Lacrosse."We verify that the header has indeed been added.
$ ./pgheader -s performance.bin Variants supported: normal Comment: performance.bin by Marc Lacrosse.Here is the actual header data as shown by "./pgheader -S performance.bin".
@ P G @ \n 1 . 0
\n 2 \n 1 \n n o r
m a l \n p e r f
o r m a n c e .
b i n b y M
a r c L a c r
o s s e . \0 \0 \0
The actual api is contained in the source files pgheader.h
and pgheader.c. It provides the following functions
int pgheader_known_variant(const char *variant); int pgheader_detect(const char *infile); int pgheader_create(char **header, const char *variants, const char *comment); int pgheader_create_raw(char **raw_header, const char *header, unsigned int *size); int pgheader_parse(const char *header, char **variants, char **comment); int pgheader_read(char **header, const char *infile); int pgheader_read_raw(char **raw_header, const char *infile, unsigned int *size); int pgheader_write(const char *header, const char *infile, const char *outfile); int pgheader_delete(const char *infile, const char *outfile); const char * pgheader_strerror(int pgerror);For instructions about using these functions see the comments in pgheader.h.
#------------------------------------------------------------------------------ # polyglot: file(1) magic polyglot chess opening book files # # From Michel Van den BerghIt should be appended to /usr/share/file/magic and the latter file should then be recompiled with file -C .0 string \x00\x00\x00\x00\x00\x00\x00\x00@PG@\x0a Polyglot chess opening book >13 string 1.0\x00\x00\x00\x00\x00\x00\x00\x00\x0a (version 1.0) !:mime application/x-polyglot