1/* Part of SWI-Prolog 2 3 Author: Jeffrey Rosenwald, extended by Peter Ludemann 4 E-mail: jeffrose@acm.org, peter.ludemann@gmail.com 5 WWW: http://www.swi-prolog.org 6 Copyright (c) 2010-2013, Jeffrey Rosenwald 7 All rights reserved. 8 9 Redistribution and use in source and binary forms, with or without 10 modification, are permitted provided that the following conditions 11 are met: 12 13 1. Redistributions of source code must retain the above copyright 14 notice, this list of conditions and the following disclaimer. 15 16 2. Redistributions in binary form must reproduce the above copyright 17 notice, this list of conditions and the following disclaimer in 18 the documentation and/or other materials provided with the 19 distribution. 20 21 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 24 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 25 COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 26 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 27 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 28 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 29 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 31 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 32 POSSIBILITY OF SUCH DAMAGE. 33*/ 34 35:- module(protobufs, 36 [ protobuf_message/2, % ?Template ?Codes 37 protobuf_message/3, % ?Template ?Codes ?Rest 38 protobuf_parse_from_codes/3, % +WireCodes, +MessageType, -Term 39 protobuf_serialize_to_codes/3, % +Term, +MessageType, -WireCodes 40 protobuf_field_is_map/2, % +MessageType, +FieldName 41 protobuf_map_pairs/3 % ?ProtobufTermList, ?DictTag, ?Pairs 42 43 % TODO: Restore the following to the public interface, if 44 % someone needs them. For now, the tests directly specify 45 % them using, e.g. protobufs:uint32_codes(..., ...). 46 % 47 % protobuf_segment_message/2, % ?Segments ?Codes 48 % protobuf_segment_convert/2, % +Form1 ?Form2 49 % uint32_codes/2, 50 % int32_codes/2, 51 % float32_codes/2, 52 % uint64_codes/2, 53 % int64_codes/2, 54 % float64_codes/2, 55 % int64_zigzag/2, 56 % uint32_int32/2, 57 % uint64_int64/2, 58 % uint32_codes_when/2, 59 % int32_codes_when/2, % TODO: unused 60 % float32_codes_when/2, 61 % uint64_codes_when/2, 62 % int64_codes_when/2, % TODO: unused 63 % float64_codes_when/2, 64 % int64_zigzag_when/2, 65 % uint32_int32_when/2, 66 % uint64_int64_when/2, 67 % int64_float64_when/2, 68 % int32_float32_when/2, 69 % protobuf_var_int//1, 70 % protobuf_tag_type//2 71 ]). 72 73:- use_module(library(apply_macros)). % autoload(library(apply), [maplist/3, foldl/4]). 74:- autoload(library(error), [must_be/2, domain_error/2, existence_error/2]). 75:- autoload(library(lists), [append/3]). 76:- autoload(library(utf8), [utf8_codes//1]). 77:- autoload(library(dif), [dif/2]). 78:- autoload(library(dcg/high_order), [sequence//2]). 79:- autoload(library(when), [when/2]). 80:- autoload(library(debug), [assertion/1]). % TODO: remove 81 82:- set_prolog_flag(optimise, true). % For arithmetic using is/2. 83 84/** <module> Google's Protocol Buffers ("protobufs") 85 86Protocol buffers are Google's language-neutral, platform-neutral, 87extensible mechanism for serializing structured data -- think XML, but 88smaller, faster, and simpler. You define how you want your data to be 89structured once. This takes the form of a template that describes the 90data structure. You use this template to encode and decode your data 91structure into wire-streams that may be sent-to or read-from your peers. 92The underlying wire stream is platform independent, lossless, and may be 93used to interwork with a variety of languages and systems regardless of 94word size or endianness. Techniques exist to safely extend your data 95structure without breaking deployed programs that are compiled against 96the "old" format. 97 98The idea behind Google's Protocol Buffers is that you define your 99structured messages using a domain-specific language and tool 100set. Further documentation on this is at 101[https://developers.google.com/protocol-buffers](https://developers.google.com/protocol-buffers). 102 103There are two ways you can use protobufs in Prolog: 104 * with a compiled =|.proto|= file: protobuf_parse_from_codes/3 and 105 protobuf_serialize_to_codes/3. 106 * with a lower-level interface protobuf_message/2, which allows you 107 to define your own domain-specific language for parsing and 108 serializing protobufs. 109 110The protobuf_parse_from_codes/3 and protobuf_serialize_to_codes/3 111interface translates between a "wire stream" and a Prolog term. This 112interface takes advantage of SWI-Prolog's 113[dict](</pldoc/man?section=bidicts>). 114There is a =protoc= plugin (=protoc-gen-swipl=) that generates a 115Prolog file of meta-information that captures the =|.proto|= file's 116definition in the =protobufs= module: 117 * =|proto_meta_normalize(Unnormalized, Normalized)|= 118 * =|proto_meta_package(Package, FileName, Options)|= 119 * =|proto_meta_message_type( Fqn, Package, Name)|= 120 * =|proto_meta_message_type_map_entry( Fqn)|= 121 * =|proto_meta_field_name( Fqn, FieldNumber, FieldName, FqnName)|= 122 * =|proto_meta_field_json_name( FqnName, JsonName)|= 123 * =|proto_meta_field_label( FqnName, LabelRepeatOptional) % 'LABEL_OPTIONAL', 'LABEL_REQUIRED', 'LABEL_REPEATED'|= 124 * =|proto_meta_field_type( FqnName, Type) % 'TYPE_INT32', 'TYPE_MESSAGE', etc|= 125 * =|proto_meta_field_type_name( FqnName, TypeName)|= 126 * =|proto_meta_field_default_value( FqnName, DefaultValue)|= 127 * =|proto_meta_field_option_packed( FqnName)|= 128 * =|proto_meta_enum_type( FqnName, Fqn, Name)|= 129 * =|proto_meta_enum_value( FqnName, Name, Number)|= 130 * =|proto_meta_field_oneof_index( FqnName, Index)|= 131 * =|proto_meta_oneof( FqnName, Index, Name)|= 132 133The protobuf_message/2 interface allows you to define your message 134template as a list of predefined 135Prolog terms that correspond to production rules in the Definite Clause 136Grammar (DCG) that realizes the interpreter. Each production rule has an 137equivalent rule in the protobuf grammar. The process is not unlike 138specifiying the format of a regular expression. To encode a template to 139a wire-stream, you pass a grounded template, =X=, and variable, =Y=, to 140protobuf_message/2. To decode a wire-stream, =Y=, you pass an ungrounded 141template, =X=, along with a grounded wire-stream, =Y=, to 142protobuf_message/2. The interpreter will unify the unbound variables in 143the template with values decoded from the wire-stream. 144 145For an overview and tutorial with examples, see 146[library(protobufs): Google's Protocol Buffers](#protobufs-main) 147Examples of usage may also be found by inspecting 148[[test_protobufs.pl][https://github.com/SWI-Prolog/contrib-protobufs/blob/master/test_protobufs.pl]] 149and the 150[[demo][https://github.com/SWI-Prolog/contrib-protobufs/tree/master/demo]] 151directory, or by looking at the "addressbook" example that is typically 152installed at 153/usr/lib/swi-prolog/doc/packages/examples/protobufs/interop/addressbook.pl 154 155@see https://developers.google.com/protocol-buffers 156@see https://developers.google.com/protocol-buffers/docs/encoding 157@author Jeffrey Rosenwald (JeffRose@acm.org) 158@author Peter Ludemann (peter.ludemann@gmail.org) 159@compat SWI-Prolog 160*/ 161 162:- use_foreign_library(foreign(protobufs)). 163 164%! protobuf_parse_from_codes(+WireCodes:list(int), +MessageType:atom, -Term) is semidet. 165% Process bytes (list of int) that is the serialized form of a message (designated 166% by =MessageType=), creating a Prolog term. 167% 168% =Protoc= must have been run (with the =|--swipl_out=|= option and the resulting 169% top-level _pb.pl file loaded. For more details, see the "protoc" section of the 170% overview documentation. 171% 172% Fails if the message can't be parsed or if the appropriate meta-data from =protoc= 173% hasn't been loaded. 174% 175% All fields that are omitted from the =WireCodes= are set to their 176% default values (typically the empty string or 0, depending on the 177% type; or =|[]|= for repeated groups). There is no way of testing 178% whether a value was specified in =WireCodes= or given its default 179% value (that is, there is no equivalent of the Python 180% implementation's =HasField`). Optional embedded messages and groups 181% do not have any default value -- you must check their existence by 182% using get_dict/3 or similar. If a field is part of a "oneof" set, 183% then none of the other fields is set. You can determine which field 184% had a value by using get_dict/3. 185% 186% @tbd document the generated terms (see library(http/json) and json_read_dict/3) 187% @tbd add options such as =true= and =value_string_as= (similar to json_read_dict/3) 188% @tbd add option for form of the [dict](</pldoc/man?section=bidicts>) tags (fully qualified or not) 189% @tbd add option for outputting fields in the C++/Python/Java order 190% (by field number rather than by field name). 191% 192% @bug Ignores =|.proto|= [extensions](https://developers.google.com/protocol-buffers/docs/proto#extensions). 193% @bug =map= fields don't get special treatment (but see protobuf_map_pairs/3). 194% @bug Generates fields in a different order from the C++, Python, 195% Java implementations, which use the field number to determine 196% field order whereas currently this implementation uses field 197% name. (This isn't stricly speaking a bug, because it's allowed 198% by the specification; but it might cause some surprise.) 199% 200% @param WireCodes Wire format of the message from e.g., read_stream_to_codes/2. 201% (The stream should have options `encoding(octet)` and `type(binary)`, 202% either as options to read_file_to_codes/3 or by calling set_stream/2 203% on the stream to read_stream_to_codes/2.) 204% @param MessageType Fully qualified message name (from the =|.proto|= file's =package= and =message=). 205% For example, if the =package= is =google.protobuf= and the 206% message is =FileDescriptorSet=, then you would use 207% =|'.google.protobuf.FileDescriptorSet'|= or =|'google.protobuf.FileDescriptorSet'|=. 208% If there's no package name, use e.g.: =|'MyMessage|= or =|'.MyMessage'|=. 209% You can see the packages by looking at 210% =|protobufs:proto_meta_package(Pkg,File,_)|= 211% and the message names and fields by 212% =|protobufs:proto_meta_field_name('.google.protobuf.FileDescriptorSet', 213% FieldNumber, FieldName, FqnName)|= (the initial '.' is not optional for these facts, 214% only for the top-level name given to protobuf_serialize_to_codes/3). 215% @param Term The generated term, as nested [dict](</pldoc/man?section=bidicts>)s. 216% @see [library(protobufs): Google's Protocol Buffers](#protobufs-serialize-to-codes) 217% @error version_error(Module-Version) you need to recompile the =Module= 218% with a newer version of =|protoc|=. 219protobuf_parse_from_codes(WireCodes, MessageType0, Term) :- 220 verify_version, 221 must_be(ground, MessageType0), 222 ( proto_meta_normalize(MessageType0, MessageType) 223 -> true 224 ; existence_error(protobuf_package, MessageType0) 225 ), 226 protobuf_segment_message(Segments, WireCodes), 227 % protobuf_segment_message/2 can leave choicepoints, backtracking 228 % through all the possibilities would have combinatoric explosion; 229 % instead use segment_to_term/3 call protobuf_segment_convert/2 to 230 % change segments that were guessed incorrectly. 231 !, 232 maplist(segment_to_term(MessageType), Segments, MsgFields), 233 !, % TODO: remove 234 combine_fields(MsgFields, MessageType{}, Term), 235 !. % TODO: remove? - but proto_meta might have left choicepoints if loaded twice 236 237verify_version :- 238 ( protoc_gen_swipl_version(Module, Version), 239 Version @< [0,9,1] % This must be sync-ed with changes to protoc-gen-swipl 240 -> throw(error(version_error(Module-Version), _)) 241 ; true 242 ). 243 244%! protobuf_serialize_to_codes(+Term:dict, -MessageType:atom, -WireCodes:list(int)) is det. 245% Process a Prolog term into bytes (list of int) that is the serialized form of a 246% message (designated by =MessageType=). 247% 248% =Protoc= must have been run (with the =|--swipl_out=|= option and the resulting 249% top-level _pb.pl file loaded. For more details, see the "protoc" section of the 250% overview documentation. 251% 252% Fails if the term isn't of an appropriate form or if the appropriate 253% meta-data from =protoc= hasn't been loaded, or if a field name is incorrect 254% (and therefore nothing in the meta-data matches it). 255% 256% @bug =map= fields don't get special treatment (but see protobuf_map_pairs/3). 257% @bug =oneof= is not checked for validity. 258% 259% @param Term The Prolog form of the data, as nested [dict](</pldoc/man?section=bidicts>)s. 260% @param MessageType Fully qualified message name (from the =|.proto|= file's =package= and =message=). 261% For example, if the =package= is =google.protobuf= and the 262% message is =FileDescriptorSet=, then you would use 263% =|'.google.protobuf.FileDescriptorSet'|= or =|'google.protobuf.FileDescriptorSet'|=. 264% If there's no package name, use e.g.: =|'MyMessage|= or =|'.MyMessage'|=. 265% You can see the packages by looking at 266% =|protobufs:proto_meta_package(Pkg,File,_)|= 267% and the message names and fields by 268% =|protobufs:proto_meta_field_name('.google.protobuf.FileDescriptorSet', 269% FieldNumber, FieldName, FqnName)|= (the initial '.' is not optional for these facts, 270% only for the top-level name given to protobuf_serialize_to_codes/3). 271% @param WireCodes Wire format of the message, which can be output using 272% =|format('~s', [WireCodes])|=. 273% @see [library(protobufs): Google's Protocol Buffers](#protobufs-serialize-to-codes) 274% @error version_error(Module-Version) you need to recompile the =Module= 275% with a newer version of =|protoc|=. 276% @error existence_error if a field can't be found in the meta-data 277protobuf_serialize_to_codes(Term, MessageType0, WireCodes) :- 278 verify_version, 279 must_be(ground, MessageType0), 280 ( proto_meta_normalize(MessageType0, MessageType) 281 -> true 282 ; existence_error(protobuf_package, MessageType0) 283 ), 284 term_to_segments(Term, MessageType, Segments), 285 !, % TODO: remove 286 protobuf_segment_message(Segments, WireCodes), 287 !. % TODO: remove? - but proto_meta might have left choicepoints if loaded twice 288 289% 290% Map wire type (atom) to its encoding (an int) 291% 292wire_type(varint, 0). % for int32, int64, uint32, uint64, sint32, sint64, bool, enum 293wire_type(fixed64, 1). % for fixed64, sfixed64, double 294wire_type(length_delimited, 2). % for string, bytes, embedded messages, packed repeated fields 295wire_type(start_group, 3). % for groups (deprecated) 296wire_type(end_group, 4). % for groups (deprecated) 297wire_type(fixed32, 5). % for fixed32, sfixed32, float 298 299% 300% basic wire-type processing handled by C-support code in DCG-form 301% 302 303fixed_uint32(X, [A0, A1, A2, A3 | Rest], Rest) :- 304 uint32_codes_when(X, [A0, A1, A2, A3]). 305/* equivalent to: 306fixed_uint32_(X) --> 307 [ A0,A1,A2,A3 ], 308 { uint32_codes_when(X, [A0,A1,A2,A3]) }. 309*/ 310 311fixed_uint64(X, [A0, A1, A2, A3, A4, A5, A6, A7 | Rest], Rest) :- 312 uint64_codes_when(X, [A0, A1, A2, A3, A4, A5, A6, A7]). 313 314fixed_float64(X, [A0, A1, A2, A3, A4, A5, A6, A7 | Rest], Rest) :- 315 float64_codes_when(X, [A0, A1, A2, A3, A4, A5, A6, A7]). 316 317fixed_float32(X, [A0, A1, A2, A3 | Rest], Rest) :- 318 float32_codes_when(X, [A0, A1, A2, A3]). 319 320% 321% Start of the DCG 322% 323 324code_string(N, Codes, Rest, Rest1) :- 325 length(Codes, N), 326 append(Codes, Rest1, Rest), 327 !. 328/* 329code_string(N, Codes) --> 330 { length(Codes, N) }, 331 Codes, !. 332*/ 333 334% 335% deal with Google's method of packing unsigned integers in variable 336% length, modulo 128 strings. 337% 338% protobuf_var_int//1 and protobuf_tag_type//2 productions were rewritten in straight 339% Prolog for speed's sake. 340% 341 342%! protobuf_var_int(?A:int)// is det. 343% Conversion between an int A and a list of codes, using the 344% "varint" encoding. 345% The behvior is undefined if =A= is negative. 346% This is a low-level predicate; normally, you should use 347% template_message/2 and the appropriate template term. 348% e.g. phrase(protobuf_var_int(300), S) => S = [172,2] 349% phrase(protobuf_var_int(A), [172,2]) -> A = 300 350protobuf_var_int(A, [A | Rest], Rest) :- 351 A < 128, 352 !. 353protobuf_var_int(X, [A | Rest], Rest1) :- 354 nonvar(X), 355 X1 is X >> 7, 356 A is 128 + (X /\ 0x7f), 357 protobuf_var_int(X1, Rest, Rest1), 358 !. 359protobuf_var_int(X, [A | Rest], Rest1) :- 360 protobuf_var_int(X1, Rest, Rest1), 361 X is (X1 << 7) + A - 128, 362 !. 363 364%! protobuf_tag_type(?Tag:int, ?WireType:atom)// is det. 365% Conversion between Tag (number) + WireType and wirestream codes. 366% This is a low-level predicate; normally, you should use 367% template_message/2 and the appropriate template term. 368% @arg Tag The item's tag (field number) 369% @arg WireType The item's wire type (see prolog_type//2 for how to 370% convert this to a Prolog type) 371protobuf_tag_type(Tag, WireType, Rest, Rest1) :- 372 nonvar(Tag), nonvar(WireType), 373 wire_type(WireType, WireTypeEncoding), 374 A is Tag << 3 \/ WireTypeEncoding, 375 protobuf_var_int(A, Rest, Rest1), 376 !. 377protobuf_tag_type(Tag, WireType, Rest, Rest1) :- 378 protobuf_var_int(A, Rest, Rest1), 379 WireTypeEncoding is A /\ 0x07, 380 wire_type(WireType, WireTypeEncoding), 381 Tag is A >> 3. 382 383%! prolog_type(?Tag:int, ?PrologType:atom)// is semidet. 384% Match Tag (field number) + PrologType. 385% When Type is a variable, backtracks through all the possibilities 386% for a given wire encoding. 387% Note that 'repeated' isn't here because it's handled by single_message//3. 388% See also segment_type_tag/3. 389prolog_type(Tag, double) --> protobuf_tag_type(Tag, fixed64). 390prolog_type(Tag, integer64) --> protobuf_tag_type(Tag, fixed64). 391prolog_type(Tag, unsigned64) --> protobuf_tag_type(Tag, fixed64). 392prolog_type(Tag, float) --> protobuf_tag_type(Tag, fixed32). 393prolog_type(Tag, integer32) --> protobuf_tag_type(Tag, fixed32). 394prolog_type(Tag, unsigned32) --> protobuf_tag_type(Tag, fixed32). 395prolog_type(Tag, integer) --> protobuf_tag_type(Tag, varint). 396prolog_type(Tag, unsigned) --> protobuf_tag_type(Tag, varint). 397prolog_type(Tag, signed32) --> protobuf_tag_type(Tag, varint). 398prolog_type(Tag, signed64) --> protobuf_tag_type(Tag, varint). 399prolog_type(Tag, boolean) --> protobuf_tag_type(Tag, varint). 400prolog_type(Tag, enum) --> protobuf_tag_type(Tag, varint). 401prolog_type(Tag, atom) --> protobuf_tag_type(Tag, length_delimited). 402prolog_type(Tag, codes) --> protobuf_tag_type(Tag, length_delimited). 403prolog_type(Tag, utf8_codes) --> protobuf_tag_type(Tag, length_delimited). 404prolog_type(Tag, string) --> protobuf_tag_type(Tag, length_delimited). 405prolog_type(Tag, embedded) --> protobuf_tag_type(Tag, length_delimited). 406prolog_type(Tag, packed) --> protobuf_tag_type(Tag, length_delimited). 407 408% 409% The protobuf-2.1.0 grammar allows negative values in enums. 410% But they are encoded as unsigned in the golden message. 411% As such, they use the maximum length of a varint, so it is 412% recommended that they be non-negative. However, that's controlled 413% by the =|.proto|= file. 414% 415:- meta_predicate enumeration( , , ). 416 417enumeration(Type) --> 418 { call(Type, Value) }, 419 payload(signed64, Value). 420 421%! payload(?PrologType, ?Payload) is det. 422% Process the codes into =Payload=, according to =PrologType= 423% TODO: payload//2 "mode" is sometimes module-sensitive, sometimes not. 424% payload(enum, A)// has A as a callable 425% all other uses of payload//2, the 2nd arg is not callable. 426% - This confuses check/0; it also makes defining an enumeration 427% more difficult because it has to be defined in module protobufs 428% (see vector_demo.pl, which defines protobufs:commands/2) 429payload(enum, Payload) --> 430 enumeration(Payload). 431payload(double, Payload) --> 432 fixed_float64(Payload). 433payload(integer64, Payload) --> 434 { uint64_int64_when(Payload0, Payload) }, 435 fixed_uint64(Payload0). 436payload(unsigned64, Payload) --> 437 fixed_uint64(Payload). 438payload(float, Payload) --> 439 fixed_float32(Payload). 440payload(integer32, Payload) --> 441 { uint32_int32_when(Payload0, Payload) }, 442 fixed_uint32(Payload0). 443payload(unsigned32, Payload) --> 444 fixed_uint32(Payload). 445payload(integer, Payload) --> 446 { nonvar(Payload), int64_zigzag(Payload, X) }, % TODO: int64_zigzag_when/2 447 !, 448 protobuf_var_int(X). 449payload(integer, Payload) --> 450 protobuf_var_int(X), 451 { int64_zigzag(Payload, X) }. % TODO: int64_zigzag_when/2 452payload(unsigned, Payload) --> 453 protobuf_var_int(Payload), 454 { Payload >= 0 }. 455payload(signed32, Payload) --> % signed32 is not defined by prolog_type//2 456 % for wire-stream compatibility reasons. 457 % signed32 ought to write 5 bytes for negative numbers, but both 458 % the C++ and Python implementations write 10 bytes. For 459 % wire-stream compatibility, we follow C++ and Python, even though 460 % protoc decode appears to work just fine with 5 bytes -- 461 % presumably there are some issues with decoding 5 bytes and 462 % getting the sign extension correct with some 32/64-bit integer 463 % models. See CodedOutputStream::WriteVarint32SignExtended(int32 464 % value) in google/protobuf/io/coded_stream.h. 465 payload(signed64, Payload). 466payload(signed64, Payload) --> 467 % protobuf_var_int//1 cannot handle negative numbers (note that 468 % zig-zag encoding always results in a positive number), so 469 % compute the 64-bit 2s complement, which is what is produced 470 % form C++ and Python. 471 { nonvar(Payload) }, 472 !, 473 { uint64_int64(X, Payload) }, % TODO: uint64_int64_when 474 protobuf_var_int(X). 475payload(signed64, Payload) --> 476 % See comment in previous clause about negative numbers. 477 protobuf_var_int(X), 478 { uint64_int64(X, Payload) }. % TODO: uint64_int64_when 479payload(codes, Payload) --> 480 { nonvar(Payload), 481 !, 482 length(Payload, Len) 483 }, 484 protobuf_var_int(Len), 485 code_string(Len, Payload). 486payload(codes, Payload) --> 487 protobuf_var_int(Len), 488 code_string(Len, Payload). 489payload(utf8_codes, Payload) --> 490 { nonvar(Payload), % TODO: use freeze/2 or when/2 491 !, 492 phrase(utf8_codes(Payload), B) 493 }, 494 payload(codes, B). 495payload(utf8_codes, Payload) --> 496 payload(codes, B), 497 { phrase(utf8_codes(Payload), B) }. 498payload(atom, Payload) --> 499 { nonvar(Payload), 500 atom_codes(Payload, Codes) 501 }, 502 payload(utf8_codes, Codes), 503 !. 504payload(atom, Payload) --> 505 payload(utf8_codes, Codes), 506 { atom_codes(Payload, Codes) }. 507payload(boolean, true) --> 508 payload(unsigned, 1). 509payload(boolean, false) --> 510 payload(unsigned, 0). 511payload(string, Payload) --> 512 { nonvar(Payload) 513 -> string_codes(Payload, Codes) 514 ; true 515 }, 516 % string_codes produces a list of unicode, not bytes 517 payload(utf8_codes, Codes), 518 { string_codes(Payload, Codes) }. 519payload(embedded, protobuf(PayloadSeq)) --> 520 { ground(PayloadSeq), 521 phrase(protobuf(PayloadSeq), Codes) 522 }, 523 payload(codes, Codes), 524 !. 525payload(embedded, protobuf(PayloadSeq)) --> 526 payload(codes, Codes), 527 { phrase(protobuf(PayloadSeq), Codes) }. 528payload(packed, TypedPayloadSeq) --> 529 { TypedPayloadSeq =.. [PrologType, PayloadSeq], % TypedPayloadSeq = PrologType(PayloadSeq) 530 ground(PayloadSeq), 531 phrase(packed_payload(PrologType, PayloadSeq), Codes) 532 }, 533 payload(codes, Codes), 534 !. 535payload(packed, enum(EnumSeq)) --> 536 !, 537 % TODO: combine with next clause 538 % TODO: replace =.. with a predicate that gives all the possibilities - see detag/6. 539 { EnumSeq =.. [ Enum, Values ] }, % EnumSeq = Enum(Values) 540 payload(codes, Codes), 541 { phrase(packed_enum(Enum, Values), Codes) }. 542payload(packed, TypedPayloadSeq) --> 543 payload(codes, Codes), 544 % TODO: replace =.. with a predicate that gives all the possibilities - see detag/6. 545 { TypedPayloadSeq =.. [PrologType, PayloadSeq] }, % TypedPayloadSeq = PrologType(PayloadSeq) 546 { phrase(packed_payload(PrologType, PayloadSeq), Codes) }. 547 548packed_payload(enum, EnumSeq) --> 549 { ground(EnumSeq) }, !, 550 { EnumSeq =.. [EnumType, Values] }, % EnumSeq = EnumType(Values) 551 packed_enum(EnumType, Values). 552packed_payload(PrologType, PayloadSeq) --> 553 sequence_payload(PrologType, PayloadSeq). 554 555% sequence_payload//2 (because sequence//2 isn't compile-time expanded) 556sequence_payload(PrologType, PayloadSeq) --> 557 sequence_payload_(PayloadSeq, PrologType). 558 559sequence_payload_([], _PrologType) --> [ ]. 560sequence_payload_([Payload|PayloadSeq], PrologType) --> 561 payload(PrologType, Payload), 562 sequence_payload_(PayloadSeq, PrologType). 563 564packed_enum(Enum, [ A | As ]) --> 565 % TODO: replace =.. with a predicate that gives all the possibilities - see detag/6. 566 { E =.. [Enum, A] }, 567 payload(enum, E), 568 packed_enum(Enum, As). 569packed_enum(_, []) --> [ ]. 570 571start_group(Tag) --> protobuf_tag_type(Tag, start_group). 572 573end_group(Tag) --> protobuf_tag_type(Tag, end_group). 574% 575% 576nothing([]) --> [], !. 577 578protobuf([Field | Fields]) --> 579 % TODO: don't use =.. -- move logic to single_message 580 ( { Field = repeated_embedded(Tag, protobuf(EmbeddedFields), Items) } 581 -> repeated_embedded_messages(Tag, EmbeddedFields, Items) 582 ; { Field =.. [ PrologType, Tag, Payload] }, % Field = PrologType(Tag, Payload) 583 single_message(PrologType, Tag, Payload), 584 ( protobuf(Fields) 585 ; nothing(Fields) 586 ) 587 ), 588 !. 589 590repeated_message(repeated_enum, Tag, Type, [A | B]) --> 591 % TODO: replace =.. with a predicate that gives all the possibilities - see detag/6. 592 { TypedPayload =.. [Type, A] }, % TypedPayload = Type(A) 593 single_message(enum, Tag, TypedPayload), 594 ( repeated_message(repeated_enum, Tag, Type, B) 595 ; nothing(B) 596 ). 597repeated_message(Type, Tag, [A | B]) --> 598 { Type \= repeated_enum }, 599 single_message(Type, Tag, A), 600 repeated_message(Type, Tag, B). 601repeated_message(_Type, _Tag, A) --> 602 nothing(A). 603 604repeated_embedded_messages(Tag, EmbeddedFields, [protobuf(A) | B]) --> 605 { copy_term(EmbeddedFields, A) }, 606 single_message(embedded, Tag, protobuf(A)), !, 607 repeated_embedded_messages(Tag, EmbeddedFields, B). 608repeated_embedded_messages(_Tag, _EmbeddedFields, []) --> 609 [ ]. 610 611%! single_message(+PrologType:atom, ?Tag, ?Payload)// is det. 612% Processes a single messages (e.g., one item in the list in protobuf([...]). 613% The PrologType, Tag, Payload are from Field =.. [PrologType, Tag, Payload] 614% in the caller 615single_message(repeated, Tag, enum(EnumSeq)) --> 616 !, 617 { EnumSeq =.. [EnumType, Values] }, % EnumSeq = EnumType(Values) 618 repeated_message(repeated_enum, Tag, EnumType, Values). 619single_message(repeated, Tag, Payload) --> 620 !, 621 % TODO: replace =.. with a predicate that gives all the possibilities - see detag/6. 622 { Payload =.. [PrologType, A] }, % Payload = PrologType(A) 623 { PrologType \= enum }, 624 repeated_message(PrologType, Tag, A). 625single_message(group, Tag, A) --> 626 !, 627 start_group(Tag), 628 protobuf(A), 629 end_group(Tag). 630single_message(PrologType, Tag, Payload) --> 631 { PrologType \= repeated, PrologType \= group }, 632 prolog_type(Tag, PrologType), 633 payload(PrologType, Payload). 634 635%! protobuf_message(?Template, ?WireStream) is semidet. 636%! protobuf_message(?Template, ?WireStream, ?Rest) is nondet. 637% 638% Marshals and unmarshals byte streams encoded using Google's 639% Protobuf grammars. protobuf_message/2 provides a bi-directional 640% parser that marshals a Prolog structure to WireStream, according 641% to rules specified by Template. It can also unmarshal WireStream 642% into a Prolog structure according to the same grammar. 643% protobuf_message/3 provides a difference list version. 644% 645% @bug The protobuf specification states that the wire-stream can have 646% the fields in any order and that unknown fields are to be ignored. 647% This implementation assumes that the fields are in the exact order 648% of the definition and match exactly. If you use 649% protobuf_parse_from_codes/3, you can avoid this problem.o 650% 651% @param Template is a protobuf grammar specification. On decode, 652% unbound variables in the Template are unified with their respective 653% values in the WireStream. On encode, Template must be ground. 654% 655% @param WireStream is a code list that was generated by a protobuf 656% encoder using an equivalent template. 657 658protobuf_message(protobuf(TemplateList), WireStream) :- 659 must_be(list, TemplateList), 660 phrase(protobuf(TemplateList), WireStream), 661 !. 662 663protobuf_message(protobuf(TemplateList), WireStream, Residue) :- 664 must_be(list, TemplateList), 665 phrase(protobuf(TemplateList), WireStream, Residue). 666 667%! protobuf_segment_message(+Segments:list, -WireStream:list(int)) is det. 668%! protobuf_segment_message(-Segments:list, +WireStream:list(int)) is det. 669% 670% Low level marshalling and unmarshalling of byte streams. The 671% processing is independent of the =|.proto|= description, similar to 672% the processing done by =|protoc --decode_raw|=. This means that 673% field names aren't shown, only field numbers. 674% 675% For unmarshalling, a simple heuristic is used on length-delimited 676% segments: first interpret it as a message; if that fails, try to 677% interpret as a UTF8 string; otherwise, leave it as a "blob" (if the 678% heuristic was wrong, you can convert to a string or a blob by using 679% protobuf_segment_convert/2). 32-bit and 64-bit numbers are left as 680% codes because they could be either integers or floating point (use 681% int32_codes_when/2, float32_codes_when/2, int64_codes_when/2, 682% uint32_codes_when/2, uint64_codes_when/2, float64_codes_when/2 as 683% appropriate); variable-length numbers ("varint" in the [[Protocol 684% Buffers encoding 685% documentation][https://developers.google.com/protocol-buffers/docs/encoding#varints]]), 686% might require "zigzag" conversion, int64_zigzag_when/2. 687% 688% For marshalling, use the predicates int32_codes_when/2, 689% float32_codes_when/2, int64_codes_when/2, uint32_codes_when/2, 690% uint64_codes_when/2, float64_codes_when/2, int64_zigzag_when/2 to 691% put integer and floating point values into the appropriate form. 692% 693% @bug This predicate is preliminary and may change as additional 694% functionality is added. 695% 696% @param Segments a list containing terms of the following form (=Tag= is 697% the field number; =Codes= is a list of integers): 698% * varint(Tag,Varint) - =Varint= may need int64_zigzag_when/2 699% * fixed64(Tag,Int) - =Int= signed, derived from the 8 codes 700% * fixed32(Tag,Codes) - =Int= is signed, derived from the 4 codes 701% * message(Tag,Segments) 702% * group(Tag,Segments) 703% * string(Tag,String) - =String= is a SWI-Prolog string 704% * packed(Tag,Type(Scalars)) - =Type= is one of 705% =varint=, =fixed64=, =fixed32=; =Scalars= 706% is a list of =Varint= or =Codes=, which should 707% be interpreted as described under those items. 708% Note that the protobuf specification does not 709% allow packed repeated string. 710% * length_delimited(Tag,Codes) 711% * repeated(List) - =List= of segments 712% Of these, =group= is deprecated in the protobuf documentation and 713% shouldn't appear in modern code, having been superseded by nested 714% message types. 715% 716% For deciding how to interpret a length-delimited item (when 717% =Segments= is a variable), an attempt is made to parse the item in 718% the following order (although code should not rely on this order): 719% * message 720% * string (it must be in the form of a UTF string) 721% * packed (which can backtrack through the various =Type=s) 722% * length_delimited - which always is possible. 723% 724% The interpretation of length-delimited items can sometimes guess 725% wrong; the interpretation can be undone by either backtracking or 726% by using protobuf_segment_convert/2 to convert the incorrect 727% segment to a string or a list of codes. Backtracking through all 728% the possibilities is not recommended, because of combinatoric 729% explosion (there is an example in the unit tests); instead, it is 730% suggested that you take the first result and iterate through its 731% items, calling protobuf_segment_convert/2 as needed to reinterpret 732% incorrectly guessed segments. 733% 734% @param WireStream a code list that was generated by a protobuf 735% endoder. 736% 737% @see https://developers.google.com/protocol-buffers/docs/encoding 738protobuf_segment_message(Segments, WireStream) :- 739 phrase(segment_message(Segments), WireStream). 740 741segment_message(Segments) --> 742 sequence_segment(Segments). 743 744% sequence_segment//1 (because sequence//2 isn't compile-time expanded) 745sequence_segment([]) --> [ ]. 746sequence_segment([Segment|Segments]) --> 747 segment(Segment), 748 sequence_segment(Segments). 749 750segment(Segment) --> 751 { nonvar(Segment) }, 752 !, 753 % repeated(List) can be created by field_segment_scalar_or_repeated/7 754 ( { Segment = repeated(Segments) } 755 -> sequence_segment(Segments) 756 ; { segment_type_tag(Segment, Type, Tag) }, 757 protobuf_tag_type(Tag, Type), 758 segment(Type, Tag, Segment) 759 ). 760segment(Segment) --> 761 % { var(Segment) }, 762 protobuf_tag_type(Tag, Type), 763 segment(Type, Tag, Segment). 764 765segment(varint, Tag, varint(Tag,Value)) --> 766 protobuf_var_int(Value). 767segment(fixed64, Tag, fixed64(Tag, Int64)) --> 768 payload(integer64, Int64). 769segment(fixed32, Tag, fixed32(Tag, Int32)) --> 770 payload(integer32, Int32). 771segment(start_group, Tag, group(Tag, Segments)) --> 772 segment_message(Segments), 773 protobuf_tag_type(Tag, end_group). 774segment(length_delimited, Tag, Result) --> 775 segment_length_delimited(Tag, Result). 776 777segment_length_delimited(Tag, Result) --> 778 { nonvar(Result) }, 779 !, 780 { length_delimited_segment(Result, Tag, Codes) }, 781 { length(Codes, CodesLen) }, 782 protobuf_var_int(CodesLen), 783 code_string(CodesLen, Codes). 784segment_length_delimited(Tag, Result) --> 785 % { var(Result) }, 786 protobuf_var_int(CodesLen), 787 code_string(CodesLen, Codes), 788 { length_delimited_segment(Result, Tag, Codes) }. 789 790length_delimited_segment(message(Tag,Segments), Tag, Codes) :- 791 protobuf_segment_message(Segments, Codes). 792length_delimited_segment(group(Tag,Segments), Tag, Codes) :- 793 phrase(segment_group(Tag, Segments), Codes). 794length_delimited_segment(string(Tag,String), Tag, Codes) :- 795 ( nonvar(String) 796 -> string_codes(String, StringCodes), 797 phrase(utf8_codes(StringCodes), Codes) 798 ; phrase(utf8_codes(StringCodes), Codes), 799 string_codes(String, StringCodes) 800 ). 801length_delimited_segment(packed(Tag,Payload), Tag, Codes) :- 802 % We don't know the type of the fields, so we try the 3 803 % possibilities. This has a problem: an even number of fixed32 804 % items can't be distinguished from half the number of fixed64 805 % items; but it's all we can do. The good news is that usually 806 % varint (possibly with zig-zag encoding) is more common because 807 % it's more compact (I don't know whether 32-bit or 64-bit is more 808 % common for floating point). 809 packed_option(Type, Items, Payload), 810 phrase(sequence_payload(Type, Items), Codes). 811length_delimited_segment(length_delimited(Tag,Codes), Tag, Codes). 812 813segment_group(Tag, Segments) --> 814 start_group(Tag), 815 segment_message(Segments), 816 end_group(Tag). 817 818% See also prolog_type//2. Note that this doesn't handle repeated(List), 819% which is used internally (see field_segment_scalar_or_repeated/7). 820segment_type_tag(varint(Tag,_Value), varint, Tag). 821segment_type_tag(fixed64(Tag,_Value), fixed64, Tag). 822segment_type_tag(group(Tag,_Segments), start_group, Tag). 823segment_type_tag(fixed32(Tag,_Value), fixed32, Tag). 824segment_type_tag(length_delimited(Tag,_Codes), length_delimited, Tag). 825segment_type_tag(message(Tag,_Segments), length_delimited, Tag). 826segment_type_tag(packed(Tag,_Payload), length_delimited, Tag). 827segment_type_tag(string(Tag,_String), length_delimited, Tag). 828 829%! detag(+Compound, -Name, -Tag, -Value, List, -CompoundWithList) is semidet. 830% Deconstruct =Compound= or the form =|Name(Tag,Value)|= and create a 831% new =CompoundWithList= that replaces =Value= with =List=. This is 832% used by packed_list/2 to transform =|[varint(1,0),varint(1,1)]|= to 833% =|varint(1,[0,1])|=. 834% 835% Some of =Compound= items are impossible for =packed= with the 836% current protobuf spec, but they don't do any harm. 837detag(varint(Tag,Value), varint, Tag, Value, List, varint(List)). 838detag(fixed64(Tag,Value), fixed64, Tag, Value, List, fixed64(List)). 839detag(fixed32(Tag,Value), fixed32, Tag, Value, List, fixed32(List)). 840detag(length_delimited(Tag,Codes), length_delimited, Tag, Codes, List, length_delimited(List)). 841detag(message(Tag,Segments), message, Tag, Segments, List, message(List)). 842detag(packed(Tag,Payload), packed, Tag, Payload, List, packed(List)). % TODO: delete? 843detag(string(Tag,String), string, Tag, String, List, string(List)). 844 845% See also prolog_type//2, but pick only one for each wirestream type 846% For varint(Items), use one that doesn't do zigzag 847packed_option(integer64, Items, fixed64(Items)). 848packed_option(integer32, Items, fixed32(Items)). 849packed_option(unsigned, Items, varint(Items)). 850% packed_option(integer, Items, varint(Items)). 851% packed_option(double, Items, fixed64(Items)). 852% packed_option(float, Items, fixed32(Items)). 853% packed_option(signed64, Items, varint(Items)). 854% packed_option(boolean, Items, varint(Items)). 855% packed_option(enum, Items, varint(Items)). 856 857%! protobuf_segment_convert(+Form1, ?Form2) is multi. 858% A convenience predicate for dealing with the situation where 859% protobuf_segment_message/2 interprets a segment of the wire stream 860% as a form that you don't want (e.g., as a message but it should have 861% been a UTF8 string). 862% 863% =Form1= is converted back to the original wire stream, then the 864% predicate non-deterimisticly attempts to convert the wire stream to 865% a =|string|= or =|length_delimited|= term (or both: the lattter 866% always succeeds). 867% 868% The possible conversions are: 869% message(Tag,Segments) => string(Tag,String) 870% message(Tag,Segments) => length_delimited(Tag,Codes) 871% string(Tag,String) => length_delimited(Tag,Codes) 872% length_delimited(Tag,Codes) => length_delimited(Tag,Codes) 873% 874% Note that for fixed32, fixed64, only the signed integer forms are 875% given; if you want the floating point forms, then you need to do use 876% int64_float64_when/2 and int32_float32_when/2. 877% 878% For example: 879% ~~~{.pl} 880% ?- protobuf_segment_convert( 881% message(10,[fixed64(13,7309475598860382318)]), 882% string(10,"inputType")). 883% ?- protobuf_segment_convert( 884% message(10,[fixed64(13,7309475598860382318)]), 885% length_delimited(10,[105,110,112,117,116,84,121,112,101])). 886% ?- protobuf_segment_convert( 887% string(10, "inputType"), 888% length_delimited(10,[105,110,112,117,116,84,121,112,101])). 889% ?- forall(protobuf_segment_convert(string(1999,"\x1\\x0\\x0\\x0\\x2\\x0\\x0\\x0\"),Z), writeln(Z)). 890% string(1999,