AgeCommit message (Collapse)AuthorFilesLines
2020-11-08Use reinterpret-casting in yxml_setchar()HEADmasterYorhel2-14/+4
This avoids a memcpy() when the compiler doesn't optimize it. I initially avoided doing this out of fear for triggering UB and causing optimizing compilers to optimize the code away, but I believe this is actually safe when the types involved are (unsigned) chars.
2020-03-07Minor doc fixes + encoding considerations sectionYorhel4-3/+26
2019-11-18Add explicit (unsigned char*) cast in yxml_init()Yorhel2-2/+2
Fixes #5.
2019-03-23Convert documentation from POD to pandoc-markdownYorhel2-450/+428
2015-10-27Use yxml_ret_t internally wherever that makes senseYorhel3-62/+59
2015-10-27Fix parsing of PIs that start with xmlYorhel12-47/+111
Bug #2:
There is no such thing as YXML_ELEMCLOSE. It's START and END. (Or OPEN and CLOSE, but thats not the terminology I chose to use, apparently).
2014-02-26Copyright year bumpYorhel8-8/+8
2014-02-26define 'inline' for MSVC compilersYorhel1-0/+4
2014-02-26Add yxml_symlen() utility functionYorhel3-1/+32
2014-02-26Initialize x->attr in yxml_init()Yorhel2-2/+2
Shouldn't really matter to any application that respected the documentation, but it's better than leaving it uninitialized.
2014-02-26Add extern "C" stuff for C++Yorhel1-0/+9
2014-01-10Smaller and faster entity comparisonYorhel2-20/+18
This reduces the size by a bit over 100 bytes and improves performance a tiny bit.
2013-12-05Change buffer arg of yxml_parse() to void pointerYorhel4-7/+7
2013-12-05yxml.h: Add double-inclusion guardsYorhel1-0/+4
2013-11-14Some more documentationYorhel2-13/+203
2013-11-13Made a start on some documentationYorhel1-0/+249
2013-11-12Remove function argument names in yxml.hYorhel1-3/+3
2013-11-12Remove unused YXML_EATTR enum valueYorhel1-5/+4
2013-10-14test: Use correct printf formatting for debug outputYorhel1-1/+2
2013-10-14Don't use int casting hack when comparion entity referencesYorhel2-14/+10
Now using plain old byte-for-byte comparisons. Performance-wise the difference is acceptable, but it does add more than 100 bytes to the library. :-(
2013-09-26API: Split YXML_DATA for content/pi/attr + remove start-of-content tokenYorhel29-121/+119
The start-of-content token (previous use of YXML_CONTENT) had to be returned together with YXML_ELEMSTART in some cases, which resulted in the ugly bitmask hack. The token is not strictly necessary for parsing, and I don't think that it was even that valuable, so its been removed. The YXML_DATA token has been split up into YXML_(CONTENT|PICONTENT|ATTRVAL) to give the application more context as to what kind of data it is receiving. This has the added benefit that the application doesn't need to keep track of whether it is in a context that it doesn't care about (e.g. in a PI) in order to handle data. If the application is interested in element content and not in PIs, then it can now simply ignore the YXML_PI* tokens.
2013-09-26Add test for internationalized element namesYorhel2-0/+7
2013-09-25Allow non-ASCII characters in attribute and element namesYorhel2-2/+2
Similar reasoning as for allowing non-ASCII characters in data: We can't validate them anyway because yxml operates on bytes and is unaware of the encoding. This does allow a wide range of characters as element/attribute names that aren't formally allowed, but the most common use of those names in applications is simply to check whether a particular element/attribute name matches one that it knows, and unknown names are generally ignored. Without this change, it is impossible to parse "international" XML documents with yxml. It is possible now, but applications do need to do further validation on their own if they want to be conforming.
2013-09-24Add support for non-ASCII character references and encode them as UTF-8Yorhel5-11/+57
2013-09-24Fix returning of ']' chars within CDATA + de-generalized ?-in-PIYorhel7-42/+55
I thought I'd handle the ?-in-PI and ]-in-CDATA problems in a more general solution, but realized that wasn't any simpler or smaller than these specific solutions.
2013-09-24Fix returning of extra '?' in processing instruction dataYorhel6-10/+45
2013-09-24Consistent naming: s/setdata/dataset/ and s/setattrval/attrvalset/Yorhel3-24/+24
2013-09-24API: Change 'data' field from a single char to a stringYorhel4-21/+32
Allowing multiple bytes to be returned in a single YXML_DATA token. This is (unfortunately) necessary for a few special cases: - &#N; for N > 127, - <? ? ?> - <![CDATA[ ] ]]> I'll fix those separately in the next commits. This data string is now also re-used as a temporary buffer for entity/char references, removing the private 'ref' field.
2013-09-23Fixed one bug with CDATA and found another + added attr/content testsYorhel16-1/+84
2013-09-23Fix minor bug in comment parsing + add some tests for commentsYorhel11-18/+23
2013-09-23Pass PIs to the application + verify that PI name isn't /xml/iYorhel19-91/+120
2013-09-22Add test suite and a bunch of testsYorhel50-2/+257
The tests aren't complete yet, there's still a lot of cases that aren't covered.
2013-09-22Fix parsing of XML declaration without encoding but with standaloneYorhel2-2/+4
2013-09-22Fix segfault when a PI is given before an XML declarationYorhel2-1/+2
2013-09-22Define YXML_ELEMSTCONTYorhel1-1/+1
This avoids complaints from gcc when using a yxml_ret_t variable in a switch statement.
2013-09-22Improve parsing of doctype declarationsYorhel2-6/+73
This code should handle all declarations that don't use a conditional section anywhere.
2013-09-22Fix incorrect reporting of ELEMSTART as ELEMENDYorhel2-2/+2
2013-09-21bench: Use yxml_eof()Yorhel1-1/+3
2013-09-21Add 'misc3' state to handle 'Misc' data after the root element closedYorhel4-24/+57
Previously the misc2 state was entered after the root element has closed, which would still allow for character content and new open tags to be parsed. The latter was already detected with the 'afterelem' trick added in 6bc21882f (now removed again), but that commit did not disallow character content. This also removes the YXML_EMULROOT error code, such errors are now reported as YXML_ESYN. This commit adds ~600 bytes and improves performance for one benchmark and worsens performance for the other. Neither difference is very significant, however.
2013-09-21Add yxml_eof() function to detect unexpected EOF errorsYorhel3-0/+24
2013-09-21Remove 'retmisc' hack + implement state remembering in state machineYorhel5-45/+45
This is a more generic solution, and should ease the implementation of "proper" <!DOCTYPE ..> parsing. (Should I decide to go on with that) This change does not affect performance, and only increases the size with 50 bytes or so.
2013-09-21Renumber YXML_ constants to be more suitable for use in a switchYorhel1-7/+9
2013-09-21Remove YXML_EOD and add YXML_EMULROOT to signal errorYorhel4-13/+16
2013-09-21bench: Upgrade to latest muslYorhel1-2/+2
2013-09-05Rename some YXML_ tokens for consistency + add token for end-of-attrYorhel4-52/+55
Previously, reading the value of an attribute required reading all DATA tokens until the next YXML_ATTR or the next YXML_EOA. The new YXML_ATTRSTART and YXML_ATTREND are more clear and provides a more tighter bound as to where you can stop waiting for more DATA tokens for the attribute value. The YXML_EOA token has been renamed to YXML_CONTENT, because that's what it actually signifies: The start of the contents of the element. I've also documented the token ranges in which the elem, data and attr fields in yxml_t remain valid.
2013-09-05Remove unused attrlen field + remove YXML_MAX_REF defineYorhel2-7/+4
2013-09-04Use the stack buffer for attribute namesYorhel4-40/+63
This makes the yxml_t struct a bit smaller and removes the hardcoded limit on the length of an attribute name. Reduces the compiled size by 200 bytes for some reason I don't understand (better code re-use?). Doesn't seem to affect the performance.
2013-09-04Normalize white space in attribute value dataYorhel3-2/+14
2013-09-04Normalize end-of-line sequences to a single '\n'Yorhel3-10/+29
Decreases performance a bit and increases the size a bit. But, well, correctness is important. :-(