Age | Commit message (Collapse) | Author | Files | Lines |
|
This avoids a memcpy() when the compiler doesn't optimize it. I
initially avoided doing this out of fear for triggering UB and causing
optimizing compilers to optimize the code away, but I believe this is
actually safe when the types involved are (unsigned) chars.
|
|
|
|
Fixes #5.
|
|
|
|
|
|
Bug #2: http://dev.yorhel.nl/yxml/bug/2
|
|
There is no such thing as YXML_ELEMCLOSE. It's START and END. (Or OPEN
and CLOSE, but thats not the terminology I chose to use, apparently).
|
|
|
|
|
|
|
|
Shouldn't really matter to any application that respected the
documentation, but it's better than leaving it uninitialized.
|
|
|
|
This reduces the size by a bit over 100 bytes and improves performance a
tiny bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now using plain old byte-for-byte comparisons. Performance-wise the
difference is acceptable, but it does add more than 100 bytes to the
library. :-(
|
|
The start-of-content token (previous use of YXML_CONTENT) had to be
returned together with YXML_ELEMSTART in some cases, which resulted in
the ugly bitmask hack. The token is not strictly necessary for parsing,
and I don't think that it was even that valuable, so its been removed.
The YXML_DATA token has been split up into
YXML_(CONTENT|PICONTENT|ATTRVAL) to give the application more context as
to what kind of data it is receiving. This has the added benefit that
the application doesn't need to keep track of whether it is in a context
that it doesn't care about (e.g. in a PI) in order to handle data. If
the application is interested in element content and not in PIs, then it
can now simply ignore the YXML_PI* tokens.
|
|
|
|
Similar reasoning as for allowing non-ASCII characters in data: We can't
validate them anyway because yxml operates on bytes and is unaware of
the encoding. This does allow a wide range of characters as
element/attribute names that aren't formally allowed, but the most
common use of those names in applications is simply to check whether a
particular element/attribute name matches one that it knows, and unknown
names are generally ignored.
Without this change, it is impossible to parse "international" XML
documents with yxml. It is possible now, but applications do need to do
further validation on their own if they want to be conforming.
|
|
|
|
I thought I'd handle the ?-in-PI and ]-in-CDATA problems in a more
general solution, but realized that wasn't any simpler or smaller than
these specific solutions.
|
|
|
|
|
|
Allowing multiple bytes to be returned in a single YXML_DATA token. This
is (unfortunately) necessary for a few special cases:
- &#N; for N > 127,
- <? ? ?>
- <![CDATA[ ] ]]>
I'll fix those separately in the next commits. This data string is now
also re-used as a temporary buffer for entity/char references, removing
the private 'ref' field.
|
|
|
|
|
|
|
|
The tests aren't complete yet, there's still a lot of cases that aren't
covered.
|
|
|
|
|
|
This avoids complaints from gcc when using a yxml_ret_t variable in a
switch statement.
|
|
This code should handle all declarations that don't use a conditional
section anywhere.
|
|
|
|
|
|
Previously the misc2 state was entered after the root element has
closed, which would still allow for character content and new open tags
to be parsed. The latter was already detected with the 'afterelem' trick
added in 6bc21882f (now removed again), but that commit did not disallow
character content. This also removes the YXML_EMULROOT error code, such
errors are now reported as YXML_ESYN.
This commit adds ~600 bytes and improves performance for one benchmark
and worsens performance for the other. Neither difference is very
significant, however.
|
|
|
|
This is a more generic solution, and should ease the implementation of
"proper" <!DOCTYPE ..> parsing. (Should I decide to go on with that)
This change does not affect performance, and only increases the size
with 50 bytes or so.
|
|
|
|
|
|
|
|
Previously, reading the value of an attribute required reading all DATA
tokens until the next YXML_ATTR or the next YXML_EOA. The new
YXML_ATTRSTART and YXML_ATTREND are more clear and provides a more
tighter bound as to where you can stop waiting for more DATA tokens for
the attribute value.
The YXML_EOA token has been renamed to YXML_CONTENT, because that's what
it actually signifies: The start of the contents of the element.
I've also documented the token ranges in which the elem, data and attr
fields in yxml_t remain valid.
|
|
|
|
This makes the yxml_t struct a bit smaller and removes the hardcoded
limit on the length of an attribute name.
Reduces the compiled size by 200 bytes for some reason I don't
understand (better code re-use?). Doesn't seem to affect the
performance.
|
|
|
|
Decreases performance a bit and increases the size a bit. But, well,
correctness is important. :-(
|