Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
Not that it does anything yet.
|
|
|
|
I've only done a few basic tests, but no doubt there'll be some bugs
left. Changing HASH_READ_BUF_SIZE will no doubt find one.
I've not actually measured the performance yet. The code *should* be
very fast, but I can't make any performance claims at this point yet.
|
|
I had already planned to use fn->fl == NULL to indicate cancelled state,
which kinda make sense because in most cases that pointer would be
dangling after cancellation. The only situation where this isn't the
case is when hashing is "cancelled" after a hash job failed with an
error, but it's handled pretty much the same.
|
|
|
|
|
|
|
|
|
|
I feel like a noob for making such an elementary mistake...
|
|
These will be needed by share/hash.c. Totally untested code for now, as
usual.
|
|
|
|
|
|
This is a partial revert of 38d80827ba30059a9a513c45793f853ed59f067f. I
also added the respective hp* macros, although I won't be using these
for now.
I realized why I had implemented the single-pointer list anchor case; It
will halve the memory usage of the hash index. With the hlist_ macros,
that index is simply a hash set of share_fl_t pointers. Without, it'll
be a hash set of struct { share_fl_t *head, *tail; }, i.e. twice as
large. The tail pointer really isn't necessary.
|
|
Very much unfinished, early stages, etc. Doesn't do anything useful yet,
other than queueing up files to hash and calculating some fun scheduling
information.
|
|
Will be needed for the hashing code to figure out what file to hash.
(And later on likely for other purposes, too)
|
|
The new share_t structure has the same or a slightly longer lifetime
than the share_conf_t structure, so embedding makes sense.
|
|
I'll be needing them now
|
|
I wanted the linked list abstraction to be convenient in use, yet result
in exactly the same kind of (generated) code as if each list was
hand-written. That's the reason for the hlist_*, but I'm starting to
doubt whether that optimization is really worth the effort of
maintaining two separate APIs. I'm keeping the idea for an slist_
abstraction open, however; Saving memory on pointers in list nodes
likely *is* worth the effort.
I also figured that, with a tiny change, list_insert_before() could be
used as a list_insert_after(), too, so I fixed that and renamed it to a
single powerful list_insert().
|
|
This allows directories themselves to have a size, too. I initially
preferred using bit fields, but bit fields are not specified to work on
64 bit integers. The current solution is more portable.
Note that the size field isn't updated for directories yet. I'll get to
that once files are actually being hashed. Unhashed files should be
treated as if they don't exist in the list yet, atfer all.
|
|
|
|
This is an extra level of indirection and complicates the structure of
the code a bit. However, this does allow us to schedule multiple
simultaneous actions on the file list without running into data races or
memory management issues.
|
|
|
|
|
|
|
|
|
|
This makes it suitable for more use cases.
|
|
A quick benchmark indicates that Globster scans a little more than twice
as slow as 'ncdu -q0o- >/dev/null'. I haven't looked at where the
overhead goes to exactly (I do have several ideas), but I suppose it's
fast enough for now. Scanning 30k files/dirs in 0.2s isn't so bad,
after all; More than fast enough to have disk I/O as the bottleneck in
the majority of cases.
|
|
|
|
Another bug related to the special handling of the root path. Sigh.
|
|
The special handling required for the root directory '/' compared to any
subdirectory '/blah' is a minefield...
|
|
The scanning code is now mostly functional. There's still some details
missing regarding progress/error reporting, handling of filename
encodings and directories that don't exist and some
configuration-dependent features such as symlink following and excludes
patterns.
|
|
Despite its simplicity, I keep finding it hard to get such string
comparison functions right. This function makes it significantly harder
to make any mistakes in path matching, and also keeps the intent of the
code clear.
|
|
Note that due to the way this is implemented, virtual mount paths will
*always* exist, even if there exists a file with the same name in a
higher-precedence path.
I'm not really sure what behaviour is the most intuitive in that
situation. In terms of "correctness", one would expect a
lower-precedence path to be ignored if its mount path clashes with a
higher-precedence file. On the other hand, what point is there in
configuring a path if it's going to be ignored?
(Again, code in this commit is totally untested. What the hell am I
doing, really?)
|
|
Totally untested, as usual.
|
|
|
|
|
|
|
|
The 'trace' log level is for very verbose debugging information that you
really don't want to have enabled for anything except the specific
sub-component that you happen to be debugging.
|
|
The file isn't necessary anymore and will be recreated when a new
globster instance is started.
|
|
|
|
|
|
Now that a UTF-8 processing library is being used, let's make actually
make use of it.
|
|
|
|
Still untested, and the code is still never called.
|
|
Totally untested code. It is never called, either.
|
|
|
|
Startup time more than doubles when the tth leaves are stored in the
sharefiles table, as was my previous idea. I don't expect storing the
leaves in a separate table in the same file is going to help that much,
since the data is not going to be stored in a sequential chunk in the
file. Luckily, SQLite's ATTACH functionality makes this splitting quite
easy.
The downside of splitting the hash data into a separate table is that
garbage collection (that is, removing unused hash data) will require
some extra memory to keep a hash table of all used TTH hashes in memory.
For those people who have been experimenting with this branch; Please
either delete your globster.sqlite3 to have the tth_leaves column
removed, the db.c update code won't do it. (Of course, stable globster
releases will have a better updating process)
|
|
Improves performance a bit on single-core systems. Not as significant as
I had hoped.
|