HACKING


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277

This is a POD file. If you prefer reading a man page, run `perldoc ./HACKING`.
Or use `pod2text HACKING` to generate an easier to read text file. Both of
these programs should come with a default Perl installation.


=head1 DEBUGGING

=head2 General

Any debugging messages are written to C<~/.ncdc/stderr.log>. Note that this
file is emptied every time ncdc starts up, so be sure to either make backups or
check the contents of the log file before this happens!

Messages starting with C<CRITICAL> almost always indicate a bug. If you find
one, please report it! Messages starting with C<WARNING> can happen both
because something unexpected happened or because of a bug. If you see such a
message along the lines of "assertion failure" or "line should not be reached",
then this always indicates a bug.

By default, debug-level messages are not logged, enable the I<log_debug> setting
within ncdc to get these as well. Be warned, however, that depending on how you
use ncdc, this may generate a lot of data. My ncdc generates about 1 GiB per
week, for instance.


=head2 GDB

GDB can be used to debug various kinds of crashes (but not all of them),
deadlocks or to find out exactly when a C<CRITICAL> or C<WARNING> message is
being logged.  Note that you usually want ncdc to be compiled with debugging
symbols when using GDB. Compile as follows:

  ./configure CFLAGS=-g && make clean && make

How you run gdb depends on the kind of problem you are experiencing. Here are
some tips for various situations:

=over

=item ncdc crashes shortly after startup.

The easiest way is to just start ncdc within gdb directly.

  $ gdb /path/to/ncdc
  [..]
  > run

=item ncdc crashes after I perform a certain action.

Running ncdc within gdb as above is in that case not very optimal, since the
gdb messages may garble the screen and make it annoying to use ncdc. In this
case it's easier attach gdb to a running ncdc. To do so, run ncdc as you
usually would, and then run the following in a different terminal:

  $ gdb /path/to/ncdc $ncdc_pid
  [..]
  > continue

=item ncdc crashes randomly after a long period of time.

In this case it may be easier to use core dumps, as follows:

  $ ulimit -c unlimited
  $ ncdc  # run ncdc as you usually would
  [let it crash]
  $ gdb /path/to/ncdc core

=back

The above tips work nice for debugging crashes, but gdb can also be used to
debug when a C<WARNING> or C<CRITICAL> message is logged. Run ncdc with the
I<G_DEBUG> environment variable set to either I<fatal-criticals> or
I<fatal-warnings>. For example, the following will let ncdc crash as soon as a
C<WARNING> message is logged.

  $ G_DEBUG=fatal-warnings /path/to/ncdc

If you happened to get gdb to see the crash, then you can use gdb commands to
fetch more information on the crash. There are many online resources on
debugging using gdb, but here are a few important commands:

=over

=item bt

Get a backtrace.

=item info threads

List the active threads.

=item thread $number

Switch to another thread.

=back

In the off chance that you happened to have found a deadlock, these are easiest
to debug by attaching gdb to a deadlocked ncdc. Running ncdc within gdb or
using core dumps isn't very useful in these situations.


=head2 Valgrind

Valgrind can be used to debug crashes for which GDB fails. Bugs caused by
incorrect handling of memory are very easily detected with valgrind. A major
downside, however, is that ncdc takes approximately 10 times as much memory and
is 10 times slower than when run natively or in GDB. As with GDB, make sure you
have ncdc compiled with debugging symbols. To run ncdc in valgrind, use the
following command:

  $ G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --leak-check=full\
      --num-callers=25 --log-file=valgrindlog /path/to/ncdc

And any issue valgrind can find will be written to the file 'valgrindlog'. Note
that some of the reported problems, especially memory leaks, may not
necessarily come from ncdc itself, but are harmless issues within glib or other
libraries.


=head1 IDENTIFIERS

Ncdc uses all kinds of identifiers internally, let's explain some of them.

=over

=item hubid

A "hub id" is 64bit integer that uniquely identifies a hub. The number is
randomly generated and assigned upon the first run of C</open> for a hub with
that name. The use of hub IDs allow a user to change the name of a hub tab
without compromising any other IDs or related configuration.

=item CID / PID

These are defined in the ADC spec. A CID uniquely identifies a user across
multiple hubs, but only works on the ADC protocol. Ncdc doesn't try to assign a
CID to users on NMDC hubs.

=item uid

A user id, internal to ncdc. This uniquely identifies a single user on a single
hub. It's a 64bit unsigned integer, generated by taking the first 8 bytes of a
tiger hash. The data that is hashed depends on the protocol, for ADC it's
C<tiger(hubid | cid)>, where C<cid> is the base32-encoded CID of the user. For
NMDC, it's C<tiger(hubid | name)>, where C<name> is the name of the user as
sent by the hub (thus in the encoding that the hub uses). In both cases, the
byte representation of <hubid> is hashed, so these IDs are dependent on the
byte order of the CPU architecture.

This ID is used everywhere where a user should be identified, and is also
stored on disk in the database file (in the download queue) and are used as the
filename for user file lists in the C<fl/> directory. They are usually
represented in ASCII as 16-character HEX values.

=back


=head1 SQLITE SCHEMA

This is the SQL schema used to store stuff in the db.sqlite3 file.  C<PRAGMA
user_version> is set to 1. Note that this schema does not include foreign key
clauses or other checks, in order to improve portability with older SQLite
versions.

=head2 Config & variables

  CREATE TABLE vars (
    name TEXT NOT NULL,
    hub INTEGER NOT NULL DEFAULT 0,
    value TEXT NOT NULL,
    PRIMARY KEY(name, hub)
  );

Stores key-value pairs for configuration data and various other variables that
need to be kept around for more than a single run. For global variables, C<hub>
is 0. For hub-local variables, C<hub> is a random 64-bit integer. It is treated
as unsigned in the code, but stored signed in the database. For every existing
value of C<hub>, there should be a I<hubname> key indicating the name that
belongs to the hub tab, including the preceding C<#> character.

  CREATE TABLE share (
    name TEXT NOT NULL PRIMARY KEY,
    path TEXT NOT NULL
  );

Stores the shared directories. C<name> is the virtual name, C<path> is the
absolute filesystem path obtained by C<realpath()>.

  CREATE TABLE users (
    hub INTEGER NOT NULL,
    uid INTEGER NOT NULL,
    nick TEXT NOT NULL,
    flags INTEGER NOT NULL
  )

Stores information about "special" users. Currently only used for users who are
granted a slot. The C<uid> column is set, but its value is not currently used.
Matching is instead done on the C<nick> column.  C<flags> is a bit mask of
flags, the flag for a granted slot is 1.

=head2 Hash data

  CREATE TABLE hashdata (
    root TEXT NOT NULL PRIMARY KEY,
    size INTEGER NOT NULL,
    tthl BLOB NOT NULL
  );

Unsurprisingly, this stores the hash data of shared files. C<root> is the TTH
root, encoded in base32. C<size> is the size of the file and C<tthl> is the TTH
data.

  CREATE TABLE hashfiles (
    id INTEGER PRIMARY KEY,
    filename TEXT NOT NULL UNIQUE,
    tth TEXT NOT NULL,
    lastmod INTEGER NOT NULL
  );

A mapping of I<files> to I<hashes>. The C<id> column is an alias for the SQLite
C<rowid>, and used internally in ncdc to speed up certain operations.
C<filename> is the absolute and canonical path to the file, as obtained from
C<realpath()>. C<tth> refers to C<hashdata (root)>, and C<lastmod> is the last
modification time, as a UNIX timestamp.

It is not uncommon to have multiple files with the same TTH. An row in
C<hashfiles> should B<always> have a corresponding row in C<hashdata>.  It is
possible to have an row in C<hashdata> with no row in C<hashfiles> referring to
it, or to have entries in C<hashfiles> that are not in a shared directory at
all. These are cleaned up with C</gc>.

=head2 Download queue

  CREATE TABLE dl (
    tth TEXT NOT NULL PRIMARY KEY,
    size INTEGER NOT NULL,
    dest TEXT NOT NULL,
    priority INTEGER NOT NULL DEFAULT 0,
    error INTEGER NOT NULL DEFAULT 0,
    error_msg TEXT,
    tthl BLOB
  );

Each row represents a file in the download queue. File list downloads are not
included. C<tth> is the base32-encoded TTH root of the file, C<size> the file
size, in bytes and C<dest> is the full destination path where the file will be
moved to after downloading. Possible values for C<priority> are defined in the
C<DLP_*> macros in dl.c. Possible C<error> values are defined in the C<DLE_*>
macros. C<error_msg> is NULL if there is no error. C<tthl> is the downloaded
TTH data, NULL if it hasn't been fetched yet.

  CREATE TABLE dl_users (
    tth TEXT NOT NULL,
    uid INTEGER NOT NULL,
    error INTEGER NOT NULL DEFAULT 0,
    error_msg TEXT,
    PRIMARY KEY(tth, uid)
  );

Stores the users from which a download queue item can be downloaded from.
C<tth> refers to C<dl (tth)>, C<uid> is the user id, stored as a 64-bit signed
integer but internally represented as an unsigned integer in ncdc. C<error> and
<error_msg> have the same meaning as for the C<dl_users> table, but obviously
represent errors that only affect the user rather than the file. (e.g. when the
file is not available from this user).

It is possible for a C<dl> row to have no corresponding rows in C<dl_users>,
but a C<dl_user> row must always refer to a row in C<dl>.