The code run for experiments in this paper for the "compiled" code wasn't compiled: it was manually written to mimic the code that the compiler would write. As far as I remember, the compiler we had could generate the bulk of the code and also did the passes as described in the paper, but it did not place some things (e.g., if I recall correctly, some sync-related code) in the correct code location (other things, as far as I remember, it was able to do; I've forgotten the details at this point in time). There was non-trivial effort put in to make the manual code mimic what the compiled code would have looked like, so the ideas of the paper still hold some merit.
Given my involvement in this code writing, I do not include this paper in my CV. I only mention it to make this note.
Related note: this paper is cited in my dissertation, but it's not for the compiler part of it, but the manual implementation of Betweenness-Centrality. Ideally, though, given the context above, I would have not included it as a citation out of principle had I remembered.
The paper found on the GNNSys webpage contains an incorrect appendix: the learning rates for DGL and DistDGL are not 0.01 in all cases. DistDGL used 0.003 in our experiments, and DGL used 0.01 in the experiments except for reddit with GCN which uses 0.02.
This has been corrected in the version of the paper posted on my webpage (unlike the other errors on this page, it is corrected here because this is in a workshop paper, which is considered non-archival).
The paper mentions that the source code for TriCore was not available during the time of writing. The code was indeed public when we published this paper as we found out after publication. During writing and experimental results collction, we had contacted the authors and did a search for the code, but the authors did not reply nor were not able to find it; this is why the paper says it is not available.
Important Note (originally added 4/12/21): The triangle counting algorithm in this paper had already existed in past work. Even though DistTC cites this work, I was not aware that the algorithm in that paper is essentially the same.
PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks, CIKM'13.
http://www.cs.uno.edu/~arif/paper/patric.pdf
What DistTC does that is different is that it provides a GPU implementation compared to the original's CPU implementation. The use of the CuSP partitioner can also allow for different kinds of edge cuts to be explored more easily than the original formulation.
If DistTC is cited in your work, please also cite PATRIC.
There was a correctness bug in MRBC that affected indochina and rmat24 runs in the paper; with regard to runtimes, only indochina scaling numbers are significantly different: using numbers from the corrected version of MRBC affects the scaling numbers for indochina in the Results section (1 host indochina runtime is around 5% slower, so scaling to 32 hosts ends up better if new numbers are used because at scale there is no significant change with the corrected version).
The correct version of MRBC is available on the Galois GitHub.
Last update: May 13, 2025