If I understand things correctly, all edge equations are done in clip space (or NDC, but this doesn't really matter for the purposes of the discussion). This means that all triangles are viewed from the camera projected on a 2D plane with a fixed distance along the z-axis. This is confirmed by the picture where the triangle vertices both have the same z-value of 1.

If all triangles are projected on a 2D plane, you can't be in clip space or NDC since those still define a volume, but in screen-space instead. And edge equations are indeed computed in screen-space as you want to process flat triangles.

Now, in the picture attached, assume that the point not named is called p_2 = (1,0,1). Taking the cross product p_1 x p_2 = (1, 1, -1) which has a direction pointing along the z-direction, which I find odd. Aren't we purely looking for the normals to the triangle sides in the clip space?

If you look at e(x,y) = dot(n, (x,y,1)), you could re-write it as e(x,y) = dot(n.xy, (x,y)) + n.z. The direction of the normal is really only the first two components, x and y, with z being used as an offset.

If you look at the point c_0 = (0.5,0.5,1) located right on the segment p1p2, e(0.5,0.5) = dot((1,1), (0.5,0.5)) + -1 = 1 + -1 = 0, as expected.

Furthermore, it seems like you will get the correct sign of the normal (pointing outwards from the triangle) if you take each cross product in a counter clockwise fashion i.e. p_0 x p_1, p_1 x p_2 and p_2 x p_0. This has not been clarified in detail, but it is important. Maybe you can elaborate more on this ordering?

You are taking the points in clockwise order there, not counter-clockwise.

I would recommend you read section 2.1 (subsections 2.1.1 and later not included) of

those notes that are used in the EDAN35 course: it explains in more details how it works; note that it defines the points in counter-clockwise order, and as such a point is inside if the edge equation is positive.

When they define e_0(x, y) = −(p^2_y − p^1_y)(x − p^1_x) + (p^2_x − p^1_x)(y − p^1_y) it actually comes from e_0(x, y) = dot(n.xy, (x, y) - p^1) where n = (p^1_x, p^1_y, 1) x (p^2_x, p^2_y, 1)