If you can live with the risk of false positives (ie. two files which are not 100% identical but your code says they are), then the digest and checksum algorithms can be used to lessen the work, particularly if the files lives on two different machines with less than optimal bandwidth so that a binary comparison is infeasible.