2 Ways To Find Duplicate Files On a Mac

Tuesday June 28, 2022. 05:00 PM , from MacMost

If you suspect that you have some large duplicate files on your Mac, you can find them without any special software. You can use the Finder to search for files and sort them so duplicates are together. You can also use the Terminal to find duplicates with a multi-part command.

Check out 2 Ways To Find Duplicate Files On a Mac at YouTube for closed captioning and more options.
Video Transcript: Hi, this is Gary with MacMost.com. Let me show you two ways to find duplicate files on your Mac.
MacMost is brought to you thanks to a great group of more than 1000 supporters. Go to MacMost.com/patreon. There you can read more about the Patreon Campaign. Join us and get exclusive content and course discounts.
So I'm often asked how can you find duplicate files on your Mac. After all we may occasionally make a mistake and create a duplicate file somewhere and then you've got two copies of the same file taking up a lot of space. Of course if you do find this is a problem that happens to you often you going to want to make sure you figure out what behavior you're doing that leads to the duplicate file so it doesn't become a problem. But let's say you want to review what you've got now to see if there is a duplicate there.
Well, a simple way to do it in the Finder is to simply do a Search. So here I am in my Documents folder and that's where I want to start the Search. I want to look for duplicates in the Documents folder. So I'm going to do Command F to do a search and then I'm going to search for Other and look for Size because I want to only search for large files here. I'm not going to care about the tiny little files that may be identical. I want to see if there is anything big out there that I can get rid of. So I'm going to look for file size and then say, is greater than and set it to something like oh maybe greater than 1MB. So now I'm only going to get files like that. I'm going to also want to search for just for the folder I'm in, so Documents. Then I'm going to want to Sort By Size. If you don't see Size up here at the top you can Control Click and select Size here. Anywhere here in the Headers. Now I've got Size. Now I can Sort By Size and anything that is exactly the same size will appear right next to the same file. Like, for instance, here are a bunch of files that seem like they maybe duplicates because they are the same size and you can see they are similar names.
So, I could easily go and look here at where each one is located. Figure out if I've got a duplicate and get rid of the duplicate. Then I can quickly go through these large files in the order they are in so I can Quit at some point if I think the files are getting too small for me to care about.
Now this isn't a great method because you have to carefully inspect everything to make sure it's actually a duplicate and, of course, you're getting a list of all your files here. But the advantage is you don't need any special skills to do this. You can do it just right here in the Finder with a simple Search and it will spot anything, particularly anything that's pretty large.
But there's another way to do this using Terminal. You can actually have the Terminal search for duplicate files. This takes a multi-part command but it's not too difficult to understand. Now I've actually created some files in this test folder here where if I look in the subfolder there is a duplicate of this file here so we have something we can test this with. We're going to start off just running the test here in the Test Directory or Test Folder, and then we can understand how it works and we can apply that to all the files in the Documents folder.
So I'm going to run Terminal here. I'm going to make sure I'm in this folder here. So I can do PWD and it shows me I am indeed in my Test Folder. If not, you can do CD for Change Directory. Let's go up a level here and I'm going to drag Test in there and you can see it puts the full path so I don't have to type it and I can change directory to that directory. So that's how you know where you are and that's how you get to some new location.
Now that we are in here, while in Terminal we can run this Command that will find duplicates. Here's the command. So you can see it takes up a few lines. Note that every time you see an up and down line, like this here or this here or this here, that's basically saying take the output from the previous command and send it to this new command. So you have basically a bunch of different commands each one sending output to the next. Now if we were to run this it would quickly tell us that there are two files here that are the same. What are these two numbers here? Well, this second number is actually the size of the file. You can see here that, indeed, this is 7.4M so you can see that in bytes right there. What's this first number? Well, this first number here is something called the Checksum What's a checksum? Well, basically think about if you took a book and you signed a number to every letter, like A was 1, B was 2, C was 3 and you added up all of the letters using those numbers. You'd come up with a really big number for the total of all the letters in the book. But it wouldn't be as big as the book. The book might be millions of letters long. But you'll probably end up with, you know, a 13 or 15 digit number. That would be a really unique number. The only way you'll probably going to ever find that same number again is if you use exactly the same book and count them all again.
The Checksum works in a similar way. It's a little more complex than that but it analyzes every byte in the file and uses that to calculate a number. The chances of any two files having the same number are nearly zero unless the files are identical. So, this is the checksum for this file and this is the checksum for this file. The fact that they are the same points to the fact that they are exactly the same file and having the same exact size is even more confirmation. So it's telling us that these two files are identical files. They are duplicates. So let's take a look at how exactly this works right here.
I've broken the line down here with every time it is sent to a new command its on a new line. So we can now look at each individual thing here. The Find command will find things. Where? At the current location. That's what the dot means. Of what type; type f files. It's going to look for things greater than 1M in size. Then it's going to take what it finds and run a program called Checksum with no extra parameters and then it's going to terminate that. That's going to basically assign that checksum to everything there. Let's take this and try running it here and we'll see what the results are. You could see here it looks at every file in that folder that's greater than 1M and assigns that checksum to it. So, now we have that. Now it's pretty easy here to see that we've got two duplicates. But that would not be so easy if we were looking through the entire Documents folder with thousands of files.
So the next thing we're going to do is this, which is going to take the results and do two things with it. One is, it's going to save it in to a file. We're going to use the Temp directory, which is a directory on your Mac where you can save temporary files in little commands like this. It's going to save it to a file called filelist.temp. The second thing it does is it sends it to the next command. So tee as like a t-joint in plumbing. Two different things with the same data. Then, we're going to cut and basically get two fields, 1, 2 and it's going to look for divisions of a space. So, it's going to get this and this. The checksum and the file size for everything. That's what we've got after this part. So let's try that out.
So we can see that's exactly what we get. It takes away the file name. Great. What's the next part. Well, the next part is it is going to Sort. So let's try that. Now we can see it does it. It's sorted. It was already sorted so we're not going to see any change here but if these weren't in alphabetical order they would be. By sorting anything that's a duplicate is going to be right next to the other thing that's a duplicate because they are the same. So sorting puts duplicates next to each other. Now the next thing we're going to do is we're going to look for lines that are not unique. We're going to show the duplicate for any line. So we just get this one line here because the other one was unique and this one was part of a duplicate. So it shows that. So great. So now we've found every duplicate.
Next thing we want to do is use grep to go back to that file. Remember we saved the original data from the find here. So we're going to go back and use grep to say, show us any lines that match the beginning of whatever comes out of the unique part. So in other words it's going to find the lines that are there. If there are two files that are the same it's going to find both of them because both will have the same checksum and the same file size. So we use that and we see that it found two lines that start with that. This one and this one. The last thing we want to do is we want to sort again because if we find a lot of duplicates they're not going to be in any particular order. It may list the first duplicate as the first line and second duplicate as the fifteenth line. But by sorting it will put the duplicates together. We want to Sort By number. We're going to sort by file size. We want the biggest ones to be at the top. We're going to do Reverse Sort that puts the biggest at the top. K2 means the second key. So not sorting by this, the checksum. But sorting by the second thing.
So now we put that all together and we get this result. Not only finds these two duplicates but what if we were to go up a level, do CD up here, and now we look where we are. We're at the Documents folder. So let's run that command again and we can see it actually found a bunch of things. Because they are sorted everything is grouped together. We can see the first three things here are three identical files. Then we can see some duplicates right here in two different locations. Then we can see another set of three identical files. There's three. There's two. Now we've identified a bunch of places in the Documents folder here where we have found files that are duplicates.
I'll include this full code right here for the command at this post at MacMost.com. I hope you find it useful. Related Subjects: Finder (247 videos), Terminal (34 videos)
Related Video Tutorials:
Using Terminal to Find Large Files and Folders ― 10 Different Ways To Move Files On a Mac ― 10 Of the Quickest Ways To Access Files On Your Mac ― Handy Ways To View Recent Files On Your Mac

Here’s the code for the Terminal command. This should all be on one single long line.
find. -type f -size +1M -exec cksum {}; | tee /tmp/filelist.tmp | cut -f 1,2 -d ' ' | sort | uniq -d | grep -hif - /tmp/filelist.tmp | sort -nrk2; rm /tmp/filelist.tmp
Note: I have added a part at the end that is not in the video to delete the temporary file to keep things cleaner.