From f313b6c0f59bc3d157f7cf3d07848b7e1baabdc9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Wed, 10 May 2023 04:13:36 +0000 Subject: [PATCH 1/7] Fix File Naming Conventions for Unix Environment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Luke mentions about Unix naming conventions on his videos. Here is a script to increase consistency according to Unix conventions for all file names in parallel, very easily and fast in a safe way. Luke also asks: "What do you think about naming files with underscores instead of dashes?", stating his worry about the usage of underscores seems like a "soydev" thing 😂. I give my opinion below. Actually the justification is objective compared to an opinion. ### What The Script Does **1.** Check if the item is a directory. If so; - **a)** Remove non-English characters. - **b)** Replace spaces, dots, and dashes with underscores. - **c)** Remove consecutive underscores. - **d)** Convert the name to lowercase. - **e)** Remove any other special characters. - **f)** If the resulting name is empty, set it to "untitled". - **g)** Every file or directory should start and end with an alphanumeric character. **2.** If the item is a file, apply the same transformations as for directories, but keep the file extension intact. **3.** Check if the original name and the new name are different. If so, and if a file or directory with the new name already exists, create a unique name. - The script can use Dash and parallel processes, ensuring safety and performance with a subshell environment. Therefore it can even rename more than 100.000 files that have extremely weird names in 30 seconds (I have tested bash built-in functions, tr, awk and sed. None of them was faster than sed for this task, awk was very close but still slower). **Examples of How Every File Should Look:** this_is_an_example_directory_name **OR** this_is_an_example_video_file.mp4 ### Why "_" is Preferred Instead of a Space or a Dot or a Dash In Unix environments, it is generally recommended to replace spaces in filenames with underscores (_), rather than dots (.) or dashes (-). This is because underscores are more commonly used and supported by Unix utilities and programming languages. Dots (.) are typically used as a separator between a file's name and its extension, so using them to replace spaces can lead to confusion and errors. Dashes (-) are sometimes used in place of spaces, but they can be problematic because they are often used as a command-line option delimiter in Unix, which can lead to unexpected behavior. - **Readability:** Underscores make file and directory names more readable, as they clearly separate words and components in the name, whereas spaces can be easily overlooked, and dots can be mistaken for file extensions. - **Compatibility:** Some command line tools and scripts may not handle file names with spaces or dots properly without additional configuration or escaping. Underscores, on the other hand, do not require special handling and are generally better supported across various tools and environments. - **URL encoding:** When sharing file paths in URLs or web applications, spaces and dots may require URL encoding (e.g., replacing spaces with "%20" and dots with "%2E"), which can make the URLs less readable and more cumbersome to work with ### The Reason Behind Using a Subshell Environment Subshells are used in the script to isolate the execution environment of each parallel process. This isolation ensures that the processes do not interfere with each other, as they have their own separate environments, including local variables and function definitions. This separation is particularly important when running multiple processes in parallel, as it reduces the risk of race conditions and other synchronization issues. Using subshells in the script also simplifies the process of launching parallel processes. By executing the process_item function within a subshell, the script can easily leverage the -P flag of xargs to specify the maximum number of parallel processes to run. This results in improved performance and efficiency when processing a large number of files and directories. ### The Benefit of Removing Non-English Characters - **Compatibility:** Non-English characters can cause compatibility issues with some tools, applications, or systems that are not properly configured to handle them. By removing these characters, you reduce the risk of encountering issues related to character encoding and ensure broader compatibility across different environments. - **Consistency:** Standardizing file and directory names by removing non-English characters can make it easier to organize, search, and manage your files. It helps maintain a consistent naming convention across your file system, which can be beneficial for both human users and automated processes. - **Accessibility:** Using only English characters in file and directory names can improve accessibility for users who may not be familiar with non-English characters or languages. This can be particularly important in multi-user or multi-language environments where not all users might be comfortable with non-English characters. ### A Lot More Details - find . -depth -name '*' -print0: This find command searches for all files and directories recursively in the current directory (.). -depth ensures that the directory tree is traversed depth-first, and -name '*' matches all items. -print0 prints the results separated by a null character (useful for handling filenames with spaces or special characters). - | xargs -0 -n1 -P10 -I{} sh -c '...': The find command output is piped (|) to xargs. The -0 option tells xargs to expect null-terminated items. -n1 processes one item at a time. -P10 runs 10 parallel processes. -I{} sets the placeholder for input items. sh -c '...' runs a shell script with the given commands for each input item. - generate_unique_name() { ... }: This is a function that generates a unique name for a file or directory. It takes three arguments: the base name, the extension (if any), and the destination path. It increments a counter and appends it to the base name until a unique name is found, then returns the unique name. - process_item() { ... }: This is the main function that processes a single file or directory path. It sanitizes the name and renames the item if needed. - [ "$item_path" = "." ] && return: This line checks if the item path is the current directory (.). If it is, the function returns without doing anything. - dir_name=$(dirname "$item_path"); base_name=$(basename "$item_path"): These commands extract the directory name and base name from the item path. - if [ -d "$item_path" ]; then ... else ... fi: This conditional block checks if the item is a directory (-d) and processes it accordingly. - new_name=$(echo "$base_name" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/"): This line uses sed to sanitize the base name by removing unwanted characters, replacing spaces and periods with underscores, and converting the name to lowercase. The -E flag enables extended regular expressions. - [ -z "$new_name" ] && new_name="untitled": If the new name is empty, it is set to "untitled". - file_ext="${base_name##*.}" base_name_no_ext="${base_name%.*}": For files, this line extracts the file extension and the base name without the extension. - new_name="${new_base_name_no_ext}.${file_ext}": For files, this line constructs the new file name with the sanitized base name and the original file extension. - if [ "$base_name" != "$new_name" ]; then ... fi: This conditional block checks if the original name and the new name are different. - [ -e "${dir_name}/${new_name}" ] && new_name=$(generate_unique_name "${new_name%.*}" "${new_name##*.}" "$dir_name"): If the new name already exists, the generate_unique_name function is called to get a unique name. - mv "$item_path" "${dir_name}/${new_name}" 2>/dev/null || true: This line moves (renames) the item to the new path with the sanitized name. If an error occurs, it is redirected to /dev/null (ignored) and the script continues executing due to the || true. - process_item "{}": This line calls the process_item function with the input item path (represented by {}) as the argument. - ' 2>/dev/null: This part of the script suppresses any error messages by redirecting the standard error output to /dev/null. --- .local/bin/fixnames | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 .local/bin/fixnames diff --git a/.local/bin/fixnames b/.local/bin/fixnames new file mode 100644 index 00000000..37f101d5 --- /dev/null +++ b/.local/bin/fixnames @@ -0,0 +1,33 @@ +#!/bin/sh + +find . -depth -name '*' -print0 | xargs -0 -n1 -P10 -I{} sh -c ' + generate_unique_name() { + base_name="$1"; ext="$2"; dest_path="$3"; count=1 + [ -z "$ext" ] && new_name="$base_name" || new_name="${base_name}.${ext}" + while [ -e "${dest_path}/${new_name}" ]; do + [ -z "$ext" ] && new_name="${base_name}_${count}" || new_name="${base_name}_${count}.${ext}" + count=$(( count + 1 )) + done + echo "$new_name" + } + + process_item() { + item_path="$1"; [ "$item_path" = "." ] && return + dir_name=$(dirname "$item_path"); base_name=$(basename "$item_path") + if [ -d "$item_path" ]; then + new_name=$(echo "$base_name" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") + [ -z "$new_name" ] && new_name="untitled" + else + file_ext="${base_name##*.}" + base_name_no_ext="${base_name%.*}" + new_base_name_no_ext=$(echo "$base_name_no_ext" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") + [ -z "$new_base_name_no_ext" ] && new_base_name_no_ext="untitled" + new_name="${new_base_name_no_ext}.${file_ext}" + fi + if [ "$base_name" != "$new_name" ]; then + [ -e "${dir_name}/${new_name}" ] && new_name=$(generate_unique_name "${new_name%.*}" "${new_name##*.}" "$dir_name") + mv "$item_path" "${dir_name}/${new_name}" 2>/dev/null || true + fi + } + process_item "{}" +' 2>/dev/null From 62618ae588884a90ca914ec9fadbef8ba5b26964 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Sat, 1 Jul 2023 04:13:50 +0300 Subject: [PATCH 2/7] increased safety | exclude dotfiles | check root --- .local/bin/fixnames | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index 37f101d5..c76869ae 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -1,6 +1,8 @@ #!/bin/sh -find . -depth -name '*' -print0 | xargs -0 -n1 -P10 -I{} sh -c ' +[ "$(id -u)" = "0" ] && echo "This script should not be run as root" >&2 && exit 1 + +find . -depth \( -name '.*' -prune \) -o -name '*' -print0 | xargs -0 -n1 -P10 -I{} sh -c ' generate_unique_name() { base_name="$1"; ext="$2"; dest_path="$3"; count=1 [ -z "$ext" ] && new_name="$base_name" || new_name="${base_name}.${ext}" @@ -10,7 +12,6 @@ find . -depth -name '*' -print0 | xargs -0 -n1 -P10 -I{} sh -c ' done echo "$new_name" } - process_item() { item_path="$1"; [ "$item_path" = "." ] && return dir_name=$(dirname "$item_path"); base_name=$(basename "$item_path") From fe6b9043b87b1220b38d4d786b1837c4ead27094 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Fri, 7 Jul 2023 03:27:50 +0300 Subject: [PATCH 3/7] increased safety --- .local/bin/fixnames | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index c76869ae..d8401246 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -2,7 +2,7 @@ [ "$(id -u)" = "0" ] && echo "This script should not be run as root" >&2 && exit 1 -find . -depth \( -name '.*' -prune \) -o -name '*' -print0 | xargs -0 -n1 -P10 -I{} sh -c ' +find . -type d -path '*/.*' -prune -o -print0 | xargs -0 -n1 -P10 -I{} sh -c ' generate_unique_name() { base_name="$1"; ext="$2"; dest_path="$3"; count=1 [ -z "$ext" ] && new_name="$base_name" || new_name="${base_name}.${ext}" From d08eea1cf5862a21bbe83247bf0436803456cc82 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Fri, 7 Jul 2023 03:42:37 +0300 Subject: [PATCH 4/7] check files with no extensions --- .local/bin/fixnames | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index d8401246..40b65a7f 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -13,22 +13,23 @@ find . -type d -path '*/.*' -prune -o -print0 | xargs -0 -n1 -P10 -I{} sh -c ' echo "$new_name" } process_item() { - item_path="$1"; [ "$item_path" = "." ] && return + item_path="$1"; + [ "$item_path" = "." ] && return dir_name=$(dirname "$item_path"); base_name=$(basename "$item_path") - if [ -d "$item_path" ]; then + [ -d "$item_path" ] && { new_name=$(echo "$base_name" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") [ -z "$new_name" ] && new_name="untitled" - else + } || { file_ext="${base_name##*.}" base_name_no_ext="${base_name%.*}" new_base_name_no_ext=$(echo "$base_name_no_ext" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") [ -z "$new_base_name_no_ext" ] && new_base_name_no_ext="untitled" - new_name="${new_base_name_no_ext}.${file_ext}" - fi - if [ "$base_name" != "$new_name" ]; then + [ -z "$file_ext" ] || [ "$file_ext" = "$base_name_no_ext" ] && new_name="$new_base_name_no_ext" || new_name="${new_base_name_no_ext}.${file_ext}" + } + [ "$base_name" != "$new_name" ] && { [ -e "${dir_name}/${new_name}" ] && new_name=$(generate_unique_name "${new_name%.*}" "${new_name##*.}" "$dir_name") mv "$item_path" "${dir_name}/${new_name}" 2>/dev/null || true - fi + } } process_item "{}" ' 2>/dev/null From a06fe894c98fb24318f2152a1d6665e4dee42b3c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Thu, 19 Oct 2023 23:14:09 +0300 Subject: [PATCH 5/7] highly improve and minimize --- .local/bin/fixnames | 47 +++++++++++++++------------------------------ 1 file changed, 15 insertions(+), 32 deletions(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index 40b65a7f..4259de18 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -1,35 +1,18 @@ -#!/bin/sh +#!/bin/dash [ "$(id -u)" = "0" ] && echo "This script should not be run as root" >&2 && exit 1 -find . -type d -path '*/.*' -prune -o -print0 | xargs -0 -n1 -P10 -I{} sh -c ' - generate_unique_name() { - base_name="$1"; ext="$2"; dest_path="$3"; count=1 - [ -z "$ext" ] && new_name="$base_name" || new_name="${base_name}.${ext}" - while [ -e "${dest_path}/${new_name}" ]; do - [ -z "$ext" ] && new_name="${base_name}_${count}" || new_name="${base_name}_${count}.${ext}" - count=$(( count + 1 )) - done - echo "$new_name" - } - process_item() { - item_path="$1"; - [ "$item_path" = "." ] && return - dir_name=$(dirname "$item_path"); base_name=$(basename "$item_path") - [ -d "$item_path" ] && { - new_name=$(echo "$base_name" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") - [ -z "$new_name" ] && new_name="untitled" - } || { - file_ext="${base_name##*.}" - base_name_no_ext="${base_name%.*}" - new_base_name_no_ext=$(echo "$base_name_no_ext" | sed -E "s/[^a-zA-Z0-9 _.-]+//g; s/[ .-]+/_/g; s/_+/_/g; s/^_//; s/_$//; s/(.*)/\L\1/") - [ -z "$new_base_name_no_ext" ] && new_base_name_no_ext="untitled" - [ -z "$file_ext" ] || [ "$file_ext" = "$base_name_no_ext" ] && new_name="$new_base_name_no_ext" || new_name="${new_base_name_no_ext}.${file_ext}" - } - [ "$base_name" != "$new_name" ] && { - [ -e "${dir_name}/${new_name}" ] && new_name=$(generate_unique_name "${new_name%.*}" "${new_name##*.}" "$dir_name") - mv "$item_path" "${dir_name}/${new_name}" 2>/dev/null || true - } - } - process_item "{}" -' 2>/dev/null +find . -depth -type d -path '*/.*' -prune -o -print0 | xargs -0 -P 0 -I {} dash -c ' + +base="${1##*/}" +path="${1%/*}" + +pattern="s/[^a-zA-Z0-9 ._-]//g; s/[ .-]/_/g; s/_+/_/g; s/^_+//; s/_+$//; s/[A-Z]/\L&/g" + +[ -f "$1" ] && pattern="$pattern; s/_([^_]+)$/.\\1/" + +new_name="$(echo "$base" | sed -E "$pattern")" + +[ "$base" != "$new_name" ] && [ -e "$path/$new_name" ] && new_name="${$}_${new_name}" +[ "$base" != "$new_name" ] && mv "$1" "$path/$new_name" +' _ {} From 01661e9bba239d73390c142097b5344f09640bab Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Fri, 20 Oct 2023 00:54:21 +0300 Subject: [PATCH 6/7] improve find command --- .local/bin/fixnames | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index 4259de18..13ac6ed9 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -2,7 +2,7 @@ [ "$(id -u)" = "0" ] && echo "This script should not be run as root" >&2 && exit 1 -find . -depth -type d -path '*/.*' -prune -o -print0 | xargs -0 -P 0 -I {} dash -c ' +find . -depth \( -path '*/.*' -o -path '*/.*/*' \) -prune -o \( -type f -o -type d \) -print0 | xargs -0 -P 0 -I {} dash -c ' base="${1##*/}" path="${1%/*}" From 476b0916c7d89576c6296cd03248fa3eed89bc56 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Emre=20AKY=C3=9CZ?= Date: Tue, 9 Jan 2024 01:31:45 +0300 Subject: [PATCH 7/7] improve | add info --- .local/bin/fixnames | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/.local/bin/fixnames b/.local/bin/fixnames index 13ac6ed9..d0aaf963 100644 --- a/.local/bin/fixnames +++ b/.local/bin/fixnames @@ -1,18 +1,23 @@ #!/bin/dash -[ "$(id -u)" = "0" ] && echo "This script should not be run as root" >&2 && exit 1 +[ -z "${1}" ] && { + printf "%s\n" "Specify a file path as an argument. e.g., fixnames " >&2 + exit "1" +} -find . -depth \( -path '*/.*' -o -path '*/.*/*' \) -prune -o \( -type f -o -type d \) -print0 | xargs -0 -P 0 -I {} dash -c ' +[ "$(id -u)" = "0" ] && printf "%s\n" "This script should not be run as root" >&2 && exit "1" + +find "${1}" -depth \( -path '*/.*' -o -path '*/.*/*' -o -path '.' \) -prune -o \( -type f -o -type d \) -print0 | sort -zr | xargs -0 -P "0" -I {} dash -c ' base="${1##*/}" path="${1%/*}" pattern="s/[^a-zA-Z0-9 ._-]//g; s/[ .-]/_/g; s/_+/_/g; s/^_+//; s/_+$//; s/[A-Z]/\L&/g" -[ -f "$1" ] && pattern="$pattern; s/_([^_]+)$/.\\1/" +[ -f "${1}" ] && pattern="${pattern}; s/_([^_]+)$/.\\1/" -new_name="$(echo "$base" | sed -E "$pattern")" +new_name="$(printf "%s\n" "${base}" | sed -E "${pattern}")" -[ "$base" != "$new_name" ] && [ -e "$path/$new_name" ] && new_name="${$}_${new_name}" -[ "$base" != "$new_name" ] && mv "$1" "$path/$new_name" +[ "${base}" != "${new_name}" ] && [ -e "${path}/${new_name}" ] && new_name="${$}_${new_name}" +[ "${base}" != "${new_name}" ] && mv "${1}" "${path}/${new_name}" ' _ {}