

Ugh, so much AI generated slop starting to fill up Bandcamp. Enshittification launch countdown is almost complete, I fear…
There’s excellent music on there and if you buy stuff from a trusted artist it’s a good deal for them, but as a discovery platform it’s becoming a dead-internet failure.
Amt: 8003, Process:10.67s (17.52T/s), Generate:1408.00s (5.68T/s), Total: 1418.67s
<think> Okay, I need to write a Python script that sorts all .gguf files in the current directory into subdirectories based on their publisher. The publishers are determined from URLs stored in files named dl{YYYYMMDD}.txt. Each line in these files has a URL, and some might be commented out with a #. The publisher is the first part after the domain in the URL. Let me think through how to approach this.
First, I need to find all the .gguf files in the current directory. Then, for each of them, I need to determine their publisher. But how do I get the URLs associated with each .gguf file? Wait, the problem says that the URLs are in the dl{YYYYMMDD}.txt files. But how do I know which URL corresponds to which .gguf file? Hmm, maybe the URLs in the txt files are the ones that were used to download the .gguf files. So each .gguf file was downloaded via a URL listed in one of these txt files. But how do I associate each .gguf file with its URL?
Wait, the problem statement says that the URLs are in the files named dl{YYYYMMDD}.txt. But the .gguf files themselves don’t have any metadata linking them to the URLs. So maybe the assumption is that each .gguf file was downloaded using one of the URLs in these txt files. But how to determine which URL corresponds to which .gguf file? That’s a problem. Unless there’s some naming convention I’m missing.
Wait, maybe the URLs in the txt files are the ones that were used to download the .gguf files. For example, if a URL is “https://huggingface.co/unsloth/.../file.gguf”, then the corresponding .gguf file would be named something like “file.gguf”. But the problem says that the .gguf files are in the current directory. So perhaps the script needs to look through all the URLs in the txt files, extract the publisher, and then for each .gguf file, find which URL it came from. But without any direct link, how?
Wait, maybe the .gguf files are named after the file part of the URL. For example, the URL ends with “Qwen3-30B-A3B-Q6_K.gguf?download=true”, so the .gguf file would be named “Qwen3-30B-A3B-Q6_K.gguf”. If that’s the case, then for each .gguf file, we can check if there’s a URL in the txt files that has the same filename. But that might not be reliable. Alternatively, maybe the script is supposed to process all the URLs in the txt files, extract their publishers, and then for each .gguf file, determine which publisher it belongs to based on the URL that was used to download it. However, without knowing which URL corresponds to which .gguf file, this seems impossible.
Wait, maybe the problem is that the user expects that each .gguf file was downloaded via one of the URLs in the txt files. So the script should process all the URLs in the txt files, extract the publisher, and then for each .gguf file, find which URL (if any) it corresponds to. But how?
Alternatively, perhaps the script is supposed to take each .gguf file, and for each of them, look through all the URLs in the txt files to find which one matches. But how? For example, if the .gguf file is named “Qwen3-30B-A3B-Q6_K.gguf”, then the URL would have that as the filename. So perhaps the script can check for each .gguf file, whether there’s a URL in the txt files that has that filename. But that’s a bit of a stretch. However, given the problem statement, perhaps that’s the intended approach.
Alternatively, maybe the problem is that the URLs in the txt files are the ones that were used to download the .gguf files, and each .gguf file’s name is derived from the URL. So the script can process all the URLs, extract the publisher, and then for each .gguf file, determine which URL it came from by matching the filename. But this requires that the .gguf files are named exactly as the file part of the URL.
Assuming that’s the case, here’s the plan:
But how to handle cases where multiple URLs have the same filename? Or if no URL matches? The problem statement says that the URLs are in the txt files, but maybe not all .gguf files are accounted for. But the script should process as many as possible.
Alternatively, perhaps the script is supposed to process all URLs in the txt files, extract the publisher, and then create directories for each publisher, but how does that relate to the .gguf files? Maybe the .gguf files are already in the directory, and the script is supposed to move them into the publisher’s directory based on the URLs in the txt files. But without knowing which .gguf file corresponds to which URL, this seems impossible.
Wait, maybe the problem is that the .gguf files were downloaded using the URLs in the txt files, and the script needs to associate each .gguf file with the URL that was used to download it. But how? Unless there’s some other metadata. But the problem statement doesn’t mention that.
This is a bit confusing. Let me re-read the problem statement.
The user says: “The URLs are in files in the current directory named in the format dl{YYYYMMDD}.txt
, with one url per line. Some url lines may be commented out with a single #
character, but they are still a valid source of info. An example url is … and the publisher is always the first node after the domain, in this case unsloth
.”
So the URLs are stored in th […trimmed because lemmy won’t handle >10,000 chars]
Here’s a well-documented Python script that fulfills the requirements of sorting .gguf
files into directories based on their source publishers, as extracted from URLs stored in dl{YYYYMMDD}.txt
files.
.gguf
files in the current working directory.dl{YYYYMMDD}.txt
files in the current directory.#
) but still considered valid..gguf
file into the corresponding publisher directory.dl20231010.txt
)https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf?download=true
# https://huggingface.co/another_publisher/model.gguf
import os
import shutil
from urllib.parse import urlparse
def main():
# Step 1: Find all dl{YYYYMMDD}.txt files in the current directory
txt_files = [f for f in os.listdir() if f.startswith('dl') and f.endswith('.txt')]
for txt_file in txt_files:
print(f"Processing {txt_file}...")
try:
with open(txt_file, 'r', encoding='utf-8') as f:
lines = f.readlines()
except Exception as e:
print(f"Error reading {txt_file}: {e}")
continue
for line in lines:
line = line.strip()
# Step 2: Remove any leading '#' and whitespace
if line.startswith('#'):
line = line[1:].strip()
if not line:
continue
try:
# Step 3: Parse the URL and extract the filename
parsed = urlparse(line)
path = parsed.path
filename = os.path.basename(path)
# Skip non-.gguf files
if not filename.endswith('.gguf'):
continue
# Step 4: Check if the file exists in the current directory
if not os.path.exists(filename):
print(f"File '{filename}' not found for URL '{line}'")
continue
# Step 5: Extract publisher from the URL path
path_parts = parsed.path.split('/')
if len(path_parts) < 3:
print(f"Invalid URL format: '{line}'")
continue
publisher = path_parts[1]
# Step 6: Create the publisher directory if it doesn't exist
publisher_dir = os.path.join(os.getcwd(), publisher)
os.makedirs(publisher_dir, exist_ok=True)
# Step 7: Move the file
src_path = os.path.join(os.getcwd(), filename)
dst_path = os.path.join(publisher_dir, filename)
shutil.move(src_path, dst_path)
print(f"Moved '{filename}' to '{publisher_dir}'")
except Exception as e:
print(f"Error processing URL '{line}': {e}")
if __name__ == "__main__":
main()
dl{YYYYMMDD}.txt
..gguf
files are named exactly as per the URL’s file part (e.g., Qwen3-30B-A3B-Q6_K.gguf
)..gguf
file is referenced by multiple URLs, it will be moved only once (after that, it will no longer exist in the current directory).shutil.move
, which is safe and works across file systems.Processing dl20231010.txt...
Moved 'Qwen3-30B-A3B-Q6_K.gguf' to 'unsloth'
File 'model.gguf' not found for URL 'https://huggingface.co/another_publisher/model.gguf'
This script is designed to be robust and user-friendly, ensuring all .gguf
files are properly organized based on their source publishers.
Amt:3313, Process:107.21s (1.74T/s), Generate:3463.16s (0.96T/s), Total:3570.37s
<think>
Okay, let’s tackle this problem step by step. The user wants a script that sorts all .gguf files into subdirectories based on their publisher, which is derived from the URLs in the dl*.txt files. Hmm, first, I need to figure out how to extract the publisher from each URL.
The example URL given is https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf?download=true
, and the publisher here is ‘unsloth’. Looking at the URL structure, the publisher is the first path component after the domain. So, after ‘huggingface.co’, the next part is the publisher.
So, the plan is: for each .gguf file in the current directory, find its corresponding URL in any of the dl*.txt files. Then parse that URL to get the publisher, create a directory if it doesn’t exist, and move the file there. But wait, how do I link each .gguf file to its URL? The filenames might not directly correspond to the URLs, so maybe the filenames are part of the URL paths?
Alternatively, perhaps each line in the dl*.txt files contains a URL that ends with the filename. So, the filename is the last part of the URL before any query parameters. For example, in the example URL, the filename is Qwen3-30B-A3B-Q6_K.gguf. So, the script can look through all the dl*.txt files, parse each URL, extract the filename from the URL, and then check if that filename exists in the current directory. If it does, then we can get the publisher from that URL and move the file accordingly.
Wait, but the problem states that the URLs are in the dl files, and each .gguf file was downloaded from one of those URLs. So each .gguf file corresponds to exactly one URL in one of the dl files. Therefore, the script needs to:
But first, how to parse the URLs correctly. Let’s think about splitting the URL. The example URL’s path after the domain is ‘/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf’. The first path component after the domain is ‘unsloth’, which is the publisher. So splitting the path by ‘/’ and taking the second element (since the first is empty due to leading ‘/’) would give the publisher.
For the filename, the last part of the path before any query parameters. So split the URL on ‘?’ to get the path part, then split on ‘/’ and take the last element.
Potential edge cases: URLs with different domains? The example uses huggingface.co, but maybe others could be present. But the problem says “the publisher can only be found in the URL used to originally download each .gguf file”, so the structure should be consistent. So the first path component after the domain is always the publisher.
Now, steps for the script:
First, process all dl*.txt files. For each file:
How to do this in bash? Maybe using parameter expansion or cut.
Alternatively, using awk or sed. Let’s think of code snippets.
Parsing the publisher:
Given a URL like ‘https://huggingface.co/unsloth/…’, the path starts after ‘huggingface.co/’. So splitting the URL into parts after the domain.
Maybe using cut:
domain_part=$(echo $url | cut -d ‘/’ -f 3) # gets huggingface.co path_part=$(echo $url | cut -d ‘/’ -f 4-) # gets everything after the domain
Then the first element of path_part is the publisher. So split path_part on ‘/’ again.
publisher=$(echo $path_part | cut -d ‘/’ -f 1)
Wait, let’s see:
For the example URL:
Original URL: https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf?download=true
Breaking down:
After splitting by ‘/’, the components are:
0: https: 1: 2: huggingface.co 3: unsloth 4: Qwen3-30B-A3B-GGUF 5: resolve 6: main 7: Qwen3-30B-A3B-Q6_K.gguf?download=true
Wait, actually, the URL is split into parts by the slashes. The third component (index 2 if starting from 0) is huggingface.co. The fourth component (index 3) is unsloth, which is the publisher.
So, to get the publisher, we can take the 4th field when splitting by ‘/’.
Wait, perhaps using:
publisher=$(echo $url | awk -F/ ‘{print $4}’)
Yes, because:
The URL is split into fields by ‘/’, so $4 would be ‘unsloth’.
Similarly, for the filename:
The filename is the last part before any ‘?’, take the first part, then split on ‘/’ and take the last element. Alternatively, using basename:
filename=$(basename $(echo $url | cut -d ‘?’ -f1))
Yes, that’s better. For example, the URL part before ? is:
https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf
basename of that is “Qwen3-30B-A3B-Q6_K.gguf”.
So putting it all together, for each URL line:
publisher=$(echo “$url” | awk -F/ ‘{print $4}’) filename=$(basename “$(echo “$url” | cut -d ‘?’ -f1)”)
Then, we can build an associative array (in bash) mapping filenames to publishers.
Once all URLs are processed, we can loop over the .gguf files in the current directory. For each file:
Now, in bash, associative arrays can be used. Let’s structure the code:
First, read all the dl files.
declare -A publisher_map
for dlfile in dl*.txt; do while IFS= read -r line; do # Ignore comments and empty lines if [[ $line == #* || -z $line ]]; then continue fi url=“$line”
# Extract publisher
publisher=$(echo "$url" | awk -F/ '{print $4}')
# Extract filename
fname=$(basename "$(echo "$url" | cut -d '?' -f1)")
# Store in the map
publisher_map["$fname"]=$publisher
done < "$dlfile"
done
Then, process each .gguf file:
for file in *.gguf; do if [[ ! -f $file ]]; then continue fi fname=$(basename “$file”)
if [[ -v publisher_map[$fname] ]]; then
publisher=${publisher_map[$fname]}
dir="$publisher"
mkdir -p "$dir"
mv -- "$file" "$dir/"
echo "Moved $file to $dir"
else
echo "Warning: $file has no corresponding URL in dl files. Skipping."
fi
done
Wait, but in bash, checking if a key exists in an associative array uses -v, so that part is okay.
But need to make sure that the filenames are exactly as they appear in the URL’s path. For example, if the URL’s filename has a different case, or maybe trailing spaces? Probably need to ensure exact matches.
Another thing: what about files that are not in any of the dl files? The problem says “all .gguf files”, but perhaps some files were not downloaded via these URLs? The user might want those to be skipped, or maybe an error. But according to the problem statement, they need to be sorted into subdirectories based on their original download URLs, so files not present in any dl file ca […trimmed because lemmy won’t handle so many chars]
Here’s a bash script to organize the .gguf files into publisher directories based on their download URLs:
#!/bin/bash
# Create an associative array to hold filename -> publisher mappings
declare -A publisher_map
# Process all dl*.txt files
for dlfile in dl*.txt; do
while IFS= read -r line; do
# Skip comments and empty lines
if [[ $line == \#* || -z $line ]]; then
continue
fi
url="$line"
# Extract publisher (4th part after splitting on '/')
publisher=$(echo "$url" | awk -F/ '{print $4}')
# Extract filename (basename of path without query parameters)
fname=$(basename "$(echo "$url" | cut -d '?' -f1)")
# Store in the map
publisher_map["$fname"]=$publisher
done < "$dlfile"
done
# Process each .gguf file
for file in *.gguf; do
if [[ ! -f $file ]]; then
continue
fi
fname=$(basename "$file")
if [[ -v publisher_map[$fname] ]]; then
publisher=${publisher_map[$fname]}
dir="$publisher"
# Create directory if not exists
mkdir -p "$dir"
# Move the file
mv -- "$file" "$dir/"
echo "Moved '$file' to '$dir'"
else
echo "Warning: No publisher found for '$file'" >&2
fi
done
To use this script:
sort_gguf.sh
in your current directorychmod +x sort_gguf.sh
./sort_gguf.sh
This script does the following:
Note: The script assumes:
QwQ-32B-Q6_K vs Qwen3-30B-A3B-Q6_K on an i5-9500 with plenty of DDR4@2666:
Please write a script to sort all the .gguf files in the current directory into subdirectories. They need to go into publisher directories (which may or may not exist yet). The publisher can only be found in the URL used to originally download each .gguf file. The URLs are in files in the current directory named in the format dl{YYYYMMDD}.txt
, with one url per line. Some url lines may be commented out with a single #
character, but they are still a valid source of info. An example url is https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-Q6_K.gguf?download=true
and the publisher is always the first node after the domain, in this case unsloth
.
So… (spoiler alert for everyone who is only up to the June 2023 episode of APH in the Vice article):
In September 2023 the $10B housing bill was passed by Labor and the Greens.
Bit of a shame Labor held back for so long on the Greens amendments, but Labor did show here they can work around the inevitable delays of robust parliamentary discourse by approving interim funding for housing in June to get things started while the details of long term funding were nutted out the crossbench.
Don’t do it. Alkalines are all shit now and will leak all over your electronics.
Get some decent NiMH.
Done. It was a little bitter, but that’s just the flavour of the grain. It’s satisfyingly nutritious without being full of sugar to mess with my teeth or give me a sugar crash. It’ll keep me going for a few hours now.
If I’m smart, with my new-found energy I’ll now set about making a proper big meal with vegetables and more substantial proteins with plenty of leftovers to fridge for the coming food crises of tomorrow and the next day and the day after. I might make a biryani. Oh but I should really do my assignments. I really shouldn’t open any more tabs.
Damn, I’m hungry now. Having scales handy helps things happen predictably and quickly.
You can make porridge with flour and water. Even better if you have interesting flours you bought in a moment of excitement but never figured out what to do with. Butter and salt help the palatability.
Hmm… would it be too much to paste some of these up around the local schools?
I think Assange is a legendary journalist who (to put it mildly) is bearing the brunt of a fucked up assault on the free press by CIA/etc.
Nothing at all! If anything I mentioned it as a point of approval from me, and stating them to clearly not be in the same camp as the banned-from-twitter-because-right-wing-extremists.
I don’t think it’s that kind of banned from twitter. The figures behind it seem to associate themselves with Wikileaks and/or Julian Assange. Suzie Dawson, for example, is hosting the video presentations about the plaform.
To be clear I don’t mean to shit on the platform, I’m just approaching it with a lot of cynicism. I want to understand what it is and its problems and merits.