Automatically tagging hundreds of thousands of songs may seem like simply writing strings, but when pursuing ultimate data consistency and multi-language support, we often run into extremely painful refactoring. The birth of AI Music Organizer’s third-generation tagging architecture (V3) was specifically to end the dilemmas caused by confused responsibilities in earlier architectures.
I. Why Traditional Tag Architectures Fail
In past implementation pipelines, we mixed several completely different responsibilities:
- The raw value output by the classifier (e.g., the model spits out “Rock”).
- The Key used for internal system comparisons.
- The final string written back to the file (ID3/FLAC).
- The localized UI display text.
This resulted in severe business pain points:
- Unstable Keys: Special characters like
/,,, and&would be repeatedly rewritten during the sanitization process, causing the same tag to turn into different strings across different pipelines. - Redundant Misjudged Writebacks: When the underlying audio file’s tag is actually already “摇滚” (Rock in Chinese) and the analysis engine produces “Rock”, a simple string comparison would generate a redundant Patch (requesting a change from “摇滚” to “Rock”).
- Difficulty in User Customization: Because the “sanitized string was the key”, when users wanted to hard-map original English tags to their native language, the logic became highly fragile.
II. The Core of the V3 Architecture: Semantic Keys
In the V3 architecture, we introduced the concept of the Semantic Key, explicitly separating the four responsibilities above:
1. Semantic Keys Become the Sole Source of Truth
The internal system no longer cares about the string “摇滚” itself. All comparisons and flows are entirely based on the stable baseline provided by the classifier (i.e., the raw classifier value). For highly cascading data (e.g., Funk / Soul---Contemporary R&B), we abandoned global sanitization, merely splitting them into independent Genre and Style Semantic Keys.
2. Two-Tier Storage with Locale Mapping
We structured the database with tag_value_catalog (the primary semantic table) and tag_value_locale (the multi-language mapping table):
- Catalog only cares about “What is this tag concept?” (e.g.,
id: 12, semantic_key: 'Rock'). - Locale cares about how to render this semantic concept under various locales like English, Chinese, Japanese, etc. (e.g.,
display_value: '摇滚', write_value: '摇滚').
[!IMPORTANT]
- Semantic Keys do not bear display responsibilities.
- Locale mappings do not bear Identity (unique recognition) responsibilities.
III. The Art of Compare: When to Actually Write Back
Under the V3 architecture, determining “whether we really need to modify the file metadata this time” is split into two core phases:
Phase A: Semantic Alignment (Merge)
No matter how many raw classifier values are currently analyzed, the system first extracts their corresponding semantic keys to deduplicate and merges them with unwritten Drafts.
Phase B: Writeback Determination (Diff)
This is the most crucial step! Whether to generate a file modification patch no longer depends on the semantic key. Instead, it extracts the desired write value (the string expected to be written under the current user’s preference settings) and does a direct comparison with the file current value (the actual string currently read from the file).
Real-world Scenarios:
- Scenario 1: The file originally contained the English
Rock, and now you configure it for a Chinese interface and Chinese tags.- Semantic Key:
Rock - File Current:
Rock - Desired Write:
摇滚 - Conclusion: Generate a Patch! The system will perfectly rewrite the original file.
- Semantic Key:
- Scenario 2: The file has already been rewritten to
摇滚, and the system scans the Semantic Key forRockagain.- Semantic Key:
Rock - File Current:
摇滚 - Desired Write:
摇滚 - Conclusion: No Patch Generated! This drastically saves disk I/O and ensures the purity of the history logs.
- Semantic Key:
Conclusion
A tag system based on Semantic Key separation sounds rather traditional, but it is the foundation supporting subsequent “user-defined mapping rules” and “complex LLM vector unification.” Software engineering is often like this: clean up the decoupled responsibilities, and the business logic will become refreshingly clear again.
