You all may be interested in the current proof-of-concept github repository we have for development of the Morroblivion v065:
https://github.com/ponyrider0/Morroblivion-JSON
In short, this is the full source-code of the Morroblivion ESM with all changes from v064 to the latest v065 developer build stored and tracked as separate commits. Instead of YAML, I am using a JSON-like format, with modifications on legal JSON for easier text processing / version control of certain records like Script and Book records. Currently, it is only a one-way process going from the binary TES4 ESM to decompiled JSON source code. Decompiler source code is here:
https://github.com/ponyrider0/ESM2JSON-Scripts.
My goal is to use the TES4 ESM exporter code from this branch of openmw (
https://github.com/ponyrider0/openmw) to produce a full ESM compiler/decompiler development and modding system. The inspiration and template for this system is the WeiDU system used for modding of Infinity Engine games:
https://weidu.org/. For those unfamiliar with WeiDU, it is a comprehensive byte-decompiler/-compiler, translation manager, regular expression/string processor, binary patcher and mod management system. One thing that seems to be missing from the WeiDU Infinity Engine modding ecosystem is integration with an open-source version control system like git.
Some lessons learned so far:
- Each ESM record must be stored as an individual file in order for git to be able to properly identify, track and merge changes made at the record level. The first iteration of the JSON source-code format combined all records of a given type into a single text file, example: all activator records were stored in ACTI.json, all alchemy records were stored in ALCH.json file, etc. This resulted in efficient physical disk space usage and short decompilation times. However, as mentioned in the previous post by Chris, it was very difficult for git to identify and track changes made in multiple records unless exact, consistent sorting of the records was done each time. And it was impossible to merge source code from a small ESP patch file that contained a subset of records into a larger ESM master file that contained a superset of records. These issues were completely resolved by storing ESMs in individual records.
-Currently, each record is stored individually with filenames based on their TES4 FormID and organized into file directories based on their record type (ACTI, ALCH, etc.) Adapting this system to use TES3 string-based record IDs should be straight-forward as long as the files stay organized in file directories based on record type. With this design, ESP patch files can be developed in separate branches or forks and merged into an ESM master using standard git pull-requests. In fact, the Morroblivion-JSON repository has several branches containing decompiled versions of v064, v065, and several ESP hotfixes which were then successfully merged together using the built-in git branch-merge feature rather than using a separate ESM/ESP based merging tool.
- The biggest issues with the current design have already been partly mentioned on this thread: compiler/decompiler performance and disk space efficiency. The decompilation time is very heavily bottle-necked by the OS filesystem performance. There are approximately 70,000 total records in Morrowind + Tribunal + Bloodmoon. There are ~450,000 total records in Morroblivion ESM (this is because all worldspace/map objects are stored as subrecords in their parent CELL record in TES3, but are stored as full fledged records in a CELL subfolder in TES4). The current decompilation prcoess uses a TES4Edit-based script which takes about 4 hours to generate the ~450,000 files on Windows NTFS. Based on performance monitor data, I think most of the time is bottle-necked by the Windows Filesystem, but I won't know for sure until I port the TES4Edit-based decompilation scripts to python or C++. Regarding disk space usage and efficiency, the current full JSON-source code tree of Morroblivion v065 ESM contains 431,390 files, with a total data-size of 956 MB, and taking up 2.17 GB of disk space. I could probably improve these performance and efficiency numbers by combining all worldspace/object records into a combined JSON file, but then I would lose granularity in tracking those changes within git. I will have to evaluate the cost/benefits for the future.
- Another issue with the current design is that completely deleting a record/file in an ESP can not be propagated to a master with a simple file-tree copy procedure. My current plan is to leverage the ESM format's "Delete" flag bit to mark a record as deleted, then these files can be purged in a post-processing step at any point after merging into the master repository.
- My eventual plan is to replace all hard-coded 32-bit FormIDs in the filename as well as in the record data with string-based record IDs. Then, these can be dynamically resolved into FormIDs when compiling the source code into TES4 engine binary format... or left as string-based record IDs when compiling for TES3/OpenMW engine!
- A pre-/post-processing step should be done after decompilation and before recompilation. This deals with the issues related to whitespace, line-ending, formating and character encoding which have previously been mentioned on this thread. Example: the string buffers from script and book subrecords should be converted from a byte-packed array into line by line text format so that git and diff tools can identify and track the changes in more granular (line-by-line) chunks of data. The general JSON source-code can also be formatted with proper nested indentations for easier readability when manual merges need to be done. During the compilation stage, standard JSON minify and other off-the shelf pre-processing operations can be used to standardize formatting, clean and strip comments from the source-code so that it is a legal JSON format. Then the legalized JSON files can then be parsed by existing JSON processing libraries.
- Additional pre-/post-processing steps can be added to do things similar to WeiDU mods, such as a script-based mod package/installer which modifies all strings at mod-installation time using regular expressions. Or a mod which searches for all creature records of a certain type and modifies the attributes by a specific mathematical formula. WeiDU mods can even do very advanced things like decompile in-game scripts from an ESM, then search the decompiled scripts for specific instructions and insert/replace those instructions with new ones, then recompile the scripts into an active ESM file.
So what is the ultimate point of this proof-of-concept? To demonstrate that a massive mod like Morroblivion can be developed in a distributed, collaborative manner by leveraging existing version control tools already used by the open-source community. To explore and experiment with new ways of developing and deploying mods when not constrained by a single binary ESM format which is locked to one game engine.
So the current to-do list:
1) Adapt the TES4 export code and script byte-compiler from
https://github.com/ponyrider0/openmw (aka modexporter) to complete the decompiler/compiler system.
2) Port the current TES4Edit-based decompiler scripts from
https://github.com/ponyrider0/ESM2JSON-Scripts to python or C++.
3) Implement Record ID to FormID dynamic resolver, probably basing it on the FormID lookup system from the above branch of openmw/modexporter.
Ideas for the future:
- Adapt openmw/modexporter to export directly to JSON format.
- Adapt the JSON compiler to output TES3 binary compatible format.
- Automate the toolchain and integrate with existing ESM/ESP editors like openmw-cs, TES Construct Set, Morrowind Enchanted Editor, TES4Edit, etc.
- Automate the github repository to compile and package nightly builds of ESM files.
- Create a JSON importer to read JSON format into openmw data structures.
- Create a new mod format that is based on decompiler/string processor/compiler scripts similar to WeiDU mods.
- Dynamically resolve Record ID to FormID at game engine start-up to circumvent 255 maximum mod limit seen in TES4/TES5 engines.