WFDis Documentation
What is this?
WFDis is an AI-based automated reverse engineering project for binary executables, which is currently in development.
Because of the large resources required it uses a client/server model, with a web browser for the front end. With the addition of a simpler tracing disassembler, the front end has become useful in its own right, and is made available for public testing and use for 6502 binaries. The full back end is not yet public.
The current focus is creating all the UI elements and interactions the system will wish to express. This public version is "Human Mode" reverse engineering, with the user manually discovering traits of the binary and triggering UI elements to describe them; while the server-based version is "AI Mode" which automatically fleshes out a descriptive understanding of the code, reporting through this same presentation substrate.
Why 6502?
These 8-bit environments are much simpler to instantiate than modern platforms, but offer far broader challenges in reverse engineering:
- There are no standard ABIs, calling conventions, or stack frames
- Low-level hardware is fully open to custom bit-banging, including timing dependencies
- Banking systems and loaded overlays make addresses ambiguous
- Code is hand-written without containing discernable patterns as from compilers
- Data structures are completely ad hoc, with no malloc/free equivalent
- Tons of nasty tricks used to try to squeeze the most out every cycle & every byte, including self-modifying code
This promotes focusing development effort on WFDis's unique "smarts" rather than on reimplementing myriad platform details. However, the infrastructure is already designed to express modern CPUs, memory systems, and abstract OS interfaces. While there is less of this low-level code style today, analysis that can tackle it is still absolutely applicable for malware, kernels, drivers, embedded systems, and debugging compiler output.
There is also strong general interest in these older computing platforms, with a generational wave of nostalgia for popular home computers and video games from a few decades back. The 6502 was used in many systems from Commodore, Atari, Apple, and Nintendo, just to name a few.
The specific load behavior is based on the extension of the provided file's name.
- .prg - Commodore 64 executable. Binary file with a 2-byte load address prepended.
- .p00 - Commodore 64 executable, including PETSCII filename.
- .sid - Commodore 64 sound program.
- .d64 - Commodore 1541 disk image.
- .t64 - Commodore tape image.
- .bin - raw binary. The load address will be asked for separately.
- .rom - same as .bin
- .wfdis - resume a downloaded snapshot
Files without a recognized filename are assumed to be in .prg format. The .prg loader is also used for BASIC and SID file formats, autodetecting based on file contents.
BASIC programs and SID files are automatically disassembled from their known entry points. Raw bin/rom images are disassembled from their $fffx vectors, if the image overlaps those locations. Otherwise, the disassembly must be manually begun by selecting an address and pressing Shift-a.
If a Commodore 64 file format is loaded, labels for various ROM routines and I/O locations are automatically created.
Numeric Formats
Inputs that require an address or value can accept multiple formats:
- Label names
- c000 - Numbers interpreted as hexadecimal
- $c000 - Numbers interpreted as hexadecimal
- +49152 - Numbers interpreted as decimal
Importing Labels
A file defining labels can be imported, which affects only the current overlay. VICE label files (which can be generated by ca65) are supported, as well as a more freeform syntax. Semicolons indicate comments.
Sample lines:
al 000801 .foo
- VICE format, hexadecimal and an ignored dot
foo: $0801
foo = 2049
- Freeform syntax defaults to decimal
foo EQU $0801
The regex for separating label & address is (=|:| \.?[eE][qQ][uU]? )
which should support enough variation for common cases.
Emulation
Many media-loaded programs contain decompression or relocation code before actually getting to the software itself. A rudimentary emulator is included to attempt to run such routines and capture their output for further disassembly.
Select the entire code block to emulate, and press Shift-r to emulate that section. The emulation will exit if the PC reaches the instruction after the last selected instruction. All reads (including read-modify-write) must be from known loaded bytes, or from bytes that the emulated code has already written. This often fails when routines update visuals to show progress (e.g. inc $d020).
Saving
The disassembled context can be saved to browser localStorage. Ensure that browser privacy settings, or plugins like Self-Destructing Cookies, do not automatically wipe out your saved information.
It can also be downloaded to a .wfdis file on your computer, but because of the manual process on each save due to browser security measures this is not the default.
Bugs & Upcoming Features
- Line breaking multi-element lines cleanly.
- Exporting to various .asm source file formats.
- The emulator currently only supports generic 6502 instructions; this will follow the CPU model selection in later versions.
- Redundant instructions, like the many NOPs of the 65C02 unused opcodes, are currently indistinguishable in the disassembly.
- Conditional branches which are always taken at runtime still trace the non-taken path. Manual intervention (adding "stop signs") and re-disassembly can get around this in the future.
- Overlapping meanings to the same bytes are supported but not well displayed. For example, BIT $EA which embeds a traced NOP instruction in the operand.
- Only PSID .sid files are currently supported, RSID format is not detected.
- C64 labels are only added to the loaded memory, and not into subsequent emulation output.
- Multi-line comment editing.
- JMP ($xxFF) bug is not yet handled.
- .p01, .p02, etc extensions not yet recognized.
Fixed Limitations
This Human Mode version does not support the following features. Support will only come through the full AI Mode version.
- 65816 support, as CPU register width flags require more dynamic analysis.
- Other non-65xx architectures. This front end version is focused on getting the UI working well. The back end has more flexible CPU handling.
- Automated BASIC launchers currently only understand numeric literal SYS addresses. AI Mode is smarter.
Credits
NMOS illegal opcode naming conventions are sourced from All About Your 64. There are many different mnemonics for these opcodes; this is the list I'm most familiar with.
PETSCII to Unicode mappings are from the work on Recode here.
Changelog
2024-04-19:
- Finally tracked down and fixed some of the major DOM & JS slowness.
2024-04-14:
- Added HuC6280 support (PC Engine/TurboGrafx 16).
- Mitigated some issues with overlapping asm instructions.
- Display of rmb/smb/bbr/bbs instruction names fixed.
- The meantime is being spent in AI research.
2019-02-17:
- Fixed a breaking change on keyboard handling from browser updates.
2018-12-24:
- Sped up rendering by about 40x, so loading large files is much more tolerable. Still slow, though.
2018-11-25:
- Added label file importing.
2018-07-14:
- Cleanup, optimization, and bugfixes noticed during rewrite.
2017-11-04:
- Bugfixed NOPs in "65C02 with bit operations" ISA.
- Fixed links to help page in Firefox.
2017-10-14:
- Fixed spurious tracing after JMP indirect instructions.
- Replaced external JS libraries that were failing to load with internal implementations.
2017-09-12:
- Labels hidden in the middle of the line are now broken out and displayed as " = * + 3" style offsets.
- References pointing inside of multi-byte lines are now displayed in "label + 1" style, if the line has a label.
- Multiple characters and sprites per line can be displayed now, if they are selected together when changed, similar to bytes and words.
- Added #!load=filename support in the URL, for easier demonstration purposes.
- Bugfixed file loading and NaN display issues caused by recent changes.
2017-08-29:
- Smooth scrolling during link navigation can now be toggled.
- Minor bugfixes & typo corrections.
2017-08-10:
- T64 and P00 file support.
- Multicolor graphics display overhauled and bugfixed.
- Responsiveness during load and "Thinking" indicator added.
2017-07-26 :
- D64 file listings display start & end addresses of PRG files.
- Better multi-file loading and cross-file resolving of addresses.
- Labels within unknown byte spans now display.
- More types added: bytes, words, addresses, PETSCII text.
2017-07-07 :
- Proper undo and redo support.
- Speed improvements. Minor rendering bugfixes.
2017-04-02 :
- Debugging & polish around spacing and cursor movement & selection.
- Added Create and Delete functionality for byte locations.
- Line address prefixes are now unselectable, making it easier to copy/paste from the page into source code files. This will be reverted when true .asm export is finished.
2017-03-25 :
- Allow traces from different CPU variants to coexist in a single file. This is part of an internal overhaul, prerequisite to saving .asm files properly.
- Debugged alignment issue with char/sprite display and reenabled multicolor.
2017-03-11 :
- Direct loading of .wfdis files for easier handling. Other minor improvements.
2017-03-01 :
- Added R6501 support by request.
2017-02-19 :
- First version of help.html. Solidifying existing exploratory features into actual public usefulness.
Contact
The WFDis thread on the 6502.org forums is a good place to post, or PM user White Flame there.