Sunday, May 27, 2012

FLV Script Data Extractor in Progress

Recently, I have been working on a program to extract script data from FLV files. Script data is basically any data in the file that isn't audio or video data. This includes metadata (included at the start of the file) and event data (included anywhere). I have a very specific purpose for extracting this data, and when I first started this project, I kept the scope focussed on just that specific purpose. I began by researching FLV files in general. I knew that there was somehow some data embedded in the file, but I did not know how or where. After digging around, I came across something called a cue point, which is apparently used to trigger an event in the ActionScript (ActionScript 3 is Flash’s current scripting language). I tried downloading some programs that were supposed to be able to view and edit cue points, but none of them found the embedded data I sought. Dismayed, I put the project on ice for a while. After renewing my interest in the project (I forget exactly what sparked this renewed interest), I decided to try loading the FLV file directly into Flash and seeing if I could access the embedded data through ActionScript. Just accessing the necessary classes for doing this was a pain. You see, I use a program called FlashDevelop instead Adobe Flash Professional CS, so I had to download a special library file to get access to the classes I needed. I suppose that’s the price you pay when you use free software. The relevant ActionScript 3 class for playing FLV files is FLVPlayback. This class contains ways of accessing those cue points I mentioned before. I wrote some script for loading the FLV file and then checking for cue points, but FLVPlayback always reported no cue points. I found this perplexing. I consulted Adobe’s ActionScript 3 documentation and met with no success. Unfortunately, Adobe’s documentation is not always thorough and I sometimes find myself faced with bizarre behavior and no documentation to explain it, but I digress. Now, I didn’t know if there is some reason why ActionScript reported no cue points or if maybe this FLV used some other, less obvious means to embed its data, but I did know that I was frustrated by the lack of an explanation. I decided that I would just learn the FLV file format myself and write a C++ program to find and extract the data I wanted.

After some searching, I finally found Adobe’s own documentation on the FLV file format. I first opened my test FLV file in a Hex editor to verify that I had the correct documentation. To my delight, all of the header bytes matched up perfectly. Armed with this excellent documentation, I started a new project in Visual C++ 2010 Express (free software makes me smile). As with many new projects, I frequently don’t have well-defined notion of exactly what I want to do. When I started this project, I did not fully understand the FLV format and mistakenly thought that all script data was referred to as metadata, so, like a derp, I named the project FLV Metadata Parser. It’s no big deal. I just renamed the output file later after I realized my error. After several days of effort, I had a working script data extractor. Having proved my technique worked, I spent some time generalizing my code so that I could do more than just output the script data. After all, my ultimate goal was never simply to do that. Realizing that a general script data extractor may be useful to others, I decided to clean it up and provide it for download. I’m still in the process of cleaning it up and testing it, but I should be able to make it available shortly.

No comments:

Post a Comment