Wednesday, May 30, 2012

FLV Data Extractor and the Mystery Tag

I finally finished that FLV Data Extractor I mentioned in my last post. You can download a copy of it here: The program has a command line interface, but I am strongly considering writing a generic GUI that can be placed over command line programs. I imagine that many modern computer users are not even aware of the Windows console, let alone how to use it, so if you want lots of people to use a program, it should probably have a GUI. There are still merits to the command line interface. For example, you can use programs with CLIs in batch files, but I suppose even windowed programs can also support command lines for such purposes. I’ll have to play around with the generic GUI idea and see how it works out. I may just end up modifying the FLV Data Extractor to have a GUI with support for the command line too.

If you try out the program, you’ll probably notice that it asks you to agree to some terms before using it. While I distribute most of my software for free, I can’t get over the idea of someone using my software for profit. I suppose this is a personal hang-up of mine and that I should get over it, but I can’t, at least not now.

On a final note, while testing the FLV Data Extractor, I came across a strange occurrence. Adobe’s FLV format specification only mentions three types of data tags: audio (0x08), video (0x09), and script data (0x12), but in some of my files, there was another mysterious data tag, 0x0F. I tried googling for information but turned up nothing. I then examined the data included in the mysterious tag, and it seems to be like a script data tag except that there’s a mysterious 0x00 byte before the start of the event name. This mysterious tag is why I added the -u flag to the command line options. I don’t know what the mystery byte is meant to indicate. If it’s a count of some kind or if it indicates the presence of some extra data, this will cause problems with the -u flag. If you happen to find any FLV files that produce errors or unknown tags (even with -u set), feel free to send them to me for analysis.

Sunday, May 27, 2012

FLV Script Data Extractor in Progress

Recently, I have been working on a program to extract script data from FLV files. Script data is basically any data in the file that isn't audio or video data. This includes metadata (included at the start of the file) and event data (included anywhere). I have a very specific purpose for extracting this data, and when I first started this project, I kept the scope focussed on just that specific purpose. I began by researching FLV files in general. I knew that there was somehow some data embedded in the file, but I did not know how or where. After digging around, I came across something called a cue point, which is apparently used to trigger an event in the ActionScript (ActionScript 3 is Flash’s current scripting language). I tried downloading some programs that were supposed to be able to view and edit cue points, but none of them found the embedded data I sought. Dismayed, I put the project on ice for a while. After renewing my interest in the project (I forget exactly what sparked this renewed interest), I decided to try loading the FLV file directly into Flash and seeing if I could access the embedded data through ActionScript. Just accessing the necessary classes for doing this was a pain. You see, I use a program called FlashDevelop instead Adobe Flash Professional CS, so I had to download a special library file to get access to the classes I needed. I suppose that’s the price you pay when you use free software. The relevant ActionScript 3 class for playing FLV files is FLVPlayback. This class contains ways of accessing those cue points I mentioned before. I wrote some script for loading the FLV file and then checking for cue points, but FLVPlayback always reported no cue points. I found this perplexing. I consulted Adobe’s ActionScript 3 documentation and met with no success. Unfortunately, Adobe’s documentation is not always thorough and I sometimes find myself faced with bizarre behavior and no documentation to explain it, but I digress. Now, I didn’t know if there is some reason why ActionScript reported no cue points or if maybe this FLV used some other, less obvious means to embed its data, but I did know that I was frustrated by the lack of an explanation. I decided that I would just learn the FLV file format myself and write a C++ program to find and extract the data I wanted.

After some searching, I finally found Adobe’s own documentation on the FLV file format. I first opened my test FLV file in a Hex editor to verify that I had the correct documentation. To my delight, all of the header bytes matched up perfectly. Armed with this excellent documentation, I started a new project in Visual C++ 2010 Express (free software makes me smile). As with many new projects, I frequently don’t have well-defined notion of exactly what I want to do. When I started this project, I did not fully understand the FLV format and mistakenly thought that all script data was referred to as metadata, so, like a derp, I named the project FLV Metadata Parser. It’s no big deal. I just renamed the output file later after I realized my error. After several days of effort, I had a working script data extractor. Having proved my technique worked, I spent some time generalizing my code so that I could do more than just output the script data. After all, my ultimate goal was never simply to do that. Realizing that a general script data extractor may be useful to others, I decided to clean it up and provide it for download. I’m still in the process of cleaning it up and testing it, but I should be able to make it available shortly.