Neatware Header
Home   Products   Forums   Partners   Buy   Company   Contact   Blog   

MMF for Large File Processing

Usually you can complete file processing with channel and file commands in Tcl. However, when you are facing huge amount of files with extramely large sizes, the performance of traditional file processing method is poor. And a file size can not be over 2GB because the limitation of 32-bit processors. Memory Mapping File (MMF) package is designed to solve these problems.

Memory Mapping File (MMF) allocates an address space and link it to the disk file. So you can accee the file like a block of memory. Operating system will complete the task such as page and cache management. In Windows 2000 and XP, MMF is the sole method for memory sharing among processes.

Compare to traditional file processing method, MMF may speedup to 7 times.

MMF Package

MMF package is a Tcl extension. It consists of three objects: mmf, map, and view. mmf is an file object. map is a memory mapping object of mmf. And view is a view of memory mapping object map.

Key properties of MMF package

  • MMF is the best for fast and large file processing.
  • MMF can be used for reverse, copy, split, merge, transmit, search.
  • MMF is the only way for data sharing among processes.
  • MMF allows multiple GUI viewers to present a common data set.
  • MMF is high performance for application integration.
  • MMF requires no I/O buffers and few system resources consumed.
  • MMF does not apply on shared network files.

Memory Mapping File (mmf)

  • mmf::create FileName DesiredAccess ShareMode Creation FlagsAndAttributes

    Creates and returns a memory mapping file object. mmf is the namespace.

    FileName opens or creates file name.
    DesiredAccess for access control.

      generic_none Query access to the object.
      generic_read Reads access to the object.
      generic_write Writes access to the object.

    SharedMode specifies how the object can be shared.

      share_none No shared
      share_read Shared read
      share_write Shared write

    Creation specifies file creation.

      create_new Creates a new file. The function fails if the specified file already exists.
      create_always Creates a new file. The function overwrites the file if it exists.
      open_existing Opens the file. The function fails if the file does not exist.
      open_always Opens the file, if it exists. If the file does not exist, the function creates the file as if Creation were create_new.
      truncate_existing Opens the file. Once opened, the file is truncated so that its size is zero bytes.

    FlagsAndAttributes a list of flags and attributes

    The flags are list below.

      flag_write_through Instructs the system to write through any intermediate cache and go directly to disk. Windows can still cache write operations, but cannot lazily flush them.
      flag_no_buffering Instructs the system to open the file with no intermediate buffering or caching.
      flag_random_access Indicates that the file is accessed randomly. The system can use this as a hint to optimize file caching.
      flag_sequential_scan Indicates that the file is to be accessed sequentially from beginning to end. Specifying this flag can increase performance for applications that read large files using sequential access.

    The attributes are list below

      attribute_archive The file should be archived. Applications use this attribute to mark files for backup or removal.
      attribute_compressed The file or directory is compressed. For a file, this means that all of the data in the file is compressed. For a directory, this means that compression is the default for newly created files and subdirectories.
      attribute_hidden The file is hidden.
      attribute_normal The file has no other attributes set. This attribute is valid only if used alone.
      attribute_offline The data of the file is not immediately available.
      attribute_readonly The file is read only.
      attribute_system The file is part of or is used exclusively by the operating system.
      attribute_temporary The file is being used for temporary storage. File systems attempt to keep all of the data in memory for quicker access.

  • mmf::memory Protect LowMaximumSize HighMaximumSize [MapName]

    creates a memory mapping only with the parameters below. No file is created. The allocated memory block can used for data sharing among processes.

    Protect Protection for mapping object with one of following items.

      page_readonly Gives read-only access to the pages. The file must have generic_read access.
      page_readwrite Gives read-write access to the pages. The file must have generic_read and generic_write access.
      page_writecopy Gives copy on write access to the pages. The file must have generic_read and generic_write access.
    LowMaximumSize The low-order 32 bits of the maximum size of the file-mapping object. When both MaximumSizeHigh and MaximumSizeLow are zero the maximum size of the file-mapping object is equal to the current size of the file.
    HighMaximumSize High-order 32 bits of the maximum size of the file-mapping object.
    MapName The name of the mapping object.

  • mmf::open DesiredAccess bInheritHandle MapName

    DesiredAccess access mode

      map_write Read-write access. The map must have been created with page_readwrite.
      map_read Read-only access. The map must have been created with page_readwrite or page _readonly.
      map_all_access Same as map_write.
      map_copy Copy on write access. If you create the map with page_writecopy and the view with map_copy, you will receive a view to file. If you write to it, the pages are automatically swappable and the modifications you make will not go to the original data file.
    bInheritHandle When bInheritHandle is true it represents sub-process will inherit this handle automatically.
    MapName It is the name of mapping object that is the same as the name from created mapping object.

  • mmf::copy src dest

    copy file from src to dest with mmf.

  • mmf::merge src dest

    merge files in list src to file named dest. Suppose total file size is less than 2GB.

  • mmf::split src dest size

    split file named src into dest_i each with size.

  • $mmf close
  • close mmf object

  • $mmf length
  • return mmf file length with a list pair {low high}.

  • $mmf position LowOffset HighOffset Method
  • set file position and size.

Memory Mapping Object (map)

  • $mmf map Protect LowMaximumSize HighMaximumSize [MapName]

    create a memory mapping with parameters:

    Protect Protection for mapping object with one of following items.

      page_readonly Gives read-only access to the pages. The file must have generic_read access.
      page_readwrite Gives read-write access to the pages. The file must have generic_read and generic_write access.
      page_writecopy Gives copy on write access to the pages. The file must have generic_read and generic_write access.
    LowMaximumSize The low-order 32 bits of the maximum size of the file-mapping object. When both MaximumSizeHigh and MaximumSizeLow are zero the maximum size of the file-mapping object is equal to the current size of the file.
    HighMaximumSize High-order 32 bits of the maximum size of the file-mapping object.
    MapName The name of the mapping object.

  • $map close
  • close map object

Memory Mapping View (view)

  • $map view DesiredAccess LowFileOffset HighFileOffset NumOfBytesToMap

    create a view object from map with parameters below

    DesiredAccess access mode

      map_write Read-write access. The map must have been created with page_readwrite.
      map_read Read-only access. The map must have been created with page_readwrite or page _readonly.
      map_all_access Same as map_write.
      map_copy Copy on write access. If you create the map with page_writecopy and the view with map_copy, you will receive a view to file. If you write to it, the pages are automatically swappable and the modifications you make will not go to the original data file.
    LowFileOffset Low-order 32 bits of file offset. the offset must be a multiple of the allocation granularity.
    HighFileOffset High-order 32 bits of file offset
    NumOfBytesToMap Number of bytes to map. 0 means that the entire file is mapped.
  • $view close
  • Closes view object.

  • $view flush dwBaseOffset dwNumOfBytesToFlush
  • Flushes the view memory from dwBaseOffset with bytes of dwNumOfBytesToFlush.

  • $view get type offset length
  • Returns string with length from offset. The type could be binary or string.

  • $view put type str offset
  • Puts str to view at the offset. The type could be binary or string.

Examples

  1. Open File for Read/Write

    Load mmf package.

    package require mmf
    

    Test file access

    proc FileAccess {fname} {
      set mmf [mmf::create $fname {generic_read 
        generic_write} {share_none} {open_existing} {
        attribute_normal}]
        set map [$mmf map {page_readwrite} 0 0 "MMF File"]
          set view [$map view {map_write} 0 0 0]
    

    do processing

          $view close
        $map close
      $mmf close
    }
    
  2. File Copy

    Open src mmf with read and create dest mmf with read/write.

    proc FileCopy {src dest} {
      set mmf(0) [mmf::create $src {generic_read} {
       share_read} {open_existing} {attribute_normal}]
      set mmf(1) [mmf::create $dest {generic_read 
       generic_write} {share_read share_write} {
       create_always} {attribute_normal}]
    

    Create two maps one for readonly and another for readwrite. [$mmf(0) length] returns low and high 32-bit pair of length. Here we are supposed that the file size is less than 2GB.

        set len [lindex [$mmf(0) length] 0]
        set map(0) [$mmf(0) map {page_readonly} 0 0 "MMF"]
        set map(1) [$mmf(1) map {page_readwrite} $len 0]
    

    Create views with offset 0 and size 0 which maps whole file.

          set view(0) [$map(0) view {map_read} 0 0 0]
          set view(1) [$map(1) view {map_write} 0 0 0]
    

    Copy one file to another

          set s [$view(0) get binary 0 $len]
          $view(1) put binary $s 0
    

    Close two views, maps, and mmfs

          $view(1) close 
          $view(0) close
        $map(1) close
        $map(0) close
      $mmf(1) close
      $mmf(0) close
    }
    
  3. Shared Data Among Processes

    MMF can be used to share memory among processes. Following example demonstrate that a Tk process sends a text from a dialog box and another Tk process receives that data from the shared memory.

    First process to write on shared memory

    loads mmf package and sets windows size

    package require mmf
    wm geometry . 160x60+480+100
    

    creates a memory mapping object with read/write and size 4K with name "SharedMemory".

    set m [mmf::memory {page_readwrite} 4096 0 "SharedMemory"]
    

    creates a memory view with read and write for entire memory mapping object set by size 0

    set v [$m view {map_read map_write} 0 0 0]
    

    creates entry widget

    entry .field -textvariable s -relief sunken
    

    creates send button. click it will put entry variable s to the view.

    frame .f
    button .f.send -text Send -command {
      $v put string $s 0}
    button .f.close -text Close -command {
      $v close; $m close; exit}
    

    pack entry and buttons

    pack .f.send .f.close -side left -padx 2 -pady 2
    pack .field .f -padx 4 -pady 2
    

    Second process to read shared memory

    loads mmf package and sets windows size

    package require mmf
    wm geometry . 160x60+300+100
    

    opens a created memory mapping object with read/write and name SharedMemory

    set m [mmf::open {map_read map_write} 0 "SharedMemory"]
    

    creates a view with read and write for entire memory mapping object set by size 0

    set v [$m view {map_read map_write} 0 0 0]
    

    creates entry widget

    entry .field -textvariable s -relief sunken
    

    creates receive and close buttons. Gets string from shared view.

    frame .f
    button .f.send -text Receive -command {
      set s [$v get string 0 4096]}
    button .f.close -text Close -command {
      $v close; $m close; exit}
    

    pack entry and buttons

    pack .f.send .f.close -side left -padx 2 -pady 2
    pack .field .f -padx 4 -pady 2
    

Performance

The first benchmark gets the copy time on a file near 2.9MB for both MMF and NORMAL methods. We can see the speedup factor is about 320.5% percent.

Benchmark File Copy (1)
File Length 2994176
MMF 200000 microseconds per iteration
NORMAL 841000 microseconds per iteration
SPEEDUP 320.5%

This benchmark gets the copy time on a file near 35MB for both MMF and NORMAL methods. The speedup factor is near 640 percent.

Benchmark File Copy (2)
File Length 35388656
MMF 2373000 microseconds per iteration
NORMAL 17536000 microseconds per iteration
SPEEDUP 638.980193847%

Summary

For larger file MMF is much faster than normal method. Since the speedup factor of MMF is from 4x to 7x, MMF package is much worth to use. Morever, a file size could be handled by MFM over 4GB on a 32-bits PC. Finally, a string can be accessed through a view of mapping, thus it can be processed as a Tcl string by all the powerful Tcl commands and libraries.