MC-GPU
MC-GPU_kernel_v1.3.cu File Reference

Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry. More...

Go to the source code of this file.

Defines

#define LEAP_DISTANCE   256
 Upper limit of the number of random values sampled in a single track.
#define a1_RANECU   40014
 Multipliers and moduli for the two MLCG in RANECU.
#define m1_RANECU   2147483563
#define a2_RANECU   40692
#define m2_RANECU   2147483399

Functions

void track_particles (int history_batch, int histories_per_thread, int num_p, int seed_input, unsigned long long int *image, ulonglong2 *voxels_Edep, float2 *voxel_mat_dens, float2 *mfp_Woodcock_table, float3 *mfp_table_a, float3 *mfp_table_b, struct rayleigh_struct *rayleigh_table, struct compton_struct *compton_table, struct detector_struct *detector_data_array, struct source_struct *source_data_array, ulonglong2 *materials_dose)
 Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";.
void tally_voxel_energy_deposition (float *Edep, short3 *voxel_coord, ulonglong2 *voxels_Edep)
 Tally the dose deposited in the voxels.
void tally_image (float *energy, float3 *position, float3 *direction, signed char *scatter_state, unsigned long long int *image, struct source_struct *source_data_SHARED, struct detector_struct *detector_data_SHARED)
 Tally a radiographic projection image.
void source (float3 *position, float3 *direction, float *energy, int2 *seed, int *absvox, struct source_struct *source_data_SHARED, struct detector_struct *detector_data_SHARED)
 Source that creates primary x rays, according to the defined source model.
void move_to_bbox (float3 *position, float3 *direction, float3 size_bbox, int *intersection_flag)
 Functions that moves a particle inside the voxelized geometry bounding box.
void init_PRNG (int history_batch, int histories_per_thread, int seed_input, int2 *seed)
 Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique).
int abMODm (int m, int a, int s)
 Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans.
float ranecu (int2 *seed)
 Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version).
double ranecu_double (int2 *seed)
 Pseudo-random number generator (PRNG) RANECU returning a double value.
int locate_voxel (float3 *position, short3 *voxel_coord)
 Find the voxel that contains the current position.
void rotate_double (float3 *direction, double costh, double phi)
 Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated.
void GRAa (float *energy, double *costh_Rayleigh, int *mat, float *pmax_current, int2 *seed, struct rayleigh_struct *cgra)
 Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006.
void GCOa (float *energy, double *costh_Compton, int *mat, int2 *seed, struct compton_struct *cgco_SHARED)
 Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles.
void tally_materials_dose (float *Edep, int *material, ulonglong2 *materials_dose)
 Tally the depose deposited inside each material.

Detailed Description

Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry.

This kernel has been optimized to yield a good performance in the GPU but can still be compiled in the CPU without problems. All the CUDA especific commands are enclosed in pre-processor directives that are skipped if the parameter "USING_CUDA" is not defined at compilation time.

Author:
Andreu Badal (Andreu.Badal-Soler@fda.hhs.gov)
Date:
2012/12/12

Definition in file MC-GPU_kernel_v1.3.cu.


Define Documentation

#define a1_RANECU   40014

Multipliers and moduli for the two MLCG in RANECU.

Definition at line 814 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define a2_RANECU   40692

Definition at line 816 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG().

#define LEAP_DISTANCE   256

Upper limit of the number of random values sampled in a single track.

Definition at line 812 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define m1_RANECU   2147483563

Definition at line 815 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

#define m2_RANECU   2147483399

Definition at line 817 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG().


Function Documentation

int abMODm ( int  m,
int  a,
int  s 
) [inline]

Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans.

Math. Soft. 17 (1991).

This function has been adapted from "seedsMLCG.f", see: Badal and Sempau, Computer Physics Communications 175 (2006)

Parameters:
[in]m,a,sMLCG parameters
Returns:
(a1*a2) MOD m

Definition at line 919 of file MC-GPU_kernel_v1.3.cu.

Referenced by init_PRNG(), and update_seed_PRNG().

void GCOa ( float *  energy,
double *  costh_Compton,
int *  mat,
int2 seed,
struct compton_struct cgco_SHARED 
) [inline]

Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles.

Parameters:
[in,out]energyincident and final photon energy (eV)
[out]costh_Comptoncosine of the polar scattering angle
[in]materialCurrent voxel material
[in]seedRANECU PRNG seed

Definition at line 1287 of file MC-GPU_kernel_v1.3.cu.

References compton_struct::fco, compton_struct::fj0, MAX_MATERIALS, MAX_SHELLS, max_value, min_value, compton_struct::noscco, ranecu(), and compton_struct::uico.

Referenced by track_particles().

void GRAa ( float *  energy,
double *  costh_Rayleigh,
int *  mat,
float *  pmax_current,
int2 seed,
struct rayleigh_struct cgra 
) [inline]

Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006.

Parameters:
[in]energyParticle energy (not modified with Rayleigh)
[out]costh_RayleighCosine of the angular deflection
[in]materialCurrent voxel material

Definition at line 1181 of file MC-GPU_kernel_v1.3.cu.

References rayleigh_struct::aco, rayleigh_struct::bco, rayleigh_struct::itlco, rayleigh_struct::ituco, min_value, NP_RAYLEIGH, rayleigh_struct::pco, ranecu_double(), and rayleigh_struct::xco.

Referenced by track_particles().

void init_PRNG ( int  history_batch,
int  histories_per_thread,
int  seed_input,
int2 seed 
) [inline]

Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique).

Each calculated seed initiates a consecutive and disjoint sequence of pseudo-random numbers with length LEAP_DISTANCE, that can be used to in a parallel simulation (Sequence Splitting parallelization method). The basic equation behind the algorithm is: S(i+j) = (a**j * S(i)) MOD m = [(a**j MOD m)*S(i)] MOD m , which is described in: P L'Ecuyer, Commun. ACM 31 (1988) p.742

This function has been adapted from "seedsMLCG.f", see: A Badal and J Sempau, Computer Physics Communications 175 (2006) p. 440-450

Parameters:
[in]historyParticle bach number.
[in]seed_inputInitial PRNG seed input (used to initiate both MLCGs in RANECU).
[out]seedInitial PRNG seeds for the present history.

Definition at line 841 of file MC-GPU_kernel_v1.3.cu.

References a1_RANECU, a2_RANECU, abMODm(), LEAP_DISTANCE, m1_RANECU, m2_RANECU, int2::x, and int2::y.

Referenced by track_particles().

int locate_voxel ( float3 position,
short3 voxel_coord 
) [inline]

Find the voxel that contains the current position.

Report the voxel absolute index and the x,y,z indices. The structure containing the voxel number and size is read from CONSTANT memory.

Parameters:
[in]positionParticle position
[out]voxel_coordPointer to three integer values (short3*) that will store the x,y and z voxel indices.
Returns:
Returns "absvox", the voxel number where the particle is located (negative if position outside the voxel bbox).

Definition at line 1033 of file MC-GPU_kernel_v1.3.cu.

References EPS_SOURCE, voxel_struct::inv_voxel_size, voxel_struct::num_voxels, voxel_struct::size_bbox, voxel_data_CONST, int3::x, float3::x, short3::x, int3::y, float3::y, short3::y, float3::z, and short3::z.

Referenced by track_particles().

void move_to_bbox ( float3 position,
float3 direction,
float3  size_bbox,
int *  intersection_flag 
) [inline]

Functions that moves a particle inside the voxelized geometry bounding box.

An EPSILON distance is added to make sure the particles will be clearly inside the bbox, not exactly on the surface.

This algorithm makes the following assumtions:

  • The back lower vertex of the voxel bounding box is always located at the origin: (x0,y0,z0)=(0,0,0).
  • The initial value of "position" corresponds to the focal spot location.
  • When a ray is not pointing towards the bbox plane that it should cross according to the sign of the direction, I assign a distance to the intersection =0 instead of the real negative distance. The wall that will be crossed to enter the bbox is always the furthest and therefore a 0 distance will never be used except in the case of a ray starting inside the bbox or outside the bbox and not pointing to any of the 3 planes. In this situation the ray will be transported a 0 distance, meaning that it will stay at the focal spot.

(Interesting information on ray-box intersection: http://tog.acm.org/resources/GraphicsGems/gems/RayBox.c)

Parameters:
[in,out]positionParticle position: initially set to the focal spot, returned transported inside the voxel bbox.
[out]directionSampled particle direction (cosine vectors).
[out]intersection_flagSet to <0 if particle outside bbox and will not cross the voxels, not changed otherwise.
[out]size_bboxSize of the bounding box.

Definition at line 714 of file MC-GPU_kernel_v1.3.cu.

References EPS_SOURCE, NEG_EPS_SOURCE, NEG_INF, float3::x, float3::y, and float3::z.

Referenced by source().

float ranecu ( int2 seed) [inline]

Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version).

Parameters:
[in,out]seedPRNG seed (seed kept in the calling function and updated here).
Returns:
PRN double value in the open interval (0,1)

Definition at line 965 of file MC-GPU_kernel_v1.3.cu.

References int2::x, and int2::y.

Referenced by GCOa(), source(), and track_particles().

double ranecu_double ( int2 seed) [inline]

Pseudo-random number generator (PRNG) RANECU returning a double value.

Definition at line 995 of file MC-GPU_kernel_v1.3.cu.

References int2::x, and int2::y.

Referenced by GRAa(), and track_particles().

void rotate_double ( float3 direction,
double  costh,
double  phi 
) [inline]

Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated.

This function is a literal translation from Fortran to C of PENELOPE (v. 2006) subroutine "DIRECT".

Parameters:
[in,out](u,v,w)input vector (=d) in the lab. frame; returns the rotated vector components in the lab. frame
[in]costhcos(theta), angle between d before and after turn
[in]phiazimuthal angle (rad) turned by d in its self-frame

Definition at line 1103 of file MC-GPU_kernel_v1.3.cu.

References float3::x, float3::y, and float3::z.

Referenced by track_particles().

void source ( float3 position,
float3 direction,
float *  energy,
int2 seed,
int *  absvox,
struct source_struct source_data_SHARED,
struct detector_struct detector_data_SHARED 
) [inline]

Source that creates primary x rays, according to the defined source model.

The particles are automatically moved to the surface of the voxel bounding box, to start the tracking inside a real material. If the sampled particle do not enter the voxels, it is init in the focal spot and the main program will check if it arrives at the detector or not.

Parameters:
[in]source_dataStructure describing the source.
[in]source_energy_data_CONSTGlobal variable in constant memory space describing the source energy spectrum.
[out]positionInitial particle position (particle transported inside the voxel bbox).
[out]directionSampled particle direction (cosine vectors).
[out]energySampled energy of the new x ray.
[in]seedCurrent seed of the random number generator, requiered to sample the movement direction.
[out]absvoxSet to <0 if primary particle will not cross the voxels, not changed otherwise (>0).

Definition at line 626 of file MC-GPU_kernel_v1.3.cu.

References source_struct::cos_theta_low, source_struct::D_cos_theta, source_struct::D_phi, source_energy_struct::espc, source_energy_struct::espc_alias, source_energy_struct::espc_cutoff, source_struct::max_height_at_y1cm, move_to_bbox(), source_energy_struct::num_bins_espc, source_struct::phi_low, source_struct::position, ranecu(), source_struct::rot_fan, detector_struct::rotation_flag, voxel_struct::size_bbox, source_energy_data_CONST, voxel_data_CONST, float3::x, float3::y, and float3::z.

Referenced by track_particles().

void tally_image ( float *  energy,
float3 position,
float3 direction,
signed char *  scatter_state,
unsigned long long int *  image,
struct source_struct source_data_SHARED,
struct detector_struct detector_data_SHARED 
) [inline]

Tally a radiographic projection image.

This function is called whenever a particle escapes the voxelized volume. The code checks if the particle would arrive at the detector if it kept moving in a straight line after exiting the voxels (assuming vacuum enclosure). An ideal image formation model is implemented: each pixel counts the total energy of the x rays that enter the pixel (100% detection efficiency for any energy). The image due to primaries and different kinds of scatter is tallied separately.

In the GPU, and atomicAdd() function is used to make sure that multiple threads do not update the same pixel at the same time, which would result in a lose of information. Since the atomicAdd function is only available for 'unsigned long long int' data, the float pixel values are scaled by a factor "SCALE_eV" defined in the header file (eg, #define SCALE_eV 10000.0f) and stored as unsigned long long integers in main memory.

WARNING! If the total tallied signal (for all particles) is larger than "1.8e19/SCALE_eV", there will be a bit overflow and the value will be reset to 0 giving bogus results.

WARNING! The detector plane should be located outside the voxels bounding box. However, since the particles are moved outside the bbox in the last step, they could cross the detector plane anyway. If the particles are less than 2.0 cm behind the detector, they are moved back and detected. Therefore the detector can be a few cm inside the bbox and still work. If the Woodcock mean free path is larger than the distance from the bbox to the detector, we may lose some particles behind the detector!

Parameters:
[in]energyX-ray energy
[in]positionParticle position
[in]directionParticle direction (cosine vectors)
[in]scatter_stateFlag marking primaries, single Compton, single Rayleigh or multiple scattered radiation
[out]imageInteger array containing the image, ie, the pixel values (in tenths of meV)

Definition at line 482 of file MC-GPU_kernel_v1.3.cu.

References detector_struct::center, detector_struct::corner_min_rotated_to_Y, source_struct::direction, detector_struct::inv_pixel_size_X, detector_struct::inv_pixel_size_Z, detector_struct::num_pixels, detector_struct::rot_inv, detector_struct::rotation_flag, SCALE_eV, detector_struct::total_num_pixels, int2::x, float3::x, int2::y, float3::y, and float3::z.

Referenced by track_particles().

void tally_materials_dose ( float *  Edep,
int *  material,
ulonglong2 materials_dose 
) [inline]

Tally the depose deposited inside each material.

This function is called whenever a particle suffers a Compton or photoelectric interaction. The energy released in each interaction is added and later in the report function the total deposited energy is divided by the total mass of the material in the voxelized object to get the dose. This naturally accounts for multiple densities for voxels with the same material (not all voxels have same mass). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the organ size.

The function uses atomic functions for a thread-safe access to the GPU memory. We can check if this tally was disabled in the input file checking if the array materials_dose was allocated in the GPU (disabled if pointer = NULL).

Parameters:
[in]EdepEnergy deposited in the interaction
[in]materialCurrent material id number
[out]materials_doseulonglong2 array storing the mateials dose [in eV/g] and dose^2 (ie, uncertainty).

Definition at line 1547 of file MC-GPU_kernel_v1.3.cu.

References SCALE_eV, ulonglong2::x, and ulonglong2::y.

Referenced by track_particles().

void tally_voxel_energy_deposition ( float *  Edep,
short3 voxel_coord,
ulonglong2 voxels_Edep 
) [inline]

Tally the dose deposited in the voxels.

This function is called whenever a particle suffers a Compton or photoelectric interaction. It is not necessary to call this function if the dose tally was disabled in the input file (ie, dose_ROI_x_max_CONST < 0). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the voxel size. Usually the doses will be acceptable for photon energies below 1 MeV. The dose estimates may not be accurate at the interface of low density volumes.

We need to use atomicAdd() in the GPU to prevent that multiple threads update the same voxel at the same time, which would result in a lose of information. This is very improbable when using a large number of voxels but gives troubles with a simple geometries with few voxels (in this case the atomicAdd will slow down the code because threads will update the voxel dose secuentially).

Parameters:
[in]EdepEnergy deposited in the interaction
[in]voxel_coordVoxel coordinates, needed to check if particle located inside the input region of interest (ROI)
[out]voxels_Edepulonglong2 array containing the 3D voxel dose and dose^2 (ie, uncertainty) as unsigned integers scaled by SCALE_eV.

Definition at line 418 of file MC-GPU_kernel_v1.3.cu.

References dose_ROI_x_max_CONST, dose_ROI_x_min_CONST, dose_ROI_y_max_CONST, dose_ROI_y_min_CONST, dose_ROI_z_max_CONST, dose_ROI_z_min_CONST, SCALE_eV, short3::x, ulonglong2::x, short3::y, ulonglong2::y, and short3::z.

Referenced by track_particles().

void track_particles ( int  history_batch,
int  histories_per_thread,
int  num_p,
int  seed_input,
unsigned long long int *  image,
ulonglong2 voxels_Edep,
float2 voxel_mat_dens,
float2 mfp_Woodcock_table,
float3 mfp_table_a,
float3 mfp_table_b,
struct rayleigh_struct rayleigh_table,
struct compton_struct compton_table,
struct detector_struct detector_data_array,
struct source_struct source_data_array,
ulonglong2 materials_dose 
)

Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";.

CUDA performs some initialization work the first time a GPU kernel is called. Therefore, calling a short kernel before the real particle tracking is performed may improve the accuracy of the timing measurements in the relevant kernel.

Parameters:
[in,out]imagePointer to the image array.
[in]pixels_per_imageNumber of pixels in the image (ie, elements in the array). Main function to simulate x-ray tracks inside a voxelized geometry. Secondary electrons are not simulated (in photoelectric and Compton events the energy is locally deposited).

The following global variables, in the GPU __constant__ memory are used: voxel_data_CONST, source_energy_data_CONST, detector_data_CONST, mfp_table_data_CONST.

Parameters:
[in]history_batchParticle batch number (only used in the CPU version when CUDA is disabled!, the GPU uses the built-in variable threadIdx)
[in]num_pProjection number in the CT simulation. This variable defines a specific angle and the corresponding source and detector will be used.
[in]histories_per_threadNumber of histories to simulate for each call to this function (ie, for GPU thread).
[in]seed_inputRandom number generator seed (the same seed is used to initialize the two MLCGs of RANECU).
[in]voxel_mat_densPointer to the voxel densities and material vector (the voxelized geometry), stored in GPU glbal memory.
[in]mfp_Woodcock_tableTwo parameter table for the linear interpolation of the Woodcock mean free path (MFP) (stored in GPU global memory).
[in]mfp_table_aFirst element for the linear interpolation of the interaction mean free paths (stored in GPU global memory).
[in]mfp_table_bSecond element for the linear interpolation of the interaction mean free paths (stored in GPU global memory).
[in]rayleigh_tablePointer to the table with the data required by the Rayleigh interaction sampling, stored in GPU global memory.
[in]compton_tablePointer to the table with the data required by the Compton interaction sampling, stored in GPU global memory.
[in,out]imagePointer to the image vector in the GPU global memory.
[in,out]dosePointer to the array containing the 3D voxel dose (and its uncertainty) in the GPU global memory.

Definition at line 135 of file MC-GPU_kernel_v1.3.cu.

References dose_ROI_x_max_CONST, linear_interp::e0, GCOa(), GRAa(), linear_interp::ide, init_PRNG(), locate_voxel(), MAX_MATERIALS, mfp_table_data_CONST, rayleigh_struct::pmax, ranecu(), ranecu_double(), rotate_double(), source(), tally_image(), tally_materials_dose(), tally_voxel_energy_deposition(), float2::x, float3::x, float2::y, float3::y, and float3::z.

Referenced by main().