MC-GPU
|
Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry. More...
Go to the source code of this file.
Defines | |
#define | LEAP_DISTANCE 256 |
Upper limit of the number of random values sampled in a single track. | |
#define | a1_RANECU 40014 |
Multipliers and moduli for the two MLCG in RANECU. | |
#define | m1_RANECU 2147483563 |
#define | a2_RANECU 40692 |
#define | m2_RANECU 2147483399 |
Functions | |
void | track_particles (int history_batch, int histories_per_thread, int num_p, int seed_input, unsigned long long int *image, ulonglong2 *voxels_Edep, float2 *voxel_mat_dens, float2 *mfp_Woodcock_table, float3 *mfp_table_a, float3 *mfp_table_b, struct rayleigh_struct *rayleigh_table, struct compton_struct *compton_table, struct detector_struct *detector_data_array, struct source_struct *source_data_array, ulonglong2 *materials_dose) |
Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";. | |
void | tally_voxel_energy_deposition (float *Edep, short3 *voxel_coord, ulonglong2 *voxels_Edep) |
Tally the dose deposited in the voxels. | |
void | tally_image (float *energy, float3 *position, float3 *direction, signed char *scatter_state, unsigned long long int *image, struct source_struct *source_data_SHARED, struct detector_struct *detector_data_SHARED) |
Tally a radiographic projection image. | |
void | source (float3 *position, float3 *direction, float *energy, int2 *seed, int *absvox, struct source_struct *source_data_SHARED, struct detector_struct *detector_data_SHARED) |
Source that creates primary x rays, according to the defined source model. | |
void | move_to_bbox (float3 *position, float3 *direction, float3 size_bbox, int *intersection_flag) |
Functions that moves a particle inside the voxelized geometry bounding box. | |
void | init_PRNG (int history_batch, int histories_per_thread, int seed_input, int2 *seed) |
Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique). | |
int | abMODm (int m, int a, int s) |
Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans. | |
float | ranecu (int2 *seed) |
Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version). | |
double | ranecu_double (int2 *seed) |
Pseudo-random number generator (PRNG) RANECU returning a double value. | |
int | locate_voxel (float3 *position, short3 *voxel_coord) |
Find the voxel that contains the current position. | |
void | rotate_double (float3 *direction, double costh, double phi) |
Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated. | |
void | GRAa (float *energy, double *costh_Rayleigh, int *mat, float *pmax_current, int2 *seed, struct rayleigh_struct *cgra) |
Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006. | |
void | GCOa (float *energy, double *costh_Compton, int *mat, int2 *seed, struct compton_struct *cgco_SHARED) |
Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles. | |
void | tally_materials_dose (float *Edep, int *material, ulonglong2 *materials_dose) |
Tally the depose deposited inside each material. |
Definition of the CUDA GPU kernel for the simulation of x ray tracks in a voxelized geometry.
This kernel has been optimized to yield a good performance in the GPU but can still be compiled in the CPU without problems. All the CUDA especific commands are enclosed in pre-processor directives that are skipped if the parameter "USING_CUDA" is not defined at compilation time.
Definition in file MC-GPU_kernel_v1.3.cu.
#define a1_RANECU 40014 |
Multipliers and moduli for the two MLCG in RANECU.
Definition at line 814 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG(), and update_seed_PRNG().
#define a2_RANECU 40692 |
Definition at line 816 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG().
#define LEAP_DISTANCE 256 |
Upper limit of the number of random values sampled in a single track.
Definition at line 812 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG(), and update_seed_PRNG().
#define m1_RANECU 2147483563 |
Definition at line 815 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG(), and update_seed_PRNG().
#define m2_RANECU 2147483399 |
Definition at line 817 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG().
int abMODm | ( | int | m, |
int | a, | ||
int | s | ||
) | [inline] |
Calculate "(a1*a2) MOD m" with 32-bit integers and avoiding the possible overflow, using the Russian Peasant approach modulo m and the approximate factoring method, as described in: L'Ecuyer and Cote, ACM Trans.
Math. Soft. 17 (1991).
This function has been adapted from "seedsMLCG.f", see: Badal and Sempau, Computer Physics Communications 175 (2006)
[in] | m,a,s | MLCG parameters |
Definition at line 919 of file MC-GPU_kernel_v1.3.cu.
Referenced by init_PRNG(), and update_seed_PRNG().
void GCOa | ( | float * | energy, |
double * | costh_Compton, | ||
int * | mat, | ||
int2 * | seed, | ||
struct compton_struct * | cgco_SHARED | ||
) | [inline] |
Random sampling of incoherent (Compton) scattering of photons, using the sampling algorithm from PENELOPE 2006: Relativistic impulse approximation with analytical one-electron Compton profiles.
[in,out] | energy | incident and final photon energy (eV) |
[out] | costh_Compton | cosine of the polar scattering angle |
[in] | material | Current voxel material |
[in] | seed | RANECU PRNG seed |
Definition at line 1287 of file MC-GPU_kernel_v1.3.cu.
References compton_struct::fco, compton_struct::fj0, MAX_MATERIALS, MAX_SHELLS, max_value, min_value, compton_struct::noscco, ranecu(), and compton_struct::uico.
Referenced by track_particles().
void GRAa | ( | float * | energy, |
double * | costh_Rayleigh, | ||
int * | mat, | ||
float * | pmax_current, | ||
int2 * | seed, | ||
struct rayleigh_struct * | cgra | ||
) | [inline] |
Sample a Rayleigh interaction using the sampling algorithm used in PENELOPE 2006.
[in] | energy | Particle energy (not modified with Rayleigh) |
[out] | costh_Rayleigh | Cosine of the angular deflection |
[in] | material | Current voxel material |
Definition at line 1181 of file MC-GPU_kernel_v1.3.cu.
References rayleigh_struct::aco, rayleigh_struct::bco, rayleigh_struct::itlco, rayleigh_struct::ituco, min_value, NP_RAYLEIGH, rayleigh_struct::pco, ranecu_double(), and rayleigh_struct::xco.
Referenced by track_particles().
void init_PRNG | ( | int | history_batch, |
int | histories_per_thread, | ||
int | seed_input, | ||
int2 * | seed | ||
) | [inline] |
Initialize the pseudo-random number generator (PRNG) RANECU to a position far away from the previous history (leap frog technique).
Each calculated seed initiates a consecutive and disjoint sequence of pseudo-random numbers with length LEAP_DISTANCE, that can be used to in a parallel simulation (Sequence Splitting parallelization method). The basic equation behind the algorithm is: S(i+j) = (a**j * S(i)) MOD m = [(a**j MOD m)*S(i)] MOD m , which is described in: P L'Ecuyer, Commun. ACM 31 (1988) p.742
This function has been adapted from "seedsMLCG.f", see: A Badal and J Sempau, Computer Physics Communications 175 (2006) p. 440-450
[in] | history | Particle bach number. |
[in] | seed_input | Initial PRNG seed input (used to initiate both MLCGs in RANECU). |
[out] | seed | Initial PRNG seeds for the present history. |
Definition at line 841 of file MC-GPU_kernel_v1.3.cu.
References a1_RANECU, a2_RANECU, abMODm(), LEAP_DISTANCE, m1_RANECU, m2_RANECU, int2::x, and int2::y.
Referenced by track_particles().
int locate_voxel | ( | float3 * | position, |
short3 * | voxel_coord | ||
) | [inline] |
Find the voxel that contains the current position.
Report the voxel absolute index and the x,y,z indices. The structure containing the voxel number and size is read from CONSTANT memory.
[in] | position | Particle position |
[out] | voxel_coord | Pointer to three integer values (short3*) that will store the x,y and z voxel indices. |
Definition at line 1033 of file MC-GPU_kernel_v1.3.cu.
References EPS_SOURCE, voxel_struct::inv_voxel_size, voxel_struct::num_voxels, voxel_struct::size_bbox, voxel_data_CONST, int3::x, float3::x, short3::x, int3::y, float3::y, short3::y, float3::z, and short3::z.
Referenced by track_particles().
void move_to_bbox | ( | float3 * | position, |
float3 * | direction, | ||
float3 | size_bbox, | ||
int * | intersection_flag | ||
) | [inline] |
Functions that moves a particle inside the voxelized geometry bounding box.
An EPSILON distance is added to make sure the particles will be clearly inside the bbox, not exactly on the surface.
This algorithm makes the following assumtions:
(Interesting information on ray-box intersection: http://tog.acm.org/resources/GraphicsGems/gems/RayBox.c)
[in,out] | position | Particle position: initially set to the focal spot, returned transported inside the voxel bbox. |
[out] | direction | Sampled particle direction (cosine vectors). |
[out] | intersection_flag | Set to <0 if particle outside bbox and will not cross the voxels, not changed otherwise. |
[out] | size_bbox | Size of the bounding box. |
Definition at line 714 of file MC-GPU_kernel_v1.3.cu.
References EPS_SOURCE, NEG_EPS_SOURCE, NEG_INF, float3::x, float3::y, and float3::z.
Referenced by source().
Pseudo-random number generator (PRNG) RANECU returning a float value (single precision version).
[in,out] | seed | PRNG seed (seed kept in the calling function and updated here). |
Definition at line 965 of file MC-GPU_kernel_v1.3.cu.
References int2::x, and int2::y.
Referenced by GCOa(), source(), and track_particles().
double ranecu_double | ( | int2 * | seed | ) | [inline] |
Pseudo-random number generator (PRNG) RANECU returning a double value.
Definition at line 995 of file MC-GPU_kernel_v1.3.cu.
References int2::x, and int2::y.
Referenced by GRAa(), and track_particles().
void rotate_double | ( | float3 * | direction, |
double | costh, | ||
double | phi | ||
) | [inline] |
Rotates a vector; the rotation is specified by giving the polar and azimuthal angles in the "self-frame", as determined by the vector to be rotated.
This function is a literal translation from Fortran to C of PENELOPE (v. 2006) subroutine "DIRECT".
[in,out] | (u,v,w) | input vector (=d) in the lab. frame; returns the rotated vector components in the lab. frame |
[in] | costh | cos(theta), angle between d before and after turn |
[in] | phi | azimuthal angle (rad) turned by d in its self-frame |
Definition at line 1103 of file MC-GPU_kernel_v1.3.cu.
References float3::x, float3::y, and float3::z.
Referenced by track_particles().
void source | ( | float3 * | position, |
float3 * | direction, | ||
float * | energy, | ||
int2 * | seed, | ||
int * | absvox, | ||
struct source_struct * | source_data_SHARED, | ||
struct detector_struct * | detector_data_SHARED | ||
) | [inline] |
Source that creates primary x rays, according to the defined source model.
The particles are automatically moved to the surface of the voxel bounding box, to start the tracking inside a real material. If the sampled particle do not enter the voxels, it is init in the focal spot and the main program will check if it arrives at the detector or not.
[in] | source_data | Structure describing the source. |
[in] | source_energy_data_CONST | Global variable in constant memory space describing the source energy spectrum. |
[out] | position | Initial particle position (particle transported inside the voxel bbox). |
[out] | direction | Sampled particle direction (cosine vectors). |
[out] | energy | Sampled energy of the new x ray. |
[in] | seed | Current seed of the random number generator, requiered to sample the movement direction. |
[out] | absvox | Set to <0 if primary particle will not cross the voxels, not changed otherwise (>0). |
Definition at line 626 of file MC-GPU_kernel_v1.3.cu.
References source_struct::cos_theta_low, source_struct::D_cos_theta, source_struct::D_phi, source_energy_struct::espc, source_energy_struct::espc_alias, source_energy_struct::espc_cutoff, source_struct::max_height_at_y1cm, move_to_bbox(), source_energy_struct::num_bins_espc, source_struct::phi_low, source_struct::position, ranecu(), source_struct::rot_fan, detector_struct::rotation_flag, voxel_struct::size_bbox, source_energy_data_CONST, voxel_data_CONST, float3::x, float3::y, and float3::z.
Referenced by track_particles().
void tally_image | ( | float * | energy, |
float3 * | position, | ||
float3 * | direction, | ||
signed char * | scatter_state, | ||
unsigned long long int * | image, | ||
struct source_struct * | source_data_SHARED, | ||
struct detector_struct * | detector_data_SHARED | ||
) | [inline] |
Tally a radiographic projection image.
This function is called whenever a particle escapes the voxelized volume. The code checks if the particle would arrive at the detector if it kept moving in a straight line after exiting the voxels (assuming vacuum enclosure). An ideal image formation model is implemented: each pixel counts the total energy of the x rays that enter the pixel (100% detection efficiency for any energy). The image due to primaries and different kinds of scatter is tallied separately.
In the GPU, and atomicAdd() function is used to make sure that multiple threads do not update the same pixel at the same time, which would result in a lose of information. Since the atomicAdd function is only available for 'unsigned long long int' data, the float pixel values are scaled by a factor "SCALE_eV" defined in the header file (eg, #define SCALE_eV 10000.0f) and stored as unsigned long long integers in main memory.
WARNING! If the total tallied signal (for all particles) is larger than "1.8e19/SCALE_eV", there will be a bit overflow and the value will be reset to 0 giving bogus results.
WARNING! The detector plane should be located outside the voxels bounding box. However, since the particles are moved outside the bbox in the last step, they could cross the detector plane anyway. If the particles are less than 2.0 cm behind the detector, they are moved back and detected. Therefore the detector can be a few cm inside the bbox and still work. If the Woodcock mean free path is larger than the distance from the bbox to the detector, we may lose some particles behind the detector!
[in] | energy | X-ray energy |
[in] | position | Particle position |
[in] | direction | Particle direction (cosine vectors) |
[in] | scatter_state | Flag marking primaries, single Compton, single Rayleigh or multiple scattered radiation |
[out] | image | Integer array containing the image, ie, the pixel values (in tenths of meV) |
Definition at line 482 of file MC-GPU_kernel_v1.3.cu.
References detector_struct::center, detector_struct::corner_min_rotated_to_Y, source_struct::direction, detector_struct::inv_pixel_size_X, detector_struct::inv_pixel_size_Z, detector_struct::num_pixels, detector_struct::rot_inv, detector_struct::rotation_flag, SCALE_eV, detector_struct::total_num_pixels, int2::x, float3::x, int2::y, float3::y, and float3::z.
Referenced by track_particles().
void tally_materials_dose | ( | float * | Edep, |
int * | material, | ||
ulonglong2 * | materials_dose | ||
) | [inline] |
Tally the depose deposited inside each material.
This function is called whenever a particle suffers a Compton or photoelectric interaction. The energy released in each interaction is added and later in the report function the total deposited energy is divided by the total mass of the material in the voxelized object to get the dose. This naturally accounts for multiple densities for voxels with the same material (not all voxels have same mass). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the organ size.
The function uses atomic functions for a thread-safe access to the GPU memory. We can check if this tally was disabled in the input file checking if the array materials_dose was allocated in the GPU (disabled if pointer = NULL).
[in] | Edep | Energy deposited in the interaction |
[in] | material | Current material id number |
[out] | materials_dose | ulonglong2 array storing the mateials dose [in eV/g] and dose^2 (ie, uncertainty). |
Definition at line 1547 of file MC-GPU_kernel_v1.3.cu.
References SCALE_eV, ulonglong2::x, and ulonglong2::y.
Referenced by track_particles().
void tally_voxel_energy_deposition | ( | float * | Edep, |
short3 * | voxel_coord, | ||
ulonglong2 * | voxels_Edep | ||
) | [inline] |
Tally the dose deposited in the voxels.
This function is called whenever a particle suffers a Compton or photoelectric interaction. It is not necessary to call this function if the dose tally was disabled in the input file (ie, dose_ROI_x_max_CONST < 0). Electrons are not transported in MC-GPU and therefore we are approximating that the dose is equal to the KERMA (energy released by the photons alone). This approximation is acceptable when there is electronic equilibrium and when the range of the secondary electrons is shorter than the voxel size. Usually the doses will be acceptable for photon energies below 1 MeV. The dose estimates may not be accurate at the interface of low density volumes.
We need to use atomicAdd() in the GPU to prevent that multiple threads update the same voxel at the same time, which would result in a lose of information. This is very improbable when using a large number of voxels but gives troubles with a simple geometries with few voxels (in this case the atomicAdd will slow down the code because threads will update the voxel dose secuentially).
[in] | Edep | Energy deposited in the interaction |
[in] | voxel_coord | Voxel coordinates, needed to check if particle located inside the input region of interest (ROI) |
[out] | voxels_Edep | ulonglong2 array containing the 3D voxel dose and dose^2 (ie, uncertainty) as unsigned integers scaled by SCALE_eV. |
Definition at line 418 of file MC-GPU_kernel_v1.3.cu.
References dose_ROI_x_max_CONST, dose_ROI_x_min_CONST, dose_ROI_y_max_CONST, dose_ROI_y_min_CONST, dose_ROI_z_max_CONST, dose_ROI_z_min_CONST, SCALE_eV, short3::x, ulonglong2::x, short3::y, ulonglong2::y, and short3::z.
Referenced by track_particles().
void track_particles | ( | int | history_batch, |
int | histories_per_thread, | ||
int | num_p, | ||
int | seed_input, | ||
unsigned long long int * | image, | ||
ulonglong2 * | voxels_Edep, | ||
float2 * | voxel_mat_dens, | ||
float2 * | mfp_Woodcock_table, | ||
float3 * | mfp_table_a, | ||
float3 * | mfp_table_b, | ||
struct rayleigh_struct * | rayleigh_table, | ||
struct compton_struct * | compton_table, | ||
struct detector_struct * | detector_data_array, | ||
struct source_struct * | source_data_array, | ||
ulonglong2 * | materials_dose | ||
) |
Initialize the image array, ie, set all pixels to zero Essentially, this function has the same effect as the command: "cutilSafeCall(cudaMemcpy(image_device, image, image_bytes, cudaMemcpyHostToDevice))";.
CUDA performs some initialization work the first time a GPU kernel is called. Therefore, calling a short kernel before the real particle tracking is performed may improve the accuracy of the timing measurements in the relevant kernel.
[in,out] | image | Pointer to the image array. |
[in] | pixels_per_image | Number of pixels in the image (ie, elements in the array). Main function to simulate x-ray tracks inside a voxelized geometry. Secondary electrons are not simulated (in photoelectric and Compton events the energy is locally deposited). |
The following global variables, in the GPU __constant__ memory are used: voxel_data_CONST, source_energy_data_CONST, detector_data_CONST, mfp_table_data_CONST.
[in] | history_batch | Particle batch number (only used in the CPU version when CUDA is disabled!, the GPU uses the built-in variable threadIdx) |
[in] | num_p | Projection number in the CT simulation. This variable defines a specific angle and the corresponding source and detector will be used. |
[in] | histories_per_thread | Number of histories to simulate for each call to this function (ie, for GPU thread). |
[in] | seed_input | Random number generator seed (the same seed is used to initialize the two MLCGs of RANECU). |
[in] | voxel_mat_dens | Pointer to the voxel densities and material vector (the voxelized geometry), stored in GPU glbal memory. |
[in] | mfp_Woodcock_table | Two parameter table for the linear interpolation of the Woodcock mean free path (MFP) (stored in GPU global memory). |
[in] | mfp_table_a | First element for the linear interpolation of the interaction mean free paths (stored in GPU global memory). |
[in] | mfp_table_b | Second element for the linear interpolation of the interaction mean free paths (stored in GPU global memory). |
[in] | rayleigh_table | Pointer to the table with the data required by the Rayleigh interaction sampling, stored in GPU global memory. |
[in] | compton_table | Pointer to the table with the data required by the Compton interaction sampling, stored in GPU global memory. |
[in,out] | image | Pointer to the image vector in the GPU global memory. |
[in,out] | dose | Pointer to the array containing the 3D voxel dose (and its uncertainty) in the GPU global memory. |
Definition at line 135 of file MC-GPU_kernel_v1.3.cu.
References dose_ROI_x_max_CONST, linear_interp::e0, GCOa(), GRAa(), linear_interp::ide, init_PRNG(), locate_voxel(), MAX_MATERIALS, mfp_table_data_CONST, rayleigh_struct::pmax, ranecu(), ranecu_double(), rotate_double(), source(), tally_image(), tally_materials_dose(), tally_voxel_energy_deposition(), float2::x, float3::x, float2::y, float3::y, and float3::z.
Referenced by main().