Dudes Study Note

Posts

Azure Data Store Test

- May 01, 2023

emitestdata Key: H95sv7e7kXf3MO1WDa+fdvL0VB27UKaL06alkrGDUMSJmn29nNRpAa9TO/UcOJYPG8jA3WM/zmy6+ASta6l6+w== Connection String: DefaultEndpointsProtocol=https;AccountName=emitestdata;AccountKey=H95sv7e7kXf3MO1WDa+fdvL0VB27UKaL06alkrGDUMSJmn29nNRpAa9TO/UcOJYPG8jA3WM/zmy6+ASta6l6+w==;EndpointSuffix=core.windows.net Client ID: 76814de2-1089-4b58-aed2-6b16ec12cd30 Client Name: emi2everything Client Secret id: cdc0af8a-3979-4f86-bbbc-151872cab98c

Performance Issue with the Tuple Join

- November 04, 2019

Spark 2.2 Count Table Multiple Time Issue

- March 20, 2018

Problem Spark 2.2 Count Table Multiple Time returns the same value regardless of table actual state Description There is table A, initially have 100 records Loop: (1) int countBefore = Spark table A count (2) //Remove 2 records from table A (3) int countAfter = Spark table A count * Note that (1) is tried with both session.sql(select count(*)) and session.sql(select *).count() Behavior 1st iteration: countBefore = 100, countAfter = 98 2nd iteration: countBefore = 100, countAfter = 96 Explanation and Solution Look like the count is cached for optimization, although the variable countBefore declared inside the loop iteration. The scope concept does not hold. Solution is session.catalog.clearCache() in the beginning of any iteration. The other attempt using session.catalog.refreshTable(tableA) does not solve the problem.

Java Concurrency: Final

- February 01, 2018

What does it mean by FINAL? (Definition below only applicable to JDK6+) 1) Address of enclosing object not allowed to escape util final variable for initialized and change made to memory done 2) JVM can execute special CPU caching instruction 3) All change in constructor to final variable, even reference will be pushed to memory

Java Synchronization: volatile

- February 01, 2018

What does it mean by "volatile"? New JMM (1.5+) 1) Volatile variable reads from memory and write to memory instead of local cache Picture below shows example of a multi core system. Each CPU has cache, which is super fast memory locally for that CPU. For optimization purpose, many information stored in cache. But in multi threaded application which shared variable used, if each CPU update and fetch info from its own cache, value of the shared variable is wrong. Example: a counter By using volatile, every time volatile variable read, it is fetched from memory. Every time volatile variable written, it is pushed from L1 to L2 to memory. 2) Instruction which use volatile variable cannot be reordered 3) Volatile variable observes what happened For example, old value of w = 0 , x = 0 , f = true. f is a volatile variable. Now CPU1 updates x = 2 and f = false. As f is volatile, it is flushed to RAM. x is not volatile, however f observes that x changes as well, so

Java Object Size and Overhead

- February 01, 2018

Let's look at the details few details of object header and calculate the memory size an object occupies inside JVM Heap. Each Object contains following information. • The Object Header. • The memory for primitive types.   • The memory for reference types.   • Offset / alignment - in fact, these are a few unused bytes that are placed after the data object itself. This is done in order that an address in memory was always a multiple of machine word, to speed up the memory read + reduce the number of bits for a pointer to an object. It is also worth noting that in java a size of any object is multiple of 8 bytes! • Object Header : In case of 32-bit system, the header size is 8 bytes, in the case of 64-bit system, respectively is 16 bytes. It contains following information. 1 Hash Code - 2 Garbage Collection Information - each java object contains the information needed for the memory management. 3 Type Information Block Pointe