Consider the standard partially linear model (PLM): Y_i=X_i^T\beta+g(W_i)+u_i. Here we assume (X_i,W_i.u_i)_{i=1}^n are i.i.d., X_i\in\mathbb{R}^p, W_i,u_i\in\mathbb{R} with p\gg n, g(\cdot) is an unknown nonlinear function, and s:=\|\beta\|_0 is much smaller than p. Our aim is to estimate \beta. For this, existing literature is focused on the least square estimators (LSEs), and the best theoretical results to date prove rate-optimality under the following four assumptions: (i) g\in\mathcal{G} for some known function class \mathcal{G} of finite entropy integral value; (ii) s^2\log p/n\to 0; (iii) u_1 is subgaussian; (iv) \max_{i,j} |X_{ij}|\leq M for some universal constant M. Though some of them could be proof artifacts, the other assumptions are related to long-standing problems in nonparametric statistics, and are arguably difficult to relax. In this talk, an alternative to LSE, Honore and Powell's pairwise difference approach, is shown to attain rate-optimality with all four assumptions relaxed. Particularly, without sacrificing the rate, it proves to be able to automatically adapt to the choice of g function, and work even when g is piecewise Holder of discontinuity, a result new to both statistics and econometrics communities. In addition, we improve the scaling condition, allow X_1 subgaussian distributed, and only require \mathbb{E} u_1^2\leq M_u for some universal constant M_u. The improvement is viable based on newly developed U-proc